Remove Old Pipelines From GitLab Backup
Once I switched to the SKIP=tar
style of backup for GitLab, I noticed how
much space the artifacts and consequently the pipelines were taking,
significantly more than the actual code and registry.
Current state
In a backup created with SKIP=tar
, this is my current layout before any
optimization:
root@gitlab-server:/data/backups# l total 31G drwxrwxrwt 4 root root 15 Aug 20 00:04 . drwxr-xr-x 4 root root 4 Aug 19 21:35 .. -rw------- 1 systemd-network systemd-network 20G Aug 19 23:53 artifacts.tar.gz -rw-r--r-- 1 systemd-network systemd-network 509 Aug 20 00:04 backup_information.yml -rw------- 1 systemd-network systemd-network 16M Aug 19 22:36 builds.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 20 00:04 ci_secure_files.tar.gz drwxr-xr-x 2 systemd-network systemd-network 3 Aug 19 22:17 db -rw------- 1 systemd-network systemd-network 146 Aug 20 00:04 external_diffs.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 19 23:55 lfs.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 20 00:04 packages.tar.gz -rw------- 1 systemd-network systemd-network 1.8G Aug 19 23:55 pages.tar.gz -rw------- 1 systemd-network systemd-network 8.8G Aug 20 00:04 registry.tar.gz drwx------ 4 systemd-network systemd-network 4 Aug 19 22:17 repositories -rw------- 1 systemd-network systemd-network 147K Aug 19 23:55 terraform_state.tar.gz -rw------- 1 systemd-network systemd-network 37M Aug 19 22:36 uploads.tar.gz
Analyzing the Data Using the Rails Console
In my case, I am running GitLab on a Docker container. In this scenario, to spawn the rails console, run:
docker exec -it gitlab gitlab-rails console
In my analysis, I plan to check how much disk space pipelines older than 6 months are consuming using the following Ruby snippet:
cutoff_date = 6.months.ago
total_pipelines = 0
total_jobs = 0
total_artifacts = 0
total_artifact_size = 0
old_pipelines = 0
old_jobs = 0
old_artifacts = 0
old_artifact_size = 0
pipelines_to_delete = []
Project.all.pluck(:id).each do |project_id|
pipelines_in_project = Ci::Pipeline.where(project_id: project_id).order(created_at: :desc)
next unless pipelines_in_project.any?
most_recent_pipeline = pipelines_in_project.first
pipelines_to_delete_in_project = []
if most_recent_pipeline.created_at > cutoff_date
pipelines_to_delete_in_project = pipelines_in_project.where("created_at < ?", cutoff_date)
else
pipelines_to_delete_in_project = pipelines_in_project.where.not(id: most_recent_pipeline.id)
end
pipelines_to_delete += pipelines_to_delete_in_project
pipelines_in_project.each do |pipeline|
total_pipelines += 1
is_old_pipeline = pipelines_to_delete_in_project.where(id: pipeline.id).any?
# Get all associated jobs
pipeline.jobs_in_self_and_project_descendants.each do |job|
total_jobs += 1
old_jobs += 1 if is_old_pipeline
# Get all associated artifacts
job.job_artifacts.each do |artifact|
total_artifacts += 1
total_artifact_size += artifact.file.size rescue 0
if is_old_pipeline
old_artifacts += 1
old_artifact_size += artifact.file.size rescue 0
end
end
end
end
old_pipelines += pipelines_to_delete_in_project.count
end
puts "-----------------------------------------"
puts " GitLab CI/CD Data Analysis Report "
puts "-----------------------------------------"
puts "Total Pipeline Data:"
puts " Pipelines: #{total_pipelines}"
puts " Jobs: #{total_jobs}"
puts " Artifacts: #{total_artifacts}"
puts " Artifact Space: #{(total_artifact_size.to_f / 1.gigabyte).round(2)} GB"
puts
puts "Data to be Cleaned Up:"
puts " Pipelines: #{old_pipelines}"
puts " Jobs: #{old_jobs}"
puts " Artifacts: #{old_artifacts}"
puts " Artifact Space: #{(old_artifact_size.to_f / 1.gigabyte).round(2)} GB"
puts "-----------------------------------------"
----------------------------------------- GitLab CI/CD Data Analysis Report ----------------------------------------- Total Pipeline Data: Pipelines: 15419 Jobs: 60590 Artifacts: 51015 Artifact Space: 90.4 GB Data to be Cleaned Up: Pipelines: 14837 Jobs: 55806 Artifacts: 46116 Artifact Space: 71.55 GB -----------------------------------------
Cleanup
Still on the same Rails console:
pipelines_to_delete.each do |pipeline|
puts "Deleting pipeline #{pipeline.id}..."
pipeline.destroy
end
Verification by Running the Analysis Again
----------------------------------------- GitLab CI/CD Data Analysis Report ----------------------------------------- Total Pipeline Data: Pipelines: 576 Jobs: 4719 Artifacts: 4862 Artifact Space: 18.48 GB Data to be Cleaned Up: Pipelines: 0 Jobs: 0 Artifacts: 0 Artifact Space: 0.0 GB -----------------------------------------
Result
After regenerating the backup with:
docker exec -it gitlab gitlab-rake gitlab:backup:create SKIP=tar
The backup folder shows the following results:
root@gitlab-server:/data/backups# l total 14G drwx------ 4 systemd-network root 15 Aug 22 19:38 . drwxr-xr-x 4 root root 4 Aug 19 21:35 .. -rw------- 1 systemd-network systemd-network 2.9G Aug 22 19:31 artifacts.tar.gz -rw-r--r-- 1 systemd-network systemd-network 509 Aug 22 19:38 backup_information.yml -rw------- 1 systemd-network systemd-network 14M Aug 22 19:28 builds.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 22 19:38 ci_secure_files.tar.gz drwxr-xr-x 2 systemd-network systemd-network 3 Aug 22 19:25 db -rw------- 1 systemd-network systemd-network 146 Aug 22 19:38 external_diffs.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 22 19:33 lfs.tar.gz -rw------- 1 systemd-network systemd-network 146 Aug 22 19:38 packages.tar.gz -rw------- 1 systemd-network systemd-network 1.8G Aug 22 19:33 pages.tar.gz -rw------- 1 systemd-network systemd-network 8.8G Aug 22 19:38 registry.tar.gz drwx------ 4 systemd-network systemd-network 4 Aug 22 19:25 repositories -rw------- 1 systemd-network systemd-network 147K Aug 22 19:33 terraform_state.tar.gz -rw------- 1 systemd-network systemd-network 37M Aug 22 19:28 uploads.tar.gz