git-backup
git-backup
is a command-line tool for backing up your Git repositories to Amazon S3 or any S3-compatible storage.
Why Choose git-backup
?
- Simple Setup: Get started quickly with minimal configuration.
- Durable Backups: Creates
.tar.gz
archives, ensuring you can restore your data without needing specific tools. - Automated Cleanup: Includes a built-in
prune
command to delete old snapshots based on your retention policy.
Installation
Using NPM:
$ npm install @larose/git-backup
Using Yarn:
$ yarn add @larose/git-backup
Creating a Snapshot
Use the snapshot
command to create a compressed archive of your Git repository and upload it to your S3-compatible storage. The snapshot
command works by executing git clone --mirror <repo>
, which captures all commits, tags, and branches. It then compresses the clone into a .tar.gz
file and uploads it to S3.
$ git-backup snapshot \
--repo $REPO \
--remote $REMOTE \
--access-key-id $ACCESS_KEY_ID \
--secret-access-key $SECRET_ACCESS_KEY
Arguments:
--repo
: The URL of the Git repository you want to back up.--remote
: The URL of the remote storage location where the snapshot will be stored.--access-key-id
: Your access key ID for the S3-compatible storage.--secret-access-key
: Your secret access key for the S3-compatible storage.
Example:
$ git-backup snapshot \
--repo git@github.com:larose/utt.git \
--remote https://1234.r2.cloudflarestorage.com/bucket-name/path/in/your/bucket \
--access-key-id AKIAIOSFODNN7EXAMPLE \
--secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Pruning Old Snapshots
The prune
command helps you manage storage space by deleting old snapshots based on a defined retention policy.
$ git-backup prune \
--repo $REPO \
--remote $REMOTE \
--retention-policy $RETENTION_POLICY \
--access-key-id $ACCESS_KEY_ID \
--secret-access-key $SECRET_ACCESS_KEY
Arguments:
--repo
: The URL of the Git repository you want to back up.--remote
: The URL of the remote storage location where the snapshot will be stored.--retention-policy
: Defines how many snapshots to keep for different durations. Format:daily=<number>,weekly=<number>,monthly=<number>
. See below for more details on the retention policy.--access-key-id
: Your access key ID for the S3-compatible storage.--secret-access-key
: Your secret access key for the S3-compatible storage.
Example:
$ git-backup prune \
--repo git@github.com:larose/utt.git \
--remote https://1234.r2.cloudflarestorage.com/bucket/base/path \
--retention-policy "daily=7, weekly=4, monthly=3" \
--access-key-id AKIAIOSFODNN7EXAMPLE \
--secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Retention Policy
The prune
command uses a retention policy (--retention-policy
) to manage how many snapshots are kept for different durations. This ensures you have enough snapshots for recovery while optimizing storage usage.
Format: daily=D, weekly=W, monthly=M
D
: The number of most recent daily snapshots to retain. Days start at UTC midnight.W
: The number of most recent weekly snapshots to retain. Weeks start on Monday.M
: The number of most recent monthly snapshots to retain. Months start on the first day of the month.
If a scheduled backup fails or is skipped, it doesn't count towards its retention window. This ensures you always have at least the intended number of successful snapshots available for each period. This is particularly helpful to avoid situations where a string of failed backups could lead to the deletion of all your snapshots for a specific timeframe.
Retention Policy Example
This example demonstrates how prune
works with a policy to retain only the four most recent daily snapshots (daily=4, weekly=0, monthly=0
).
Snapshots taken:
- May 28 (midnight)
- May 26 (at various times) - We have multiple snapshots for May 26th
- May 25 (midnight)
- May 24 (midnight)
- May 23 (midnight)
The table below shows which snapshots are retained and why:
Snapshots | Status | Explanation |
---|---|---|
May 28 at midnight | ✅ Retained | Most recent daily snapshot |
May 26 at 11pm | ✅ Retained | Second most recent daily snapshot (keeps the latest for each day) |
May 26 at 8am | ❌ Deleted | Older snapshot on the same day (keeps only the most recent per day) |
May 26 at midnight | ❌ Deleted | Older snapshot on the same day (keeps only the most recent per day) |
May 25 at midnight | ✅ Retained | Third most recent daily snapshot |
May 24 at midnight | ✅ Retained | Fourth most recent daily snapshot, reaches the retention limit of 4 daily snapshots |
May 23 at midnight | ❌ Deleted | Exceeds the retention window (policy keeps only the 4 most recent daily snapshots) |
Using git-backup
with GitHub Actions
While git-backup
is a command-line tool, you can leverage GitHub Actions to automate backups for your Git repositories hosted on GitHub. Here's an example workflow demonstrating how to achieve this:
name: Back up Public Repositories
on:
schedule:
- cron: "0 0 1 * *" # Runs at midnight on the first day of every month
workflow_dispatch:
jobs:
back-up:
runs-on: ubuntu-22.04
strategy:
matrix:
repo:
[
"https://github.com/cicd-excellence/app.git",
"https://github.com/cicd-excellence/infra.git",
"https://github.com/larose/cargo.git",
"https://github.com/larose/conjugueur.git",
"https://github.com/larose/eef.git",
"https://github.com/larose/ena.git",
"https://github.com/larose/git-backup-demo.git",
"https://github.com/larose/pretty-printer.git",
"https://github.com/larose/tsp.git",
"https://github.com/larose/utt.git",
"https://github.com/larose/verbes.git",
"https://github.com/larose/yarn-monorepo-change-based-testing-demo.git",
"https://github.com/larose/wiki.git",
]
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "lts/*"
- name: Install @larose/git-backup
run: npm install -g @larose/git-backup
- name: Back up ${{ matrix.repo }}
run: |
git-backup snapshot \
--repo ${{ matrix.repo }} \
--remote ${{ secrets.REMOTE }} \
--access-key-id ${{ secrets.ACCESS_KEY_ID }} \
--secret-access-key ${{ secrets.SECRET_ACCESS_KEY }}
git-backup prune \
--repo ${{ matrix.repo }} \
--remote ${{ secrets.REMOTE }} \
--access-key-id ${{ secrets.ACCESS_KEY_ID }} \
--secret-access-key ${{ secrets.SECRET_ACCESS_KEY }} \
--retention-policy "monthly=3"
Source: https://github.com/larose/git-backup-demo
Note that the Git repository URL uses https instead of ssh because, by default, the SSH key provided in a workflow does not have the permission to clone other Git repositories.
If you want to back up private Git repositories, simply use a personal access token (PAT) as the username in the Git repository URL. Example: git clone https://$GITHUB_PAT@github.com/larose/utt.git
.
Restoring from a Snapshot
To restore a Git repository from a snapshot created by git-backup
, follow these steps:
Step 1: Download the Snapshot
Use the AWS CLI, another S3-compatible tool, or the S3 UI to download the backup snapshot to your local machine.
Step 2: Extract the Snapshot
Use a tool like tar
to extract the contents of the downloaded archive. This will create a directory containing the complete mirrored (bare) repository, which is a special type of repository without a working directory.
$ tar -xzf <snapshot-name>.tar.gz
Replace <snapshot-name>
with the actual filename of your downloaded snapshot.
Example:
$ tar -xzf larose-utt-20240602T161101Z.tar.gz
Step 3: Clone the Bare Repository as a Regular Repository
The extracted directory contains a bare Git repository, meaning it only holds the Git data (commits, branches, tags) but not your working files.
To convert the bare repository into a regular working directory, use the git clone
command, specifying the extracted directory as the source and a new directory for your restored working repository.
$ git clone <extracted_directory_name> my-restored-repo
Replace <extracted_directory_name>
with the actual name of the extracted directory and my-restored-repo
with your desired name for the restored working directory.
Example:
$ git clone larose-utt my-restored-repo
Your Git repository is now restored and ready to use.
Source Code
Download the source code from this link.