Mathieu Larose

git-backup

git-backup is a command-line tool for backing up your Git repositories to Amazon S3 or any S3-compatible storage.

Why Choose git-backup?

Installation

Using NPM:

$ npm install @larose/git-backup

Using Yarn:

$ yarn add @larose/git-backup

Creating a Snapshot

Use the snapshot command to create a compressed archive of your Git repository and upload it to your S3-compatible storage. The snapshot command works by executing git clone --mirror <repo>, which captures all commits, tags, and branches. It then compresses the clone into a .tar.gz file and uploads it to S3.

$ git-backup snapshot \
  --repo $REPO \
  --remote $REMOTE \
  --access-key-id $ACCESS_KEY_ID \
  --secret-access-key $SECRET_ACCESS_KEY

Arguments:

Example:

$ git-backup snapshot \
  --repo git@github.com:larose/utt.git \
  --remote https://1234.r2.cloudflarestorage.com/bucket-name/path/in/your/bucket \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Pruning Old Snapshots

The prune command helps you manage storage space by deleting old snapshots based on a defined retention policy.

$ git-backup prune \
  --repo $REPO \
  --remote $REMOTE \
  --retention-policy $RETENTION_POLICY \
  --access-key-id $ACCESS_KEY_ID \
  --secret-access-key $SECRET_ACCESS_KEY

Arguments:

Example:

$ git-backup prune \
  --repo git@github.com:larose/utt.git \
  --remote https://1234.r2.cloudflarestorage.com/bucket/base/path \
  --retention-policy "daily=7, weekly=4, monthly=3" \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Retention Policy

The prune command uses a retention policy (--retention-policy) to manage how many snapshots are kept for different durations. This ensures you have enough snapshots for recovery while optimizing storage usage.

Format: daily=D, weekly=W, monthly=M

If a scheduled backup fails or is skipped, it doesn't count towards its retention window. This ensures you always have at least the intended number of successful snapshots available for each period. This is particularly helpful to avoid situations where a string of failed backups could lead to the deletion of all your snapshots for a specific timeframe.

Retention Policy Example

This example demonstrates how prune works with a policy to retain only the four most recent daily snapshots (daily=4, weekly=0, monthly=0).

Snapshots taken:

The table below shows which snapshots are retained and why:

Snapshots Status Explanation
May 28 at midnight ✅ Retained Most recent daily snapshot
May 26 at 11pm ✅ Retained Second most recent daily snapshot (keeps the latest for each day)
May 26 at 8am ❌ Deleted Older snapshot on the same day (keeps only the most recent per day)
May 26 at midnight ❌ Deleted Older snapshot on the same day (keeps only the most recent per day)
May 25 at midnight ✅ Retained Third most recent daily snapshot
May 24 at midnight ✅ Retained Fourth most recent daily snapshot, reaches the retention limit of 4 daily snapshots
May 23 at midnight ❌ Deleted Exceeds the retention window (policy keeps only the 4 most recent daily snapshots)

Using git-backup with GitHub Actions

While git-backup is a command-line tool, you can leverage GitHub Actions to automate backups for your Git repositories hosted on GitHub. Here's an example workflow demonstrating how to achieve this:

name: Back up Public Repositories

on:
  schedule:
    - cron: "0 0 1 * *" # Runs at midnight on the first day of every month
  workflow_dispatch:

jobs:
  back-up:
    runs-on: ubuntu-22.04

    strategy:
      matrix:
        repo:
          [
            "https://github.com/cicd-excellence/app.git",
            "https://github.com/cicd-excellence/infra.git",
            "https://github.com/larose/cargo.git",
            "https://github.com/larose/conjugueur.git",
            "https://github.com/larose/eef.git",
            "https://github.com/larose/ena.git",
            "https://github.com/larose/git-backup-demo.git",
            "https://github.com/larose/pretty-printer.git",
            "https://github.com/larose/tsp.git",
            "https://github.com/larose/utt.git",
            "https://github.com/larose/verbes.git",
            "https://github.com/larose/yarn-monorepo-change-based-testing-demo.git",
            "https://github.com/larose/wiki.git",
          ]

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "lts/*"

      - name: Install @larose/git-backup
        run: npm install -g @larose/git-backup

      - name: Back up ${{ matrix.repo }}
        run: |
          git-backup snapshot \
            --repo ${{ matrix.repo }} \
            --remote ${{ secrets.REMOTE }} \
            --access-key-id ${{ secrets.ACCESS_KEY_ID }} \
            --secret-access-key ${{ secrets.SECRET_ACCESS_KEY }}

          git-backup prune \
            --repo ${{ matrix.repo }} \
            --remote ${{ secrets.REMOTE }} \
            --access-key-id ${{ secrets.ACCESS_KEY_ID }} \
            --secret-access-key ${{ secrets.SECRET_ACCESS_KEY }} \
            --retention-policy "monthly=3"

Source: https://github.com/larose/git-backup-demo

Note that the Git repository URL uses https instead of ssh because, by default, the SSH key provided in a workflow does not have the permission to clone other Git repositories.

If you want to back up private Git repositories, simply use a personal access token (PAT) as the username in the Git repository URL. Example: git clone https://$GITHUB_PAT@github.com/larose/utt.git.

Restoring from a Snapshot

To restore a Git repository from a snapshot created by git-backup, follow these steps:

Step 1: Download the Snapshot

Use the AWS CLI, another S3-compatible tool, or the S3 UI to download the backup snapshot to your local machine.

Step 2: Extract the Snapshot

Use a tool like tar to extract the contents of the downloaded archive. This will create a directory containing the complete mirrored (bare) repository, which is a special type of repository without a working directory.

$ tar -xzf <snapshot-name>.tar.gz

Replace <snapshot-name> with the actual filename of your downloaded snapshot.

Example:

$ tar -xzf larose-utt-20240602T161101Z.tar.gz

Step 3: Clone the Bare Repository as a Regular Repository

The extracted directory contains a bare Git repository, meaning it only holds the Git data (commits, branches, tags) but not your working files.

To convert the bare repository into a regular working directory, use the git clone command, specifying the extracted directory as the source and a new directory for your restored working repository.

$ git clone <extracted_directory_name> my-restored-repo

Replace <extracted_directory_name> with the actual name of the extracted directory and my-restored-repo with your desired name for the restored working directory.

Example:

$ git clone larose-utt my-restored-repo

Your Git repository is now restored and ready to use.

Source Code

Download the source code from this link.