rsync — Efficient File Synchronisation

RSYNC

rsync transfers only the differences between files, making it the standard tool for backups, deployments, and file synchronisation across systems. Its delta algorithm minimises bandwidth; its SSH transport makes it secure. Most production backup pipelines and deployment scripts use rsync under the hood.

applicationrsyncbackupsynchronisationdeltasshrfc

Overview

rsync is a file transfer utility and protocol designed around a single insight: when synchronising files between two systems, you usually do not need to transfer the entire file — only the parts that changed. The rsync delta-transfer algorithm computes the differences between the source and destination versions of a file and transfers only the changed blocks.

rsync operates in two transport modes:

rsync is not just a protocol — it is the de facto standard tool for:


The Delta Algorithm

The rsync algorithm, developed by Andrew Tridgell, works in three phases:

Phase 1 — Checksum generation (receiver side): The receiving system splits the destination file into fixed-size blocks (typically 700 bytes) and computes two checksums for each block: a fast rolling Adler-32 checksum and a slower MD4 checksum. It sends this list of checksums to the sender.

Phase 2 — Block matching (sender side): The sender scans the source file using a sliding window. For each position, it computes the rolling checksum. If it matches any block’s rolling checksum, it computes the MD4 checksum to confirm. Matching blocks are referenced by their offset in the destination file rather than transmitted. Non-matching data is added to a literal list.

Phase 3 — Delta transmission: The sender transmits a sequence of instructions: “copy block N from existing file” or “insert these literal bytes”. The receiver reconstructs the new file by following these instructions.

The result: if you modify 10KB of a 1GB file, rsync transfers approximately 10KB, not 1GB.


rsync Over SSH

rsync Client
SSH Daemon
SSH connection → port 22
Authenticated, encrypted channel
exec rsync --server ...
rsync started as remote subprocess
File list exchange
Client sends list of files to sync
Read destination checksums
Block checksums of existing files
Block checksum list
Delta (changed blocks only)
Only differences transmitted
Reconstruct files
Apply delta to existing files
Transfer statistics
Bytes sent, received, transfer rate

Common rsync Commands

# Basic sync — local to remote
rsync -avz /local/path/ [email protected]:/remote/path/

# Remote to local (backup pull)
rsync -avz [email protected]:/remote/path/ /local/backup/

# Dry run — show what would change without doing it
rsync -avzn /source/ user@remote:/dest/

# Mirror with deletion (remove files in dest not in source)
rsync -avz --delete /source/ user@remote:/dest/

# Exclude patterns
rsync -avz --exclude='*.log' --exclude='.git/' /source/ user@remote:/dest/

# Preserve hard links, ACLs, extended attributes
rsync -avzHAX /source/ /dest/

# Limit bandwidth (KB/s)
rsync -avz --bwlimit=10000 /source/ user@remote:/dest/

# Resume interrupted transfer (partial files)
rsync -avz --partial --progress /source/ user@remote:/dest/

# Use specific SSH key or port
rsync -avz -e "ssh -i ~/.ssh/backup_key -p 2222" /source/ user@remote:/dest/

Key Options

FlagEffect
-a (archive)Recursive, preserve permissions/ownership/timestamps/symlinks
-vVerbose — show files being transferred
-zCompress data during transfer
-nDry run — no changes made
--deleteDelete destination files not present in source
--progressShow per-file transfer progress
--partialKeep partially transferred files (enables resume)
--checksumCompare by checksum instead of timestamp+size
--bwlimitThrottle transfer speed

rsync Daemon Mode

When rsync runs as a daemon (port 873), it serves modules — named directory trees defined in /etc/rsyncd.conf:

# /etc/rsyncd.conf
[backups]
    path = /var/backups/
    comment = Backup storage
    read only = false
    auth users = backupuser
    secrets file = /etc/rsyncd.secrets
    hosts allow = 10.0.0.0/24

[mirrors]
    path = /srv/mirror/
    comment = Public mirror
    read only = true

Connect to a module:

rsync -avz [email protected]::backups/myhost/ /local/restore/

Daemon mode authentication uses a simple username:password file — not SSH keys. The password is transmitted as an MD5 hash of a challenge. For any sensitive use, prefer rsync over SSH rather than the daemon protocol.


Backup Patterns

Push backup (client initiates): Each server runs a cron job pushing its data to a backup server. Simple but requires the backup server to have SSH access from each client, or clients to have write access to the backup destination.

Pull backup (backup server initiates): The backup server SSHes into each client and pulls data. Preferred from a security standpoint — clients do not need outbound access to the backup server; the backup server controls the schedule.

Snapshot backups with hard links (rsnapshot): Uses rsync plus hard links to maintain multiple point-in-time snapshots while storing only changed files once. A week of hourly snapshots of a 100GB system might use only 105GB total storage.


Key Concepts

The trailing slash matters

rsync /source/ /dest/ syncs the contents of source into dest. rsync /source /dest/ syncs the directory itself into dest, creating /dest/source/. This distinction trips up even experienced administrators — always verify with --dry-run first.

rsync does not encrypt by default in daemon mode

rsync over SSH inherits SSH’s encryption. rsync daemon mode (port 873) is plaintext. Never use daemon mode for sensitive data over untrusted networks — use the SSH transport instead.


References