Engineering Design Document

Meridian
TB-Scale Reliable Data Transfer

A production-grade, highly available system for transferring terabytes of data across nodes with zero data loss, end-to-end integrity verification, and automatic failover.

50+ TB
Max Transfer
80%+
Link Utilization
Zero
Data Loss
<5s
Resume Time
Version 1.0 · March 2026

Executive Summary

This document specifies the architecture, design decisions, and implementation details for Meridian, a production-grade system that transfers terabyte-scale datasets between nodes with zero data loss.

The system provides resumable transfers, content-defined deduplication, multi-path parallelism, end-to-end cryptographic integrity verification, and automatic failover — all while saturating available network bandwidth.

Resumable Transfers
Content-Defined Dedup
Multi-Path Parallelism
Cryptographic Integrity
Automatic Failover
Bandwidth Saturation

Requirements

Functional Requirements

  • Bulk Transfer: Transfer datasets from 1 GB to 50+ TB between two nodes (source and destination).
  • Resumability: If interrupted at any point (crash, network failure, reboot), resume from the point of interruption, not restart from scratch.
  • Incremental Sync: When transferring a previously-transferred dataset, only changed portions are re-sent.
  • Integrity Guarantee: The receiver cryptographically verifies that received data is bit-for-bit identical to the source. Silent corruption is detected.
  • Atomic Commit: A transfer is either fully committed or not committed at all. No partial states visible to consumers.
  • Multi-File Support: A single transfer job can include thousands of files in a directory tree, preserving structure on the destination.

Performance Targets

MetricTargetRationale
Throughput≥80% of link capacityOn a 10 Gbps link, achieve ≥8 Gbps sustained
Chunk Latency (p99)<50 msEnd-to-end time for one chunk: send, verify, ACK
Resume Time<5 secondsTime from process restart to transfer resumption
Dedup Ratio (incremental)≥40%For typical edit patterns on previously-transferred data

Reliability Targets

MetricTargetRationale
Data LossZeroNot one byte lost, ever. Cryptographic verification end-to-end.
Transfer Success Rate99.999%Measured over 30-day rolling window
Recovery from Node CrashAutomaticWAL-based recovery, no manual intervention
Recovery from Network PartitionAutomaticReconnect with exponential backoff, delta sync on reconnect

Security Requirements

Encryption in Transit
TLS 1.3 with mTLS. AES-256-GCM, ChaCha20-Poly1305.
Encryption at Rest
AES-256-GCM per-chunk. Per-transfer DEKs. Master key in HSM/Vault.
Access Control
RBAC with OPA policy engine. Roles: admin, operator, transfer-agent, read-only.
Audit Logging
Immutable, append-only audit log. 2-year retention.

Scalability

  • • Datasets up to 50 TB in a single transfer job
  • • Up to 10 million chunks per transfer (~40 TB at 4 MB average)
  • • Up to 100 concurrent transfer jobs per node
  • • Horizontal scaling via multiple sender/receiver pairs

Observability

  • Metrics: Prometheus-compatible on :9090 — throughput, chunk latency histogram, dedup ratio, error rate, WAL backlog
  • Tracing: OpenTelemetry distributed tracing with per-chunk spans. 100% sampling on failures, 10% on success
  • Alerting: Pre-configured rules for throughput drops, integrity failures, stalled transfers, disk space, certificate expiry
Constraints & Assumptions
  • • Both nodes run Linux (kernel 5.6+ for io_uring support)
  • • Network link at least 1 Gbps. Optimized for 10–100 Gbps
  • • NVMe SSD storage on both nodes
  • • All ordering uses logical clocks, not wall clocks (no time sync required)
  • • Designed for batch/bulk transfers, not real-time streaming

High-Level Architecture

The system consists of three planes: the data plane (chunking, transfer, reassembly), the control plane (coordination, state management, health monitoring), and the observability plane (metrics, tracing, alerting).

Architecture: End-to-End Data FlowOpen in Excalidraw →

End-to-End Data Flow

1. File Scanning
Scan directory tree, compute metadata. inotify/fswatch for change detection.
2. CDC Chunking
Variable-size chunks (256 KB – 16 MB, avg 4 MB) via Buzhash rolling hash.
3. Deduplication
SHA-256 hash checked against receiver's dedup index. Skip existing chunks.
4. Compression
Adaptive: LZ4, Zstd L3, Zstd L9, or none based on entropy sampling.
5. Transport
gRPC bi-directional streams over 8 parallel TCP connections. TLS 1.3.
6. Integrity
CRC32C (wire), SHA-256 (chunk), Merkle root (file completeness).
7. Reassembly & Commit
Chunks reassembled via manifest. fsync'd. Atomically committed.

Transfer State Machine

QUEUEDSCANNINGCHUNKINGNEGOTIATINGTRANSFERRINGVERIFYINGCOMMITTINGCOMPLETE

Failure at any state triggers RETRYING (exponential backoff). After max retries: FAILED with diagnostic snapshot. Operators can PAUSE or CANCEL at any point.

Component Responsibilities

ComponentLocationResponsibility
File ScannerSourceDiscover files, detect changes, feed chunker
CDC ChunkerSourceSplit files into content-defined variable-size chunks
Dedup IndexBothTrack which chunks exist at each node (Bloom filter + RocksDB)
Transfer QueueSourcePriority queue of chunks to send, backed by WAL
Transport PoolBoth8 gRPC connections, 16 streams each, TLS 1.3, flow control
Integrity VerifierDestinationCRC32C, SHA-256, Merkle tree verification
ReassemblerDestinationReconstruct files from chunks using manifest
WALBothCrash-safe state tracking (append-only, fsync'd)
CoordinatorBothState machine management, retry logic, job lifecycle
Metrics ExporterBothPrometheus metrics, OpenTelemetry traces

Content-Defined Chunking

Why Not Fixed-Size Chunks?

Fixed-size chunking (e.g., splitting every 4 MB) breaks catastrophically on insertions. Consider a 1 TB file where a 50 KB row is inserted near the beginning:

  • Fixed chunks: Every chunk boundary after the insertion shifts by 50 KB. Every chunk has a different SHA-256 hash. You must re-transfer the entire 1 TB file.
  • CDC chunks: Boundaries are data-dependent via rolling hash. An insertion only affects 2–3 chunks near the edit point. All other boundaries remain stable.

CDC reduces re-transfer volume by 90–99% compared to fixed-size chunking.

Algorithm: Buzhash

We use Buzhash (bitwise rolling hash) rather than Rabin fingerprinting:

PropertyBuzhashRabin
Speed~3.5 GB/s~2.8 GB/s
Distribution QualityGoodExcellent
ImplementationXOR + rotate (simple)Polynomial arithmetic (moderate)

We chose Buzhash because chunking is CPU-bound and runs inline with I/O. The 25% speed advantage keeps pace with NVMe read speeds (~3.5 GB/s). Rabin's better distribution doesn't matter because we rely on SHA-256 for content addressing, not the rolling hash.

Chunk Size Parameters

ParameterValueRationale
Minimum Chunk256 KBPrevents pathological tiny chunks from repetitive data patterns
Average Chunk4 MBSweet spot: 2.5M chunks per 10 TB fits in ~640 MB RAM index
Maximum Chunk16 MBCaps worst-case for uniform data (zeros, compressed streams)
Window Size48 bytesSliding window for rolling hash computation

The 4 MB average is a three-way tension: smaller chunks (1 MB) give better dedup granularity but produce 10M chunks per 10 TB, requiring ~2.5 GB for the index. Larger chunks (16 MB) reduce index size but miss small edits. 4 MB is tunable per-environment.

The Chunk Manifest

ChunkMeta {
  chunk_index: u64,
  file_path:   String,
  offset:      u64,
  length:      u32,
  sha256:      [u8; 32]
}

The manifest records each chunk's position (byte offset within the file), size, and content hash. It's deterministic: given the same file content and CDC parameters, the same boundaries are always produced. For multi-file transfers, chunk indices are global across all files. The sender can transmit chunks in any order; the receiver uses the manifest to place each chunk correctly.

Deduplication Index

RocksDB Backend
SHA-256(chunk) → chunk_id, size, ref_count, locations[]
Bloom Filter
0.01% false positive rate. ~600 KB RAM for 2.5M chunks. Eliminates 99.99% of disk lookups.
Garbage Collection
ref_count=0 eligible for GC. Background compactor every 6 hours, 72-hour grace period.

Transport Layer

Transport Layer: Protocol Stack & TCP vs UDPOpen in Excalidraw →

Protocol Stack: gRPC over HTTP/2 over TCP

TCP (Layer 4)
Provides packet-level reliable delivery. 8 independent TCP connections, each with 128 MB socket buffer, BBR congestion control. Kernel handles retransmission at 1.5 KB granularity.
HTTP/2 (Framing)
Multiplexes 16 logical streams per TCP connection. Frames from different streams interleaved on the wire, tagged with stream IDs. 16-way parallelism per connection without separate handshakes.
gRPC (Application)
RPC semantics (TransferChunks, NegotiateChunks, CommitTransfer), protobuf serialization, bi-directional streaming, deadline/cancellation propagation. Each gRPC stream maps to one HTTP/2 stream.
Combined
8 TCP connections × 16 gRPC streams = 128 parallel chunk transfers. A packet loss on connection #3 only stalls its 16 streams; the other 112 streams continue unaffected.

Why 8 Connections and 16 Streams?

Too Few (1 TCP, 128 streams)
Single packet loss causes head-of-line blocking across all 128 streams. At 0.1% loss, effective throughput drops dramatically.
Too Many (128 TCP, 1 stream each)
128 TLS handshakes, 128 × 128 MB = 16 GB RAM, 128 congestion windows fighting for bandwidth, middlebox DDoS flags.
Sweet Spot (8 TCP, 16 streams)
8 fault domains limit blast radius to 12.5%. 16 streams amortize overhead. 8 handshakes, 2 GB buffer memory, good fairness.

What is Multiplexing?

Without multiplexing, sending 16 chunks simultaneously requires 16 TCP connections. With multiplexing, one TCP connection carries all 16 by interleaving. HTTP/2 chops each chunk's data into small frames (16 KB default), stamps each frame with a stream ID, and interleaves them on the wire. The receiver reads frames, sorts by stream ID, and reassembles each stream independently. This is purely a framing-layer concern — invisible to application code.

Why TCP and Not UDP?

The Granularity Argument

TCP operates at the packet level (1.5 KB). Our application operates at the chunk level (4 MB). A 4 MB chunk is ~2,700 packets. If one packet is lost:

  • With TCP: Kernel retransmits just 1.5 KB. Application receives a complete 4 MB chunk. Cost: 1.5 KB re-sent, ~50 ms delay.
  • With UDP (chunk-level retry): SHA-256 verification fails. Discard all 4 MB and re-request. Cost: 4 MB re-sent — a 2,700× amplification. At 0.1% loss, ~70% of all data is re-transmitted.
  • With UDP (packet-level reliability): Track individual packets, detect losses, retransmit. But this requires sequence numbers, retransmit timers, duplicate detection, congestion control — you've reimplemented TCP.

TCP's ordering guarantee is a small tax (occasional HOL blocking). Its packet-level reliability saves us from 2,700× retransmit amplification.

QUIC Fallback

QUIC runs over UDP but builds its own per-stream reliability, eliminating TCP's head-of-line blocking. Used as a fallback when measured packet loss exceeds 1% (typically WAN/internet paths). Not the default because: (1) NIC hardware offloading only works for TCP; (2) BBR over TCP is more mature; (3) kernel bypass (io_uring) integrates better with TCP sockets.

Adaptive Compression

AlgorithmRatioSpeedWhen Used
LZ42.1×4.5 GB/sCPU bottleneck (fast network, slow CPU)
Zstd level 33.2×1.2 GB/sBalanced default
Zstd level 93.8×200 MB/sNetwork bottleneck (slow WAN link)
None1.0×Already compressed/encrypted data (entropy >7.5 bits/byte)

The first 64 KB of each chunk is sampled to estimate entropy. High-entropy data is sent uncompressed. Algorithm choice between LZ4/Zstd is based on the ratio of measured network throughput to available CPU cycles.

Zero-Copy I/O & Kernel Optimizations

sendfile()
Transfers data directly from disk to socket without copying into userspace. Saves 2 memory copies per chunk.
io_uring
Linux 5.6+ async batched I/O submission. Up to 32 SQEs per syscall, reducing context switches 10–30×.
Huge Pages
2 MB huge pages for chunk buffers to reduce TLB misses during hash computation and compression.

Why a Single TCP Connection Doesn't Fill the Pipe

A 10 Gbps NIC can push 1.25 GB/s onto the wire. A single “normal” TCP connection achieves a fraction of this. Five independent bottlenecks compound:

1. Socket Buffer Limits
TCP throughput is bounded by: throughput = socket_buffer / RTT. Default Linux socket buffers are ~4 MB. With 50 ms RTT: 4 MB / 50 ms = 640 Mbps — just 6.4% of a 10 Gbps link. The kernel defaults to small buffers because a server with 1,000 connections at 128 MB each would need 256 TB of RAM.
2. Congestion Window Ramp-Up
TCP starts with a small congestion window (~14.6 KB) and probes upward. To fill a 10 Gbps link with 50 ms RTT, cwnd must reach the bandwidth-delay product: 10 Gbps × 50 ms = 62.5 MB. CUBIC ramps conservatively. BBR probes much faster.
3. Head-of-Line Blocking
HTTP/2 on one TCP connection: a lost packet stalls ALL streams until retransmission (one RTT). At 0.1% loss on 10 Gbps, thousands of stall events per second. Multiple TCP connections contain the blast radius.
4. Receiver Processing Bottleneck
Single-threaded receiver doing decompress + SHA-256 + disk write tops out at 350–500 MB/s. Buffer fills, TCP window closes, sender throttled. We use 64 parallel processing workers.
5. Memory Copy Overhead
Normal read()/write() copies data twice per direction. At 10 Gbps that's ~2.5 GB/s of memcpy competing with application work. sendfile(), io_uring, and zero-copy techniques reduce this.

Combined Effect

BottleneckEffect on 10 GbpsMitigation
Default 4 MB buffers640 Mbps (6%)128 MB per-socket buffers
CUBIC cwnd ramp~7 Gbps (70%)BBR congestion control
HOL blocking (0.1% loss)~4 Gbps (40%)8 independent TCP connections
Single-threaded receiver~2.8 Gbps (28%)64 parallel processing workers
2× memory copies~2.2 Gbps (22%)sendfile() + io_uring + huge pages
All mitigations applied~8 Gbps (80%)Remaining 20%: TLS, checksums, fsync

Data Integrity

Four Layers of Data Integrity & Merkle TreeOpen in Excalidraw →

Four layers of verification, each catching a different class of corruption at a different point in the pipeline:

Layer 1: CRC32C on the Wire
Every chunk carries a CRC32C checksum in its protobuf header. Verified immediately upon receipt, before writing to disk. Hardware-accelerated (SSE4.2, ~30 GB/s), ~0.1% overhead. Catches: bit flips in transit, NIC errors, kernel buffer corruption.
Layer 2: SHA-256 Content Hash
After decompression, receiver computes SHA-256 of raw chunk data and compares to manifest hash. Uses hardware acceleration (SHA-NI on x86, CE on ARM) at ~1.5 GB/s. Catches: decompression errors, memory corruption, anything CRC32C might miss.
Layer 3: Merkle Tree
Once all chunks for a file are received, a Merkle tree is built from chunk SHA-256 hashes. Root compared against sender's. Catches: missing chunks, duplicate chunks, misordered chunks. Enables O(log n) failure localization via binary search.
Layer 4: Background Scrubbing
After commit, a background scrubber periodically re-reads stored chunks and re-verifies SHA-256 hashes. Detects bit rot from storage media degradation. Low I/O priority (ionice class 3). Full scrub of 10 TB completes in ~8 hours at 350 MB/s.
Why Three Layers Instead of Just SHA-256?

SHA-256 alone is technically sufficient. CRC32C adds value because it is 20× faster and catches wire errors before wasting CPU on decompression and SHA-256. The Merkle tree adds value because per-chunk SHA-256 cannot detect missing or misordered chunks. A single Merkle root comparison proves the entire dataset is correct; without it, you'd compare 2.5 million hashes individually.

Reliability & Fault Tolerance

Write-Ahead Log (WAL)

The WAL is an append-only file on disk — not a service, not a database, not a distributed system. It stores a sequence of binary records, each describing a state transition:

Record TypeMeaningFields
CHUNK_QUEUEDChunk added to send queuetransfer_id, chunk_index, sha256
CHUNK_SENTChunk data written to sockettransfer_id, chunk_index
CHUNK_ACKEDReceiver confirmed receipttransfer_id, chunk_index, ack_status
CHUNK_COMMITTEDWritten to final location + fsync'dtransfer_id, chunk_index
STATE_CHANGETransfer phase transitiontransfer_id, old_state, new_state
CHECKPOINTSnapshot of all in-flight statefull state map
The Write-Ahead Contract
Before doing any action, write a WAL record describing the intended action. Only after the record is safely on disk (fsync'd) do you execute the action. If you crash between the WAL write and the action, on restart you replay the WAL and retry. Because all operations are idempotent (sending a duplicate chunk is a no-op via dedup), retrying is always safe.

Why a WAL and Not a Database?

The WAL is append-only: every write is sequential I/O at the end of the file — the fastest possible disk operation. NVMe SSDs can do 500K+ sequential small writes per second. A database doing random I/O, maintaining B-tree indexes, and journaling is 10–50× slower for this access pattern. With 64 parallel workers recording state, a database's row-level locking creates contention. An append-only file has zero lock contention.

Crash Recovery Scenarios

Sender Crash
Read WAL from last checkpoint. Identify chunks in QUEUED or SENT-but-not-ACKED state. Resume from there. Skip file scanning if manifest already persisted.
Receiver Crash
Read WAL. Identify partially-written chunks (ACKED but not COMMITTED). Delete incomplete chunks. Send updated manifest to sender. Sender fills gaps.
Both Crash Simultaneously
Both replay WALs independently. Sender sends manifest; receiver responds with what it has. Delta transfer resumes. Worst case — verified with chaos testing.
Network Partition
gRPC keepalive pings every 10s. 3 consecutive failures (30s) = dead. Sender pauses and queues locally. Auto-reconnect with exponential backoff (1s → 60s cap). Lightweight delta sync via bitmap.

Retry Strategy

delay = min(base × 2^attempt + random(0, jitter), max_delay)
base: 100ms, jitter: 50ms, max: 30s, max_attempts: 5

Per-connection retries: if a gRPC stream breaks, reconnect with fresh TLS handshake. All in-flight chunks on that stream are re-queued.

Chaos Engineering Tests

TestMechanismValidates
Kill sender mid-transferSIGKILL + restartWAL replay, resume from checkpoint
Kill receiver mid-writeSIGKILL + restartPartial chunk cleanup, gap filling
Network partition (60s)iptables DROPReconnect + delta sync
Flip random bits on wiretc netem corrupt 1%CRC32C + SHA-256 catch all corruption
Disk full on receiverfallocate fill diskBackpressure, THROTTLE ack, retry
Slow disk (10ms latency)dm-delayI/O queue depth adaptation
OOM pressurecgroups memory limitGraceful degradation, fewer streams

Security

Transport Security

All data in transit uses TLS 1.3 with cipher suites AES-256-GCM-SHA384 and ChaCha20-Poly1305-SHA256. Mutual TLS (mTLS) requires both sender and receiver to present X.509 certificates signed by the internal CA. Certificate rotation every 90 days.

Encryption at Rest — Key Hierarchy

Master Key (MK)
Stored in HSM / AWS KMS / HashiCorp Vault. Never leaves the HSM. Rotates quarterly.
Data Encryption Key (DEK)
Per-transfer random 256-bit key. Encrypted with MK (envelope encryption). Stored alongside transfer metadata.
Chunk Nonce
Unique per-chunk (chunk_index as nonce counter). Ensures no nonce reuse even across retries.
Why Chunk-Level, Not Volume-Level (LUKS)?
  • • Per-transfer DEKs limit blast radius of key compromise to one transfer, not all data
  • • Destroying a DEK cryptographically erases that transfer instantly, without re-encrypting the volume
  • • Chunks can live on any storage backend (local disk, object store, archive) carrying their own encryption

When LUKS is acceptable: single-tenant systems where per-transfer key isolation isn't required.

Access Control & Audit

RBAC with OPA policy engine evaluates every transfer request against policies: allowed source/dest pairs, max transfer size, time-of-day windows, rate limits per-user. Immutable audit log records every API call, state transition, and access with who, what, when, from_ip, transfer_id, result. Retained for 2 years.

Observability

Key Metrics (Prometheus)

MetricTypeDescription
transfer_bytes_totalCounterTotal bytes transferred by direction and compression type
transfer_chunks_totalCounterChunks sent, received, deduped, or failed
transfer_throughput_bytesGaugeCurrent throughput per connection
chunk_latency_secondsHistogramEnd-to-end chunk delivery time (p50/p95/p99)
wal_entries_pendingGaugeUnprocessed WAL entries (backlog indicator)
dedup_ratioGaugeFraction of chunks skipped via deduplication
integrity_failures_totalCounterCRC or SHA mismatches (should be ~0)
connection_resets_totalCountergRPC stream resets (network health indicator)

Distributed Tracing

Every transfer gets a root span. Child spans for each phase: scan_files, chunk_file, negotiate_dedup, transfer_chunks (with per-chunk sub-spans for compress/encrypt/send), verify_merkle, commit. Sampling: 100% for failed transfers, 10% for successful (adaptive based on volume).

Alerting Rules

AlertConditionSeverity
Throughput Drop<50% of baseline for 5 minWARN
Integrity FailureAny checksum mismatchCRITICAL
Transfer StalledNo progress for 2 minWARN
WAL Backlog>10K pending entriesWARN
Connection Failures>5 resets in 1 minCRITICAL
Disk Space Low<10% free on receiverCRITICAL
Certificate Expiry<7 days to expirationWARN

Design Tradeoffs Summary

Every major decision involved rejecting viable alternatives:

DecisionChosenRunner-UpKey Tradeoff
TransportgRPC streamingRaw TCP5–8% overhead vs. months of protocol engineering
ChunkingCDC (Buzhash)Fixed-sizeCode complexity vs. 90–99% savings on incremental
Chunk Size4 MB average1 MB / 16 MBRAM vs. dedup granularity vs. round-trips
Integrity3-layerSHA-256 onlyComplexity vs. early detection + failure localization
ReliabilityWALDatabaseSequential I/O speed vs. query flexibility
DedupChunk-levelFile-level / NoneCPU+RAM overhead vs. 40–80% bandwidth savings
CompressionAdaptiveFixed Zstd-3Decision logic vs. optimal per-environment
Flow ControlReceiver-drivenSender rate-limitACK overhead vs. perfect receiver knowledge
EncryptionChunk-level AESVolume LUKSCode complexity vs. key granularity
Coordinationetcd + RocksDBPostgresOperational overhead vs. multi-node needs
Connections8 TCP × 16 streams1 TCP / 128 TCPHOL blast radius vs. connection overhead
Primary ProtocolTCP + QUIC fallbackQUIC-onlyNIC offload + maturity vs. loss handling

Operations & Capacity Planning

Kernel Tuning

# Socket buffers
net.core.rmem_max = 134217728          # 128 MB receive buffer max
net.core.wmem_max = 134217728          # 128 MB send buffer max
net.ipv4.tcp_rmem = 4096 1048576 134217728
net.ipv4.tcp_wmem = 4096 1048576 134217728

# Congestion control
net.ipv4.tcp_congestion_control = bbr

# Network queue
net.core.netdev_max_backlog = 250000

# VM tuning
vm.dirty_ratio = 40
vm.swappiness = 10

# File descriptors
ulimit -n 1048576  # 1M open fds

Deployment Modes

Lite Mode (2 nodes)
In-process coordinator, no etcd. WAL + RocksDB on local disk. Simplest deployment.
Standard Mode (2+ nodes)
etcd for coordination, RocksDB for local state. Multiple concurrent jobs and failover.
Relay Mode
Source → Relay → Destination. For transfers crossing network boundaries. Relay stores chunks temporarily.

Hardware Recommendations

ComponentSpecificationRationale
CPU16+ cores (AMD EPYC / Xeon)SHA-256, Zstd, CDC hashing are CPU-bound
RAM64 GB+Chunk buffers (64×16MB=1GB), dedup index, RocksDB block cache
Network25 Gbps+ NIC (dual-port for HA)10TB @ 25Gbps = ~53 minutes
StorageNVMe SSD, 2× transfer sizeChunk store + WAL. 3+ GB/s sequential writes
NIC OffloadTSO, GRO, TCP checksum offloadReduces CPU load for TCP processing

Transfer Time Estimates

Data Size10 Gbps25 Gbps100 Gbps
100 GB~1.3 min~32 sec~8 sec
1 TB~13 min~5.3 min~1.3 min
10 TB~2.2 hrs~53 min~13 min
50 TB~11 hrs~4.4 hrs~66 min

Assumes 80% link utilization, 0% dedup, Zstd level 3 (3.2× compression). Actual times significantly lower with dedup on incremental transfers.

When This System is Wrong

Use Something Simpler When...
  • Sub-100 GB, one-time transfer: Use rsync or scp. Engineering investment unjustified.
  • Streaming/real-time data: Use Kafka, Pulsar, or Kinesis. This is a batch system.
  • Cloud-to-cloud with managed services: Use AWS DataSync, GCP Transfer Service, or Azure Data Box.
  • Cross-region over public internet: Consider Aspera, Signiant with proprietary WAN optimization.
  • Immutable write-once data: Simple PUT to object storage with file-level checksums. CDC overhead adds no value.

Conclusion

Meridian is designed for organizations that need to move terabytes of data between nodes reliably, efficiently, and securely. Every design choice reflects a conscious tradeoff — gRPC over raw TCP for engineering velocity, CDC over fixed-size for incremental efficiency, WAL over database for I/O speed, multiple TCP connections over one for fault isolation.

The system is not the right tool for every job. For small, one-time transfers, rsync is simpler. For real-time streaming, Kafka is better. For cloud-managed environments, first-party transfer services may suffice. Meridian's value proposition is the combination of TB-scale throughput, zero-data-loss integrity, crash-safe resumability, and production-grade observability — all in a single, self-contained system.