Breaking the Bandwidth Bottleneck: Benchmarking Rclone vs. Rsync for 4x Faster Cloud Migrations
Discover why rclone outperforms rsync for modern cloud migrations. Learn how parallelism and object-storage optimization can quadruple your transfer speeds.
In the world of IT infrastructure, few things induce anxiety quite like a massive data migration. Whether you are a CTO overseeing a lift-and-shift to AWS or a DevOps engineer tasked with syncing petabytes of training data for an AI model, the physics of data transfer often feels like the enemy. For decades, rsync has been the gold standard for file synchronization—reliable, robust, and universally understood. But as infrastructure moves from local block storage to distributed object storage, the old tools are beginning to show their age.
At Nohatek, we frequently assist clients in moving massive datasets to the cloud to fuel AI initiatives and modernize legacy stacks. We’ve noticed a recurring pattern: teams clinging to rsync are hitting a performance wall, not because of bandwidth limitations, but because of protocol inefficiencies. In this deep dive, we are benchmarking the industry veteran against the cloud-native challenger, rclone, to demonstrate how shifting your tooling strategy can result in 4x faster migration speeds.
The Legacy of Rsync: Why It Struggles in the Cloud
To understand why we need a change, we must first respect the incumbent. rsync is a masterpiece of engineering. Its delta-transfer algorithm, which only sends the differences between source and destination files, is unbeatable for local-to-local or server-to-server transfers over SSH. If you are updating a 10GB log file where only a few kilobytes have changed, rsync is magic.
However, the cloud introduces a different set of physics. Modern cloud storage (S3, Google Cloud Storage, Azure Blob) is Object Storage, not block storage. Object storage APIs generally do not support the granular block-level operations that rsync relies on to calculate deltas. Consequently, when you use rsync over a mounted file system (like S3FS) or standard SSH tunnels to a cloud instance, you often encounter the "high latency penalty."
The Bottleneck: Rsync is single-threaded by nature. It processes files serially—one after another. When moving millions of small files to an S3 bucket, the handshake overhead for each file often takes longer than the data transfer itself.
In a high-latency cloud environment, a serial process leaves 90% of your available bandwidth sitting idle. You might have a 10Gbps pipe, but rsync might only utilize 50Mbps because it is waiting for the server to acknowledge the previous file before sending the next.
Enter Rclone: The Swiss Army Knife of Cloud Storage
If rsync is a surgical scalpel, rclone is a heavy-duty industrial conveyor belt. Designed specifically for cloud storage, rclone interfaces directly with the APIs of over 40 cloud providers. It doesn't try to treat the cloud like a hard drive; it treats it like an API.
The primary architectural difference—and the source of the speed boost—is parallelism. Rclone is multi-threaded. It can transfer multiple files simultaneously, saturate your bandwidth, and manage the overhead of API requests much more efficiently than a serial process.
Key Advantages of Rclone:
- Multi-threading: You can define how many files to transfer at once using the
--transfersflag. - Chunking: Large files are split into chunks and uploaded in parallel.
- API Optimization: It understands the specific limitations and features of S3, GCS, and Azure, handling retries and backoffs natively.
- Checksum Validation: It ensures data integrity using the cloud provider's native hashing (MD5/SHA1) rather than reading the file back (which costs money in egress fees).
For a company looking to migrate a data lake or a massive image repository, these features aren't just convenient; they are critical for meeting downtime windows.
The Benchmark: Achieving the 4x Speed Boost
Let’s look at a practical scenario we recently encountered at Nohatek. A client needed to migrate 2TB of data consisting of 5 million small files (images and JSON logs) from an on-premise server to AWS S3.
The Setup
- Source: 10Gbps Fiber connection
- Destination: AWS S3 Standard
- Dataset: 2TB total, approx 400KB avg file size
Test 1: Rsync (over S3FS mount)
Using standard rsync -avz, the transfer speed hovered around 15-20 MB/s. The overhead of opening and closing millions of HTTP requests serially killed the throughput. At this rate, the migration was projected to take over 30 hours.
Test 2: Rclone (Default Settings)
Out of the box, rclone defaults to 4 parallel transfers. The speed jumped to 85 MB/s immediately. A 4x improvement just by switching binaries.
Test 3: Rclone (Tuned)
Here is where the magic happens. We tuned the concurrency to match the available CPU and bandwidth headroom:
rclone copy /local/data remote:bucket --transfers=32 --checkers=32 --fast-listWith --transfers=32, we saturated the link. Speeds hit 350 MB/s. The migration that was projected to take 30 hours with rsync was completed in under 2 hours. That is a massive reduction in engineering babysitting time and a faster time-to-value for the client.
Why 32 transfers? There is a point of diminishing returns where CPU context switching or disk I/O becomes the bottleneck. We typically recommend starting at --transfers=16 and stepping up until you see network saturation.
While rsync will always have a place in our hearts (and our local cron jobs), it is no longer the default choice for cloud-scale migrations. The shift from block storage to object storage requires tools that understand the latency and API mechanics of the cloud. By leveraging rclone and understanding the power of parallelism, IT leaders can turn week-long migration nightmares into manageable afternoon tasks.
However, raw speed is only one part of the equation. Data integrity, encryption, cost management (API request fees), and delta strategies are equally important. At Nohatek, we specialize in architecting these high-performance cloud environments. Whether you are building the infrastructure for your next AI model or migrating legacy systems to a modern cloud stack, we ensure your data gets there fast, safe, and cost-effectively.
Ready to optimize your cloud infrastructure? Contact the Nohatek team today to discuss your migration strategy.