The Atomic Broker: Architecting Serverless Message Queues with Object Storage & Python
Learn how to build scalable, serverless message queues using Object Storage conditional writes and Python. A cost-effective architecture guide for CTOs and Devs.
In the landscape of distributed systems, the message broker is often the heavy lifter. Whether it is Kafka, RabbitMQ, or a managed service like AWS SQS, these systems are the central nervous system of modern architecture. But for many organizations, they introduce significant complexity: maintenance overhead, vendor lock-in, and costs that scale aggressively with throughput.
But what if you could eliminate the dedicated broker entirely? What if your storage layer was your queue?
With the evolution of Object Storage (like AWS S3, Google Cloud Storage, or MinIO) offering strong consistency and conditional write operations, we can now architect serverless, infinite-scale message queues using nothing but standard storage buckets and Python. We call this pattern "The Atomic Broker." In this post, we will dismantle the traditional queuing model and rebuild it using object storage primitives, offering a robust solution for high-payload, asynchronous processing workflows that can save your organization significant infrastructure costs.
The Paradigm Shift: Why Treat Storage as a Queue?
Traditionally, developers are taught to separate storage (databases, object stores) from transport (queues, streams). The standard pattern for handling large data payloads—such as video rendering jobs, AI model training datasets, or large-scale ETL—is the Claim Check Pattern.
- Step 1: Upload the heavy payload (GBs of data) to Object Storage.
- Step 2: Send a tiny reference message (the "claim check") to SQS or RabbitMQ.
- Step 3: The consumer reads the queue, gets the reference, and downloads the payload.
While effective, this introduces two points of failure and two billing meters. You are paying for the storage and the queue requests. Furthermore, you have to manage the synchronization between the two; if the queue message is lost but the file remains, you have orphaned data. If the file is deleted before the message is processed, you have a processing error.
The Atomic Broker approach collapses this stack. By leveraging the atomic guarantees of modern object storage, the file is the message. When a file lands in a specific bucket prefix (e.g., /inbox), it is effectively "enqueued." When a consumer successfully moves it to /processing using a conditional operation, it is "locked." This simplifies the architecture, reduces moving parts, and leverages the practically infinite scalability of cloud storage.
The Secret Sauce: Conditional Writes and Atomicity
The biggest challenge in using a file system or object store as a queue is concurrency. If you have ten worker nodes scanning a bucket and they all see job_123.json, how do you ensure only one worker picks it up? Without a broker managing state, you risk race conditions where multiple workers process the same job.
The solution lies in Conditional Writes (specifically If-None-Match or "Preconditions"). This is the mechanism that turns a dumb storage bucket into an intelligent locking system.
In distributed systems, a conditional write allows you to say: "Only write this file if a file with this name does NOT already exist."
Here is the architectural workflow for atomicity:
- Producer: Writes a JSON payload to
s3://my-bucket/queue/task_id.json. - Consumer: Lists objects in the
/queueprefix. - The Atomic Lock: The consumer attempts to copy the object to
s3://my-bucket/processing/task_id.jsonIF AND ONLY IF that file does not already exist in the processing folder. - Verification: If the storage service returns a success (200 OK), the consumer has won the "lock." It can now safely delete the original from
/queueand begin work. If it fails (412 Precondition Failed), another worker beat them to it, and they move to the next message.
This leverages the cloud provider's internal consistency engine to handle the locking for us, effectively offloading the complexity of distributed consensus to AWS or Google Cloud.
Implementing the Atomic Broker in Python
Let’s look at how we implement this practically using Python. We will use the boto3 library as an example, targeting AWS S3, though the logic applies to GCS or MinIO just as well. The key is catching the specific error code that indicates a failed condition.
First, the Producer is straightforward—it simply dumps data:
import boto3
import json
import uuid
s3 = boto3.client('s3')
def enqueue_message(bucket, data):
task_id = str(uuid.uuid4())
key = f"queue/{task_id}.json"
s3.put_object(
Bucket=bucket,
Key=key,
Body=json.dumps(data)
)
return task_idThe Consumer is where the magic happens. We need to attempt to "claim" a file. Note that S3 does not have a native atomic "Move" operation, so we implement a Copy-if-not-exists strategy to create a lock.
import botocore
def acquire_lock(bucket, task_key):
filename = task_key.split('/')[-1]
lock_key = f"processing/{filename}"
try:
# Attempt to copy ONLY if the destination does not exist.
# Note: S3 CopyObject doesn't natively support If-None-Match on the DESTINATION directly in all SDKs easily,
# so a robust alternative is using a separate 0-byte lock file or utilizing DynamoDB for the lock.
# However, for pure S3, we can use 'PutObject' with 'IfNoneMatch' as a sidecar lock.
s3.put_object(
Bucket=bucket,
Key=f"locks/{filename}.lock",
Body=b'locked',
IfNoneMatch='*'
)
# If we get here, we own the lock.
return True
except botocore.exceptions.ClientError as e:
error_code = e.response['Error']['Code']
if error_code in ['PreconditionFailed', '412']:
# Another worker claimed it
return False
raise e
def worker_process(bucket):
# List available tasks
response = s3.list_objects_v2(Bucket=bucket, Prefix='queue/')
for obj in response.get('Contents', []):
key = obj['Key']
if acquire_lock(bucket, key):
print(f"Processing {key}...")
# ... Perform heavy AI or Data Logic here ...
# Cleanup: Delete from queue and remove lock
s3.delete_object(Bucket=bucket, Key=key)
s3.delete_object(Bucket=bucket, Key=f"locks/{key.split('/')[-1]}.lock")Important Consideration: While the code above demonstrates the logic, production environments should implement a "visibility timeout" mechanism. If a worker crashes after acquiring a lock but before finishing, that message becomes a "zombie." A robust Atomic Broker includes a reaper process that checks the timestamps of lock files and releases them if they are older than a specific threshold (e.g., 10 minutes).
Strategic Fit: When to Use (and When to Avoid)
As with any architectural pattern, the Atomic Broker is not a silver bullet. It is a specialized tool for specific scenarios. Understanding when to deploy this versus a standard Kafka or SQS setup is vital for a CTO or Lead Architect.
Use the Atomic Broker when:
- Payloads are Large: If you are processing images, video, or large datasets, you are already using Object Storage. Combining the queue and storage reduces latency and cost.
- Throughput is Low-to-Medium: If you are processing hundreds or thousands of complex jobs per minute, this works beautifully.
- Cost is a Primary Factor: You avoid the hourly costs of managed brokers. You only pay for S3 requests and storage.
- Vendor Agnosticism is Required: This pattern works on AWS, Azure, GCP, and on-premise with MinIO. It makes multi-cloud deployments significantly easier.
Stick to Standard Queues (SQS/Kafka) when:
- Latency is Critical: If you need sub-millisecond message delivery, object storage API calls (which can take 20-100ms) are too slow.
- Throughput is Massive: For millions of messages per second (e.g., IoT telemetry, clickstream data), object storage API limits (typically 3,500 PUTs/sec per prefix) will become a bottleneck.
- Message Ordering is Strict: Object storage does not guarantee FIFO (First-In-First-Out). If order matters, use a FIFO queue.
By correctly identifying the workload, Nohatek has helped clients reduce cloud bills by up to 40% simply by removing unnecessary middleware layers in their AI processing pipelines.
The Atomic Broker architecture represents a shift towards simplification. In an era where cloud architectures are becoming increasingly convoluted, collapsing the queue and storage layers offers a refreshing alternative that is cost-effective, durable, and surprisingly powerful. By mastering conditional writes and Python automation, you can build a serverless processing engine that scales with your data, not your infrastructure budget.
At Nohatek, we specialize in optimizing cloud architectures and building high-performance custom software. Whether you are looking to refactor legacy message queues or build a greenfield AI pipeline, our team is ready to help you architect the most efficient solution.
Ready to optimize your cloud infrastructure? Contact Nohatek today for a consultation.