The Inference Scheduler: Architecting High-Throughput LLM Serving with Continuous Batching and vLLM on Kubernetes
The Ephemeral Sandbox: Architecting Secure Runtime Environments for AI Coding Agents with Firecracker MicroVMs