The Inference Scheduler: Architecting High-Throughput LLM Serving with Continuous Batching and vLLM on Kubernetes
The Knowledge Anchor: Architecting Hallucination-Resistant RAG Pipelines with Knowledge Graphs and Python