[GitHub Trending] LMCache/LMCache
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
KV cache layer for LLMs, directly relevant to AI/ML infrastructure and performance optimization.
LMCache is a vendor-neutral KV cache management layer for LLM inference that persists and reuses KV caches across serving engines, reducing time-to-first-token (TTFT) and improving throughput for long-context and agentic workloads. It supports tiered storage (GPU, CPU, SSD, Redis, S3) and engine-independent deployment, with integrations including vLLM, NVIDIA Dynamo, and PyTorch Foundation. Key features include non-prefix KV reuse via CacheBlend, PD disaggregation with KV transfer over NVLink/RDMA, and production-grade observability metrics.
LMCache