[GitHub Trending] LMCache/LMCache

7.6 relevance

KV cache layer for LLMs, directly relevant to AI/ML infrastructure and performance optimization.

AI/ML github.com

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache

Summary

LMCache is a vendor-neutral KV cache management layer for LLM inference that persists and reuses KV caches across serving engines, reducing time-to-first-token (TTFT) and improving throughput for long-context and agentic workloads. It supports tiered storage (GPU, CPU, SSD, Redis, S3) and engine-independent deployment, with integrations including vLLM, NVIDIA Dynamo, and PyTorch Foundation. Key features include non-prefix KV reuse via CacheBlend, PD disaggregation with KV transfer over NVLink/RDMA, and production-grade observability metrics.

Author

LMCache