Skip to content

[GitHub Trending] LMCache/LMCache

7.6 relevance
Score Breakdown
technical depth
8
novelty
8
actionability
7
community
7
strategic
6
personal
9

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

KV cache layer for LLMs, directly relevant to AI/ML infrastructure and performance optimization.

AI/ML github.com
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache
Summary

LMCache is a vendor-neutral KV cache management layer for LLM inference that persists and reuses KV caches across serving engines, reducing time-to-first-token (TTFT) and improving throughput for long-context and agentic workloads. It supports tiered storage (GPU, CPU, SSD, Redis, S3) and engine-independent deployment, with integrations including vLLM, NVIDIA Dynamo, and PyTorch Foundation. Key features include non-prefix KV reuse via CacheBlend, PD disaggregation with KV transfer over NVLink/RDMA, and production-grade observability metrics.

Author

LMCache