LMCache/LMCache
原文摘要
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer A KV Cache Management Layer for Scalable LLM Inference Blog | Documentation | Join Slack | Community Meeting | Roadmap Updates [2026/05] 🔥 Agentic workload benchmark on AMD MI300X ( blog ). [2026/04] 🔥 LMCache's new multiprocess(MP) architecture release ( blog ). [2026/03] LMCache at GTC 2026 ( post ). [2026/01] LMCache multi-node P2P CPU memory sharing, from experimental feature to production ( blog ). More [2025/11] LMCache x CoreWeave accelerate efficient LLM inference for Cohere ( blog ). [2025/10] LMCache joins the PyTorch Foundation and Tensormesh unveiled ( blog , PyTorch ). [2025/09] NVIDIA Dynamo integrates LMCache, accelerating LLM inference ( blog ). [2025/08] 🎉 LMCache hits 5,000+ GitHub stars ( blog ). [2025/08] LMCache supports gpt-oss (20B/120B) on day 1 ( blog ). [2025/07] Get faster LLM inference and cheaper responses with LMCache and Redis ( Redis blog ). [2025/07] LMCache extends its turbo-boost to multimodal models in vLLM V1 ( blog ). [2025/06] LLM Production Stack goes cross-hardware: AMD, Arm and Ascend ( blog ). About LMCache is a KV cache management layer for LLM inference. It turns KV cache f…