Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis

Posted by kaliades 1 day ago

Multi-tier exact-match cache for AI agents backed by Valkey or Redis. LLM responses, tool results, and session state behind one connection. Framework adapters for LangChain, LangGraph, and Vercel AI SDK. OpenTelemetry and Prometheus built in. No modules required - works on vanilla Valkey 7+ and Redis 6.2+.

Shipped v0.1.0 yesterday, v0.2.0 today with cluster mode. Streaming support coming next.

Existing options locked you into one tier (LangChain = LLM only, LangGraph = state only) or one framework. This solves both.

npm: https://www.npmjs.com/package/@betterdb/agent-cache Docs: https://docs.betterdb.com/packages/agent-cache.html Examples: https://valkeyforai.com/cookbooks/betterdb/ GitHub: https://github.com/BetterDB-inc/monitor/tree/master/packages...

Happy to answer questions.

Comments

Comment by potter098 7 hours ago

I’d be curious how you’re handling freshness for tool caches. Exact-match caching seems great for pure functions, but once a tool depends on external state I’d want a TTL or invalidation hook, otherwise the hit rate can look great while the answer is already stale.

Comment by kaliades 2 hours ago

[dead]

Comment by revenga99 1 day ago

Can you explain what this does?

Comment by kaliades 1 day ago

It caches AI agent operations in Valkey (or Redis) so you don't repeat expensive work.

Three tiers: if your agent calls gpt-4o with the same prompt twice, the second call returns from Valkey in under 1ms instead of hitting the API. Same for tool calls - if your agent calls get_weather("Sofia") twice with the same arguments, the cached result comes back instantly. And session state (what step the agent is on, user intent, LangGraph checkpoints) persists across requests with per-field TTL.

The main difference from existing options is that LangChain's cache only handles LLM responses, LangGraph's checkpoint-redis only handles state (and requires Redis 8 + modules), and none of them ship OpenTelemetry or Prometheus instrumentation at the cache layer. This puts all three tiers behind one Valkey connection with observability built in.

Comment by trueno 1 day ago

when you say "same prompt" are you saying its similar prompt and something in the middle determines that "this is basically the same question" or is it looking for someone who for whatever reason prompted, then copied and pasted that prompt and prompted it again word for word?

Comment by kaliades 23 hours ago

Exact match, word for word. agent-cache takes everything that defines an LLM request - which model you're calling (gpt-4o, Claude, etc.), the full conversation history (system prompt + user messages + assistant responses), sampling parameters like temperature, and any tool/function definitions the model has access to - serializes it all into a canonical JSON string with sorted keys, and hashes it with SHA-256. That hash is the cache key in Valkey. Same inputs down to the last character = cache hit, anything different = miss.

If you want the 'basically the same question' behavior, that's our other package - @betterdb/semantic-cache. It embeds the prompt as a vector and does similarity search, so 'What is the capital of France?' and 'Capital city of France?' both hit. The trade-off is it needs valkey-search for the vector index, while agent-cache works on completely vanilla Valkey with no modules.

In practice, agent-cache hits its cache less often than semantic-cache would, but when it does hit, you know the result is correct - there's no chance of returning a response for a question that was similar but not actually the same.

Comment by ivanvmoreno 1 hour ago

[dead]

Comment by eddy_cammegh 23 hours ago

[dead]