Contact us

Autocache: How we reduce AI costs by up to 90% with our smart proxy

logo-autocache

Autocache: How we reduce AI costs by up to 90% with our smart proxy

At Montevive, we constantly face the challenge of optimizing costs in AI projects without compromising quality. After analyzing thousands of API calls and detecting waste patterns, we developed AutoCache: a smart proxy that automatically reduces Claude costs by up to 90%.

The problem we identified

While working with platforms like n8n, Flowise, and Make.com, we discovered something frustrating: these tools do not support prompt caching from Anthropic. This means that:

  • Every call resends the full context (system prompts, tools, documentation)
  • Users pay 10x more than necessary
  • Latency multiplies unnecessarily

Real-world example: A documentation chat with 8,000 tokens cost $0.024 per request. With smart caching, the same request costs $0.0066. 90% savings.

Our solution: Autocache

AutoCache is a transparent proxy that works as a drop-in replacement for the Claude API. Zero code changes, maximum impact.

Key features

đź§  Intelligent Token Analysis

  • Automatically identifies which parts of the prompt to cache
  • Conservative, Moderate, and Aggressive strategies
  • Up to 4 simultaneous cache breakpoints

📊 Real-Time ROI Analytics

  • HTTP headers with detailed savings metrics
  • /savings endpoint with complete statistics
  • Automatic break-even point calculation

⚡ Robust Architecture

  • Developed in Go with modular architecture
  • Full streaming and non-streaming support
  • Docker-ready with docker-compose included

Real-world use cases with proven ROI

Technical documentation chat

  • Typical request: 8,000 tokens (6,000 cached + 2,000 user input)
  • Without AutoCache: $0.024/request
  • With AutoCache: $0.0066/request (90% savings)
  • Break-even: 2 requests

Code review assistant

  • Typical request: 12,000 tokens (10,000 cached + 2,000 review)
  • Without AutoCache: $0.036/request
  • With AutoCache: $0.009/request (75% savings)
  • Break-even: 1 request

Implementation in 5 minutes

git clone https://github.com/montevive/autocache
cd autocache
export ANTHROPIC_API_KEY="tu-api-key"
docker-compose up -d

Change in your application:

// Antes: "https://api.anthropic.com/v1/messages"
// Después: "http://localhost:8080/v1/messages"

That’s it! AutoCache begins optimizing automatically.

Impact on our operation

Since implementing AutoCache in our internal projects:

  • 78% cost reduction on average
  • 65% latency improvement in requests with repetitive context
  • Total ROI transparency via automatic analytics

Development philosophy

At Montevive, we believe that optimization should be invisible. Autocache reflects our philosophy:

  • Zero-config by default
  • Automatic intelligence
  • Total transparency in metrics
  • Robust and scalable architecture

Next steps

AutoCache is open source and is available on GitHub. We are working on:

  • Web dashboard for advanced monitoring
  • Support for more AI providers
  • One-click integration with popular platforms

Do you want to reduce your AI costs by up to 90%? Try Autocache and share your experience with us.