At Montevive, we constantly face the challenge of optimizing costs in AI projects without compromising quality. After analyzing thousands of API calls and detecting waste patterns, we developed AutoCache: an intelligent proxy that automatically reduces Claude’s costs by up to 90%.

The problem we identified

While working with platforms like n8n, Flowise and Make.com, we discovered something frustrating: these tools do not support Anthropic prompt caching. This means that:

  • Each call forwards the complete context (system prompts, tools, documentation).
  • Users pay 10x more than necessary
  • Latency is unnecessarily multiplied

Real example: A documentation chat with 8,000 tokens cost $0.024 per request. With smart caching, the same request costs $0.0066. Savings of 90%.

Our solution: Autocache

AutoCache is a transparent proxy that works as a drop-in replacement for Claude’s API. Zero code changes, maximum impact.

Key features

🧠 Smart Token Analysis

  • Automatically identifies which parts of the prompt to cache
  • Conservative, Moderate and Aggressive Strategies
  • Up to 4 simultaneous cache breakpoints

📊 Real-Time ROI Analytics

  • HTTP headers with detailed savings metrics
  • Endpoint /savings with complete statistics
  • Automatic calculation of break-even point

⚡ Robust Architecture

  • Developed in Go with modular architecture
  • Full streaming and non-streaming support
  • Docker-ready with docker-compose included

Real use cases with proven ROI

Technical documentation chat

  • Typical request: 8,000 tokens (6,000 cached + 2,000 user input)
  • Without AutoCache: $0.024/request
  • With AutoCache: $0.0066/request(90% savings)
  • Break-even: 2 requests

Code review assistant

  • Typical request: 12,000 tokens (10,000 cached + 2,000 review)
  • Without AutoCache: $0.036/request
  • With AutoCache: $0.009/request(75% savings)
  • Break-even: 1 request

Implementation in 5 minutes

git clone https://github.com/montevive/autocache
cd autocache
export ANTHROPIC_API_KEY="tu-api-key"
docker-compose up -d

Change in your application:

// Antes: "https://api.anthropic.com/v1/messages"
// Después: "http://localhost:8080/v1/messages"

That’s all! AutoCache starts optimizing automatically.

Impact on our operation

Since we implemented AutoCache in our internal projects:

  • Cost reduction of 78% on average
  • 65% latency improvement in repetitive context requests
  • Full ROI transparency via automatic analytics

Development philosophy

At Montevive we believe that optimization should be invisible. Autocache reflects our philosophy:

  • Zero-config default
  • Automatic intelligence
  • Total transparency in metrics
  • Robust and scalable architecture

Next steps

AutoCache is open source and available at GitHub . We are working on:

  • Web dashboard for advanced monitoring
  • Support for more AI providers
  • One-click integration with popular platforms

Do you want to reduce your AI costs by up to 90%? Try Autocache and share your experience with us.