Overview
LARouter is an intelligent LLM proxy that sits between your applications and AI model providers. Instead of hardcoding model choices, LARouter classifies each request and routes it to the optimal model based on complexity, cost, and availability.
Problem
Most AI applications face a dilemma:
- Use cloud APIs → Expensive, high latency, privacy concerns
- Use local models → Limited capability, complex setup
- Use both → Requires custom routing logic in every application
Solution
LARouter provides a single OpenAI-compatible endpoint that automatically routes requests:
- Simple tasks (greetings, classification) → Free local models
- Moderate tasks (conversation, summarization) → Fast local MoE models
- Complex tasks (analysis, code generation) → Cloud APIs
Routing Tiers
| Tier | Complexity | Default Model | Cost |
|---|---|---|---|
HEARTBEAT | Trivial — greetings, ping | Gemma 4 E2B (local) | Free |
SIMPLE | Low — short answers, lookups | Gemma 4 E4B (local) | Free |
MODERATE | Medium — conversation, summaries | Gemma 4 26B-A4B (local) | Free |
COMPLEX | High — analysis, reasoning | Gemini 2.5 Pro (cloud) | ~$0.01/req |
FRONTIER | Maximum — code gen, architecture | Claude Opus (cloud) | ~$0.05/req |
Default behavior: HEARTBEAT → SIMPLE → MODERATE are routed entirely to local models at zero cost. Only COMPLEX and FRONTIER hit cloud APIs.
Hybrid Classification
LARouter uses a two-stage classification pipeline:
- Heuristic classifier — Instant regex-based pattern matching. Handles obvious cases (greetings, one-liners, code markers) with zero latency.
- AI classifier — Falls back to a lightweight local model that analyzes the prompt and returns a tier classification. Adds ~500ms but catches edge cases.
Multi-Tenant Projects
Each project gets:
- A unique bearer token (
lr_prefix) for API authentication - Isolated usage tracking and billing
- Configurable routing policies (e.g., "always use local models")
- Optional budget limits with alerts
Technology Stack
| Component | Technology |
|---|---|
| Backend | Bun + Hono (TypeScript) |
| Frontend | Vite + React + TanStack Router |
| Database | SQLite (sql.js) — zero infrastructure |
| Local Models | llama.cpp / MLX |
| Cloud APIs | OpenAI, Anthropic, Google Gemini, DeepSeek |
| MCP | stdio + SSE transports |