Overview

LARouter is an intelligent LLM proxy that sits between your applications and AI model providers. Instead of hardcoding model choices, LARouter classifies each request and routes it to the optimal model based on complexity, cost, and availability.

Problem

Most AI applications face a dilemma:

Use cloud APIs → Expensive, high latency, privacy concerns
Use local models → Limited capability, complex setup
Use both → Requires custom routing logic in every application

Solution

LARouter provides a single OpenAI-compatible endpoint that automatically routes requests:

Simple tasks (greetings, classification) → Free local models
Moderate tasks (conversation, summarization) → Fast local MoE models
Complex tasks (analysis, code generation) → Cloud APIs

Routing Tiers

Tier	Complexity	Default Model	Cost
`HEARTBEAT`	Trivial — greetings, ping	Gemma 4 E2B (local)	Free
`SIMPLE`	Low — short answers, lookups	Gemma 4 E4B (local)	Free
`MODERATE`	Medium — conversation, summaries	Gemma 4 26B-A4B (local)	Free
`COMPLEX`	High — analysis, reasoning	Gemini 2.5 Pro (cloud)	~$0.01/req
`FRONTIER`	Maximum — code gen, architecture	Claude Opus (cloud)	~$0.05/req

Default behavior: HEARTBEAT → SIMPLE → MODERATE are routed entirely to local models at zero cost. Only COMPLEX and FRONTIER hit cloud APIs.

Hybrid Classification

LARouter uses a two-stage classification pipeline:

Heuristic classifier — Instant regex-based pattern matching. Handles obvious cases (greetings, one-liners, code markers) with zero latency.
AI classifier — Falls back to a lightweight local model that analyzes the prompt and returns a tier classification. Adds ~500ms but catches edge cases.

Multi-Tenant Projects

Each project gets:

A unique bearer token (lr_ prefix) for API authentication
Isolated usage tracking and billing
Configurable routing policies (e.g., "always use local models")
Optional budget limits with alerts

Technology Stack

Component	Technology
Backend	Bun + Hono (TypeScript)
Frontend	Vite + React + TanStack Router
Database	SQLite (sql.js) — zero infrastructure
Local Models	llama.cpp / MLX
Cloud APIs	OpenAI, Anthropic, Google Gemini, DeepSeek
MCP	stdio + SSE transports

Problem​

Solution​

Routing Tiers​

Hybrid Classification​

Multi-Tenant Projects​

Technology Stack​