API Reference
LARouter exposes a REST API on port 18790. All endpoints use JSON request/response bodies.
Proxy Endpoints
POST /v1/chat/completions
The primary proxy endpoint. Accepts OpenAI-compatible chat completion requests and routes them to the best model.
Headers:
Authorization: Bearer lr_<project_token> (optional — defaults to admin)
Content-Type: application/json
Request Body:
{
"model": "auto",
"messages": [
{ "role": "user", "content": "Explain quantum computing" }
],
"stream": true,
"temperature": 0.7,
"max_tokens": 2048
}
Model field options:
| Value | Behavior |
|---|---|
"auto" | LARouter classifies and routes automatically |
"gemma-4-e4b" | Force a specific registered model |
"gpt-4o" | Passthrough to cloud provider |
Response: Standard OpenAI chat completion response (JSON or SSE stream).
GET /health
Health check endpoint.
curl http://127.0.0.1:18790/health
# → { "status": "ok", "uptime": 3600 }
Usage & Billing
GET /api/usage
Get aggregated usage summary for a project.
Query Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
period | string | "monthly" | daily, weekly, monthly, all |
projectId | string | "default" | Filter by project |
Response:
{
"totalCost": 3.5344,
"totalCalls": 1499,
"succeededCalls": 1363,
"inputTokens": 450200,
"outputTokens": 312800,
"byModel": {
"gemma-4-e4b": { "calls": 800, "cost": 0, "tokens": 250000 },
"gpt-4o": { "calls": 150, "cost": 2.10, "tokens": 89000 }
},
"byTier": {
"HEARTBEAT": { "calls": 200, "cost": 0 },
"SIMPLE": { "calls": 600, "cost": 0 },
"COMPLEX": { "calls": 150, "cost": 2.10 }
},
"dailyCosts": [
{ "date": "2026-04-01", "cost": 0.12 },
{ "date": "2026-04-02", "cost": 0.08 }
]
}
GET /api/usage/daily
Daily cost time series for charts.
GET /api/usage/models
Per-model usage breakdown.
GET /api/usage/export
Export usage data as CSV.
Model Management
GET /api/models/catalog
List all downloadable Gemma 4 models.
curl http://127.0.0.1:18790/api/models/catalog
POST /api/models/download
Start downloading a model from HuggingFace.
{ "modelId": "gemma4-e4b", "format": "gguf" }
GET /api/models/download/:id/progress
SSE stream of download progress.
POST /api/models/:id/start
Start a llama-server instance for a downloaded model.
POST /api/models/:id/stop
Stop a running model server.
GET /api/models/:id/health
Health check for a running model.
GET /api/models/running
List all currently running model servers.
Project Management
GET /api/projects
List all projects.
POST /api/projects
Create a new project. Returns a bearer token.
{
"name": "my-app",
"description": "Production app project",
"budgetUsd": 50.0,
"routingPolicy": "cost-optimized"
}
Response:
{
"id": "proj_abc123",
"name": "my-app",
"token": "lr_sk_abc123def456...",
"budgetUsd": 50.0,
"createdAt": "2026-04-17T00:00:00Z"
}
PUT /api/projects/:id
Update project settings.
DELETE /api/projects/:id
Delete a project and revoke its token.
Configuration
GET /api/config
Get current runtime configuration (API keys redacted).
PATCH /api/config
Update runtime configuration.