Skip to main content

API Reference

LARouter exposes a REST API on port 18790. All endpoints use JSON request/response bodies.

Proxy Endpoints

POST /v1/chat/completions

The primary proxy endpoint. Accepts OpenAI-compatible chat completion requests and routes them to the best model.

Headers:

Authorization: Bearer lr_<project_token>   (optional — defaults to admin)
Content-Type: application/json

Request Body:

{
"model": "auto",
"messages": [
{ "role": "user", "content": "Explain quantum computing" }
],
"stream": true,
"temperature": 0.7,
"max_tokens": 2048
}

Model field options:

ValueBehavior
"auto"LARouter classifies and routes automatically
"gemma-4-e4b"Force a specific registered model
"gpt-4o"Passthrough to cloud provider

Response: Standard OpenAI chat completion response (JSON or SSE stream).


GET /health

Health check endpoint.

curl http://127.0.0.1:18790/health
# → { "status": "ok", "uptime": 3600 }

Usage & Billing

GET /api/usage

Get aggregated usage summary for a project.

Query Parameters:

ParamTypeDefaultDescription
periodstring"monthly"daily, weekly, monthly, all
projectIdstring"default"Filter by project

Response:

{
"totalCost": 3.5344,
"totalCalls": 1499,
"succeededCalls": 1363,
"inputTokens": 450200,
"outputTokens": 312800,
"byModel": {
"gemma-4-e4b": { "calls": 800, "cost": 0, "tokens": 250000 },
"gpt-4o": { "calls": 150, "cost": 2.10, "tokens": 89000 }
},
"byTier": {
"HEARTBEAT": { "calls": 200, "cost": 0 },
"SIMPLE": { "calls": 600, "cost": 0 },
"COMPLEX": { "calls": 150, "cost": 2.10 }
},
"dailyCosts": [
{ "date": "2026-04-01", "cost": 0.12 },
{ "date": "2026-04-02", "cost": 0.08 }
]
}

GET /api/usage/daily

Daily cost time series for charts.

GET /api/usage/models

Per-model usage breakdown.

GET /api/usage/export

Export usage data as CSV.


Model Management

GET /api/models/catalog

List all downloadable Gemma 4 models.

curl http://127.0.0.1:18790/api/models/catalog

POST /api/models/download

Start downloading a model from HuggingFace.

{ "modelId": "gemma4-e4b", "format": "gguf" }

GET /api/models/download/:id/progress

SSE stream of download progress.

POST /api/models/:id/start

Start a llama-server instance for a downloaded model.

POST /api/models/:id/stop

Stop a running model server.

GET /api/models/:id/health

Health check for a running model.

GET /api/models/running

List all currently running model servers.


Project Management

GET /api/projects

List all projects.

POST /api/projects

Create a new project. Returns a bearer token.

{
"name": "my-app",
"description": "Production app project",
"budgetUsd": 50.0,
"routingPolicy": "cost-optimized"
}

Response:

{
"id": "proj_abc123",
"name": "my-app",
"token": "lr_sk_abc123def456...",
"budgetUsd": 50.0,
"createdAt": "2026-04-17T00:00:00Z"
}

PUT /api/projects/:id

Update project settings.

DELETE /api/projects/:id

Delete a project and revoke its token.


Configuration

GET /api/config

Get current runtime configuration (API keys redacted).

PATCH /api/config

Update runtime configuration.