API Reference

LARouter exposes a REST API on port 18790. All endpoints use JSON request/response bodies.

Proxy Endpoints

`POST /v1/chat/completions`

The primary proxy endpoint. Accepts OpenAI-compatible chat completion requests and routes them to the best model.

Headers:

Authorization: Bearer lr_<project_token>   (optional — defaults to admin)
Content-Type: application/json

Request Body:

{
  "model": "auto",
  "messages": [
    { "role": "user", "content": "Explain quantum computing" }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 2048
}

Model field options:

Value	Behavior
`"auto"`	LARouter classifies and routes automatically
`"gemma-4-e4b"`	Force a specific registered model
`"gpt-4o"`	Passthrough to cloud provider

Response: Standard OpenAI chat completion response (JSON or SSE stream).

`GET /health`

Health check endpoint.

curl http://127.0.0.1:18790/health
# → { "status": "ok", "uptime": 3600 }

Usage & Billing

`GET /api/usage`

Get aggregated usage summary for a project.

Query Parameters:

Param	Type	Default	Description
`period`	string	`"monthly"`	`daily`, `weekly`, `monthly`, `all`
`projectId`	string	`"default"`	Filter by project

Response:

{
  "totalCost": 3.5344,
  "totalCalls": 1499,
  "succeededCalls": 1363,
  "inputTokens": 450200,
  "outputTokens": 312800,
  "byModel": {
    "gemma-4-e4b": { "calls": 800, "cost": 0, "tokens": 250000 },
    "gpt-4o": { "calls": 150, "cost": 2.10, "tokens": 89000 }
  },
  "byTier": {
    "HEARTBEAT": { "calls": 200, "cost": 0 },
    "SIMPLE": { "calls": 600, "cost": 0 },
    "COMPLEX": { "calls": 150, "cost": 2.10 }
  },
  "dailyCosts": [
    { "date": "2026-04-01", "cost": 0.12 },
    { "date": "2026-04-02", "cost": 0.08 }
  ]
}

`GET /api/usage/daily`

Daily cost time series for charts.

`GET /api/usage/models`

Per-model usage breakdown.

`GET /api/usage/export`

Export usage data as CSV.

Model Management

`GET /api/models/catalog`

List all downloadable Gemma 4 models.

curl http://127.0.0.1:18790/api/models/catalog

`POST /api/models/download`

Start downloading a model from HuggingFace.

{ "modelId": "gemma4-e4b", "format": "gguf" }

`GET /api/models/download/:id/progress`

SSE stream of download progress.

`POST /api/models/:id/start`

Start a llama-server instance for a downloaded model.

`POST /api/models/:id/stop`

Stop a running model server.

`GET /api/models/:id/health`

Health check for a running model.

`GET /api/models/running`

List all currently running model servers.

Project Management

`GET /api/projects`

List all projects.

`POST /api/projects`

Create a new project. Returns a bearer token.

{
  "name": "my-app",
  "description": "Production app project",
  "budgetUsd": 50.0,
  "routingPolicy": "cost-optimized"
}

Response:

{
  "id": "proj_abc123",
  "name": "my-app",
  "token": "lr_sk_abc123def456...",
  "budgetUsd": 50.0,
  "createdAt": "2026-04-17T00:00:00Z"
}

`PUT /api/projects/:id`

Update project settings.

`DELETE /api/projects/:id`

Delete a project and revoke its token.

Configuration

`GET /api/config`

Get current runtime configuration (API keys redacted).

`PATCH /api/config`

Update runtime configuration.

Proxy Endpoints​

POST /v1/chat/completions​

GET /health​

Usage & Billing​

GET /api/usage​

GET /api/usage/daily​

GET /api/usage/models​

GET /api/usage/export​

Model Management​

GET /api/models/catalog​

POST /api/models/download​

GET /api/models/download/:id/progress​

POST /api/models/:id/start​

POST /api/models/:id/stop​

GET /api/models/:id/health​

GET /api/models/running​

Project Management​

GET /api/projects​

POST /api/projects​

PUT /api/projects/:id​

DELETE /api/projects/:id​

Configuration​

GET /api/config​

PATCH /api/config​