Model Catalog

LARouter ships with a curated registry of Gemma 4 models from Unsloth, optimized for local inference via llama.cpp and Apple MLX.

Supported Models

Model	Params	Context	Modality	Best For
Gemma 4 E2B	2B Dense + PLE	128K	Text, Image, Audio	Edge inference, ASR, speech translation
Gemma 4 E4B	4B Dense + PLE	128K	Text, Image, Audio	Fast local multimodal, laptops
Gemma 4 26B-A4B	26B MoE	256K	Text, Image	Best speed/quality tradeoff
Gemma 4 31B	31B Dense	256K	Text, Image	Strongest local performance

HuggingFace Sources

GGUF (llama.cpp — All Platforms)

Model	Repository	Quantization	Size
E2B	`unsloth/gemma-4-E2B-it-GGUF`	Q8_0	~2.1 GB
E4B	`unsloth/gemma-4-E4B-it-GGUF`	Q8_0	~4.3 GB
26B-A4B	`unsloth/gemma-4-26B-A4B-it-GGUF`	UD-Q4_K_XL	~16 GB
31B	`unsloth/gemma-4-31B-it-GGUF`	UD-Q4_K_XL	~19 GB

MLX (macOS Apple Silicon)

Model	Repository	Size
E4B	`unsloth/gemma-4-E4B-it-MLX-8bit`	~4.1 GB
26B-A4B	`unsloth/gemma-4-26b-a4b-it-UD-MLX-4bit`	~15 GB
31B	`unsloth/gemma-4-31b-it-MLX-8bit`	~33 GB

Default Tier Mapping

[!NOTE] Green = local (free), Purple = cloud (paid). Override any mapping via the WebUI or config/larouter.json.

llama.cpp Server Settings

Each model is started with optimized parameters:

./llama-server \
    --model models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf \
    --mmproj models/mmproj-BF16.gguf \
    --temp 1.0 --top-p 0.95 --top-k 64 \
    --alias "gemma-4-26b-a4b" \
    --port 8001 \
    --chat-template-kwargs '{"enable_thinking":true}'

LARouter auto-manages:

Starting/stopping llama-server processes per model
Port assignment (8001, 8002, 8003, 8004)
Health monitoring (GET /health)
OpenAI-compatible proxy routing

Hardware Requirements

Model	Min RAM	Recommended	GPU
E2B (Q8_0)	4 GB	8 GB	Optional
E4B (Q8_0)	8 GB	16 GB	Optional
26B-A4B (UD-Q4_K_XL)	16 GB	24 GB	Recommended
31B (UD-Q4_K_XL)	24 GB	32 GB	Recommended

[!TIP] On Apple Silicon Macs, use the MLX format for native Metal acceleration — no llama.cpp compilation needed.

Supported Models​

HuggingFace Sources​

GGUF (llama.cpp — All Platforms)​

MLX (macOS Apple Silicon)​

Default Tier Mapping​

llama.cpp Server Settings​

Hardware Requirements​