Running all agent tasks through a single frontier model is the most expensive path to the same result. A support triage message doesn't need Claude Opus. A multi-document legal analysis probably does. The ClawHQ model router makes this decision automatically, per task, based on detected task type and your budget state.
This guide covers how the router works, how to configure task-type detection, how to set up provider fallback chains, and how to use the self-learning weights to let the router improve over time.
How the Router Works
Supported Providers
Any provider with an OpenAI-compatible API can be added as a custom entry using the openai-compat adapter type.
Task-Type Detection
The router classifies each task into one of four tiers. You can tune the keywords and thresholds that drive classification, or hard-pin specific agents to a tier.
| Tier | Default Models | When Used |
|---|---|---|
| simple | claude-haiku-4-5, gemini-flash, groq-llama3-8b | Short messages, FAQ lookups, simple classification, status checks |
| moderate | claude-sonnet-4-6, gpt-4o-mini, gemini-pro | Multi-step tasks, summarization, draft generation, tool-use chains |
| complex | claude-opus-4-6, gpt-4o, gemini-ultra | Long documents, multi-document analysis, reasoning chains, ambiguous tasks |
| code | claude-sonnet-4-6, deepseek-coder, gpt-4o | Code generation, debugging, refactoring — routes to code-specialized models |
Configuring Task Detection
// model-router.json (or Settings → Model Router in dashboard)
{
"task_detection": {
"enabled": true,
"simple_max_tokens": 200, // tasks under this token count start as "simple"
"code_keywords": [
"function", "def ", "class ", "debug", "refactor", "implement"
],
"complex_keywords": [
"analyze", "compare", "synthesize", "evaluate", "summarize"
]
}
}
If you have an agent that always handles complex tasks (legal analysis, financial modeling), pin it to the complex tier so the router skips classification. Set "tier": "complex" in the agent's config in the Team page.
Provider Configuration
Configure providers in Settings → Integrations. Each provider needs an API key and an optional model override. If you don't specify models, the router uses its defaults for each tier.
{
"providers": {
"anthropic": {
"api_key": "$ANTHROPIC_API_KEY",
"models": {
"simple": "claude-haiku-4-5-20251001",
"moderate": "claude-sonnet-4-6",
"complex": "claude-opus-4-6",
"code": "claude-sonnet-4-6"
}
},
"groq": {
"api_key": "$GROQ_API_KEY",
"models": {
"simple": "llama3-8b-8192",
"moderate": "llama-3.3-70b-versatile"
}
},
"ollama": {
"base_url": "http://localhost:11434",
"models": {
"simple": "mistral:7b",
"code": "deepseek-coder:6.7b"
}
}
}
}
Budget Fallback
Budget fallback is the most cost-effective configuration you can make. Set soft and hard thresholds per agent or globally, and the router automatically downgrades to cheaper models as budget is consumed.
{
"budget_fallback": {
"enabled": true,
"global": {
"soft_threshold": 0.8, // at 80% spent: downgrade complex → moderate
"hard_threshold": 0.95, // at 95% spent: downgrade all → simple
"reset": "monthly" // resets with the budget period
},
"per_agent": {
"research-agent": {
"monthly_limit_usd": 50,
"soft_threshold": 0.7,
"hard_threshold": 0.90
}
}
}
}
Fallback Chains
The fallback chain defines which provider to try next if the primary fails. Ordered by preference — the router works down the list on each error:
{
"fallback_chains": {
"complex": [
"anthropic/claude-opus-4-6",
"openai/gpt-4o",
"anthropic/claude-sonnet-4-6" // downgrade if both above fail
],
"simple": [
"groq/llama3-8b-8192", // free tier first
"ollama/mistral:7b", // local fallback
"anthropic/claude-haiku-4-5"
]
}
}
If you run Ollama locally, add it as the last entry in your fallback chain for simple tasks. When your cloud provider API is down or rate-limited, simple tasks keep working at zero cost. Set "base_url": "http://host.docker.internal:11434" when running ClawHQ in Docker.
Self-Learning Weights
The router tracks success rate, average latency, and task completion quality for every model it uses. Over time, it shifts weight toward models that perform better for your specific workload. You don't need to tune this manually — it adjusts automatically.
You can see the current learned weights in Dashboard → Routing. The initial state gives all providers equal weight. After a few hundred tasks, you'll see clear differentiation. You can also reset weights to equal or manually set them:
{
"learning": {
"enabled": true,
"weight_decay": 0.95, // how much older data is discounted
"min_samples": 20, // minimum runs before weight shifts
"manual_overrides": {
"anthropic/claude-sonnet-4-6": 1.2 // boost this model's weight by 20%
}
}
}
Monitoring the Router
Dashboard → Routing shows a live view of routing decisions: which model handled each request, the detected task tier, cost per request, and the current learned weights. The last 24 hours of routing decisions are available via GET /api/tool-stats — this includes per-model success rates and average durations, which is useful for debugging unexpected routing behavior.
Common Tuning Situations
- Router keeps picking the expensive model for simple tasks. Lower
simple_max_tokensand add more simple-task keywords to the detection config. Or hard-pin the agent tosimpletier. - Quality is inconsistent after a fallback event. Your fallback chain is hitting a model that's significantly less capable. Add a middle tier to the chain instead of jumping straight from Opus to a 7B model.
- Budget threshold triggering too late. Lower
soft_thresholdfrom 0.8 to 0.6 so the downgrade happens earlier in the month. - Self-learning weights converging on one provider too fast. Increase
min_samplesso the weights only shift after more data is collected.
Stop routing all tasks to the same model
The ClawHQ model router ships in every deployment. Configure it once and let it cut your LLM costs automatically without sacrificing quality on the tasks that need it.
Deploy ClawHQ → Read the docs