Running all agent tasks through a single frontier model is the most expensive path to the same result. A support triage message doesn't need Claude Opus. A multi-document legal analysis probably does. The ClawHQ model router makes this decision automatically, per task, based on detected task type and your budget state.

This guide covers how the router works, how to configure task-type detection, how to set up provider fallback chains, and how to use the self-learning weights to let the router improve over time.

How the Router Works

Task classificationThe router analyzes the incoming task — message length, keywords, detected task type — and assigns it to a category: simple, moderate, complex, or code.
Budget checkThe router checks the current budget state for the agent making the request. If the agent is above its soft threshold (e.g., 80% of monthly budget used), it downgrades by one tier.
Provider selectionFrom the candidate models for the task tier, the router picks based on a weighted score: self-learned success rate, configured preference weight, and current latency.
Fallback on failureIf the selected provider returns an error (rate limit, timeout, 5xx), the router immediately tries the next candidate in the fallback chain without failing the request.

Supported Providers

anthropic openai groq gemini deepseek mistral cohere together fireworks ollama lmstudio openrouter azure-openai bedrock vertex perplexity xai zhipu moonshot 01ai

Any provider with an OpenAI-compatible API can be added as a custom entry using the openai-compat adapter type.

Task-Type Detection

The router classifies each task into one of four tiers. You can tune the keywords and thresholds that drive classification, or hard-pin specific agents to a tier.

TierDefault ModelsWhen Used
simple claude-haiku-4-5, gemini-flash, groq-llama3-8b Short messages, FAQ lookups, simple classification, status checks
moderate claude-sonnet-4-6, gpt-4o-mini, gemini-pro Multi-step tasks, summarization, draft generation, tool-use chains
complex claude-opus-4-6, gpt-4o, gemini-ultra Long documents, multi-document analysis, reasoning chains, ambiguous tasks
code claude-sonnet-4-6, deepseek-coder, gpt-4o Code generation, debugging, refactoring — routes to code-specialized models

Configuring Task Detection

// model-router.json (or Settings → Model Router in dashboard)
{
  "task_detection": {
    "enabled": true,
    "simple_max_tokens": 200,   // tasks under this token count start as "simple"
    "code_keywords": [
      "function", "def ", "class ", "debug", "refactor", "implement"
    ],
    "complex_keywords": [
      "analyze", "compare", "synthesize", "evaluate", "summarize"
    ]
  }
}
Hard-pinning agents

If you have an agent that always handles complex tasks (legal analysis, financial modeling), pin it to the complex tier so the router skips classification. Set "tier": "complex" in the agent's config in the Team page.

Provider Configuration

Configure providers in Settings → Integrations. Each provider needs an API key and an optional model override. If you don't specify models, the router uses its defaults for each tier.

{
  "providers": {
    "anthropic": {
      "api_key": "$ANTHROPIC_API_KEY",
      "models": {
        "simple": "claude-haiku-4-5-20251001",
        "moderate": "claude-sonnet-4-6",
        "complex": "claude-opus-4-6",
        "code": "claude-sonnet-4-6"
      }
    },
    "groq": {
      "api_key": "$GROQ_API_KEY",
      "models": {
        "simple": "llama3-8b-8192",
        "moderate": "llama-3.3-70b-versatile"
      }
    },
    "ollama": {
      "base_url": "http://localhost:11434",
      "models": {
        "simple": "mistral:7b",
        "code": "deepseek-coder:6.7b"
      }
    }
  }
}

Budget Fallback

Budget fallback is the most cost-effective configuration you can make. Set soft and hard thresholds per agent or globally, and the router automatically downgrades to cheaper models as budget is consumed.

{
  "budget_fallback": {
    "enabled": true,
    "global": {
      "soft_threshold": 0.8,   // at 80% spent: downgrade complex → moderate
      "hard_threshold": 0.95,  // at 95% spent: downgrade all → simple
      "reset": "monthly"        // resets with the budget period
    },
    "per_agent": {
      "research-agent": {
        "monthly_limit_usd": 50,
        "soft_threshold": 0.7,
        "hard_threshold": 0.90
      }
    }
  }
}

Fallback Chains

The fallback chain defines which provider to try next if the primary fails. Ordered by preference — the router works down the list on each error:

{
  "fallback_chains": {
    "complex": [
      "anthropic/claude-opus-4-6",
      "openai/gpt-4o",
      "anthropic/claude-sonnet-4-6"   // downgrade if both above fail
    ],
    "simple": [
      "groq/llama3-8b-8192",           // free tier first
      "ollama/mistral:7b",              // local fallback
      "anthropic/claude-haiku-4-5"
    ]
  }
}
Ollama as free fallback

If you run Ollama locally, add it as the last entry in your fallback chain for simple tasks. When your cloud provider API is down or rate-limited, simple tasks keep working at zero cost. Set "base_url": "http://host.docker.internal:11434" when running ClawHQ in Docker.

Self-Learning Weights

The router tracks success rate, average latency, and task completion quality for every model it uses. Over time, it shifts weight toward models that perform better for your specific workload. You don't need to tune this manually — it adjusts automatically.

You can see the current learned weights in Dashboard → Routing. The initial state gives all providers equal weight. After a few hundred tasks, you'll see clear differentiation. You can also reset weights to equal or manually set them:

{
  "learning": {
    "enabled": true,
    "weight_decay": 0.95,         // how much older data is discounted
    "min_samples": 20,             // minimum runs before weight shifts
    "manual_overrides": {
      "anthropic/claude-sonnet-4-6": 1.2  // boost this model's weight by 20%
    }
  }
}

Monitoring the Router

Dashboard → Routing shows a live view of routing decisions: which model handled each request, the detected task tier, cost per request, and the current learned weights. The last 24 hours of routing decisions are available via GET /api/tool-stats — this includes per-model success rates and average durations, which is useful for debugging unexpected routing behavior.

Common Tuning Situations

  • Router keeps picking the expensive model for simple tasks. Lower simple_max_tokens and add more simple-task keywords to the detection config. Or hard-pin the agent to simple tier.
  • Quality is inconsistent after a fallback event. Your fallback chain is hitting a model that's significantly less capable. Add a middle tier to the chain instead of jumping straight from Opus to a 7B model.
  • Budget threshold triggering too late. Lower soft_threshold from 0.8 to 0.6 so the downgrade happens earlier in the month.
  • Self-learning weights converging on one provider too fast. Increase min_samples so the weights only shift after more data is collected.

Stop routing all tasks to the same model

The ClawHQ model router ships in every deployment. Configure it once and let it cut your LLM costs automatically without sacrificing quality on the tasks that need it.

Deploy ClawHQ → Read the docs