ADR-004: Class-Based LLMRouter
ADR-004: Class-Based LLMRouter
Status: Accepted Date: 2024-12-01 Deciders: Platform architect
Context
The engine routes LLM requests to different providers based on agent tier (strategy → xAI, writing → Anthropic, analytical → OpenAI, etc.). The original single-user system used module-level functions for routing, which created global state that was difficult to test and impossible to configure per-request.
The multi-tenant platform needed LLM routing that:
- Supports per-tenant BYOK (Bring Your Own Key) API key overrides
- Is independently testable without making real LLM calls
- Handles provider fallback chains (e.g., xAI unavailable → fall back to Claude Sonnet)
- Can be instantiated multiple times in parallel tests without state conflicts
Decision
Implement the LLM router as a class (LLMRouter) that is instantiated with configuration and optional API key overrides. Each test creates its own router instance. In production, a singleton instance is shared via state.get_router().
The router accepts api_key_overrides: dict to support BYOK — when a tenant has their own API keys, the router uses those instead of platform keys.
Alternatives Considered
1. Module-level functions with global configuration
The original pattern. Works for single-user but creates problems in multi-tenant: global API keys can't be overridden per-tenant, tests pollute shared state, and parallel execution risks race conditions on global config.
2. Dependency injection framework (e.g., python-inject, dependency-injector)
Would formalize the pattern but adds a dependency for a problem solved by basic class instantiation. Rejected as over-engineering.
3. Configuration-based routing (YAML/JSON)
Route configuration lives in a file, loaded at startup. Doesn't support per-request API key overrides or dynamic fallback chains. Rejected because BYOK requires runtime configuration.
Consequences
Positive:
- Each test creates its own
LLMRouterwith mock providers — zero global state pollution - BYOK support is clean: pass
api_key_overridesat instantiation, router uses them transparently - Fallback chains are configurable per-instance
- Production singleton via
state.get_router()is lazy-loaded and thread-safe
Negative:
- Every module that needs LLM access must receive a router instance (parameter threading)
- The router instance is threaded through ~14 engine files — more plumbing than module-level functions
- Singleton pattern in production means the shared router doesn't support per-request BYOK; instead, BYOK overrides are applied at the run level before execution starts
Risks:
- Router instance lifecycle: if a router is created with BYOK keys and then reused for a different tenant's run, keys could leak. Mitigated by creating a fresh router per run when BYOK keys are present.