Configuring Models

Models are the core of Lumen. Each entry tells Lumen how to reach an AI model, what it costs, and what it can do.

Basic Model Entry

Every model starts with a name and an endpoints list:

models:
  - name: my-model
    active: true
    input_cost_per_million: 0.5
    output_cost_per_million: 1.0
    endpoints:
      - url: https://example.com/v1
        api_key: sk-your-key

Field	Required	Description
`name`	Yes	Lumen's internal identifier for the model. This is what appears in the chat UI and must be unique within your config.
`endpoints`	Yes	One or more back-end servers that provide this model
`active`	No (default: true)	Whether the model is available. Set to `false` to disable without deleting.

Pricing

Fields shown to users on the Models page:

Field	Description	Default
`input_cost_per_million`	Coins charged per 1M input tokens	0.0
`output_cost_per_million`	Coins charged per 1M output tokens	0.0

See the Introduction for how coin costs are calculated.

Capabilities

These fields tell the UI what the model can do and help users pick the right one:

Field	Description
`description`	Short text shown next to the model name in the UI
`url`	Link to the model's documentation page (e.g. HuggingFace)
`context_window`	Maximum total tokens for input + output in one request
`max_input_tokens`	Maximum tokens accepted in a single request (if the backend enforces a tighter limit than `context_window`)
`max_output_tokens`	Maximum tokens the model can generate in a single reply
`knowledge_cutoff`	Month the model's training data extends to, e.g. `"2025-04"`
`supports_reasoning`	Whether the model can show step-by-step thinking
`supports_function_calling`	Whether the model supports tool/function calling via the API
`input_modalities`	What the model accepts: `["text"]`, `["text", "image"]`, `["text", "image", "video"]`, `["text", "image", "video", "audio"]`
`output_modalities`	What the model produces: typically `["text"]`
`notice`	Optional admin note shown to users on the model detail page

All fields except name, input_cost_per_million, and output_cost_per_million are optional. Everything else fills in the UI and API responses.

Endpoints

Each model can have one or more endpoints:

Field	Description
`url`	Base URL of the backend server (e.g. `https://internal-server/v1`)
`api_key`	API key required by the backend
`model`	The model name the endpoint actually expects (defaults to the parent `name` if omitted)

Setting model to a different value lets Lumen map its internal model name to whatever the endpoint calls the same model. This is useful when a single server serves multiple variants.

Round-robin distributes requests across all configured endpoints. A health checker periodically probes each endpoint and automatically routes traffic away from servers that fail.

Multiple Endpoints for Load Balancing

You can configure multiple endpoints for one model to distribute load:

  - name: phi3
    active: true
    input_cost_per_million: 0.0
    output_cost_per_million: 0.0
    endpoints:
      - url: http://gpu-server-1.internal/v1
        api_key: key-one
        model: phi-3-mini
      - url: http://gpu-server-2.internal/v1
        api_key: key-two
        model: phi-3-mini
      - url: http://gpu-server-3.internal/v1
        api_key: key-three
        model: phi-3-mini

The models page shows how many of those endpoints are healthy. If all endpoints for a model are down, the model shows a "down" status and the chat interface hides it.

Ollama (Local Models)

Ollama runs on your own hardware. It uses an OpenAI-compatible API at http://localhost:11434/v1 and doesn't require a real API key — any non-empty string works:

  - name: llama3.2
    active: true
    input_cost_per_million: 0.0
    output_cost_per_million: 0.0
    supports_reasoning: true
    input_modalities: ["text"]
    output_modalities: ["text"]
    endpoints:
      - url: http://localhost:11434/v1
        api_key: ollama
        model: llama3.2

Duplicate Names

If the same name appears twice in config.yaml, the later entry wins. This can be useful for environment-specific overrides (e.g., a local dev model vs production).