Lumen
Lumen has been updated, please send email to help@ncsa.illinois.edu with subject Lumen if you have questions. Certain models will need to be acknowledged before use (one time only).

Configuring Models

Models are the core of Lumen. Each entry tells Lumen how to reach an AI model, what it costs, and what it can do.

Basic Model Entry

Every model starts with a name and an endpoints list:

models:
  - name: my-model
    active: true
    input_cost_per_million: 0.5
    output_cost_per_million: 1.0
    endpoints:
      - url: https://example.com/v1
        api_key: sk-your-key
Field Required Description
name Yes Lumen's internal identifier for the model. This is what appears in the chat UI and must be unique within your config.
endpoints Yes One or more back-end servers that provide this model
active No (default: true) Whether the model is available. Set to false to disable without deleting.

Pricing

Fields shown to users on the Models page:

Field Description Default
input_cost_per_million Coins charged per 1M input tokens 0.0
output_cost_per_million Coins charged per 1M output tokens 0.0

See the Introduction for how coin costs are calculated.

Capabilities

These fields tell the UI what the model can do and help users pick the right one:

Field Description
description Short text shown next to the model name in the UI
url Link to the model's documentation page (e.g. HuggingFace)
context_window Maximum total tokens for input + output in one request
max_input_tokens Maximum tokens accepted in a single request (if the backend enforces a tighter limit than context_window)
max_output_tokens Maximum tokens the model can generate in a single reply
knowledge_cutoff Month the model's training data extends to, e.g. "2025-04"
supports_reasoning Whether the model can show step-by-step thinking
supports_function_calling Whether the model supports tool/function calling via the API
input_modalities What the model accepts: ["text"], ["text", "image"], ["text", "image", "video"], ["text", "image", "video", "audio"]
output_modalities What the model produces: typically ["text"]
notice Optional admin note shown to users on the model detail page

All fields except name, input_cost_per_million, and output_cost_per_million are optional. Everything else fills in the UI and API responses.

Endpoints

Each model can have one or more endpoints:

Field Description
url Base URL of the backend server (e.g. https://internal-server/v1)
api_key API key required by the backend
model The model name the endpoint actually expects (defaults to the parent name if omitted)

Setting model to a different value lets Lumen map its internal model name to whatever the endpoint calls the same model. This is useful when a single server serves multiple variants.

Round-robin distributes requests across all configured endpoints. A health checker periodically probes each endpoint and automatically routes traffic away from servers that fail.

Multiple Endpoints for Load Balancing

You can configure multiple endpoints for one model to distribute load:

  - name: phi3
    active: true
    input_cost_per_million: 0.0
    output_cost_per_million: 0.0
    endpoints:
      - url: http://gpu-server-1.internal/v1
        api_key: key-one
        model: phi-3-mini
      - url: http://gpu-server-2.internal/v1
        api_key: key-two
        model: phi-3-mini
      - url: http://gpu-server-3.internal/v1
        api_key: key-three
        model: phi-3-mini

The models page shows how many of those endpoints are healthy. If all endpoints for a model are down, the model shows a "down" status and the chat interface hides it.

Ollama (Local Models)

Ollama runs on your own hardware. It uses an OpenAI-compatible API at http://localhost:11434/v1 and doesn't require a real API key — any non-empty string works:

  - name: llama3.2
    active: true
    input_cost_per_million: 0.0
    output_cost_per_million: 0.0
    supports_reasoning: true
    input_modalities: ["text"]
    output_modalities: ["text"]
    endpoints:
      - url: http://localhost:11434/v1
        api_key: ollama
        model: llama3.2

Duplicate Names

If the same name appears twice in config.yaml, the later entry wins. This can be useful for environment-specific overrides (e.g., a local dev model vs production).

National Center for Supercomputing Applications

Lumen

Illinois Computes GitHub Repository Request Feature