Connecting to OSS Models

Connect to open-source models using Ollama or vLLM with LangDB AI Gateway.

LangDB AI Gateway supports connecting to open-source models through providers like Ollama and vLLM. This allows you to use locally hosted models while maintaining the same OpenAI-compatible API interface.

Configuration

To use Ollama or vLLM, you need to provide a list of models with their endpoints. By default, ai-gateway loads models from ~/.langdb/models.yaml. You can define your models there in the following format:

- model: gpt-oss
  model_provider: ollama
  inference_provider:
    provider: ollama
    model_name: gpt-oss
    endpoint: https://my-ollama-server.localhost
  price:
    per_input_token: 0.0
    per_output_token: 0.0
  input_formats:
  - text
  output_formats:
  - text
  limits:
    max_context_size: 128000
  capabilities: ['tools']
  type: completions
  description: OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Configuration Fields

Field

Description

Required

model

The model identifier used in API requests

Yes

model_provider

The provider type (e.g., ollama, vllm)

Yes

inference_provider

Provider-specific configuration

Yes

price

Token pricing (set to 0.0 for local models)

Yes

input_formats

Supported input formats

Yes

output_formats

Supported output formats

Yes

limits

Model limitations (context size, etc.)

Yes

capabilities

Model capabilities array (e.g., ['tools'] for function calling)

Yes

type

Model type (e.g., completions)

Yes

description

Human-readable model description

Yes

Example Usage

Once configured, you can use your OSS models through the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Supported Providers

Ollama

Provider: ollama
Endpoint: URL to your Ollama server
Model Name: The model name as configured in Ollama

vLLM

Provider: vllm
Endpoint: URL to your vLLM server
Model Name: The model name as configured in vLLM

Best Practices

Local Development: Use localhost or 127.0.0.1 for local Ollama/vLLM instances
Production: Use proper domain names or IP addresses for remote instances
Security: Ensure your OSS model endpoints are properly secured
Performance: Consider the network latency between ai-gateway and your model servers
Monitoring: Use the observability features to monitor OSS model performance

PreviousGetting Started NextRate Limiting

Last updated 17 days ago

Was this helpful?