Connecting to OSS Models
Connect to open-source models using Ollama or vLLM with LangDB AI Gateway.
LangDB AI Gateway supports connecting to open-source models through providers like Ollama and vLLM. This allows you to use locally hosted models while maintaining the same OpenAI-compatible API interface.
Configuration
To use Ollama or vLLM, you need to provide a list of models with their endpoints. By default, ai-gateway loads models from ~/.langdb/models.yaml
. You can define your models there in the following format:
- model: gpt-oss
model_provider: ollama
inference_provider:
provider: ollama
model_name: gpt-oss
endpoint: https://my-ollama-server.localhost
price:
per_input_token: 0.0
per_output_token: 0.0
input_formats:
- text
output_formats:
- text
limits:
max_context_size: 128000
capabilities: ['tools']
type: completions
description: OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
Configuration Fields
model
The model identifier used in API requests
Yes
model_provider
The provider type (e.g., ollama
, vllm
)
Yes
inference_provider
Provider-specific configuration
Yes
price
Token pricing (set to 0.0 for local models)
Yes
input_formats
Supported input formats
Yes
output_formats
Supported output formats
Yes
limits
Model limitations (context size, etc.)
Yes
capabilities
Model capabilities array (e.g., ['tools']
for function calling)
Yes
type
Model type (e.g., completions
)
Yes
description
Human-readable model description
Yes
Example Usage
Once configured, you can use your OSS models through the standard OpenAI-compatible API:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
Supported Providers
Ollama
Provider:
ollama
Endpoint: URL to your Ollama server
Model Name: The model name as configured in Ollama
vLLM
Provider:
vllm
Endpoint: URL to your vLLM server
Model Name: The model name as configured in vLLM
Best Practices
Local Development: Use
localhost
or127.0.0.1
for local Ollama/vLLM instancesProduction: Use proper domain names or IP addresses for remote instances
Security: Ensure your OSS model endpoints are properly secured
Performance: Consider the network latency between ai-gateway and your model servers
Monitoring: Use the observability features to monitor OSS model performance
Last updated
Was this helpful?