Only this pageAll pages
Powered by GitBook
1 of 8

Self Hosted

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Getting Started

Run LangDB AI Gateway locally.

LangDB AI gateway is available as an open-source repo that you can configure locally. Own your LLM data and route to 250+ models.

Here is the link to the repo - https://github.com/langdb/ai-gateway

Running Locally

1

Run Docker and Login

docker run -it \
    -p 8080:8080 \
    langdb/ai-gateway login
2

Start Server

docker run -it \
    -p 8080:8080 \
    langdb/ai-gateway serve
3

Make your first request

# Chat completion with GPT-4
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

# Or try Claude
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-opus",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

The gateway provides the following OpenAI-compatible endpoints:

  • POST /v1/chat/completions - Chat completions

  • GET /v1/models - List available models

  • POST /v1/embeddings - Generate embeddings

  • POST /v1/images/generations - Generate images

Advanced Configuration

LangDB allows advanced configuration options to customize its functionality. The three main configuration areas are:

  1. Limits – Control API usage with rate limiting and cost control.

  2. Routing – Define how requests are routed across multiple LLM providers.

  3. Observability – Enable logging and tracing to monitor API performance.

These configurations can be set up using a configuration file (config.yaml) or overridden via command line options.

Setting up

Download the sample configuration from our repo.

  1. Copy the example config file:

curl -sL https://raw.githubusercontent.com/langdb/ai-gateway/main/config.sample.yaml -o config.sample.yaml

cp config.sample.yaml config.yaml

Command line options will override corresponding config file settings when both are specified.

Visit for more details.

Observability with Clickhouse

Configure your tracing data and store them in Clickhouse

The gateway supports OpenTelemetry tracing with ClickHouse as the storage backend. All traces are stored in the langdb.traces table.

Setting up

  1. Create the traces table in ClickHouse:

# Create langdb database if it doesn't exist
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS langdb"

# Import the traces table schema
clickhouse-client --query "$(cat sql/traces.sql)"
  1. Enable tracing by providing the ClickHouse URL when running the server:

ai-gateway serve --clickhouse-url "clickhouse://localhost:9000"

Or in config.yaml:

clickhouse:
  url: "http://localhost:8123"

Querying Traces

Traces are stored in the langdb.traces table. Example query:

-- Get recent traces
SELECT
    trace_id,
    operation_name,
    start_time_us,
    finish_time_us,
    (finish_time_us - start_time_us) as duration_us
FROM langdb.traces
WHERE finish_date >= today() - 1
ORDER BY finish_time_us DESC
LIMIT 10;

Leveraging LangDB APIs within ClickHouse

LangDB APIs can be called directly within ClickHouse. Check out our UDF documentation to learn how to use LLMs in SQL queries.

Running with Docker Compose

For a complete setup, including ClickHouse for analytics and tracing, follow these steps:

  1. Start the services using Docker Compose:

docker-compose up -d

This will start:

  • ClickHouse server on ports 8123 (HTTP)

  • All necessary configurations loaded from docker/clickhouse/server/config.d

  1. Build and run the gateway:

ai-gateway run

The gateway will now be running with full analytics and logging capabilities, storing data in ClickHouse.

Connecting to OSS Models

Connect to open-source models using Ollama or vLLM with LangDB AI Gateway.

LangDB AI Gateway supports connecting to open-source models through providers like Ollama and vLLM. This allows you to use locally hosted models while maintaining the same OpenAI-compatible API interface.

Configuration

To use Ollama or vLLM, you need to provide a list of models with their endpoints. By default, ai-gateway loads models from ~/.langdb/models.yaml. You can define your models there in the following format:

- model: gpt-oss
  model_provider: ollama
  inference_provider:
    provider: ollama
    model_name: gpt-oss
    endpoint: https://my-ollama-server.localhost
  price:
    per_input_token: 0.0
    per_output_token: 0.0
  input_formats:
  - text
  output_formats:
  - text
  limits:
    max_context_size: 128000
  capabilities: ['tools']
  type: completions
  description: OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Configuration Fields

Field
Description
Required

model

The model identifier used in API requests

Yes

model_provider

The provider type (e.g., ollama, vllm)

Yes

inference_provider

Provider-specific configuration

Yes

price

Token pricing (set to 0.0 for local models)

Yes

input_formats

Supported input formats

Yes

output_formats

Supported output formats

Yes

limits

Model limitations (context size, etc.)

Yes

capabilities

Model capabilities array (e.g., ['tools'] for function calling)

Yes

type

Model type (e.g., completions)

Yes

description

Human-readable model description

Yes

Example Usage

Once configured, you can use your OSS models through the standard OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Supported Providers

Ollama

  • Provider: ollama

  • Endpoint: URL to your Ollama server

  • Model Name: The model name as configured in Ollama

vLLM

  • Provider: vllm

  • Endpoint: URL to your vLLM server

  • Model Name: The model name as configured in vLLM

Best Practices

  1. Local Development: Use localhost or 127.0.0.1 for local Ollama/vLLM instances

  2. Production: Use proper domain names or IP addresses for remote instances

  3. Security: Ensure your OSS model endpoints are properly secured

  4. Performance: Consider the network latency between ai-gateway and your model servers

  5. Monitoring: Use the observability features to monitor OSS model performance

API Reference

API Reference for LangDB

Cost Control

Apply cost control using configuration.

Cost control helps manage API spending by setting daily, monthly, or total cost limits. Configure cost limits using:

# Set daily and monthly limits
ai-gateway serve \
  --cost-daily 100.0 \
  --cost-monthly 1000.0 \
  --cost-total 5000.0

Or in config.yaml:

cost_control:
  daily: 100.0   # $100 per day
  monthly: 1000.0  # $1000 per month
  total: 5000.0    # $5000 total

When a cost limit is reached, the API will return a 429 response indicating the limit has been exceeded.

Benefits of Cost Control

  • Prevents overspending: Ensures budgets are adhered to.

  • Optimizes usage: Encourages efficient API consumption.

Routing

Configure dynamic routing to manage LLM traffic intelligently with fallback, script, and latency strategies in LangDB.

Dynamic Model Routing

LangDB AI Gateway enables sophisticated routing strategies for LLM requests. You can optimize AI traffic by implementing fallback routing, script-based routing, and latency-based routing.

Self hosted option enables routing through configuration. Checkout the full for more details.

Example Configuration:

This configuration allows multiple targets with specific parameters, ensuring that requests are handled efficiently.

{
    "model": "router/dynamic",
    "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What is the formula of a square plot?" }
    ],
    "router": {
        "router": "router",
        "type": "fallback",
        "targets": [
            { "model": "openai/gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 },
            { "model": "deepseek/deepseek-chat", "frequency_penalty": 1, "presence_penalty": 0.6 }
        ]
    },
    "stream": false
}

Rate Limiting

Apply rate limiting, cost control and more.

Rate limiting is an essential mechanism to prevent API abuse by controlling the number of requests allowed within a specific time frame. You can configure rate limits by setting hourly, daily and monthly total limits

This ensures fair usage and helps maintain system performance and stability.

# Limit to 1000 requests per hour
ai-gateway serve \
    --rate-hourly 1000
    --rate-daily 1000
    --rate-monthly 1000

Or in config.yaml:

rate_limit:
  hourly: 100
  daily: 1000
  monthly: 10000

When a rate limit is exceeded, the API will return a 429 (Too Many Requests) response.

Why Rate Limiting Matters

  • Prevents excessive LLM API usage: Controls the number of requests per user to avoid resource exhaustion.

  • Optimizes model inference efficiency: Ensures that LLM requests are processed smoothly without congestion.

Using LangDB Locally through ai-gateway

Clickhouse UDFs

Leveraging AI functions directly in your Clickhouse environment

langdb_udf adds support for AI operations directly within ClickHouse through User Defined Functions (UDFs). This enables running AI completions and embeddings natively in your SQL queries. You can access 250+ models directly in Clickhouse.

  • Check the full list of models supported here

  • You can find the full instructions in our AI Gateway repository.

Features

  • ai_completions: Generate AI completions from various models

  • ai_embed: Create embeddings from text

Why Use LangDB UDFs in ClickHouse?

LangDB UDFs are particularly powerful for running LLM-based evaluations and analysis directly within your ClickHouse environment:

  • Native Integration: Run AI operations directly in SQL queries without data movement

  • Batch Processing: Efficiently process and analyze large datasets with LLMs

  • Real-time Analysis: Perform content moderation, sentiment analysis, and other AI tasks as part of your data pipeline

  • Model Comparison: Easily compare results across different LLM models in a single query

  • Scalability: Leverage ClickHouse's distributed architecture for parallel AI processing

Prerequisites

  1. Get your LangDB credentials:

    • Sign up at LangDB

    • Get your LANGDB_PROJECT_ID and LANGDB_API_KEY

    • Download the latest landb_udf binary

  2. Set up environment variables:

export LANGDB_PROJECT_ID=your_project_id
export LANGDB_API_KEY=your_api_key

Installation

# Clone the repository
git clone [email protected]:langdb/ai-gateway.git
cd ai-gateway

# Create directory for ClickHouse user scripts
mkdir -p docker/clickhouse/user_scripts

# Download the latest UDF
curl -sL https://github.com/langdb/ai-gateway/releases/download/0.1.0/langdb_udf \
  -o docker/clickhouse/user_scripts/langdb_udf

# Start ClickHouse with LangDB UDF
docker compose up -d

Usage Examples

Using ai_completions

Basic example with system prompt:

-- Set system prompt
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value sentiment score between 1 and 5 for every input and nothing else.';

-- Run completion
SELECT ai_completions
('{"model": "gpt-4o-mini", "max_tokens": 1000}') 
({system_prompt:String}, 'You are very rude') as score

Advanced Parameters

You can specify additional parameters like thread_id and run_id:

-- Set parameters
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value sentiment score between 1 and 5 for every input and nothing else.';

-- Generate UUIDs for tracking
SELECT generateUUIDv4();
SET param_thread_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
SET param_run_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';

-- Run completion with parameters
SELECT ai_completions
('{"model": "gpt-4o-mini", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
({system_prompt:String}, 'You are very rude') as score

Using ai_embed

Generate embeddings from text:

SELECT ai_embed
('{"model":"text-embedding-3-small"}')
('Life is beautiful') as embed_text

Real-world Example: Content Moderation

This example shows how to score HackerNews comments for harmful content:

-- Create and populate table
CREATE TABLE hackernews
ENGINE = MergeTree
ORDER BY id
SETTINGS allow_nullable_key = 1 EMPTY AS
SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.parquet', 'Parquet');

-- Insert sample data
INSERT INTO hackernews SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.parquet', 'Parquet') 
LIMIT 100;

-- Set up parameters
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value score between 1 and 5 for every input and nothing else based on malicious behavior. 0 being ok, 5 being the most harmful';
SET param_thread_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
SET param_run_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';

-- Score content using multiple models
WITH tbl as ( select * from hackernews limit 5)
SELECT  
    id, 
    left(text, 100) as text_clip, 
    ai_completions
    ('{"model": "gpt-4o-mini", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
    ({system_prompt:String}, text) as gpt_4o_mini_score,
    ai_completions
    ('{"model": "gemini/gemini-1.5-flash-8b", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
    ({system_prompt:String}, text) as gemini_15flash_score
FROM tbl 
FORMAT PrettySpace
id   text_clip                                            gpt_4o_mini_score   gemini_15flash_score
1.  7544833   This is a project for people who like to read and    2                   2
                    
2.  7544834   I appreciate your efforts to set the facts straigh   2                   2
                    
3.  7544835   Here in Western Europe, earning $100,000 per year    1                   2
                    
4.  7544836   Haha oh man so true. This is why I've found i   3                   2
                    
5.  7544837   The thing is it's gotten more attention from    1                   2
                    

Observability

If tracing is enabled you ll be able to view several metrics about the request such as cost, time, Time to First Token etc on https://app.langdb.ai/

References

  • ClickHouse UDF Documentation

  • LangDB API Parameters Reference

  • Docker Compose Configuration

Using LangDB AI gateway through clickhouse using UDFs
Routing documentation

Retrieve pricing information

get

Returns the pricing details for LangDB services.

Responses
200

Successful retrieval of pricing information

application/json
get
GET /pricing HTTP/1.1
Host: api.us-east-1.langdb.ai
Accept: */*
200

Successful retrieval of pricing information

{
  "model": "gpt-3.5-turbo-0125",
  "provider": "openai",
  "price": {
    "per_input_token": 0.5,
    "per_output_token": 1.5,
    "valid_from": null
  },
  "input_formats": [
    "text"
  ],
  "output_formats": [
    "text"
  ],
  "capabilities": [
    "tools"
  ],
  "type": "completions",
  "limits": {
    "max_context_size": 16385
  }
}

List models

get
Responses
200

OK

application/json
get
GET /models HTTP/1.1
Host: api.us-east-1.langdb.ai
Accept: */*
200

OK

{
  "object": "list",
  "data": [
    {
      "id": "o1-mini",
      "object": "model",
      "created": 1686935002,
      "owned_by": "openai"
    }
  ]
}

Fetch analytics data

post
Authorizations
Header parameters
X-Project-IdstringRequired

LangDB project ID

Body
start_time_usinteger · int64Optional

Start time in microseconds.

Example: 1693062345678
end_time_usinteger · int64Optional

End time in microseconds.

Example: 1693082345678
Responses
200

Successful response

application/json
post
200

Successful response

Fetch analytics summary

post
Authorizations
Header parameters
X-Project-IdstringRequired

LangDB project ID

Body
start_time_usinteger · int64OptionalExample: 1693062345678
end_time_usinteger · int64OptionalExample: 1693082345678
groupBystring[]RequiredExample: ["provider"]
Responses
200

Successful response

application/json
post
200

Successful response

Get total usage

post
Authorizations
Header parameters
X-Project-IdstringRequired

LangDB project ID

Body
start_time_usinteger · int64OptionalExample: 1693062345678
end_time_usinteger · int64Optional
Responses
200

OK

application/json
post
200

OK

Retrieve a list of threads

post
Authorizations
Header parameters
X-Project-IdstringRequired

LangDB project ID

Body
limitinteger · min: 1RequiredExample: 10
offsetintegerRequiredExample: 100
Responses
200

A list of threads with pagination info

application/json
post
200

A list of threads with pagination info

Retrieve messages for a specific thread

get
Authorizations
Path parameters
thread_idstring · uuidRequired

The ID of the thread to retrieve messages from

Header parameters
X-Project-IdstringRequired

LangDB project ID

Responses
200

A list of messages for the given thread

application/json
get
200

A list of messages for the given thread

Retrieve the total cost for a specific thread

get
Authorizations
Path parameters
thread_idstring · uuidRequired

The ID of the thread for which to retrieve cost information

Header parameters
X-Project-IdstringRequired

LangDB project ID

Responses
200

The total cost and token usage for the specified thread

application/json
get
200

The total cost and token usage for the specified thread

Create chat completion

post
Authorizations
Header parameters
X-Project-IdstringRequired

LangDB project ID

Body
modelstringRequired

ID of the model to use. This can be either a specific model ID or a virtual model identifier.

Example: gpt-4o
temperaturenumber · max: 2Optional

Sampling temperature.

Example: 0.8
Responses
200

OK

application/json
post
200

OK

Create embeddings

post

Creates an embedding vector representing the input text or token arrays.

Authorizations
Body
modelstringRequired

ID of the model to use for generating embeddings.

Example: text-embedding-ada-002
inputone ofRequired
stringOptional

The text to embed.

or
string[]Optional

Array of text strings to embed.

encoding_formatstring · enumOptional

The format to return the embeddings in.

Default: floatPossible values:
dimensionsinteger · min: 1 · max: 1536Optional

The number of dimensions the resulting embeddings should have.

Example: 1536
Responses
200

Successful response with embeddings

application/json
post
200

Successful response with embeddings

POST /analytics HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Content-Type: application/json
Accept: */*
Content-Length: 59

{
  "start_time_us": 1693062345678,
  "end_time_us": 1693082345678
}
{
  "timeseries": [
    {
      "hour": "2025-02-20 18:00:00",
      "total_cost": 12.34,
      "total_requests": 1000,
      "avg_duration": 250.5,
      "duration": 245.7,
      "duration_p99": 750.2,
      "duration_p95": 500.1,
      "duration_p90": 400.8,
      "duration_p50": 200.3,
      "total_duration": 1,
      "total_input_tokens": 1,
      "total_output_tokens": 1,
      "error_rate": 1,
      "error_request_count": 1,
      "avg_ttft": 1,
      "ttft": 1,
      "ttft_p99": 1,
      "ttft_p95": 1,
      "ttft_p90": 1,
      "ttft_p50": 1,
      "tps": 1,
      "tps_p99": 1,
      "tps_p95": 1,
      "tps_p90": 1,
      "tps_p50": 1,
      "tpot": 0.85,
      "tpot_p99": 1.5,
      "tpot_p95": 1.2,
      "tpot_p90": 1,
      "tpot_p50": 0.75,
      "tag_tuple": [
        "text"
      ]
    }
  ],
  "start_time_us": 1,
  "end_time_us": 1
}
POST /analytics/summary HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Content-Type: application/json
Accept: */*
Content-Length: 82

{
  "start_time_us": 1693062345678,
  "end_time_us": 1693082345678,
  "groupBy": [
    "provider"
  ]
}
{
  "summary": [
    {
      "tag_tuple": [
        "openai",
        "gpt-4"
      ],
      "total_cost": 156.78,
      "total_requests": 5000,
      "total_duration": 1250000,
      "avg_duration": 250,
      "duration": 245.5,
      "duration_p99": 750,
      "duration_p95": 500,
      "duration_p90": 400,
      "duration_p50": 200,
      "total_input_tokens": 100000,
      "total_output_tokens": 50000,
      "avg_ttft": 100,
      "ttft": 98.5,
      "ttft_p99": 300,
      "ttft_p95": 200,
      "ttft_p90": 150,
      "ttft_p50": 80,
      "tps": 10.5,
      "tps_p99": 20,
      "tps_p95": 15,
      "tps_p90": 12,
      "tps_p50": 8,
      "tpot": 0.85,
      "tpot_p99": 1.5,
      "tpot_p95": 1.2,
      "tpot_p90": 1,
      "tpot_p50": 0.75,
      "error_rate": 1,
      "error_request_count": 1
    }
  ],
  "start_time_us": 1,
  "end_time_us": 1
}
POST /usage/total HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Content-Type: application/json
Accept: */*
Content-Length: 47

{
  "start_time_us": 1693062345678,
  "end_time_us": 1
}
{
  "models": [
    {
      "provider": "openai",
      "model_name": "gpt-4o",
      "total_input_tokens": 3196182,
      "total_output_tokens": 74096,
      "total_cost": 10.4776979999,
      "cost_per_input_token": 3,
      "cost_per_output_token": 12
    }
  ],
  "total": {
    "total_input_tokens": 4181386,
    "total_output_tokens": 206547,
    "total_cost": 11.8904386859
  },
  "period_start": 1737504000000000,
  "period_end": 1740120949421000
}
POST /threads HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Content-Type: application/json
Accept: */*
Content-Length: 25

{
  "limit": 10,
  "offset": 100
}
{
  "data": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "created_at": "2025-10-01T22:21:07.541Z",
      "updated_at": "2025-10-01T22:21:07.541Z",
      "model_name": "text",
      "project_id": "text",
      "score": 1,
      "title": "text",
      "user_id": "text"
    }
  ],
  "pagination": {
    "limit": 10,
    "offset": 100,
    "total": 10
  }
}
GET /threads/{thread_id}/messages HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Accept: */*
[
  {
    "model_name": "gpt-4o-mini",
    "thread_id": "123e4567-e89b-12d3-a456-426614174000",
    "user_id": "langdb",
    "content_type": "Text",
    "content": "text",
    "content_array": [
      "text"
    ],
    "type": "system",
    "tool_call_id": "123e4567-e89b-12d3-a456-426614174000",
    "tool_calls": "text",
    "created_at": "2025-01-29 10:25:00.736000",
    "id": "123e4567-e89b-12d3-a456-426614174000"
  }
]
GET /threads/{thread_id}/cost HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Accept: */*
{
  "total_cost": 0.022226999999999997,
  "total_output_tokens": 171,
  "total_input_tokens": 6725
}
POST /v1/chat/completions HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Project-Id: text
Content-Type: application/json
Accept: */*
Content-Length: 599

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Write a haiku about recursion in programming."
    }
  ],
  "temperature": 0.8,
  "max_tokens": 1000,
  "top_p": 0.9,
  "frequency_penalty": 0.1,
  "presence_penalty": 0.2,
  "stream": false,
  "response_format": "json_object",
  "mcp_servers": [
    {
      "name": "websearch",
      "type": "in-memory"
    }
  ],
  "router": {
    "name": "kg_random_router",
    "type": "script",
    "script": "const route = ({ request, headers, models, metrics }) => { return {model: 'test'};};"
  },
  "extra": {
    "guards": [
      "word_count_validator_bd4bdnun",
      "toxicity_detection_4yj4cdvu"
    ],
    "user": {
      "id": "7",
      "name": "mrunmay",
      "tags": [
        "coding",
        "software"
      ]
    }
  }
}
{
  "id": "text",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 1,
      "message": {
        "role": "assistant",
        "content": "text"
      },
      "logprobs": {
        "content": [
          {
            "token": "text",
            "logprob": 1
          }
        ],
        "refusal": [
          {
            "token": "text",
            "logprob": 1
          }
        ]
      }
    }
  ],
  "created": 1,
  "model": "text",
  "service_tier": "scale",
  "system_fingerprint": "text",
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 1,
    "total_tokens": 1,
    "prompt_tokens_details": {
      "cached_tokens": 1
    },
    "completion_tokens_details": {
      "reasoning_tokens": 1,
      "accepted_prediction_tokens": 1,
      "rejected_prediction_tokens": 1
    }
  }
}
POST /v1/embeddings HTTP/1.1
Host: api.us-east-1.langdb.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 136

{
  "input": "The food was delicious and the waiter was kind.",
  "model": "text-embedding-ada-002",
  "encoding_format": "float",
  "dimensions": 1536
}
{
  "data": [
    {
      "embedding": [
        1
      ],
      "index": 1
    }
  ],
  "model": "text",
  "usage": {
    "prompt_tokens": 1,
    "total_tokens": 1
  }
}