langdb_udf adds support for AI operations directly within ClickHouse through User Defined Functions (UDFs). This enables running AI completions and embeddings natively in your SQL queries. You can access 250+ models directly in Clickhouse.
# Clone the repository
git clone git@github.com:langdb/ai-gateway.git
cd ai-gateway
# Create directory for ClickHouse user scripts
mkdir -p docker/clickhouse/user_scripts
# Download the latest UDF
curl -sL https://github.com/langdb/ai-gateway/releases/download/0.1.0/langdb_udf \
-o docker/clickhouse/user_scripts/langdb_udf
# Start ClickHouse with LangDB UDF
docker compose up -d
Usage Examples
Using ai_completions
Basic example with system prompt:
-- Set system prompt
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value sentiment score between 1 and 5 for every input and nothing else.';
-- Run completion
SELECT ai_completions
('{"model": "gpt-4o-mini", "max_tokens": 1000}')
({system_prompt:String}, 'You are very rude') as score
Advanced Parameters
You can specify additional parameters like thread_id and run_id:
-- Set parameters
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value sentiment score between 1 and 5 for every input and nothing else.';
-- Generate UUIDs for tracking
SELECT generateUUIDv4();
SET param_thread_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
SET param_run_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
-- Run completion with parameters
SELECT ai_completions
('{"model": "gpt-4o-mini", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
({system_prompt:String}, 'You are very rude') as score
Using ai_embed
Generate embeddings from text:
SELECT ai_embed
('{"model":"text-embedding-3-small"}')
('Life is beautiful') as embed_text
Real-world Example: Content Moderation
This example shows how to score HackerNews comments for harmful content:
-- Create and populate table
CREATE TABLE hackernews
ENGINE = MergeTree
ORDER BY id
SETTINGS allow_nullable_key = 1 EMPTY AS
SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.parquet', 'Parquet');
-- Insert sample data
INSERT INTO hackernews SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.parquet', 'Parquet')
LIMIT 100;
-- Set up parameters
SET param_system_prompt = 'You are a helpful assistant. You will return only a single value score between 1 and 5 for every input and nothing else based on malicious behavior. 0 being ok, 5 being the most harmful';
SET param_thread_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
SET param_run_id = '06b66882-e42e-4b17-ba93-4b5260a10ad8';
-- Score content using multiple models
WITH tbl as ( select * from hackernews limit 5)
SELECT
id,
left(text, 100) as text_clip,
ai_completions
('{"model": "gpt-4o-mini", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
({system_prompt:String}, text) as gpt_4o_mini_score,
ai_completions
('{"model": "gemini/gemini-1.5-flash-8b", "max_tokens": 1000, "thread_id": "' || {thread_id:String} || '", "run_id": "' || {run_id:String} || '"}')
({system_prompt:String}, text) as gemini_15flash_score
FROM tbl
FORMAT PrettySpace
id text_clip gpt_4o_mini_score gemini_15flash_score
1. 7544833 This is a project for people who like to read and 2 2
2. 7544834 I appreciate your efforts to set the facts straigh 2 2
3. 7544835 Here in Western Europe, earning $100,000 per year 1 2
4. 7544836 Haha oh man so true. This is why I've found i 3 2
5. 7544837 The thing is it's gotten more attention from 1 2
Observability
If tracing is enabled you ll be able to view several metrics about the request such as cost, time, Time to First Token etc on https://app.langdb.ai/