Introducing Virtual MCP Servers
LogoLogo
GithubJoin SlackSignupBook a Demo
  • Documentation
  • Self Hosted
  • Integrations
  • Guides
  • Enterprise
  • Architecture Overview
  • Enterprise Licensing Options
  • Running Locally
    • ai-gateway.yaml
  • Tenant & User Provisioning
  • Deployment Options
    • Using Docker Compose
    • Deploying on AWS Cloud
    • Using Kubernetes (Beta)
    • Deploying on GCP (Beta)
  • Resources
    • Multi Tenancy
    • Configuring Data Retention
    • Clickhouse Queries
    • Working with Models
Powered by GitBook
LogoLogo

Social

  • LinkedIn
  • X
  • Youtube
  • Github

Platform

  • Pricing
  • Documentation
  • Blog

Company

  • Home
  • About

Legal

  • Privacy Policy
  • Terms of Service

2025 LangDB. All rights reserved.

On this page
  • Overview
  • Tracing Data Architecture
  • Implementation using Materialized Views
  • Tier-Specific Materialized Views
  • Data Access Flow
  • Benefits of This Approach
  • Backup and Disaster Recovery
  • Monitoring and Management
  • Future Enhancements

Was this helpful?

Export as PDF
  1. Resources

Configuring Data Retention

Control trace data retention in LangDB with scalable, cost-effective strategies using ClickHouse background TTL processes and tiered materialized views.

Overview

This document outlines LangDB's data retention strategy for tracing information stored in ClickHouse. The strategy employs materialized views to manage data retention periods based on user subscription tiers efficiently. Data eviction is implemented using ClickHouse's TTL (Time-To-Live) mechanisms and background processes:

  • TTL Definitions: Each table includes TTL expressions that specify when data should expire based on timestamp fields

  • Background Merge Process: ClickHouse automatically runs background processes that merge data parts and remove expired data during these merge operations

  • Resource-Efficient: The eviction process runs asynchronously during system low-load periods, minimizing impact on query performance

Tracing Data Architecture

LangDB uses a robust system for storing and analyzing trace data:

  • Primary Storage: All trace data is initially stored in the langdb.traces table in ClickHouse

  • Materialized Views: Tier-specific materialized views filter and retain data based on user subscription levels

  • Retention Policies: Automated TTL (Time-To-Live) mechanisms enforce retention periods

Implementation using Materialized Views

Tier-Specific Materialized Views

Professional Tier View

CREATE MATERIALIZED VIEW langdb.traces_professional_mv
TO langdb.traces_professional
AS SELECT *
FROM langdb.traces;

CREATE TABLE langdb.traces_professional (
    /* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(30);

Enterprise Tier View

CREATE MATERIALIZED VIEW langdb.traces_enterprise_mv
TO langdb.traces_enterprise
AS SELECT *
FROM langdb.traces;

CREATE TABLE langdb.traces_enterprise (
    /* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(90);

Data Access Flow

  1. New trace data is inserted into the base langdb.traces table

  2. Materialized views automatically filter and copy relevant data to tier-specific tables

  3. TTL mechanisms automatically remove data older than the specified retention period

  4. Data access APIs query the appropriate table based on the user's subscription tier

Benefits of This Approach

  • Efficiency: Only store data for the period necessary based on customer tier

  • Performance: Queries run against smaller, tier-specific tables rather than the entire dataset

  • Compliance: Clear retention boundaries help with regulatory compliance

  • Cost-Effective: Optimizes storage costs by aligning retention with customer value

Backup and Disaster Recovery

While the retention strategy focuses on operational access to trace data, a separate backup strategy ensures data can be recovered in case of system failures:

  • Daily snapshots of ClickHouse data

  • Backup retention aligned with the longest tier retention period (365 days)

  • Geo-redundant storage of backups

Monitoring and Management

The retention system includes:

  • Monitoring dashboards for data volume by tier

  • Alerts for unexpected growth or retention failures

  • Regular audits to ensure compliance with retention policies

Future Enhancements

  • Implementation of custom retention periods for specific enterprise customers

  • Cold storage options for extended archival needs

  • Advanced sampling techniques to retain representative trace data beyond standard periods

PreviousMulti TenancyNextClickhouse Queries

Last updated 1 day ago

Was this helpful?