Evaluation

You can evaluate the interactions with the Models in LangDB.

Data Collection

To evaluate model interactions, we extract message data from LangDB. This involves:

  • Fetching all messages from conversation threads using the LangDB API.

  • Exporting the data into a structured format such as a DataFrame (df) or CSV file.

from pylangdb.client import LangDb
client = LangDb(
    api_key=os.getenv("LANGDB_API_KEY"), 
    project_id=os.getenv("LANGDB_PROJECT_ID"))
thread_id  =[...,...,...] # LangDB Threads
df = client.create_evaluation_df(thread_ids)

Cost Calculation

Once the data is collected, we can compute:

  • Total cost: Sum of the cost of all interactions.

  • Average: Average cost per message.

print(f"Total cost across all threads: ${df['thread_total_cost'].sum():.4f}")
thread_costs = df.groupby('thread_id')['thread_total_cost'].sum()
avg_cost = df['thread_total_cost'].sum() / len(df)
print(f"\nAverage cost per message: ${avg_cost:.4f}")

Custom Evaluations

Beyond cost analysis, the messages allows you to conduct deeper insights into topic distribution, and trends.

# Analyze topic distribution
topics = analyzer.get_topic_distribution(thread_ids)
print("\nTopic Distribution Results:")
print(topics)

Example Output:

{
    "topic_distribution": {
        "Programming Languages": 5,
        "Python Concepts": 6,
        "Web Development": 2,
        "Error Handling": 1,
        "Testing": 1,
        "Optimization": 1
    },
    "total_messages": 10
}

For more evaluations, check out the full notebook!

Last updated

Was this helpful?