Evaluation

You can evaluate the interactions with the Models in LangDB.

Data Collection

To evaluate model interactions, we extract message data from LangDB. This involves:

  • Fetching all messages from conversation threads using the LangDB API.

  • Exporting the data into a structured format such as a DataFrame (df) or CSV file.

from pylangdb.client import LangDb
client = LangDb(
    api_key=os.getenv("LANGDB_API_KEY"), 
    project_id=os.getenv("LANGDB_PROJECT_ID"))
thread_id  =[...,...,...] # LangDB Threads
df = client.create_evaluation_df(thread_ids)

Cost Calculation

Once the data is collected, we can compute:

  • Total cost: Sum of the cost of all interactions.

  • Average: Average cost per message.

print(f"Total cost across all threads: ${df['thread_total_cost'].sum():.4f}")
thread_costs = df.groupby('thread_id')['thread_total_cost'].sum()
avg_cost = df['thread_total_cost'].sum() / len(df)
print(f"\nAverage cost per message: ${avg_cost:.4f}")

Custom Evaluations

Beyond cost analysis, the messages allows you to conduct deeper insights into topic distribution, and trends.

Example Output:

For more evaluations, check out the full notebook!

Last updated

Was this helpful?