Skip to main content

AI Agent Evaluations

OpenTelemetry-based evaluation dashboard for bilingual AI agents

Accuracy

87.5%
240.0%

Avg Latency

1170ms
5%vs last week

Pass Rate

87.5

Cost/1k Requests

$0.42

E-commerce Support Agent

Customer support for e-commerce queries

production
Model

GPT-4-Turbo

Version

1.2.0

Languages

English, Arabic

Last Evaluated

11/15/2024

Test CaseLanguageScoreLatencyStatus
Product Return Request
English
95%
1240ms
Pass
Shipping Delay Complaint
English
72%
1180ms
Fail
طلب استرجاع منتج
العربية
98%
1320ms
Pass
Product Availability Check
English
88%
890ms
Pass
استفسار عن طرق الدفع
العربية
85%
1050ms
Pass
Discount Code Issue
English
92%
1420ms
Pass
Order Status Inquiry
English
100%
980ms
Pass
شكوى جودة المنتج
العربية
96%
1280ms
Pass

Total Tests

8

Passed

7

Failed

1

Avg Score

90.8%