AI Agent Evaluations
OpenTelemetry-based evaluation dashboard for bilingual AI agents
Accuracy
87.5%
240.0%
Avg Latency
1170ms
5%vs last week
Pass Rate
87.5
Cost/1k Requests
$0.42
E-commerce Support Agent
Customer support for e-commerce queries
production
Model
GPT-4-Turbo
Version
1.2.0
Languages
English, Arabic
Last Evaluated
11/15/2024
| Test Case | Language | Score | Latency | Status | |
|---|---|---|---|---|---|
| Product Return Request | English | 1240ms | Pass | ||
| Shipping Delay Complaint | English | 1180ms | Fail | ||
| طلب استرجاع منتج | العربية | 1320ms | Pass | ||
| Product Availability Check | English | 890ms | Pass | ||
| استفسار عن طرق الدفع | العربية | 1050ms | Pass | ||
| Discount Code Issue | English | 1420ms | Pass | ||
| Order Status Inquiry | English | 980ms | Pass | ||
| شكوى جودة المنتج | العربية | 1280ms | Pass |
Total Tests
8
Passed
7
Failed
1
Avg Score
90.8%