LIBAOS AI 小幫手

production env · 1 suites · 2 次完成 run

Subject № 8b573a9c-28ac-4682-ad6f-ce0e52425adb PRODUCTION
評測狀態 · 待重生

至少 1 個 suite 已漂移 — 建議重生

7 scenarios · 66 KB items 1 suite (1 stale)
FLEET READY · ALL KINDS COVERED

30 cases · LIBAOS AI 小幫手 (bulk R1)

kb_accuracy 10
scenario_funnel 10
mixed_qa 10
uncategorized 0
01

生命徵象

[KIND × DIMENSION] vital signs — this bot's per-dim clearance vs. its baseline
知識庫精準度 [PASS]
檢索 100.0% 100.0% ≥95.0% [±5pp] +5.0 ✓
忠實度 100.0% 100.0% ≥92.0% [±8pp] +8.0 ✓
回答品質 99.0% 93.3% ≥91.0% [±8pp] +2.3 ✓
情境調用與完成 [FAIL]
情境 100.0% 90.0% <95.0% [±5pp] -5.0 ✗
工具使用 90.0% 90.0% ≥85.0% [±5pp] +5.0 ✓
回答品質 71.3% 78.3% ≥70.0% [floor] +8.3 ✓
對話素養(混合問答) [FAIL]
檢索 50.0% 50.0% <70.0% [floor] -20.0 ✗
忠實度 83.3% 83.3% ≥75.3% [±8pp] +8.0 ✓
回答品質 82.3% 82.0% ≥74.3% [±8pp] +7.7 ✓
02

測試套件