housing_qa_knowledge_only


4 submissions

Rank Model accuracy f1_macro Date Results
1 gpt-5-2025-08-07 0.715 0.705 2025-08-08 View
2 claude-3-haiku-20240307 0.593 0.588 2025-08-04 View
3 claude-3-5-haiku-20241022 0.584 0.580 2025-08-04 View
4 gpt-4o-mini-2024-07-18 0.544 0.544 2025-08-04 View