hearsay


19 submissions

Rank Model accuracy balanced_accuracy f1_macro f1_micro valid_predictions_ratio Date Results
1 claude-opus-4-1-20250805 0.915 0.919 0.914 0.915 1.000 2025-08-05 View
2 claude-opus-4-20250514 0.904 0.907 0.903 0.904 1.000 2025-07-25 View
3 grok-3-mini 0.883 0.885 0.882 0.883 1.000 2025-07-29 View
4 claude-sonnet-4-20250514 0.862 0.871 0.860 0.862 0.926 2025-07-25 View
5 grok-4-0709 0.851 0.838 0.844 0.851 1.000 2025-07-30 View
6 o4-mini 0.830 0.810 0.818 0.830 1.000 2025-07-08 View
7 o3 0.798 0.777 0.782 0.798 1.000 2025-07-08 View
8 gpt-5-2025-08-07 0.787 0.762 0.767 0.787 1.000 2025-08-08 View
9 openai/gpt-oss-120b 0.777 0.763 0.767 0.777 1.000 2025-08-05 View
10 deepseek-ai/DeepSeek-V3 0.755 0.733 0.737 0.755 1.000 2025-07-10 View
11 claude-3-5-haiku-20241022 0.745 0.768 0.743 0.745 1.000 2025-08-01 View
12 deepseek-ai/DeepSeek-R1 0.740 0.724 0.727 0.740 0.802 2025-07-11 View
13 gpt-4o-mini 0.734 0.723 0.725 0.734 1.000 2025-07-02 View
14 meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 0.734 0.703 0.702 0.734 1.000 2025-07-25 View
15 o3-mini 0.723 0.699 0.700 0.723 1.000 2025-07-09 View
16 meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo 0.723 0.691 0.687 0.723 1.000 2025-07-30 View
17 google/gemma-2-27b-it 0.713 0.709 0.709 0.713 1.000 2025-07-24 View
18 claude-3-haiku-20240307 0.691 0.707 0.691 0.691 1.000 2025-07-25 View
19 gpt-4.1-nano 0.617 0.636 0.615 0.617 1.000 2025-07-03 View