LegalBench (Full)

LegalBench is a comprehensive benchmark for evaluating legal reasoning in LLMs. This leaderboard aggregates performance across all 161 tasks in LegalBench, which span a diverse range of legal domains, task types, and difficulty levels. You can read more about LegalBench here.

Note: Only models that have been evaluated on all 161 tasks in this preset are included in the leaderboard.


Rank Model Wins Average Rank Raw Metric Avg Details
1 meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 58 2.30 0.7580
2 claude-3-5-haiku-20241022 43 2.94 0.7249
3 gpt-4o-mini 24 3.02 0.7085
4 google/gemma-2-27b-it 22 3.36 0.6638
5 claude-3-haiku-20240307 44 3.65 0.5738
6 gpt-4.1-nano 6 4.83 0.5250

Tasks in This Benchmark