maud_cor_standard_(superior_offer)

Task Description: This is a multiple-choice task in which the model must select the answer that best characterizes the merger agreement regarding the standard the board should follow when determining whether to change its recommendation in connection with a superior offer.
Task Type: 10-way classification
Document Type: merger agreement
Number of Samples: 101
Input Length Range: 80-1630 tokens
Evaluation Metrics: accuracy (maximize), balanced_accuracy (maximize), f1_macro (maximize), f1_micro (maximize), valid_predictions_ratio (maximize)
Tags: interpretation, merger agreement
Paper: LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Dataset Download: https://hazyresearch.stanford.edu/legalbench/

7 submissions

Rank	Model	accuracy	balanced_accuracy	f1_macro	f1_micro	valid_predictions_ratio	Date	Results
1	claude-3-5-haiku-20241022	0.570	0.259	0.263	0.570	1.000	2025-08-01	View
2	meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	0.330	0.152	0.184	0.330	1.000	2025-07-25	View
3	google/gemma-2-27b-it	0.150	0.056	0.078	0.150	1.000	2025-07-24	View
4	gpt-4.1-nano	0.040	0.016	0.025	0.040	1.000	2025-07-03	View
5	meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo	0.040	0.018	0.028	0.040	1.000	2025-08-03	View
6	gpt-4o-mini	0.020	0.007	0.011	0.020	1.000	2025-07-02	View
7	claude-3-haiku-20240307	0.010	0.004	0.008	0.010	1.000	2025-07-28	View