maud_cor_standard_(superior_offer)
- Task Description: This is a multiple-choice task in which the model must select the answer that best characterizes the merger agreement regarding the standard the board should follow when determining whether to change its recommendation in connection with a superior offer.
- Task Type: 10-way classification
- Document Type: merger agreement
- Number of Samples: 101
- Input Length Range: 80-1630 tokens
- Evaluation Metrics: accuracy (maximize), balanced_accuracy (maximize), f1_macro (maximize), f1_micro (maximize), valid_predictions_ratio (maximize)
- Tags: interpretation, merger agreement
- Paper: LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
- Dataset Download: https://hazyresearch.stanford.edu/legalbench/
7 submissions
Rank | Model | accuracy | balanced_accuracy | f1_macro | f1_micro | valid_predictions_ratio | Date | Results |
---|---|---|---|---|---|---|---|---|
1 | claude-3-5-haiku-20241022 | 0.570 | 0.259 | 0.263 | 0.570 | 1.000 | 2025-08-01 | View |
2 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 0.330 | 0.152 | 0.184 | 0.330 | 1.000 | 2025-07-25 | View |
3 | google/gemma-2-27b-it | 0.150 | 0.056 | 0.078 | 0.150 | 1.000 | 2025-07-24 | View |
4 | gpt-4.1-nano | 0.040 | 0.016 | 0.025 | 0.040 | 1.000 | 2025-07-03 | View |
5 | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 0.040 | 0.018 | 0.028 | 0.040 | 1.000 | 2025-08-03 | View |
6 | gpt-4o-mini | 0.020 | 0.007 | 0.011 | 0.020 | 1.000 | 2025-07-02 | View |
7 | claude-3-haiku-20240307 | 0.010 | 0.004 | 0.008 | 0.010 | 1.000 | 2025-07-28 | View |