HousingQA (Knowledge)
This task evaluates model knowledge of housing law (specifically focused on eviction) from the year 2021. Models are prompted with yes/no questions about housing law across different states, and expected to answer using only knowledge stored in their weights. To learn more about HousingQA, see here.Rank | Model | accuracy | f1_macro | Date | Results |
---|---|---|---|---|---|
1 | gpt-5-2025-08-07 | 0.715 | 0.705 | 2025-08-08 | View |
2 | claude-3-haiku-20240307 | 0.593 | 0.588 | 2025-08-04 | View |
3 | claude-3-5-haiku-20241022 | 0.584 | 0.580 | 2025-08-04 | View |
4 | gpt-4o-mini-2024-07-18 | 0.544 | 0.544 | 2025-08-04 | View |