| Benchmark | 3 Pro | 2.5 Pro | Sonnet 4.5 | GPT-5.1 |
|-----------------------|-----------|---------|------------|-----------|
| Humanity's Last Exam | 37.5% | 21.6% | 13.7% | 26.5% |
| ARC-AGI-2 | 31.1% | 4.9% | 13.6% | 17.6% |
| GPQA Diamond | 91.9% | 86.4% | 83.4% | 88.1% |
| AIME 2025 | | | | |
| (no tools) | 95.0% | 88.0% | 87.0% | 94.0% |
| (code execution) | 100% | - | 100% | - |
| MathArena Apex | 23.4% | 0.5% | 1.6% | 1.0% |
| MMMU-Pro | 81.0% | 68.0% | 68.0% | 80.8% |
| ScreenSpot-Pro | 72.7% | 11.4% | 36.2% | 3.5% |
| CharXiv Reasoning | 81.4% | 69.6% | 68.5% | 69.5% |
| OmniDocBench 1.5 | 0.115 | 0.145 | 0.145 | 0.147 |
| Video-MMMU | 87.6% | 83.6% | 77.8% | 80.4% |
| LiveCodeBench Pro | 2,439 | 1,775 | 1,418 | 2,243 |
| Terminal-Bench 2.0 | 54.2% | 32.6% | 42.8% | 47.6% |
| SWE-Bench Verified | 76.2% | 59.6% | 77.2% | 76.3% |
| t2-bench | 85.4% | 54.9% | 84.7% | 80.2% |
| Vending-Bench 2 | $5,478.16 | $573.64 | $3,838.74 | $1,473.43 |
| FACTS Benchmark Suite | 70.5% | 63.4% | 50.4% | 50.8% |
| SimpleQA Verified | 72.1% | 54.5% | 29.3% | 34.9% |
| MMLU | 91.8% | 89.5% | 89.1% | 91.0% |
| Global PIQA | 93.4% | 91.5% | 90.1% | 90.9% |
| MRCR v2 (8-needle) | | | | |
| (128k avg) | 77.0% | 58.0% | 47.1% | 61.6% |
| (1M pointwise) | 26.3% | 16.4% | n/s | n/s |
n/s = not supportedEDIT: formatting, hopefully a bit more mobile friendly
Brave: https://api-dashboard.search.brave.com/terms-of-service "Licensee shall not at any time, and shall not permit others to: store the results of the API or any derivative works from the results of the API"
Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display purposes"
Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to.