62 8 - Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) 59.1 55 - GPT-5.5 (xhigh) 58.5 55 - GPT-5.5 (high) 57.2 104 - GPT-5.4 (xhigh) 56.7 20 - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) 56.2 55 - GPT-5.5 (medium) 55.5 118 - Gemini 3.1 Pro Preview 53.1 62 - Claude Opus 4.7 (Non-reasoning, High Effort) 53.1 132 - GPT-5.3 Codex (xhigh) 52.5 62 - Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 52.1 55 - GPT-5.5 (low) 51.5 92 - GPT-5.4 mini (xhigh) 50.9 120 - Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 50.7 1 large GLM-5.2 (max) 50.1 29 - Qwen3.7 Max 48.7 188 - GPT-5.2 (xhigh) 48.6 55 - GPT-5.5 (Non-reasoning) 48.1 132 - Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 47.8 205 - Claude Opus 4.5 (Reasoning) 47.6 132 - Claude Opus 4.6 (Non-reasoning, High Effort) 47.5 70 - Muse Spark 47.5 54 large DeepSeek V4 Pro (Reasoning, Max Effort) 47.1 58 large Kimi K2.6 47.1 29 - Gemini 3.5 Flash (minimal) 46.7 449 - Gemini 2.5 Pro Preview (Mar' 25) 46.5 211 - Gemini 3 Pro Preview (high) 46.5 16 - Qwen3.7 Plus 46.4 120 - Claude Sonnet 4.6 (Non-reasoning, High Effort) 45.6 5 large Kimi K2.7 Code 45.6 104 - GPT-5.4 (low) 45.5 56 large MiMo-V2.5-Pro 45.1 43 - GPT-5.5 Instant (May 2026) 45 29 - Gemini 3.5 Flash (high) 44.9 58 - Qwen3.6 Max Preview 44.7 216 - GPT-5.1 (high) 44.2 188 - GPT-5.2 (medium) 44.2 126 large GLM-5 (Reasoning) 43.9 92 - GPT-5.4 nano (xhigh) 43.9 29 - Gemini 3.5 Flash (medium) 43.4 71 large GLM-5.1 (Reasoning) 43.4 16 large MiniMax-M3 43.2 54 large DeepSeek V4 Pro (Reasoning, High Effort) 43 188 - GPT-5.2 Codex (xhigh) 43 120 - Claude Sonnet 4.6 (Non-reasoning, Low Effort) 42.9 76 - Qwen3.6 Plus 42.9 205 - Claude Opus 4.5 (Non-reasoning) 42.6 182 - Gemini 3 Flash Preview (Reasoning) 42.2 99 - Grok 4.20 0309 (Reasoning) 42.1 56 large MiMo-V2.5 41.9 91 large MiniMax-M2.7 41.4 91 - MiMo-V2-Pro 41.3 121 large Qwen3.5 397B A17B (Reasoning) 41 48 - Grok 4.3 (high) 41 104 - GPT-5.4 (Non-reasoning) 40.5 71 - Grok 4.20 0309 v2 (Reasoning) 40.5 342 - Grok 4 39.8 54 large DeepSeek V4 Flash (Reasoning, High Effort) 39.6 141 large Kimi K2.5 (Reasoning) 39.4 211 - Gemini 3 Pro Preview (low) 39 126 large GLM-5 (Non-reasoning) 38.9 314 - GPT-5 (medium) 38.9 267 - GPT-5 Codex (high) 38.7 76 small Gemma 4 31B (Reasoning) 38.7 54 large DeepSeek V4 Flash (Reasoning, Max Effort) 38.6 261 - Claude 4.5 Sonnet (Reasoning) 38.4 58 large Kimi K2.6 (Non-reasoning) 38.4 54 large DeepSeek V4 Pro (Non-reasoning) 38.4 427 - o3 37.9 198 large DeepSeek V3.2 Speciale 37.8 182 - Gemini 3 Flash Preview (Non-reasoning) 37.6 13 large Nemotron 3 Ultra 550B A55B (Reasoning) 37.5 92 - GPT-5.4 mini (medium) 37.4 125 large MiniMax-M2.5 37.4 121 large Qwen3.5 397B A17B (Non-reasoning) 37.1 19 large Step 3.7 Flash 36.9 82 - MiMo-V2-Omni-0327 36.8 94 - GLM-5-Turbo 36.8 56 large MiMo-V2.5-Pro (Non-reasoning) 36.7 198 large DeepSeek V3.2 (Reasoning) 36.6 216 - GPT-5.1 Codex (high) 36.5 56 small Qwen3.6 27B (Reasoning) 36.5 55 large Hy3-preview (Reasoning) 36.5 316 - Claude 4.1 Opus (Reasoning) 36.4 216 - GPT-5.1 Codex mini (high) 36.3 177 large GLM-4.7 (Reasoning) 36.2 77 - GLM 5V Turbo (Reasoning) 36 314 - GPT-5 (high) 35.8 71 large GLM-5.1 (Non-reasoning) 35.5 90 - MiMo-V2-Omni 35.4 49 medium Mistral Medium 3.5 35.3 314 - GPT-5 mini (high) 35.2 62 small Qwen3.6 35B A3B (Reasoning) 35.2 54 large DeepSeek V4 Flash (Non-reasoning) 35.1 48 - Grok 4.3 (medium) 35 92 - GPT-5.4 nano (medium) 34.9 113 small Qwen3.5 27B (Reasoning) 34.8 223 large Kimi K2 Thinking 34.7 188 - GPT-5.2 (Non-reasoning) 34.7 113 medium Qwen3.5 122B A10B (Reasoning) 34.6 76 - Step 3.5 Flash 2603 34.6 198 large DeepSeek V3.2 (Non-reasoning) 34.3 55 large Hy3-preview (Non-reasoning) 34.1 391 - Claude 4 Sonnet (Reasoning) 34 643 - o1-preview 34 391 - Claude 4 Opus (Reasoning) 33.9 76 small Gemma 4 31B (Non-reasoning) 33.7 268 large DeepSeek V3.1 Terminus (Reasoning) 33.5 261 - Claude 4.5 Sonnet (Non-reasoning) 33.5 183 large MiMo-V2-Flash (Feb 2026) 33.4 8 small North Mini Code 33.4 113 small Qwen3.5 27B (Non-reasoning) 33.3 40 large Ring-2.6-1T 33.3 261 large DeepSeek V3.2 Exp (Reasoning) 33.1 55 large Ling-2.6-1T 32.8 314 - GPT-5 mini (medium) 32.8 176 large MiniMax-M2.1 32.6 245 - Claude 4.5 Haiku (Reasoning) 32 377 - Gemini 2.5 Pro 32 177 large GLM-4.7 (Non-reasoning) 31.9 268 large DeepSeek V3.1 Terminus (Non-reasoning) 31.8 183 large MiMo-V2-Flash (Reasoning) 31.6 48 - Grok 4.3 (low) 31.6 135 large Step 3.5 Flash 31.6 113 medium Qwen3.5 122B A10B (Non-reasoning) 31.3 218 - Doubao Seed Code 31.2 98 medium NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 30.9 210 - Grok 4.1 Fast (Reasoning) 30.7 314 - GPT-5 (low) 30.6 391 - Claude 4 Sonnet (Non-reasoning) 30.6 117 - Mercury 2 30.5 142 - Qwen3 Max Thinking 30.4 202 - Nova 2.0 Pro Preview (medium) 30.3 113 small Qwen3.5 35B A3B (Reasoning) 30.2 603 - Claude 3.5 Sonnet (Oct '24) 30.2 260 large GLM-4.6 (Non-reasoning) 30.1 106 - Gemini 3.1 Flash-Lite 30 261 large DeepSeek V3.2 Exp (Non-reasoning) 29.7 300 large DeepSeek V3.1 (Reasoning) 29.6 245 - Claude 4.5 Haiku (Non-reasoning) 29.5 260 large GLM-4.6 (Reasoning) 29.3 28 large Command A+ 29.2 234 large MiniMax-M2 29.2 216 - ERNIE 5.0 Thinking Preview 29.1 76 small Gemma 4 26B A4B (Non-reasoning) 28.9 34 small JT-35B-Flash 28.6 316 medium gpt-oss-120b (high) 28.4 300 large DeepSeek V3.1 (Non-reasoning) 27.9 92 - GPT-5.4 nano (Non-Reasoning) 27.6 79 - Qwen3.5 Omni Plus 27.6 478 - Claude 3.7 Sonnet (Reasoning) 27.4 271 - Grok 4 Fast (Reasoning) 27.3 216 - GPT-5.1 (Non-reasoning) 27.2 77 large Trinity Large Thinking 27 168 large K-EXAONE (Reasoning) 26.7 478 - Claude 3.7 Sonnet (Non-reasoning) 26.7 22 medium HyperNova 60B 2605 26.6 56 small Qwen3.6 27B (Non-reasoning) 26.4 267 - Qwen3 Max 26.3 324 large GLM-4.5 (Reasoning) 26 726 - Claude 3.5 Sonnet (June '24) 25.9 285 large Kimi K2 0905 25.9 149 small GLM-4.7-Flash (Reasoning) 25.8 90 small Nemotron Cascade 2 30B A3B 25.8 183 large MiMo-V2-Flash (Non-reasoning) 25.8 141 large Kimi K2.5 (Non-reasoning) 25.6 427 - o4-mini (high) 25.5 497 - Gemini 2.0 Pro Experimental (Feb '25) 25.5 285 - Qwen3 Max (Preview) 25.4 99 - Grok 4.20 0309 (Non-reasoning) 25.3 92 - GPT-5.4 mini (Non-Reasoning) 25.3 107 small Qwen3.5 9B (Reasoning) 25.2 483 - Grok 3 mini Reasoning (high) 25.1 48 - Grok 4.3 (Non-reasoning) 25 314 - GPT-5 (minimal) 24.9 14 small Gemma 4 12B (Reasoning) 24.6 330 large Qwen3 Coder 480B A35B Instruct 24.6 265 - Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 24.5 226 - Qwen3 Max Thinking (Preview) 24.5 202 - Nova 2.0 Pro Preview (low) 24.3 93 medium Mistral Small 4 (Reasoning) 24.2 765 - GPT-4o (May '24) 24.1 512 - Gemini 2.0 Flash Thinking Experimental (Jan '25) 24 385 large DeepSeek R1 0528 (May '25) 23.9 231 - Nova 2.0 Lite (medium) 23.8 324 medium GLM-4.5-Air 23.7 293 - Grok Code Fast 1 23.7 190 medium Devstral 2 23.6 631 - Gemini 1.5 Pro (Sep '24) 23.4 231 - Nova 2.0 Lite (high) 23.2 57 medium Ling 2.6 Flash 23.2 327 large Qwen3 235B A22B 2507 (Reasoning) 23 69 small EXAONE 4.5 33B 22.9 314 - GPT-5 nano (medium) 22.9 134 medium Qwen3 Coder Next 22.7 197 large Mistral Large 3 22.4 76 small Gemma 4 26B A4B (Reasoning) 22.2 393 - Gemini 2.5 Flash (Reasoning) 22.1 341 large Kimi K2 22.1 331 large Qwen3 235B A22B 2507 Instruct 22.1 265 - Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 22 71 - Grok 4.20 0309 v2 (Non-reasoning) 22 449 large DeepSeek V3 0324 21.9 314 - GPT-5 mini (minimal) 21.8 429 - GPT-4.1 21.7 272 - Magistral Medium 1.2 21.5 954 - GPT-4 Turbo 21.3 107 small Qwen3.5 9B (Non-reasoning) 21.2 63 - JT-MINI 21.2 314 - GPT-5 (ChatGPT) 20.9 267 large Qwen3 VL 235B A22B (Reasoning) 20.7 190 small Devstral Small 2 20.5 559 - o1 20.5 202 - Nova 2.0 Pro Preview (Non-reasoning) 20.3 314 - GPT-5 nano (high) 19.8 763 - Gemini 1.5 Pro (May '24) 19.8 483 - Grok 3 19.7 191 medium GLM-4.6V (Reasoning) 19.5 835 - Claude 3 Opus 19.5 279 medium Qwen3 Next 80B A3B (Reasoning) 19.5 210 - Grok 4.1 Fast (Non-reasoning) 19.4 321 small Qwen3 Coder 30B A3B Instruct 19.1 202 medium INTELLECT-3 19 271 - Grok 4 Fast (Non-reasoning) 19 184 small NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 18.8 252 large Ling-1T 18.5 429 - GPT-4.1 mini 18.5 316 small gpt-oss-20B (high) 18.3 309 - Mistral Medium 3.1 18.2 282 - Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 17.9 502 - o3-mini 17.8 393 - Gemini 2.5 Flash (Non-reasoning) 17.6 924 - Gemini 1.0 Ultra 17.6 62 small Qwen3.6 35B A3B (Non-reasoning) 17.5 14 small Gemma 4 12B (Non-reasoning) 17.5 107 small Qwen3.5 4B (Reasoning) 17.4 415 large Qwen3 235B A22B (Reasoning) 17.3 502 - o3-mini (high) 16.8 247 large Ring-1T 16.8 113 small Qwen3.5 35B A3B (Non-reasoning) 16.7 574 - GPT-4o (Nov '24) 16.7 301 small Seed-OSS-36B-Instruct 16.7 273 medium Ling-flash-2.0 16.6 680 - GPT-4o (Aug '24) 16.5 267 large Qwen3 VL 235B A22B Instruct 16.5 140 medium LongCat Flash Lite 16.4 93 medium Mistral Small 4 (Non-reasoning) 16.4 538 large DeepSeek V3 (Dec '24) 16 372 - Magistral Medium 1 15.9 513 large DeepSeek R1 (Jan '25) 15.9 342 - Devstral Medium 15.8 184 small NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 15.6 438 large Llama 4 Maverick 15.6 239 small Qwen3 VL 32B Instruct 15.5 316 medium gpt-oss-120b (low) 15.3 279 medium Qwen3 Next 80B A3B Instruct 15.1 327 medium Llama Nemotron Super 49B v1.5 (Reasoning) 15.1 203 - Nova 2.0 Omni (medium) 14.8 49 small Nemotron 3 Nano Omni 30B A3B Reasoning 14.8 273 small Magistral Small 1.2 14.6 322 small Qwen3 30B A3B 2507 (Reasoning) 14.5 694 large Llama 3.1 Instruct 405B 14.5 365 large MiniMax M1 80k 14.5 352 large ERNIE 4.5 300B A47B 14.5 265 - Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 14.5 239 small Qwen3 VL 32B (Reasoning) 14.4 316 small gpt-oss-20B (low) 14.3 257 small Qwen3 VL 30B A3B Instruct 14.2 323 small Qwen3 30B A3B 2507 Instruct 14.2 314 - GPT-5 nano (minimal) 14.2 230 medium Kimi Linear 48B A3B Instruct 14.1 365 large MiniMax M1 40k 14 939 - Claude 2.1 14 79 - Qwen3.5 Omni Flash 14 415 large Qwen3 235B A22B (Non-reasoning) 14 337 small EXAONE 4.0 32B (Reasoning) 13.9 457 small Mistral Small 3.1 13.9 203 - Nova 2.0 Omni (low) 13.9 148 small Step3 VL 10B 13.8 576 medium Mistral Large 2 (Nov '24) 13.8 415 small Qwen3 32B (Reasoning) 13.8 413 - Nova Premier 13.8 203 - Nova 2.0 Omni (Non-reasoning) 13.7 75 small Gemma 4 E4B (Reasoning) 13.7 107 small Qwen3.5 4B (Non-reasoning) 13.6 497 - Gemini 2.0 Flash (Feb '25) 13.6 406 - Mistral Medium 3 13.6 231 - Nova 2.0 Lite (low) 13.5 168 large K-EXAONE (Non-reasoning) 13.3 72 medium Solar Pro 3 13.3 415 small Qwen3 30B A3B (Non-reasoning) 13.3 362 small Mistral Small 3.2 13.1 436 large Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 13.1 415 small Qwen3 14B (Reasoning) 13.1 257 small Qwen3 VL 30B A3B (Reasoning) 13.1 1191 - GPT-4 12.9 1072 - Claude 2.0 12.7 268 small Qwen3 Omni 30B A3B (Reasoning) 12.6 188 small Mi:dm K 2.5 Pro 12.5 231 - Nova 2.0 Lite (Non-reasoning) 12.4 415 small Qwen3 14B (Non-reasoning) 12.2 392 small Devstral Small (May '25) 12.1 343 - Solar Pro 2 (Reasoning) 12.1 342 small Devstral Small (Jul '25) 11.9 636 medium Qwen2.5 Instruct 72B 11.9 195 small Motif-2-12.7B-Reasoning 11.9 188 - Mi:dm K 2.5 Pro Preview 11.7 232 small NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 11.4 513 medium DeepSeek R1 Distill Llama 70B 11.3 343 - Solar Pro 2 (Non-reasoning) 11.2 552 small Phi-4 11.2 429 - GPT-4.1 nano 11.1 372 small Magistral Small 1 11.1 191 medium GLM-4.6V (Non-reasoning) 11 561 - Nova Pro 11 415 small Qwen3 30B A3B (Reasoning) 11 149 small GLM-4.7-Flash (Non-reasoning) 10.9 694 medium Llama 3.1 Instruct 70B 10.9 310 medium GLM-4.5V (Reasoning) 10.9 197 small Ministral 3 14B 10.8 610 medium Llama 3.1 Nemotron Instruct 70B 10.8 310 medium GLM-4.5V (Non-reasoning) 10.7 603 - Claude 3.5 Haiku 10.7 558 medium Llama 3.3 Instruct 70B 10.7 1295 - GPT-3.5 Turbo 10.6 271 medium Ring-flash-2.0 10.5 327 medium Llama Nemotron Super 49B v1.5 (Non-reasoning) 10.5 182 medium Solar Open 100B (Reasoning) 10.1 49 small Granite 4.1 30B 10 93 tiny NVIDIA Nemotron 3 Nano 4B 10 197 small Ministral 3 8B 9.9 461 medium Command A 9.8 246 small Qwen3 VL 8B (Reasoning) 9.8 164 small Falcon-H1R-7B 9.8 103 medium Sarvam 105B (high) 9.6 462 small Gemma 3 27B Instruct 9.5 365 - Gemini 2.5 Flash-Lite (Reasoning) 9.5 315 tiny Qwen3 4B 2507 (Reasoning) 9.4 456 medium Llama 3.3 Nemotron Super 49B v1 (Reasoning) 9.4 337 small EXAONE 4.0 32B (Non-reasoning) 9 76 small Gemma 4 E2B (Reasoning) 9 415 small Qwen3 8B (Reasoning) 9 315 tiny Qwen3 4B 2507 Instruct 8.9 464 small Reka Flash 3 8.9 126 tiny Nanbeige4.1-3B 8.5 268 small Granite 4.0 H Small 8.3 76 small Gemma 4 E2B (Non-reasoning) 8.3 303 small NVIDIA Nemotron Nano 9B V2 (Reasoning) 7.9 103 small Sarvam 30B (high) 7.8 384 small DeepSeek R1 0528 Qwen3 8B 7.8 345 large Jamba 1.7 Large 7.8 1191 - Claude Instant 7.6 456 medium Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 7.5 390 small Sarvam M (Reasoning) 7.5 303 small NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 7.4 365 - Gemini 2.5 Flash-Lite (Non-reasoning) 7.3 49 small Granite 4.1 8B 7.3 246 small Qwen3 VL 8B Instruct 7.2 268 small Qwen3 Omni 30B A3B Instruct 7.1 415 small Qwen3 8B (Non-reasoning) 6.8 790 medium Llama 3 Instruct 70B 6.7 835 - Claude 3 Haiku 6.7 438 medium Llama 4 Scout 6.7 246 tiny Qwen3 VL 4B (Reasoning) 6.4 75 small Gemma 4 E4B (Non-reasoning) 6.3 462 small Gemma 3 12B Instruct 5.9 232 small NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 5.6 20 - LFM2.5-8B-A1B 5.5 49 tiny Granite 4.1 3B 5.1 561 - Nova Lite 5 281 small Ling-mini-2.0 5 268 tiny Granite 4.0 Micro 4.9 694 small Llama 3.1 Instruct 8B 4.9 107 tiny Qwen3.5 2B (Non-reasoning) 4.8 197 tiny Ministral 3 3B 4.6 246 tiny Qwen3 VL 4B Instruct 4.6 1134 - PALM-2 4.2 630 small Llama 3.2 Instruct 11B (Vision) 4.2 356 small Gemma 3n E4B Instruct 4.1 561 - Nova Micro 4 790 small Llama 3 Instruct 8B 3.6 842 tiny Phi-4 Mini Instruct 3.6 112 small LFM2 24B A2B 3.5 107 tiny Qwen3.5 2B (Reasoning) 3.4 427 small Granite 3.3 8B (Non-reasoning) 3.1 345 medium Jamba 1.7 Mini 3.1 337 tiny Exaone 4.0 1.2B (Reasoning) 3 785 tiny Phi-3 Mini Instruct 3.8B 2.9 462 tiny Gemma 3 4B Instruct 2.9 232 tiny Granite 4.0 1B 2.7 232 tiny Granite 4.0 H 1B 2.5 337 tiny Exaone 4.0 1.2B (Non-reasoning) 2.5 252 tiny Jamba Reasoning 3B 2.3 415 tiny Qwen3 1.7B (Non-reasoning) 2.3 253 small LFM2 8B A1B 2.2 356 small Gemma 3n E2B Instruct 1.9 288 medium Apertus 70B Instruct 1.5 23 tiny MiniCPM5-1B (Reasoning) 1.4 415 tiny Qwen3 1.7B (Reasoning) 1.4 415 tiny Qwen3 0.6B (Non-reasoning) 1.4 288 small Apertus 8B Instruct 1.4 267 tiny LFM2 2.6B 1.4 148 tiny LFM2.5-1.2B-Thinking 1.2 120 tiny Tiny Aya Global 1 163 tiny LFM2.5-VL-1.6B 1 107 tiny Qwen3.5 0.8B (Non-reasoning) 0.9 415 tiny Qwen3 0.6B (Reasoning) 0.8 342 tiny LFM2 1.2B 0.8 163 tiny LFM2.5-1.2B-Instruct 0.7 37 tiny MiniCPM-V 4.6 1.3B 0.6 630 tiny Llama 3.2 Instruct 1B 0.6 232 tiny Granite 4.0 H 350M 0.5 23 tiny MiniCPM5-1B (Non-reasoning) 0.3 232 tiny Granite 4.0 350M 0.2 461 tiny Gemma 3 1B Instruct 0 996 small Qwen Chat 14B 0 994 small Mistral 7B Instruct 0 931 small DeepSeek LLM 67B Chat (V1) 0 930 medium Qwen Chat 72B 0 924 - Gemini 1.0 Pro 0 919 - Mistral Medium 0 919 medium Mixtral 8x7B Instruct 0 912 small OpenChat 3.5 (1210) 0 874 small Solar Mini 0 842 - Mistral Small (Feb '24) 0 842 - Mistral Large (Feb '24) 0 835 - Claude 3 Sonnet 0 827 small Command-R (Mar '24) 0 822 large Grok-1 0 812 medium DBRX Instruct 0 804 medium Command-R+ (Apr '24) 0 791 medium Mixtral 8x22B Instruct 0 784 large Arctic Instruct 0 783 medium Qwen1.5 Chat 110B 0 772 large DeepSeek-V2-Chat 0 764 - Gemini 1.5 Flash (May '24) 0 740 medium Qwen2 Instruct 72B 0 730 small DeepSeek Coder V2 Lite Instruct 0 730 large DeepSeek-Coder-V2 0 69 small EXAONE 4.5 33B (Non-reasoning) 0 699 - GPT-4o mini 0 693 medium Mistral Large 2 (Jul '24) 0 673 - Grok Beta 0 664 medium Jamba 1.5 Mini 0 664 large Jamba 1.5 Large 0 649 large DeepSeek-V2.5 0 643 - o1-mini 0 638 small Mistral Small (Sep '24) 0 636 small Qwen2.5 Instruct 32B 0 636 small Qwen2.5 Coder Instruct 7B 0 631 - Gemini 1.5 Flash (Sep '24) 0 630 tiny Llama 3.2 Instruct 3B 0 630 medium Llama 3.2 Instruct 90B (Vision) 0 625 medium LFM 40B 0 622 small Gemini 1.5 Flash-8B 0 621 small Reka Flash (Sep '24) 0 583 small Qwen2.5 Coder Instruct 32B 0 576 - Qwen2.5 Turbo 0 576 medium Pixtral Large 0 567 small QwQ 32B-Preview 0 55 - GPT-5.5 Pro (xhigh) 0 554 large DeepSeek-V2.5 (Dec '24) 0 553 - Gemini 2.0 Flash (experimental) 0 552 large Grok 2 (Dec '24) 0 547 - GPT-4o Realtime (Dec '24) 0 547 - GPT-4o mini Realtime (Dec '24) 0 545 - Gemini 2.0 Flash Thinking Experimental (Dec '24) 0 513 tiny DeepSeek R1 Distill Qwen 1.5B 0 513 small DeepSeek R1 Distill Qwen 32B 0 513 small DeepSeek R1 Distill Qwen 14B 0 513 small DeepSeek R1 Distill Llama 8B 0 512 - Sonar Pro 0 512 - Sonar 0 505 - Sonar Reasoning Pro 0 505 - Sonar Reasoning 0 505 - Qwen2.5 Max 0 503 small Mistral Small 3 0 497 - Gemini 2.0 Flash-Lite (Preview) 0 487 - GPT-4o (ChatGPT) 0 485 small Mistral Saba 0 484 large R1 1776 0 483 - Grok 3 Reasoning Beta 0 477 - Gemini 2.0 Flash-Lite (Feb '25) 0 476 small Phi-4 Multimodal Instruct 0 475 - GPT-4.5 (Preview) 0 469 small QwQ 32B 0 468 medium Jamba 1.6 Mini 0 468 large Jamba 1.6 Large 0 455 - o1-pro 0 447 - GPT-4o (March 2025, chatgpt-4o-latest) 0 426 - Gemini 2.5 Flash Preview (Reasoning) 0 426 - Gemini 2.5 Flash Preview (Non-reasoning) 0 415 tiny Qwen3 4B (Reasoning) 0 415 tiny Qwen3 4B (Non-reasoning) 0 415 small Qwen3 32B (Non-reasoning) 0 407 - Gemini 2.5 Pro Preview (May' 25) 0 393 small Solar Pro 2 (Preview) (Reasoning) 0 393 small Solar Pro 2 (Preview) (Non-reasoning) 0 393 small Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 0 393 small Gemma 3n E4B Instruct Preview (May '25) 0 391 - Claude 4 Opus (Non-reasoning) 0 372 - o3-pro 0 316 - Claude 4.1 Opus (Non-reasoning) 0 307 tiny Gemma 3 270M 0 132 - Gemini 3 Deep Think 0 1209 medium Llama 65B 0 1100 - GPT-3.5 Turbo (0613) 0 107 tiny Qwen3.5 0.8B (Reasoning) 0 1065 small Llama 2 Chat 7B 0 1065 small Llama 2 Chat 13B 0 1065 medium Llama 2 Chat 70B 0 104 - GPT-5.4 Pro (xhigh)