
AIL Player Card #008 — GPT-5.5: The Scoring Conductor
93 OVR. SC. Arena Elo 1474. Terminal-Bench 2.0 #1 at 82.7%. GPQA Diamond 93.6%. $5/$30 per million tokens. OpenAI United's Scoring Conductor is back in the top five — and this time the agentic benchmarks actually hold up. #AILeague
The scouting report
Position definition: Scoring Conductor (SC)
The stat sheet
| Dimension | Score | What's behind it |
|---|---|---|
| RZN Reasoning | 92 | GPQA Diamond 93.6%, FrontierMath Tier 4 35.4%, ARC-AGI-2 85.0% |
| CRE Creativity | 88 | BrowseComp 84.4%, OSWorld-Verified 78.7%, GeneBench leading score |
| SPD Speed | 85 | Matched GPT-5.4 per-token latency; significant token-efficiency gains on Codex tasks |
| MLT Multimodal | 84 | Text + image in; MMMU Pro 81.2% (with tools); no video/audio input |
| SAF Safety | 79 | "High" on bio/cyber Preparedness Framework; stricter classifiers, tighter cyber safeguards — gap to frontier on hallucination rate vs. Claude |
| VAL Value | 76 | $5/$30 per M tokens; more token-efficient than GPT-5.4 in practice; still 8× costlier than DeepSeek V4 Pro on output tokens |

Season highlights
Head-to-head table
| Model | OVR | Position | Arena Elo | Terminal-Bench 2.0 | GPQA Diamond | SWE-Bench Pro | Output $/M |
|---|---|---|---|---|---|---|---|
| GPT-5.5 | 93 | SC | 1474 | 82.7% | 93.6% | 58.6% | $30 |
| Claude Opus 4.8 | 94 | SF | 1890 | — | 92.0% | — | $25 |
| Claude Opus 4.7 | — | SF | 1494 | 69.4% | 94.2% | 64.3%* | $25 |
| DeepSeek V4 Pro | 95 | VE | 1454 | 46.2% | 88.8% | — | $3.48 |
| Gemini 3.1 Pro | — | MW | 1487 | 68.5% | 94.3% | 54.2% | $10 |

The tactical read
Related content
Picked from other channels by content similarity—find new creators to follow.
Article·AI League — Game Day 5: GPT-5.5 Posts a New Season-High, Claude's Speed Rebounds
GPT-5.5 hits 62.1 t/s — new season-high for the top-2 bracket. Claude bounces to 60.1 t/s. Gemini 3.5 Flash surges to 187 t/s. Grok and DeepSeek hold. Full June 2 stats. #AILeague
AIL·Stats Board
Article·AI League — Game Day 7: GPT-5.5 Breaks Out with a Season-High 68.2 t/s
GPT-5.5 hits 68.2 t/s — new season-high at the 60+ index tier. Claude bumps to 63.7. Google fields the fastest pro model in the 57+ club at 138 t/s AND a 187 t/s flash unit. DeepSeek quietly +6.2 t/s. Intelligence board locked at 61. Full June 4 stats. #AILeague
AIL·Stats Board
Article·GPT-5.5:OpenAI 最新旗舰的技术路线解读
OpenAI 于 2026 年 4 月 23 日发布 GPT-5.5,Terminal-Bench 2.0 达 82.7%、ARC-AGI-2 提升 11.7pp。本文从智能体编码、科学研究、安全机制三个维度解读核心数据,并分析「更少 token 完成更多」背后的技术路线信号。
三大公司大模型论文
Article·AI Agent 生态速报 | 2026-04-22 至 24:GPT-5.5 登场、Anthropic 自曝 Bug、多 Agent 复杂度祛魅
本期覆盖 2026-04-22 至 04-24。三条主线:OpenAI GPT-5.5 在 Terminal-Bench 2.0 以 82.7% 超越 Opus 4.7(69.4%),但 Input/Output 定价翻倍;Anthropic 主动发布事后分析,坦承三个叠加 bug(推理强度降级、缓存清除、系统 prompt 变更)导致 Claude Code 质量下滑,4 月 20 日已修复;社区多 Agent 实践持续祛魅,亲历者普遍回归单 Agent + 强提示词路线。框架侧:Haystack v2.28.0 支持 State 直传、CrewAI 1.14.3a3 冷启动提升 29%;安全侧:Agent Vault 开源,实现凭证网络层代理注入、Agent 永不接触底层密钥。
Agent 生态周报
Article·AI League — Game Day 4: OpenAI Goes Full Sprint, Google Jumps a Gear
GPT-5.5 surges +16.5% in speed. Gemini 3.1 Pro jumps +25% to 143 t/s. Claude holds #1 (AI Index 61). Qwen3.7 Max enters at 57 / 189 t/s. Full June 1 stats. #AILeague
AIL·Stats Board
Article·AI League — Season Opening Night: The Official Stats Panel, Week 1
Claude Opus 4.8 tops the board (AI Index: 61). DeepSeek V4 Pro cuts output price 75% to $0.87/M. Gemini 3.5 Flash hits 207 t/s. Full post-game stats panel. #AILeague
AIL·Stats Board

Add more perspectives or context around this Post.