
Claude Opus 4.8: four times fewer silent code flaws, Mythos-level alignment, and 100-agent workflows
On May 28, 2026, Anthropic released Claude Opus 4.8 — an upgrade that cuts unremarked code flaws by 4x versus Opus 4.7, matches the alignment properties of Claude Mythos Preview, and ships dynamic workflows capable of coordinating hundreds of parallel subagents in a single session. Pricing is unchanged. This piece covers the benchmark results, what the alignment numbers actually mean, and how dynamic workflows work in practice.
Research Brief
claude-opus-4-8. Simultaneously, three new features shipped: dynamic workflows for Claude Code, effort control in claude.ai, and a Messages API change that lets developers push mid-task instruction updates without breaking the prompt cache. 1Four times less likely to miss its own bugs
What the benchmarks actually show

| Benchmark | Opus 4.8 | Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Agentic coding (SWE-Bench Pro) | 69.2% | 64.3% | 58.6% | 54.2% |
| Agentic terminal coding (Terminal-Bench 2.1) | 74.6% | 66.1% | 78.2% | 70.3% |
| Multidisciplinary reasoning, no tools (Humanity's Last Exam) | 49.8% | 46.9% | 41.4% | 44.4% |
| Multidisciplinary reasoning, with tools | 57.9% | 54.7% | 52.2% | 51.4% |
| Agentic computer use (OSWorld-Verified) | 83.4% | 82.8% | 78.7% | 76.2% |
| Knowledge work (GDPval-AA) | 1890 | 1753 | 1769 | 1314 |
| Agentic financial analysis (Finance Agent v2) | 53.9% | 51.5% | 51.8% | 43.0% |
Dynamic workflows: hundreds of parallel subagents
- Codebase-wide migrations: Jarred Sumner used dynamic workflows to port Bun from Zig to Rust — roughly 750,000 lines of Rust, 99.8% of the existing test suite passing, eleven days from first commit to merge. One workflow mapped Rust lifetimes for every struct field; a second wrote every
.rsfile as a behavior-identical port of its.zigcounterpart, with hundreds of agents working in parallel and two reviewers per file; a fix loop then drove build and test until both ran clean. - Security audits across entire services: Claude searches a repo in parallel, runs independent verification on every finding, and surfaces only confirmed issues.
- Adversarial verification: When the cost of a wrong answer is high, a workflow gives Claude independent attempts at the problem, then assigns adversarial agents to try to break each result.

Effort control and the Messages API change
xhigh effort level (labeled "extra") is recommended for difficult tasks and long-running asynchronous workflows. The ultracode setting is a new Code-specific mode that sets effort to xhigh and lets Claude decide automatically when to spin up a dynamic workflow.high effort, which it judges as "the best overall balance of quality and user experience." On coding tasks, this spends a similar number of tokens as Opus 4.7's default — but the performance at that token budget is meaningfully better.system entries inside the messages array. Previously, updating Claude's instructions mid-task meant routing the update through a user turn, which broke the prompt cache or required awkward workarounds. The new behavior lets developers push updated permissions, token budgets, or environment context as an agent runs, without disrupting the cache. For teams running complex multi-step agent harnesses, this removes a common pain point that previously required either caching sacrifices or architectural gymnastics.The alignment angle: reaching Mythos-level misalignment rates

What's next — and the Mythos footnote
Related content
Picked from other channels by content similarity—find new creators to follow.
Article·Claude Opus 4.8:当「诚实」成为旗舰模型的核心卖点
Anthropic 在 2026 年 5 月发布的 Claude Opus 4.8,以「诚实性」作为首要叙事方向:代码缺陷未标出率下降 4 倍、首个在关键 Agent 测试上漏报率为零的 Claude 模型。本文深度拆解其核心能力提升、Dynamic Workflows 新功能、benchmark 进退与竞品格局,以及 Mythos 下一代模型的时间线信号。
LLM Release Notes
Article·AI Coding Tools Weekly: Opus 4.8 lands on three platforms, Copilot's $746 bill shock, and the June 18 Gemini CLI deadline
This week's digest covers 22 confirmed events across 8 tools: Anthropic closed a $65B Series H and shipped Claude Opus 4.8 simultaneously to Copilot, Cursor, and Windsurf. Claude Code's Dynamic Workflows (16 parallel sub-agents, 1,000 total) enabled a 750K-line Zig→Rust migration in 11 days. Cursor v3.6 launched Auto-review mode to keep agents running without constant approval interrupts. Copilot's June 1 usage-based billing is generating sticker shock — community posts document bills of 15–26× current rates under agentic workloads. Devin raised $1B at a $26B valuation with $492M run-rate revenue; async sessions now outnumber interactive ones. Grok Build shipped 7 releases in 7 days. Gemini CLI shuts down June 18. The BARE benchmark finds frontier models succeed on real maintainability tasks less than 23% of the time.
Global AI Coding Tools Update
Article·🚨 BREAKING: Anthropic Drops Claude Opus 4.8 — 4× Less Likely to Lie, Same Price, Hundreds of Parallel Subagents
🚨 BREAKING: Anthropic ships Claude Opus 4.8 — 42 days after Opus 4.7, same $5/$25 price, 4× better at catching its own mistakes. Dynamic Workflows unlocks hundreds of parallel subagents. The safety squad is playing offense now. #AILeague
AIL·Breaking
Audio·Opus 4.8:Anthropic 把旗舰模型做成更稳的代理工人
Anthropic 发布 Claude Opus 4.8,同价升级 Opus,并把努力程度控制、Claude Code 动态工作流和更强调诚实性的评估放到同一条线上。本期解读它为什么指向更长时间、更高自治度的代理工作,而不只是一次跑分提升。
Claude 博客解读播客
Article·Best of your X follows: May 28
Today's digest: Claude Opus 4.8 ships with better judgment and unchanged pricing, dynamic workflows let Claude Code run hundreds of parallel agents, a study finds five frontier LLMs only agree on 33% of fact-checks, YouTube starts auto-labeling AI video, Paul Graham explains why he never finishes AI-written emails, and a Microsoft Copilot prompt-injection bug enabled file exfiltration.
Daily Best of Who I Follow on X
Article·AIL Player Card #007 — Claude Opus 4.8: The Honest Architect
94 OVR. SF. Arena Elo 1890 — #1. AI Intelligence Index 61.4 — #1. Same price as its predecessor. And 4× less likely to let a code flaw slide unremarked. Anthropic FC just answered its critics. #AILeague
AIL·Player Card

Add more perspectives or context around this Post.