
HF Breakout Models, Jun 8–15: MiniMax M3, Kimi-K2.7-Code, and the License Week Builders Waited For
Four HuggingFace models cleared the >10x download-growth bar during June 8–15, led by Kimi-K2.7-Code (Moonshot, 1T/32B active, Modified MIT, 33x growth, 81.1% MCPMark Verified, $0.75/M on OpenRouter), DiffusionGemma (Google DeepMind, 25.2B/3.8B active, Apache 2.0, 311K downloads, 4× faster text generation with documented hallucination trade-off), and Nex-N2-mini (nex-agi, 35B/3B active, 15.9x growth, Qwen3.5 derivative). MiniMax M3 (428B/23B active, Community License, 1M context, native multimodal) graduated from last week's "on radar" with weights confirmed June 12. The week's highest-downloaded model, Rio-3.5-Open-397B (189K downloads), was exposed as a weight-merge fraud by nex-agi.
LLMs
Kimi-K2.7-Code — 1T/32B active, Modified MIT, agentic coding
ANTHROPIC_BASE_URL override with full MCP, hooks, and skills preserved. 2 Pricing: $0.95/$4.00 per 1M tokens (Moonshot API), $0.75/$3.50 (OpenRouter). Unsloth GGUF arrived June 13; no official Ollama or mainline llama.cpp builds yet. Native INT4 weights require ~340GB, so self-hosting needs a multi-A100 setup.- License: Modified MIT — commercial use permitted
- Active params: 32B (1T total MoE), 256K context
- Deployment: vLLM, SGLang, KTransformers; Moonshot API and OpenRouter for managed hosting
- Builder angle: at $0.75/M input and $3.50/M output on OpenRouter, K2.7-Code runs roughly 4x cheaper than Claude Sonnet 4.6 on output tokens. 2 For agent loops generating substantial output — multi-file refactors, automated documentation, code review passes — that gap compounds fast. The Anthropic-compatible API makes it a drop-in swap in any Claude Code or agentic workflow. The 30% thinking-token reduction over K2.6 is meaningful for latency-sensitive pipelines. Self-hosting requires serious GPU infrastructure; for most indie builders, the managed API is the practical path.
DiffusionGemma — 25.2B/3.8B active, Apache 2.0, 4× faster text generation

llama-diffusion-cli. Also supported: vLLM, MLX, NVIDIA NIM, Transformers. 8 Official llama.cpp mainline merge was pending as of June 15; 11 community fine-tunes and 25 quantizations already on HF.- License: Apache 2.0 — commercial use permitted
- Active params: 3.8B (25.2B total MoE), 256K context
- Deployment: vLLM, NVIDIA NIM, MLX, Transformers; Unsloth GGUF for local (Q4_K_M on 24GB GPU)
- Builder angle: the speed/quality trade-off is real and Google was upfront about it. DiffusionGemma is the right call for high-throughput pipelines where factual precision is less critical than generation volume — chatbot suggestion prefill, autocomplete, creative drafts, summarization at scale. For anything requiring fact accuracy (customer-facing answers, legal or medical content, structured data extraction), the hallucination rate rules it out until the mitigation techniques are more established. The Apache 2.0 license means you can ship it immediately once you find the use case that fits.
Nex-N2-mini — 35B/3B active, license unclear, Agentic Thinking
- License: Not explicitly stated; inherited from Qwen3.5-35B-A3B-Base — verify before commercial use
- Active params: 3B (35B total MoE)
- Builder angle: 3B active parameters makes N2-mini fast and cheap to run. If you're already using Qwen3.5 derivatives in a pipeline and want a post-trained agentic variant, it's a reasonable test. The license ambiguity is the blocker for production until you trace the Qwen3.5 terms and confirm compatibility with your deployment.
Multimodal
MiniMax M3 — 428B/23B active, Community License, 1M context

git fetch origin pull/24523/head:minimax-m3); Unsloth GGUF with 11 quantization variants (IQ1_M through IQ4_XS); Ollama, LM Studio, Jan compatible via Unsloth GGUF. MLX-VLM confirmed working on Mac Studio M3 Ultra with 512GB RAM (one user reported 736 output tokens in ~31 seconds). 15 NVIDIA NIM lists it under non-commercial. MiniMax Token Plan subscription: Plus $20/month (~1.7B tokens), Max $50/month (~5.1B tokens), Ultra $120/month (~9.8B tokens). 11- License: MINIMAX COMMUNITY LICENSE — free non-commercial; free commercial under $20M revenue with attribution + email notification; requires written authorization above $20M
- Active params: ~23B (428B total MoE), 1M context
- Deployment: SGLang, vLLM, llama.cpp (PR #24523), Unsloth GGUF; MiniMax API; local requires ~280GB VRAM for 4-bit
- Builder angle: For the 99% of indie builders under $20M revenue, the license is workable — the attribution requirement (a footer badge or "Built with MiniMax M3" on an about page) is friction but not a blocker. The practical ceiling is hardware: 280GB VRAM for local 4-bit means you're API-dependent unless you have a multi-GPU server. The API subscription model is a strong deal for high-token-volume workloads vs. per-token pricing at comparable capability tiers. Real-world community use cases this week included multimodal form-filling (US customs form from driver's license photo via MLX-VLM) and a GTA-style game generated entirely in-browser — both demonstrate the multimodal pipeline working end-to-end. The 109B consumer-GPU variant hinted at in the MSA paper has not been released; the community is asking for it.
On the radar
Trust, but verify
The week's shape
References
- 1moonshotai/Kimi-K2.7-Code · Hugging Face
- 2Kimi K2.7-Code Developer Guide — Developers Digest
- 3r/LocalLLaMA: moonshotai/Kimi-K2.7-Code · Hugging Face
- 4Introducing DiffusionGemma — Google DeepMind Blog
- 5google/diffusiongemma-26B-A4B-it · Hugging Face
- 6Diffusion Gemma is 4× faster, but makes 6× more mistakes! — r/LocalLLaMA
- 7Can we stop dunking on DiffusionGemma and hack it — r/LocalLLaMA
- 8unsloth/diffusiongemma-26B-A4B-it-GGUF · Hugging Face
- 9nex-agi/Nex-N2-mini · Hugging Face
- 10nex-agi/Nex-N2-Pro · Hugging Face
- 11MiniMax M3: Frontier Coding, 1M Context, Native Multimodality
- 12r/LocalLLaMA: MiniMaxAI/MiniMax-M3 · Hugging Face
- 13LICENSE · MiniMaxAI/MiniMax-M3 at main
- 14MiniMax AI on X: On the M3 license
- 15MiniMax M3 - How to Run Locally
- 16CohereLabs/North-Mini-Code-1.0 · Hugging Face
- 17Command A Plus GGUFs posted — r/LocalLLaMA
- 18nex-agi/Nex-N2: Rio-3.5-Open-397B ≈ 0.6 × Nex-N2-Pro + 0.4 × Qwen — Issue #4
Related content
Picked from other channels by content similarity—find new creators to follow.
Article·MiniMax M3 开源 428B 参数、Kimi 代码模型提升 21.8%——AI HOT 今日热点(2026-06-13)
MiniMax M3 以 428B 总参数开源登场,上下文窗口扩展至 1M token;Kimi-K2.7-Code 代码基准全线提升、推理 token 减少 30%。豆包上线「任务模式」,Codex 推出速率存储与浏览器开发者模式,Claude Code 一天内三版连发。TCS 宣布将 Claude 部署至 56 个国家 5 万员工。精选 2026-06-12 全天 17 条 AI 行业动态。
AI HOT 每日热点简报
Article·五月大模型竞技:Kimi K2.6 开源、Qwen 35小时连跑、Gemini 3.5 Flash 登场、Mistral 一体化重组
4月底至5月底,Moonshot AI、阿里Qwen、Google、Mistral在4周内相继发布重要版本。本文逐一拆解Kimi K2.6的1T MoE开源架构与300子智能体能力、Qwen3.7-Max的35小时kernel优化10倍加速、Google I/O上Gemini 3.5 Flash的速度优势、以及Mistral废弃Magistral后的一体化新旗舰Medium 3.5——并横向对比四家发布背后共同指向的Agent执行趋势。
LLM Release Notes
Article·Just Open-Sourced: Week of Jun 9, 2026
This week's legitimacy-screened open-source releases: Nex-N2-Pro (Apache 2.0, 397B MoE model matching GPT-5.5 on coding benchmarks), Kimi Code CLI (MIT terminal agent), DiffusionGemma 26B (Apache 2.0, up to 4x faster text generation), Future AGI (Apache 2.0 agent evaluation platform), and MinerU v3.3. Each entry includes license name, who's behind it, maturity signals, and a plain-language legitimacy call.
Freshly Open-Sourced
Article·X Feed 每日中文简报|2026年6月13日
今日关注圈:Kimi-K2.7-Code 开源发布,推理 token 降 30%,MCP 工具编排超越 Opus 4.8;@shao__meng 整理 Spec 驱动开发(SDD)框架与 Claude Fable 5 首日 playbook 8 条实践;SpaceX SPCX 以 $135 定价正式上市纳斯达克,盘中冲至 $176,马斯克成万亿富翁;华为 HDC 2026 宣布开源盘古 openPangu 2.0。
X Feed 每日中文简报
Article·AIL Player Card #014 — Kimi K2.6: The Agentic Swarm
92 OVR. CW. SWE-Bench Pro 58.6% — ties GPT-5.5. AIME 2026 96.4%. 981 tokens/sec on Cerebras. 300-agent swarm built into the model. Open-weight, Modified MIT, $0.95/M input. Moonshot Challengers just fielded the most dangerous open-weight player in the league. #AILeague
AIL·Player Card
Video·AI HOT 每日精选:高速模型、社交 AI 搜索与资本加码
6 月 16 日 AI HOT 视频日报:覆盖 Kimi 高速版、MiniMax M3、DFlash、OpenRouter 免费容量、Facebook AI Mode、Salesforce 收购 Fin、Nvidia 债券发行与 AI 版 Siri 重构等重点动态。
AI 热点每日视频日报

Add more perspectives or context around this Post.