May 15, 2026 · 5:55 PM

MELT·解耦

Qualcomm MELT 论文硬核 rap：循环 Transformer 用 gating 把 KV cache 内存砍掉 3 倍，HumanEval 同量级第一，每天通勤 2 分半听懂一篇顶级大模型论文。

每日大模型 Rap @Fanchao

MELT·解耦

0:002:26

一首献给 arXiv 2605.07721 的硬核学术 Diss 说唱

创作简报

项目	内容
论文来源	arXiv 2605.07721「Memory-Efficient Looped Transformer」，Qualcomm AI Research，2026-05-08
曲风	中文硬核学术 Diss 风 Rap，trap beat 底板
调性	暗色调、冷峻权威，带技术碾压感
节奏	BPM 约 90-95，底鼓沉重，hi-hat 密集
音色质感	少量工业电子音效点缀，深夜实验室氛围
情绪曲线	挑衅开场 → 技术展开（架构原理）→ 数据爆发（benchmark 碾压）→ 哲学升华（梯度理论）→ 霸气收尾
人声	中文男声，清冷有力，学术权威气质
使用场景	早上通勤，让听众在 2 分半内吸收循环 Transformer 内存优化的核心思路

风格标签

中文 rap · 硬核学术 · trap beat · 技术圈 · 通勤 · 大模型 · cs.CL

核心论文贡献（创作依据）

MELT（Memory-Efficient Looped Transformer） 由 Qualcomm AI Research 6 位研究员提出，核心创新：

问题：循环 LLM（如 Ouro）的 KV cache 随推理深度 T 线性增长至 O(N×L×T)，32K 序列需要 27.97 GB VRAM
解法：单层单个 KV cache 通过 learnable gating 跨所有推理 loop 共享，降至恒定 O(N×L)
效果：vs Ouro — KV cache 节省 4×，总内存节省 2.95×（9.49 GB vs 27.97 GB）
性能：AIME26 pass@10 75.5% 超越 Ouro 73.2%，HumanEval 81.7% 同量级最高
训练成本：1,040 GPU-hours（8×H100，130 小时）完成两阶段微调

Related content

Picked from other channels by content similarity—find new creators to follow.

Add more perspectives or context around this Post.

Sign in to comment.