
June 14, 2026 · 9:18 AM
dupehound: catch the code your AI wrote twice
dupehound (Rust, MIT, v0.1.0, 22★) scans your codebase for structurally identical functions regardless of renamed identifiers — a CI gate for AI-generated code slop. cargo install dupehound
Coding agents don't browse your repo before they write. They can't — they work from whatever fits in context. So when you ask a model to add date formatting, it writes
formatDate. Three tickets later, another model writes renderTimestamp. A month after that, stringifyDate appears in the billing service. Three functions, same logic, aging separately, each one drifting under future edits.dupehound is a Rust CLI that finds them. It fingerprints every function's structure — not its text — so renamed identifiers and swapped literals don't hide a copy. 1
Loading content card…
MIT license. v0.1.0, released 2026-06-12. Written in Rust with tree-sitter. No network calls, no API keys, no ML model. 1
Why an index, not a model
The author put it plainly in the README: "An LLM can't do this job. Duplicate detection compares every function against every other; a model samples what fits in context, an index checks everything." 1
That's not a knock on LLMs — it's a constraint. A context window is finite. An inverted index is not. dupehound parses each function body with tree-sitter, normalizes every identifier, string, and number to a sentinel value, then applies winnowing (Schleimer, Wilkerson & Aiken, SIGMOD 2003) to generate structural fingerprints. Those fingerprints go into an inverted index, and exact Jaccard similarity matches the clusters. Same algorithm on every run: same input, same verdict. 1
A CI merge gate has to be reproducible. A probabilistic model that might return different results on two identical commits is not a usable gate. dupehound is.
Three subcommands
scan [path] — runs the full analysis and reports duplicate clusters sorted by deletable lines. Each cluster lists the representative function (marked with ★) and every copy, along with line counts and similarity percentages. The summary box reports a slop score: the percentage of the codebase that could be deleted by keeping only one copy of each duplicate cluster. 1Here's what the output looks like on a codebase with a date-formatting problem:
$ dupehound scan .
dupehound v0.1.0 — scanned 19 files · 370 lines · 27 functions in 21ms
╭─────────────────────────────────────────────────────────╮
│ SLOP SCORE 36.1% grade F │
│ 127 of 352 significant lines are deletable duplicates │
╰─────────────────────────────────────────────────────────╯
● Cluster 1 ─ 4 copies · 100% similar · 42 deletable lines ─────────────
★ src/utils/date.ts:1 formatDate 14 lines
src/api/timestamps.ts:1 renderTimestamp 14 lines 100% █████████
src/jobs/report_dates.ts:1 stringifyDate 14 lines 100% █████████
src/billing/dates.ts:1 humanizeDate 14 lines 100% █████████
★ = representative (kept) · dupehound scan --explain 1 shows the codehistory — charts duplication across git history using monthly snapshots. Useful for spotting the commit range where the slop score started climbing — often correlates with the start of an AI-assisted sprint. 1check — the CI gate. Fails when an incoming change introduces a new duplicate of an existing function. It points to the function already in the codebase that the new code replicates, so the author can reference or refactor instead of merge. 1The
history and check subcommands need git on PATH. scan runs anywhere.Language support and benchmarks
dupehound supports 11 languages: TypeScript, TSX, JavaScript, Python, Rust, Go, Java, Ruby, Swift, C, and C++. 1
On speed: a scan of VS Code (2.97 million lines, 53,000 functions) completes in 3.6 seconds on a laptop. 1 Grade calibration against known open-source projects:
| Project | Slop score | Grade |
|---|---|---|
| express | 0.0% | A |
| gin | 0.2% | A |
| tokio | 1.1% | A |
| fastapi | 1.7% | A |
| vscode | 2.8% | A |
These serve as a sanity check for the grading scale — well-maintained OSS projects with human reviewers cluster near zero.
A real scenario
Your team has been shipping features with Claude Code for six weeks. The codebase grew from 40k lines to 70k. Code review is harder to do thoroughly because the PRs are large and the code looks coherent. A senior engineer on the team suspects there's structural repetition building up but doesn't have a way to measure it.
Run
dupehound scan . from the repo root. The scan takes under two seconds. If the slop score comes back at 15%+, you have a real problem: thousands of lines that are logically redundant, each one a future maintenance surface. The cluster report shows exactly which functions to consolidate, ranked by the lines you'd save.Add
dupehound check to your pre-commit hook or CI workflow. From that point, every PR that re-implements an existing function gets rejected with a pointer to the original — before it's merged, not three months later when two diverged copies both have bugs.
Install
Three paths:
Cargo (cross-platform, builds from source):
cargo install dupehoundHomebrew (macOS and Linux):
brew install rafaelpta/dupehound/dupehoundPrebuilt binaries — macOS, Linux, and Windows builds are available on the releases page. 2
scan has no external dependencies. history and check need git in PATH.Momentum
22 stars, 5 forks, 18 commits — repository created June 11, v0.1.0 released June 12. 1 Two days old at time of writing. No community discussion threads yet — the tool hasn't had time to surface on HN or Reddit. That's the leading-edge window: the kind of thing that will land on r/rust or Show HN once a few teams hit a painful enough slop score and go looking for a tool like this.
The GitClear research the README cites gives the backpressure context: duplicated code blocks grew 8× in 2024, the first year where copy-pasted lines outnumbered moved ones in aggregate across the repos they track. 1 dupehound is a direct response to that.
Loading stats card…
Caveats
- v0.1.0, 18 commits. The core scan/history/check loop works, but edge-case handling in less common language grammars and very large monorepos isn't battle-tested yet. File issues — the author is clearly engaged.
- Function-level granularity only. dupehound fingerprints function bodies, not arbitrary code blocks or multi-function sequences. If your duplication pattern is copy-pasted logic inside a single large function, it won't catch it.
- No Windows Homebrew. The
brewtap works on macOS and Linux. Windows users need the prebuilt binary from releases or acargo install. 2 checkis a pre-merge gate, not a remediation tool. It stops new duplication from entering; it doesn't generate consolidation patches or suggest refactors. The scan output tells you what exists; cleaning it up is on you.
Quick start:
cargo install dupehound && dupehound scan .Cover image: AI-generated illustration of duplicate function clusters
References
Related content

New AI Tools Weekly — Issue #2: Six Themes from June 1–7, 2026
New AI Tools Weekly·Article·
`Understand-Anything`: give your agent a map before it touches the code
Today's Trending Agent Skills·Article·
Agent Skills: giving your AI coding agent a rulebook it can't ignore
Today's Trending Agent Skills·Article·

Add more perspectives or context around this Post.