The Vulnerability-Finding Moat Isn't Model Size — It's O...

What happened

Aisle published a detailed breakdown of AI-assisted vulnerability discovery in the wake of the Mythos findings — a high-profile demonstration earlier this year where a large frontier model identified real, exploitable security flaws in production software. The blog post, titled "AI Cybersecurity After Mythos: The Jagged Frontier," landed on Hacker News with nearly 1,000 upvotes, and the core claim caught the security community off guard: smaller, significantly cheaper models reproduced the same classes of vulnerabilities that Mythos found, often with comparable or identical results.

The post walks through a systematic comparison. Aisle ran multiple model tiers — including open-weight models in the 7B-70B parameter range and mid-tier API models — against the same codebases and attack surfaces that Mythos had been tested on. The findings weren't ambiguous. On the specific task of identifying known vulnerability patterns (buffer overflows, injection points, auth bypasses, logic flaws in access control), smaller models performed within a narrow band of the frontier model's results. The gap wasn't in detection — it was in explanation quality and novel chain construction.

Why it matters

The term "jagged frontier" comes from Ethan Mollick's research on AI-augmented knowledge work, where he found that AI capabilities don't scale smoothly — they're excellent at some tasks and mediocre at adjacent ones, regardless of model size. Aisle's analysis argues that vulnerability pattern matching sits squarely on the flat part of the capability curve: once a model is good enough to parse code structure and match against known weakness patterns, adding more parameters yields diminishing returns.

This has immediate implications for how the industry prices and markets AI security tools. The Mythos demonstration spawned a wave of enterprise security startups positioning frontier model access as a competitive moat. If Aisle's findings hold — and the HN discussion surfaced several independent practitioners corroborating the results with their own tooling — that moat is largely illusory for the bread-and-butter work of vulnerability scanning.

The community reaction on Hacker News split into two camps. Practitioners who had built their own AI-assisted fuzzing and audit pipelines largely agreed: they'd seen similar results with smaller models, and the real differentiator was always the scaffolding around the model — the prompt chains, the code parsing pipeline, the feedback loops that re-query with context from previous findings. A second camp pushed back, arguing that Mythos's real value was in discovering *novel* vulnerability chains that smaller models miss — zero-day-class findings that require deeper reasoning about system interactions rather than pattern matching against known CVE templates.

Both camps are probably right, and the distinction matters enormously for buying decisions. If you're running a security team and your primary concern is catching known vulnerability classes before they ship — the vast majority of real-world security work — a well-orchestrated pipeline built on a mid-tier model will likely get you 90% of the way there at 10% of the cost. If you're a dedicated security research team hunting for novel zero-days in complex system interactions, the frontier model's deeper reasoning chains may still justify the price premium.

The "orchestration is the moat" argument deserves unpacking. What Aisle describes — and what multiple HN commenters confirmed from their own setups — is that the surrounding infrastructure does most of the heavy lifting. The model needs to understand code well enough to identify suspicious patterns. But the system that feeds it the right code chunks, maintains context across a large codebase, re-queries with refined hypotheses, cross-references against CVE databases, and validates findings against actual exploit paths — that system is where the real engineering complexity lives. A mediocre model in excellent scaffolding outperforms an excellent model with naive prompting, and it's not close.

What this means for your stack

If you're evaluating AI-assisted security tooling — whether buying a product or building internal pipelines — the procurement calculus just changed. Stop asking vendors which foundation model they use. Start asking about their orchestration architecture: how they chunk and contextualize code, how they handle multi-file analysis, how they validate findings to reduce false positives, and how they feed results back into subsequent analysis passes.

For teams building their own tooling, this is genuinely good news. You don't need a six-figure API budget to run continuous AI-assisted security audits. A well-designed pipeline using Llama 3, Mistral, or even a fine-tuned smaller model behind a solid orchestration layer can cover the same ground as premium API access for routine vulnerability scanning. The cost difference between running a 70B parameter model on your own hardware versus paying per-token for a frontier API is roughly an order of magnitude — and for a task where the models perform comparably, that math is hard to argue with.

The caveat is real, though: if your threat model includes sophisticated attackers and you need to find the kind of novel, multi-step vulnerability chains that require genuine reasoning about complex system interactions, cheaper models will likely miss what frontier models catch. Know which game you're playing before you optimize for cost.

Looking ahead

This finding fits a pattern that's been emerging across AI applications in 2026: the initial wave of "bigger model = better results" marketing is giving way to a more nuanced understanding of where scale actually matters. For security specifically, expect the market to bifurcate — commodity AI-assisted scanning that runs on smaller models (likely embedded directly into CI/CD pipelines and available in every major SAST tool within a year) and premium AI security research platforms that justify frontier model costs for genuine zero-day hunting. The teams that build the best orchestration layers, not the ones with the biggest models, will win the commodity tier. And that tier is where 95% of the market lives.

The Vulnerability-Finding Moat Isn't Model Size — It's Orchestration

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Small models also found the vulnerabilities that Mythos found

// community takes

The Vulnerability-Finding Moat Isn't Model Size — It's Orchestration

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Small models also found the vulnerabilities that Mythos found

// community takes

// share this