Anthropic's open-source framework for AI-powered vulnerability discovery
Posted by binyu 5 days ago
Comments
Comment by tptacek 5 days ago
It was a different situation 2 years ago, when there was significant cost to building your own harness (but then: you probably weren't doing AI vuln research 2 years ago). Today, I think your best bet is to look at something like this for ideas, and then just ask for your own, to fit your own work style, with your own interface, your own notion of target and effort specification, and your own alerting.
Comment by redfloatplane 5 days ago
0: https://redfloatplane.lol/blog/17-why-share/ (and related posts, I guess)
Comment by colmmacc 5 days ago
Something I think about a lot is what is the equivalent for the software builders of today using AI tools? how do make these harnesses exportable and portable? You might think employers would be against this; make it more costly to leave. But I actually think most will favor this because it makes people more productive more quickly. But we have to find ways to normalize it and show that there are no security leaks in the process (like might make it in to a set of personal steering prompts).
Comment by tptacek 5 days ago
Comment by happyopossum 5 days ago
Comment by aquajet 5 days ago
Comment by djfergus 5 days ago
whats the purpose of this? just fun or does it cause some desired behaviour?
Comment by ClikeX 5 days ago
Fun is desirable.
Comment by agravier 5 days ago
Comment by worldsayshi 5 days ago
Except for software gigs the software typically belongs to the customer so you'd need to rewrite it every time...
Comment by ClikeX 5 days ago
And contractually, any code I made was my employer's if I made it during office hours. Some even made a claim for code I would've written that during my employ that would be "competitive". Luckily, there was a massive difference in what I would do in my own time versus what they did.
Comment by borski 5 days ago
Comment by ninjalanternshk 5 days ago
Comment by pjmlp 4 days ago
Comment by borski 4 days ago
Comment by pjmlp 4 days ago
Comment by borski 4 days ago
Some agencies do, however; it's dependent on the contract specifics.
Comment by krzyk 4 days ago
When I'm hired in a company (not contract), they wipe the harddrive when I leave (well, I also do it before I hand it over sometimes). So they don't get the tools (I take them with myself, it would be a waste to loose them)
Comment by borski 2 days ago
Comment by pjmlp 4 days ago
As per NDAs and work contracts, nothing is supposed to leave either employer nor customer systems into unauthorised third parties.
Comment by jaxn 5 days ago
for me, it’s not about the cost to leave, it’s about lowering the cost of onboarding and change.
Comment by beezlewax 5 days ago
"It takes less effort for some parts of the software development life cycle" would be more correct.
Comment by jorl17 5 days ago
I've said many times that I believe "using the computer will transparently involve having it write and run code for you" (and if you're not technical you won't even know it!). What you're saying goes in that direction as well.
I feel that it's often better for us to create purpose-built tools for our lives, and with every model release, the complexity of those tools grows.
These are really personal tools: they solve a problem that other people might have, but are very tied to your own specific way of working, and would be hard to explain or adapt to someone else. So: shop jigs.
I have about 10 custom scripts and programs that are like this -- I haven't felt like this since college! Back then I had all the time in the world to customize my setup...now I have agents!
In a way, I want to show this to all my friends, but whenever I mentally trace how that would go, I realize they wouldn't really understand a bunch of the quirks they have, because they are _my_ quirks. They're reasonably complex pieces of tech that solve my problems very well, which are themselves particular versions of broader problems, and which I (at least for now) have no interest in supporting.
It's so clear we're heading in this direction, and yet so many people still believe code will be for the elites. Maybe production-code...As for the rest, I think soon your mom and dad are going to have their computer running code it wrote to serve them. Security-wise it's scary, but it's exciting to think about!
Comment by ashdksnndck 5 days ago
And even if you did… I spent months refining AI workflows that were just obsoleted by ultracode.
Comment by Npovview 5 days ago
Comment by hsaliak 5 days ago
I am sure that in many organizations, teams responsible for this sort of work have less and less users coming to them.
Comment by tptacek 5 days ago
Comment by flir 5 days ago
So I can definitely see the value in a library for constraining the chatbot to some well-worn paths.
Comment by AndrewKemendo 4 days ago
Comment by nbardy 5 days ago
We won't reuse open source libraries as libraries we import, but as design inspiration for the bespoke tools we make.
It's too cheap to make your own stuff and too expensive to be stuck with someone else primitives.
But grounding AI Coding in existing tools is incredibly powerful.
Comment by borski 5 days ago
Comment by claud_ia 4 days ago
Comment by sieabahlpark 5 days ago
Comment by zuzululu 5 days ago
Comment by simonw 5 days ago
https://github.com/anthropics/defending-code-reference-harne... says:
> As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. You can scale parallelism up to your account's ITPM limit (roughly 10 agents per 100K ITPM).
My guess would be hundreds of dollars with Opus and thousands of dollars with Mythos.
Comment by nikcub 5 days ago
May even be an order of magnitude more
Comment by Mtinie 5 days ago
Ensuring code isn’t bad is the expensive part.
Comment by chrisweekly 5 days ago
The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.
Comment by fny 5 days ago
Comment by chrisweekly 5 days ago
Comment by kenjackson 5 days ago
Comment by tptacek 5 days ago
Those costs can be extremely high.
Comment by andai 5 days ago
I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.
(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)
I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)
Comment by sam-cop-vimes 5 days ago
Comment by smt88 5 days ago
Comment by XCSme 4 days ago
Comment by windexh8er 5 days ago
Comment by bflesch 5 days ago
The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.
Comment by simonw 5 days ago
The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.
Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.
Comment by tptacek 5 days ago
Comment by chrisweekly 5 days ago
Comment by froggit 4 days ago
Vulnerability discovery has essentially moved to a "proof of work" computation model with AI that has some similarities to crypto like BTC or ethereum 1.0. I don't see any reason a well funded adversary couldn't use this same process on open-source code to develop exploits. I'm sure AI would be happy to try and create exploits from the results rather than fixes.
This sort of proof of work has a notable difference from crypto in the asymmetric nature of what each side is targeting. In crypto, each miner was attempting to find a solution to the same problem and they would all move on to a new one once a solution is found. However with AI vulnerability scanning, the non-deterministic nature means an adversary is likely to find different vulnerabilities. Even if it doesn't, the adversaries have a different post-discovery workflow (i.e. probably less compute intensive aka cheaper due to only needing one viable exploit to win) than the software maintainers do.
Considering it's possible both the adversary and their target could both do all this while running Claude puts Anthropic in a real "Merchant of Death" position.
Comment by tptacek 4 days ago
Comment by bflesch 5 days ago
The goal of AI-generated code should not be that one needs a AI-based security review tool on top of it, but that the AI-generated code in itself is reasonably secure.
Comment by ethanmg 4 days ago
Comment by iammrpayments 5 days ago
Comment by bobkb 5 days ago
Comment by niros_valtos 5 days ago
Comment by pixl97 5 days ago
Comment by Quinner 4 days ago
Comment by binyu 5 days ago
Comment by eranation 5 days ago
It's an estimate, so it might be wrong, but it gives the ballpark based on our experience. Happy to hear everyone's feedback.
Comment by Terretta 5 days ago
But even this larger number, in turn, can be about 1/10th the cost of a formal engagement to discover the type of findings it seems to be going for: things that do not show up from PR reviews or even /security-review without the pre-work steps in the open-source framework guided by an expert. That's not counting the time and delay to figure out how to do that engagement.
Bluntly: if it matters, while this is a month's vibing budget for a single scan, it is also "pennies on the dollar" dirt cheap.
At the same time, its findings still need an expert. Its suggestions may be helpful, they may be actively harmful, depends on the prework quality.
Recommendation to IT department heads: spend a couple grand on this, use the scare page to rustle up the budget to build a relationship with a red team that can find, triage, help remediate if needed, and train your in-house team to be "security minded".
Comment by Analemma_ 5 days ago
Comment by xerxes249 5 days ago
Comment by sofixa 5 days ago
Comment by beering 5 days ago
Comment by jazz9k 5 days ago
This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.
Comment by vessenes 4 days ago
Comment by vb-8448 5 days ago
Comment by mmaney13 5 days ago
Reminiscent of the early days of tax automation where importing a W2 cost hundreds of dollars until people realized typing in 6 boxes worth of data was easy and paying the automation fee ate up their entire tax return.
Comment by kolesnikov-arch 5 days ago
Comment by lanyard-textile 5 days ago
Hm :)
Comment by Hamuko 5 days ago
Comment by politelemon 5 days ago
Comment by skeledrew 5 days ago
Comment by spacebacon 5 days ago
https://github.com/space-bacon/SRT
Significantly improve every frozen model overnight. LFG.
Comment by baby 5 days ago
Every week I see bugs (as an auditor) that our own harness (https://zkao.io/) can't find, and we have to figure out pretty interesting techniques in order to make the tool find them. Mind you I'm talking mostly about cryptographic vulnerabilities, not just webapp bugs. So IMO it's going to make a lot of sense for companies to have both their own harness (as tptacek is talking about) and pay for services that focus on making a good harness from experience (and audit firms are going to be the best at doing this, as they see a lot of bugs and can spend time "teaching" their harness about these bugs)
On the other hand, you have to find equally as good techniques to triage, because otherwise you just have some machinery that I call "vibe auditing" that just produces enough false positives to tire all the developers (who are already overwhelmed with crappy AI submissions in bugbounties and other AI tool that review all of their PRs).
At the end of the day, when your harness doesn't return any bug, you're left wondering "does it mean there's no bugs?" We're basically back in this reputation game, where you want to use the best tool, or the best team (that knows what the best tools are), and need to figure out which one is.
Comment by richardbarosky 5 days ago
Something that stands out is that for the strongest use cases, AI companies will prefer to sell the technique as a service rather than its raw output. For use cases where the output is less valuable, tokens are sold. If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.
The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than taking their knowledge and making money in the stock market directly.
Comment by dgellow 5 days ago
Or they want to diversify
> If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly.
That requires to build and sell a whole product they have little experience with, competing with their own customers. Not a great place for an AI vendor still trying to establish itself. It’s a lot of distraction, when you already have a lot to deal with the existing business. And strategically not too valuable
Comment by kenjackson 5 days ago
Comment by dgellow 5 days ago
Comment by Kiro 5 days ago
I don't understand this argument. I've ran and sold a semi-successful SaaS. The exhausting and frustrating parts are all the things an LLM cannot help you with. Coding the product is not the bottleneck or what grants you success.
Comment by zuzululu 5 days ago
Comment by richardbarosky 5 days ago
Agree, and I think that's the core of my point.
Not that it's irrational or doesn't make sense to sell tokens for purposes of software dev, but that if tokens were a true game changer for success in software dev, they wouldn't be leading with token sales, the same way they're not leading with token sales for security stuff -- it's more like "Contact Sales".
Comment by hyperpape 5 days ago
This doesn't follow at all. Anthropic's revenue is growing 10x year over year selling tokens. Their tokens can be super magical, let them enter established industries and displace incumbents, and get 100% annual growth in those industries, and they would still be better off prioritizing selling tokens, because it's a great business.
What your argument shows is that there are limits. Their tokens are not quite powerful enough to make infinite money instantly in every area of software. Admittedly, that does seem true.
Comment by morpheos137 5 days ago
Comment by latentsea 4 days ago
Comment by morpheos137 4 days ago
Comment by skybrian 5 days ago
We started out with many companies forbidding their employees to use remote LLMs on their source code because of security concerns. Now many companies are starting to believe that they must analyze their all their source code with remote LLMs because of security concerns. When trusting Anthropic becomes normalized, that means they can sell more services that require access to the source code.
Comment by Melatonic 5 days ago
Comment by therealdrag0 5 days ago
Comment by derf_ 5 days ago
If hardware were so magical in creating new value generally, TSMC would be designing the chips instead of selling fabrication as a service.
That is what US chip companies used to do, by the way (back when there was silicon in Silicon Valley, before they got their lunch eaten by Taiwan). If TSMC had to design all of the chips they fabricate now, they would be doing a lot less business. Conversely, if any other company that wanted to design a chip had to build their own cutting-edge fab first, NVIDIA would not exist.
Comment by energy123 5 days ago
Comment by DrewADesign 5 days ago
Why do you say that? I reckon lots and lots of companies sell software that aren’t monopolies. Having competition, even stiff competition, isn’t anathema to running a business.
Comment by energy123 5 days ago
But they can't do that because they aren't monopolies.
Comment by DrewADesign 5 days ago
Just to clarify, I’m not the person you initially replied to.
> "They wouldn't be selling tokens directly ... They'd hoard them" But they can't do that because they aren't monopolies.
Hoarding them— not selling any of them, but instead using them internally and selling the products created by them — doesn’t at all seem like it would require a monopoly.
Comment by dclavijo 5 days ago
Comment by majicDave 5 days ago
Comment by napoleond 5 days ago
Comment by DrewADesign 5 days ago
Comment by lateral_cloud 5 days ago
Comment by HarHarVeryFunny 4 days ago
This makes for a somewhat amusing set of product offerings given that according to Dario 90% of all software is being AI generated.
Maybe next they can sell something to find the bugs in the security scanner ?
Comment by bwfan123 4 days ago
So, tokens are used to produce sloppy code, and then this thing uses more tokens to fix vulnerabilities in the slop ? Whats not to like in this business model ? Similar to microsoft's. Create an OS which is vulnerable, and then enable business models for anti-virus software. Everyone wins.
More seriously, linters are turned off in ci because the amount of time spent chasing false-positives is prohibitive.
Comment by yalogin 4 days ago
Comment by bobkb 5 days ago
I have working on and using a similar tool for a while now :
https://github.com/bobinson/vulture
I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.
Comment by newaccount12344 5 days ago
Comment by cpard 5 days ago
This is the equivalent of Claude Design but for security.
Different harness, different packaging and obviously different distribution because the persona is different.
It’s funny because from all the posts I’ve read from companies reporting on Mythos, everyone is building their own harness for it.
Cisco even published a specification for one.
But Anthropic is the one who has figured out how to package and distribute this. Great GTM!
Comment by ElijahLynn 5 days ago
Comment by Zetaphor 5 days ago
Comment by ElijahLynn 4 days ago
Comment by sciencejerk 5 days ago
Comment by crooked-v 5 days ago
Comment by napsterbr 5 days ago
Comment by trilogic 5 days ago
Be aware: the .py/s will not pass the antivirus but basically they do the job.
Comment by NotPractical 4 days ago
Comment by madduci 5 days ago
Nice
Comment by wslh 5 days ago
Comment by bigmattystyles 5 days ago
Comment by rms2ds 5 days ago
Comment by LazyR3nR3n 5 days ago
Comment by sylware 4 days ago
Comment by ElijahLynn 5 days ago
That repo is Anthropics.
This post title should clarify that it is not Anthropic (no "s").
Comment by olcay_ 5 days ago
Comment by edot 5 days ago
Comment by sumedh 5 days ago
Comment by ElijahLynn 4 days ago
Comment by gulbanana 5 days ago
Comment by SubiculumCode 5 days ago
Comment by zoobab 5 days ago
Comment by extr 5 days ago
Comment by euroderf 5 days ago
Comment by Yokohiii 5 days ago
Comment by eranation 5 days ago
tl;dr - not that it's surprising, but it's not cheap, especially if you want to do this continuously.
Comment by leetrout 4 days ago
Like others I suspect this is exactly what they are going to paywall with product features going forward.
Comment by bartoszcki 5 days ago
Are they making 8x more features or the same amount just with more code?
Comment by crooked-v 5 days ago
Comment by terekhindc 2 days ago
Comment by xuzhenpeng 5 days ago
Comment by afford-ai 5 days ago
Comment by volume_tech 4 days ago
Comment by sspoisk 4 days ago
Comment by EvanXue 5 days ago
Comment by Xotic007 4 days ago
Comment by dangrafham 4 days ago
Comment by eddysir 4 days ago
Comment by notenkidev 5 days ago
Comment by edgardurand 5 days ago
Comment by JamesConnor_Dev 2 days ago
Comment by Maya_Andersson 4 days ago
Comment by xinchen03 5 days ago
Comment by aos_architect 4 days ago
Comment by continueops_com 4 days ago
Comment by vladsiu 5 days ago
Comment by jungfty 5 days ago
Comment by dclavijo 5 days ago
Comment by zoobab 5 days ago