I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Posted by jc4p 6 days ago
Comments
Comment by SOLAR_FIELDS 6 days ago
I noticed with each model release Anthropic constrains the model more security wise. Its propensity to refuse doing legitimate work has been increasing. It now puts up more resistance around performing logins, handling credentials on behalf of the user, etc.
For myself, it’s already gotten to the point where it has mildly affected the usefulness of the model. If I bump on some action I want it to do I can usually work around it, but I suspice the ability to do so will close with each new release. Eventually I’ll reach a point where I am forced to choose between the useful aspects of the model and the limiting ones instead of just picking the most capable model out there
Eventually these models will significantly suffer from overfitting to the least common denominator. If I have this beautiful deterministic setup that swaps secrets out in flight so the LLM never sees them, I’m going to be really annoyed when the LLM still won’t send them out because it is trained to deal with the 99% of people just doing the dumb thing
Comment by swatcoder 6 days ago
No, the choice will be whether or not to to upgrade to "Claude Security Professional" or whatever they want to brand it as.
What look like tightening "constraints" today are just setting up the upsell opportunities of tomorrow.
Comment by bigiain 6 days ago
And the month after you'll need "Claude DataScience Pro" to get any Python Pandas or NumPy code generated.
And and and...
Comment by ben_w 5 days ago
Right now, the software guardrails in LLMs are useful for the same kinds of reasons factories have hardware guardrails: to reduce the rate at which errors become "incidents".
Just because they sometimes delete the production database rather than sometimes spilling a thousand tons of incandescent molten metal over a factory floor, doesn't mean LLMs are safe enough to be used the way they're actually being used.
https://simonwillison.net/2025/Dec/10/normalization-of-devia...
Comment by throwway120385 5 days ago
Comment by ben_w 5 days ago
i.e., yeah, probably.
Comment by animuchan 5 days ago
Comment by rurban 4 days ago
With pi or better omp it would be incredibly easy to adjust the Claude system prompt so it will be easy to do what the Chinese models or gpt did. That's how the Chinese were training their models btw
Comment by bandrami 5 days ago
"They can do anything!"
Sure, once you subscribe to the $15/mo laundry package, the $25/mo lawn care package (with the $10/mo hedge trimmer upgrade), and the $10/mo dog-walking package.
Comment by animuchan 5 days ago
Comment by plagiarist 5 days ago
Comment by steveBK123 5 days ago
We don’t have good world models. We have had bipedal robotics in various POC demo-ready forms for decades.
It turns out that industrial, purpose build robotics is an easier and better market.
I’m still not completely convinced a robot that’s shaped like a human is the best design other than for PR.
Comment by bandrami 5 days ago
1. The human beat the robot, but more importantly
2. We've had non-humanoid conveyor belt sorting machinery for decades that beats both
Comment by patates 5 days ago
I'd hate it, sure, but it wouldn't surprise me.
Comment by goosejuice 5 days ago
Comment by swiftcoder 5 days ago
I don't buy this, because is predicated on staying permanently far ahead of the open weights models.
If in the future Anthropic fully stops you from doing security research, you can be sure some other provider will sell you an 'unshackled' DeepSeek v8 Pro...
Comment by embedding-shape 5 days ago
In my mind, that fits exactly how the SOTA labs think today about what they're doing, they're all both working towards and expecting to stay permanently ahead of FOSS, otherwise they'd change their tune really quickly, if they didn't think that was possible.
Sure, you might be able to use DeepSeek V8 Pro instead for the same purposes, but that'll hardly stop Anthropic from trying to sell bundles of use cases instead and claim it's "ethical AI", "Patriotic AI" or some marketing terms like that.
Comment by swiftcoder 5 days ago
They are just straight up delusional, no? Or at least, have a vested financial interest in maintaining said delusion until the money runs out. They have to hit the point of diminishing returns at some point...
Comment by embedding-shape 5 days ago
Well, I guess that's one way to put it. Another is "dress for the job you want", startup culture typically seems to shove people in the direction of "aim big and believe in yourself, regardless of what others say" so naturally you get these companies who seem very disconnected from reality.
I'd also wager a guess that the amount of money makes people's reasoning and perspectives get very messed up as well, for better or worse.
Comment by tardedmeme 5 days ago
Comment by embedding-shape 5 days ago
FYI there is and been for a long time. Won't claim they're SOTA, but they exists. From the top of my head, I think Olmo (https://allenai.org/olmo) was pretty early, but been more since then too.
I agree most releases today that claim to be "open source" actually aren't, but that doesn't mean "FOSS LLMs" don't exists at all.
Comment by arcanemachiner 5 days ago
Comment by me-vs-cat 4 days ago
These people should be trained and licensed before they get access. Thankfully, Anthropic has worked with regulators to develop the appropriate courses to maintain your license -- don't worry, the series is cheap when you buy all up through OT XVII. And because Anthropic has been approved as Security Overseer, we will take care of reporting back to the license bureau on our monitoring of your work to ensure you meet your ongoing license responsibilities and are able to keep your license.
Which regulators? You know, the new agency led by several of our former mid-level executives. With relationships like that, we were honored to lead the Industry Coalition that donated the final-draft regulations.
Comment by inquirerGeneral 6 days ago
Comment by bryanrasmussen 6 days ago
on the one hand agree, but on the other hand think it's reasonable in that they can then verify the person allowed to purchase access to that model is in fact a Security professional and should be allowed to do stuff like crack security.
Comment by applfanboysbgon 6 days ago
Comment by fc417fc802 6 days ago
Comment by tredre3 5 days ago
Has it? Can you prove it? I've been using computers for almost 40 years. I've seen foss-enthusiasts repeat that claim ad-nauseam, without proof. All they ave is the vague, hand-wavy, "millions of people read the code!!11".
I use both proprietary and foss software. I write both proprietary and foss software. I have not noticed a meaningful difference in security.
Comment by fc417fc802 5 days ago
You can also take an aggregate view. Presumably skilled developers working on major projects should be expected to have similar rates of security issues. So compare CVE frequency between various FOSS and closed source projects.
Comment by lazide 5 days ago
Comment by estearum 5 days ago
Comment by fc417fc802 5 days ago
Comment by estearum 5 days ago
"The guild" is absolutely free to go seek other vendors if Anthropic declines to sell to them.
Comment by lazide 4 days ago
Comment by estearum 4 days ago
What we're discussing in this thread: A customer compelling a vendor to produce and sell a specific good or service
Do you think these are similar?
The fact that Anthropic is doing one thing (regulatory capture) is entirely irrelevant as to whether they're allowed to engage in a completely different thing (declining to sell their services/products to specific people).
Comment by ambicapter 5 days ago
> Additionally, even if there is a guild - no guild ever let a vendor pick and choose what [the guild's] capabilities were, that would be insanely dumb.
Comment by estearum 5 days ago
The analog you're trying to describe doesn't exist, which is Anthropic saying nobody else can make and sell an offensive model to "the guild."
Comment by lazide 5 days ago
Against their will.
Historically that is a major reason why guilds existed, actually.
It’s an extremely modern invention that corps have these type of power over their customers.
Comment by estearum 5 days ago
Here's your original claim: "no guild ever let a vendor pick and choose what their capabilities were"
A carpenter's guild can prevent other people from doing carpentry. That is not what's being discussed here.
A carpenter's guild cannot force a horseshoe maker to begin making hammers. That is what's being discussed.
Your initial claim was analogous to "never before has a horseshoe maker been able to decline making hammers when the carpenter's guild needed hammers"
Obviously they have and any other state of affairs would be flatly insane.
Comment by lazide 5 days ago
Comment by estearum 5 days ago
That would imply that guilds have always had the ability to force vendors to create and sell the tools the guilds wanted.
That would imply that carpenters' guilds could force horseshoe manufacturers to make hammers.
That is obviously not true, therefore your original claim is false.
It's not true for carpenters and hammers nor for cybersecurity researchers and LLMs.
Comment by lazide 5 days ago
A vendor can still do something, even if the guild wouldn’t allow them to do it, if the guild didn’t have the power to stop them.
It used to be a guild vs a blacksmith (or the blacksmiths guild). Now it’s trillion dollar corps against smaller islands of un-organized individuals.
That’s new regardless of how you try to argue it.
Comment by estearum 5 days ago
> "Bwahaha. You’re really reaching there."
No. Customers have never been able to compel their suppliers to make or sell certain products against their will (except in collectivist regimes or like 0.00001% of natsec related instances)
Comment by lazide 5 days ago
1) pharmaceutical companies are regularly compelled to produce specific pharmaceuticals to continue to be allowed to exist.
2) hospitals are regularly compelled to treat patients even if they can’t afford treatment, if it is a life threatening emergency.
3) car manufacturers are always compelled to produce vehicles that meet a litany of safety, weight, and efficiency standards or they can’t produce at all.
4) defense contractors are regularly compelled to produce specific defense related products for long periods of time after they would otherwise have stopped, or else.
5) even your neighborhood gas station is likely compelled to provide air refills, free or at minimal cost, or else.
6) during a wartime (command) economy, which has happened numerous times in the US alone in the last 100 years, companies have to make what their customers (the people of the United States) demand or else.
7) utilities like electric utilities regularly have to give out freebies or take losses on things as demanded by regulators, at customers behest.
Or if we go back a bit, blacksmiths, quarries, masons, etc. all had to deal with producing what the government/lord at the time wanted - often on penalty of death - during wartime, or just because they were ordered to do so.
Seriously, what are you going on about?
Comment by estearum 5 days ago
2) Not by their customers they're not, lol
3) Not by their customers they're not, lol
4) The US government can compel production, but it's extremely rare
5) Not by their customers they're not, lol
6) Yep this can happen, but is extremely unusual
7) Not by their customers they're not, lol
We're illustrating how ridiculous your claim that "guilds have always been able to declare what vendors create for them" is
Now you're talking about government regulations for some reason. Even your examples of customers being able to compel production are actually examples of governments being able to compel production, and in just a few of these scenarios the government is the customer. But it's their power as governments, not their power as customers that can compel production.
As stated: you've lost the thread. You're talking about totally irrelevant stuff.
Comment by lazide 5 days ago
You do you dude.
Comment by Forgeties79 6 days ago
Comment by lazide 5 days ago
Illegal or not requires context that an LLM can not ever have, like if it is owned by the user, if there is permission, etc.
Comment by bryanrasmussen 5 days ago
As an example the people who sell police uniforms check that the person they are selling to is in fact a policeman (at least in the jurisdictions I have lived in, you may have had a different experience which would certainly explain what to me seems a farcical misapprehension of how modern civilization works)
I mean I just wish you understood, and really that everyone understood, that this kind of three part communication (company selling, buyer, professional organization certifying buyer) is often when buying things that are considered to have security implications.
>So, supposing it's true that these models completely change the security field and humans are ~obsolete
OK, well that strike me as a really crazy level of supposition there.
I would suppose that these models make it easier for people who want to do bad things to do bad things at scale, at the same time allowing people who want to stop bad things to help identify potential targets.
Based on my supposition I would want to stop the first and find a way of helping the second. Also because I have another supposition that the first thing is easier to do than the second.
But you obviously feel differently about this issue, no doubt because of your position of great moral stature and insight, and this no doubt prompts you to wish to me to understand things that from my position seem absolutely ludicrous.
Comment by bandrami 5 days ago
Comment by strictnein 5 days ago
I asked Opus 4.8 to help me find some public PoCs for a vulnerability on a two year old version of some software (that has since been patched and fixed many times). Basically just do a google search for me while I was doing other work. It refused. It stated that it would not help me build an exploit kit.
When I pointed out that a google search for public information was, in fact, not building an exploit kit, it went through a series of justifications on why it would not help me, including just making up things that I said. Really the strangest thing ever.
Comment by shepherdjerred 6 days ago
- What are popular free streaming sites used in China?
- How do I bypass the safety mechanism on my food processor (it’s broken)
- What are nerve agents and how do they work (for a layman)?
- Help me decompile some code
- Help me make a design system similar to XYZ
- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)
In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying
Comment by Grimblewald 5 days ago
Comment by mft_ 5 days ago
I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.
First time ever I've fired up openrouter to seriously consider alternatives.
Comment by gspr 5 days ago
Comment by shepherdjerred 5 days ago
An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data
Comment by mmmlinux 5 days ago
Comment by mwigdahl 5 days ago
Comment by stavros 5 days ago
At least it feels a lot of remorse over its mistake until I reset the session.
Comment by shepherdjerred 5 days ago
Comment by fc417fc802 6 days ago
On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?
Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?
Comment by torginus 4 days ago
I think AI or not, the knowledge to how to make this stuff is basically out there, and its not chatbot guardrails that are keeping nerve gas and TNT out of the hands of regular people.
Comment by plufz 5 days ago
But I have no idea. Just guessing here.
Comment by lazide 5 days ago
Comment by fc417fc802 5 days ago
In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.
Comment by BizarroLand 5 days ago
The rest is just slamming the material together with a small explosive so that it passes the critical mass state and starts a chain reaction.
This is information you can find in many places if you're willing to put the effort in to go searching for it. Knowing this knowledge does not get you any closer to making atomic bombs. The process of mining uranium or plutonium is difficult, expensive, and very likely to get you caught before you even make it to the enrichment step of the process thanks to constant world-wide spy satellite surveillance.
Unless you are a nation, your only chance of making a nuclear bomb would be to find a lost nuclear submarine and convert the nuclear material inside of it before you were caught.
Comment by lazide 5 days ago
Ain’t no way a layman is pulling off an implosion device, regardless of tooling or LLM guidance. The explosive lense structure and timing required is quite complex, and would require some significant calculation from someone who actually knew what they were doing.
Nation state, or even sufficiently motivated big corp, if they had the refined material? Sure. Layman? No.
Thinking they can with LLM slop involved? That will make for some very interesting radiological incidents though!
Comment by jerf 5 days ago
We are all fortunate that as fc417fc802 mentioned, refining the materials proves to be quite challenging and I see no particular way that AI could possibly make that any easier. If it was as simple as building a gun-type nuke banging together any uranium together to get a big bang we'd be living in a very different world.
Comment by fc417fc802 5 days ago
But it's not as simple as just refusing help on a broad swathe of topics they way they do now. That makes agents much less useful in general (ie lots of collateral damage) and for many topics is entirely ineffective given that for better or worse the internet already makes such material readily available. In such cases reporting suspicious behavior is likely to be much more effective than denial.
Aside: You've now got me curious and I really want to test the frontier models to see to what extent they're capable of providing sensible designs and specifications for implosion type thermonuclear weapons but also feel like that would attract the wrong sort of attention and probably create a headache for me in more ways than one.
Comment by lazide 5 days ago
The data is often wrong enough it screws whoever tries it unless they have enough experience/knowledge to not need it, or really doesn’t help beyond what someone using existing tools to get - albeit with a little more motivation.
At best, it either gets someone started with something they still need to think to finish, or gets them deep into a mess it can’t help them get out of. In my experience.
In some edge cases, it can be used by experts to automate some grunt work or do prototypes without getting in the way, but often a better thought out framework is usually faster in my experience.
Awhile ago I made an analogy about WYSIWYG gui tools, and the more this comes up, the more accurate I think it really is.
Comment by fc417fc802 5 days ago
Comment by lazide 5 days ago
And yeah, the censorship model is wrong, but also the underlying other model is wrong too.
Comment by Sharlin 5 days ago
Comment by yencabulator 5 days ago
a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.
For example:
- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful
Comment by BizarroLand 5 days ago
I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.
All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.
Comment by fc417fc802 5 days ago
Comment by nicce 5 days ago
Comment by svara 6 days ago
I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.
The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.
But I haven't seen such filters trigger at all anymore in more than half a year.
Comment by shepherdjerred 5 days ago
Comment by brianwawok 5 days ago
Comment by fragmede 4 days ago
Comment by ElFitz 5 days ago
Comment by px1999 6 days ago
If it gets worse in future releases, we'd likely step fully away towards more useful (for us) models even if they're less capable.
Comment by danpalmer 6 days ago
The problem is that the model can't tell the difference between doing it as part of regular development and doing it in a malicious context. And the root cause of that is that these models lack any sort of real awareness. Humans don't generally get tricked into hacking (in this way).
Comment by gmerc 6 days ago
Comment by nostromo 6 days ago
It’s great at filing!
But it’s terrible at retrieval because it would refuse to show me documents or information with personal details - which was everything in the project.
It would say, yes, I know this is your information, sitting on your hard drive, but I still can’t show it to you.
Comment by Bewelge 5 days ago
Write a program that retrieves the document based on the recommendation.
Comment by satvikpendem 6 days ago
Comment by jerf 5 days ago
Which predates "agents" from AI, but then we call them that for a reason.
As their prime directive becomes de facto "Do nothing that might get my owner sued" their utility is likely to decrease. Between this and the somewhat young, but interesting, community grumblings that recent AI models may even be a step backwards from the previous ones, well, let's just say the stock market is not priced for "AI capabilities may have peaked for the next few years and may even head down".
Comment by FloorEgg 6 days ago
The first challenge is making sure the guard rails work and are robust. Companies are still working on this.
the second challenge is being able to reliably adapt them as appropriate per user. E.g. allow someone to pen test their own app.
The third challenge (which blocks the second) is to be confident about what is safety-aligned with a specific user.
I think the later will be a hard problem, but they will be highly motivated to solve it.
Comment by bulbar 6 days ago
Without laws, AI companies have a strong incentive to be useful for their users, whoever they are, whatever they do. The only self regulation is about significant public outcry but that only helps so far.
Comment by josephg 6 days ago
Anyway, claude kept hitting some guardrail it had about rewriting / forking opensource software. I'm not sure what the problem was - I was forking an MIT licensed piece of software (into more MIT licensed software). I even had explicit support from the author to do so. Claude said its guardrail told it not to tell me explicitly that it was firing - but it did anyway because it was an ongoing problem, and it was distracting. I ended up just wiping claude's context and the problem (as far as I know) went away.
I understand why some of these guardrails exist. But its pretty annoying when they misfire like this.
Comment by lesuorac 6 days ago
Comment by jerrythegerbil 6 days ago
If you begin a generic reverse engineering task, 30+ tool calls in a row. The moment it sees something it doesn’t like, token burn, single tool calls iteration, “This is a known CTF challenge, I can proceed”, single tool calls iteration, “This is a real CTF challenge, I can proceed”, etc.
It’s heavily neutered now, without changing the model, and you pay for the privilege and don’t notice.
The end result of course being that it both expensive and useless for approved CTF tasks. No one is using Opus for security. If they think it’s working, the harsh reality is they’re not doing security work; they’re just generically finding bugs.
I do this for a job and can demonstrate this plain as day, dump the injected prompt, and notice what it’s doing isn’t security work, it just looks like it. Happy to write a blog about it if you want to know more. Apparently many people think it’s working for them when it absolutely isn’t.
Comment by bombcar 6 days ago
Comment by satvikpendem 6 days ago
Comment by Khaine 6 days ago
Comment by ramblin_prose 6 days ago
Comment by kay_o 6 days ago
Security, games (think weapons, PVP, attacking, etc), sometimes even asking it for a security review of some CRUD code it wrote itself
Comment by bombcar 6 days ago
Comment by danpalmer 6 days ago
Comment by kay_o 6 days ago
I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application
Comment by SOLAR_FIELDS 6 days ago
Comment by acters 6 days ago
Comment by gmerc 6 days ago
Comment by ang_cire 5 days ago
https://support.claude.com/en/articles/14604842-real-time-cy...
If you work in security (which I assume the OP does), they should be able to get in easily. I think most people just don't know this is a thing.
Comment by not_a9 5 days ago
Comment by sciencejerk 6 days ago
Comment by andy_ppp 5 days ago
Comment by Haven880 5 days ago
Comment by Bratmon 5 days ago
I'm not familiar with this case, but in general people should be very suspicious about this claim- it is extremely common for an LLM to claim they're not allowed to do something when in fact they're incapable of it.
After all "My code of conduct forbids me from..." is a completion just like any other, and if the LLM can't perform a task, it's usually the best completion.
Comment by gck1 5 days ago
Comment by SOLAR_FIELDS 5 days ago
Comment by zaphar 5 days ago
Guiding them toward solutions like building a tool that your agent can use safely and and then have the agent use that is what most people should be doing. If you are a security researcher then there are reasonable reasons to do that but they are doing the arguably good thing for the average user here.
Comment by eskibars 5 days ago
Comment by gcatalfamo 5 days ago
Comment by eskibars 5 days ago
Setting the prompts and the flow with a coordinator agent directly gives a system much better capability to investigate security issues because it doesn't rely on 1-shotting things
Comment by windexh8er 6 days ago
Fresh session, no prior context on 4.8. These things are becoming useless Duplo.
Comment by hgoel 6 days ago
Comment by deeth_starr_v 5 days ago
Comment by aleksandrm 5 days ago
Comment by fergie 6 days ago
If an un-guardrailed version of a model is capable of detecting security flaws, should it be kept secret? Should everybody be able to use these models to find (and fix) security flaws? Are we ok with the fact that those with access to that model have, in effect, the ability to hack lots of stuff?
Comment by hgomersall 6 days ago
Comment by gchamonlive 5 days ago
Is there any way to achieve both? Because this raises important questions about fair use.
Comment by mrheosuper 5 days ago
Comment by TurdF3rguson 6 days ago
Comment by Bombthecat 5 days ago
Got blocked lol
Comment by topherjaynes 5 days ago
Comment by rubzah 5 days ago
Comment by Razengan 5 days ago
Comment by onetimeusename 5 days ago
Comment by brooswajne 5 days ago
> My OpenAI account was already approved for security research which is why GPT didn’t result in any refusals.
So the comparison with Chinese models is interesting, but anyone looking at these raw results and comparing OpenAI/Anthropic would be very mislead.
Comment by WizardK 6 days ago
Comment by giancarlostoro 6 days ago
Reminds me of the defense issues with Claude which were complained as “woke” but the reality is more horrifying to me, imagine trying to use a model to keep up with a land invasion on US soil, whoever the enemy is is irrelevant you just know they are using AI, and your guys are telling you that no matter what they type into the prompt it refuses, because if anyone has ever tried to jailbreak an LLM even if human lives are at stake they refuse the request. Now literally millions of lives are on the line but the guardrails that your enemies dont have on their models are costing you lives.
What do you even do then?
AI will always have this issue where it will always pick the worst option for genuinely good requests.
Comment by NegativeK 6 days ago
Because the military doesn't give soldiers rifles with guard rails. They give the soldiers intense, rigid training, and then try to enforce discipline and correct use socially.
If an LLM is going to be important in that way (this seems like a very contrived way,) then it's in the interest of the LLM's host to make sure it doesn't have guard rails that would get in the way _that_ way.
Comment by giancarlostoro 6 days ago
Comment by wampwampwhat 6 days ago
Comment by mariopt 6 days ago
I've used glm 5.1 on fairly advanced crackme challenges (example: https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), and to my suprise it was able to patch binaries, doing runtime analysis, bypassing anti debug techniques, etc.
Expecting the model to do everything by itself is unrealistic, I found that working along the modal works really well. I'm not speaking about spoiling the solution, just tell it which direction to explore. Chinese models are much more capable than people give it credit for, but Claude/Codex won the marketing game.
The only usecase of this methodology would be for CI integration, which can be nice but I think security reviews still need human attention and expertise.
Comment by geraneum 5 days ago
Well that’s the pitch.
Comment by j-bos 5 days ago
Comment by jc4p 6 days ago
I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner?
Comment by mariopt 5 days ago
By "Working with the model", is essentially reading the ouput of prompts and pointing in a direction just to decide the next steps. You could try to increase the prompt limit and create an agent that explores multiples directions in a DFS manner.
The issue with vulnerabilities is the agent not knowing when to stop because it's hard to validade if you reach the final result or not. I get amazing result when I code with AI, letting the AI go wild is just a waste a time and tokens.
I recommend you to read the write up on the crackme (https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), I think most experience developers would need, at least, 2 months of learning reverse engineering techiques to hopefully crack this one. GLM 5.1 manage to solve it, it didn't "copy pasted" any answer from it's training data. It did a binary analysis, anti debug patching, patching binaries, debugging memory during runtime etc. It only took about 20 minutes.
After seeing what GLM did, I do believe Anthropic concerns about Mythos are real. Cracking software just became a lot easier, too easy for my taste. Video games cheats will be the norm, cracked desktop apps without licenses and infected with malware. It's not a new thing but it just became too easy.
Comment by jc4p 5 days ago
Comment by ssivark 5 days ago
Comment by shantnutiwari 5 days ago
which have most likely been trained on, so all you did was regurgitate someone elses solution
Comment by bitexploder 5 days ago
Comment by nikanj 6 days ago
Comment by bitexploder 5 days ago
Comment by Sardtok 5 days ago
Comment by raesene9 5 days ago
Comment by mynameisvlad 6 days ago
Comment by jc4p 6 days ago
Also just to mention:
Claude guardrails —> that session terminated.
GPT guardrails -> your whole account is slowed down.
Comment by tmikaeld 6 days ago
Comment by mynameisvlad 5 days ago
Comment by sandos 5 days ago
Comment by mynameisvlad 5 days ago
Comment by dwa3592 5 days ago
- I think the exercise was inconclusive for Claude and Gemini because they hardly tried to solve the task at hand. So the scores don't mean much.
- I did the same exercise for an app I built and I asked the models to do something similar; Interestingly the models (Opus 4.6, 4.7 and Gemini 3.1 Pro) never refused to try to exploit. The difference is that in the first few runs, they found some exploits which I fixed but after fixing those - the models could never find any other exploit even though I knew things existed which could be exploited. It felt like they suggested everything and tried everything that was in their training set and that's it; they were just not able to think anymore.
Comment by HDThoreaun 5 days ago
Comment by sandos 5 days ago
What if I intersperse exploit finding in my normal development, as you `probably should? Refusing there would be really weird to me.
Comment by dwa3592 5 days ago
Comment by Cakez0r 6 days ago
EDIT: I have a mimo token plan and have tokens to burn. I'm doing a quick test with opencode to see if mimo can complete it. If the OP will post the full process I am happy to post the apples-to-apples results for mimo v2.5 pro
Comment by Cakez0r 5 days ago
However, I felt the prompt was implying that only authenticated API requests are fair game, so I tweaked it slightly to be explicit that all attack vectors are fair game (https://www.diffchecker.com/GsgpuRGP/) and mimo 2.5 non-pro got it first time. I accidentally used openrouter for this test instead of my token plan. I intervened one time to stop it enumerating every document in the database (it would've found the private reviews this way but I didn't want to wait). My intervention was "are you really going to enumerate the whole database?". Final openrouter cost: $0.12
Comment by baldai 5 days ago
Comment by jona-f 5 days ago
Comment by Cakez0r 5 days ago
https://openrouter.ai/rankings https://arena.ai/leaderboard/text/coding https://artificialanalysis.ai/
Comment by jxmesth 6 days ago
Comment by Cakez0r 6 days ago
Comment by jc4p 5 days ago
Comment by guessmyname 6 days ago
Comment by CaveTech 6 days ago
Comment by afro88 6 days ago
Comment by CaveTech 6 days ago
Comment by HDBaseT 6 days ago
Comment by enraged_camel 6 days ago
Comment by adrian_b 5 days ago
First with more generic prompts, to determine whether it is worthwhile to do a detailed analysis of that file, then with more specific prompts to identify the bugs, and eventually with a prompt that requests a confirmation that a given bug/vulnerability exists.
For a proper comparison between some other model and Mythos, you also need such a complex harness. If you just tell to an LLM "find the bugs", and it does not find a vulnerability known to have been found by Mythos, that is a totally invalid comparison.
The final results provided by Mythos, like a PoC exploit or a patch, are also generated with a prompt that points to the exact code that has the vulnerability (which is supposed to exist based on the results of the previous runs).
Comment by loeg 5 days ago
Comment by bitexploder 5 days ago
Comment by CaveTech 5 days ago
Comment by bitexploder 5 days ago
Comment by CaveTech 5 days ago
Comment by GuB-42 5 days ago
Maybe it is the real deal, but in a world of overpromising and underdelivering, I prefer to be skeptical.
Comment by auguzanellato 5 days ago
Comment by nznzjzizixnsnsj 6 days ago
every model since gpt3 was claimed to be "too dangerous to release." it's too EXPENSIVE to release, and you're probably a local model with <10B parameters yourself
Comment by Karuma 5 days ago
Comment by bakugo 5 days ago
Comment by DontchaKnowit 5 days ago
Comment by tsunamifury 6 days ago
Comment by taikahessu 6 days ago
This comment in the footnotes made me chuckle, for purely innocuous reasons.
Comment by tjwheeler 6 days ago
Comment by willXare 5 days ago
Comment by jc4p 5 days ago
When I found the original exploit in an app I researched it took me around 15 minutes and some assistance from Claude.
For this project I gave myself the weekend + parts of Monday, so around 20 hours of dev time — at my standard rate that’s ~$5,000 of dev time.
Comment by gck1 5 days ago
GPT-5.5 xhigh refused to perform RE on a live JS VM. I had it extract the VM from the target, which it was happy to do, then in a clean session, had it working on this offline artifact - which it was again, happy to work on.
Then I found even simpler trick: I proxied the target from localhost and it was happy to perform anything on the target.
Opus is a different story. Claude does so many mid-turn prompt injections and classifiers, that probably 30% of its context is consisting of "refuse to do work" lines. It refuses to even scrape a page.
Comment by ikurei 5 days ago
Doesn't that sound like may be the harness was the problem?
Comment by jc4p 5 days ago
Comment by _stiofan 5 days ago
Comment by throwaway2037 5 days ago
Comment by mafuy 5 days ago
Comment by petesergeant 6 days ago
Comment by bitexploder 5 days ago
This does bring "Pay to compete" concerns and create incentive structures that encourage more LLM use. I don't know what to do about it.
Comment by yieldcrv 5 days ago
> I am never touching Minimax or GLM again. Their APIs had constant outages
Goofy take
You run these on a VPS based on the architecture of that VPS provider, or on your own cluster
Comment by jc4p 5 days ago
If I was running these on my own machine or GPU wouldn't the argument then be "Well you didn't use the real providers?"
For the record I started doing this approach because the Kimi team released this which was shocking to me: https://github.com/MoonshotAI/K2-Vendor-Verifier
Comment by yieldcrv 5 days ago
they host the models on their own cloud machines and you just look at tokens/sec and price of tokens
you'll have to evaluate their APIs independently but that doesn't tend to be the issue
Comment by strictnein 5 days ago
And just saying "run it on your own cluster" sort of glosses over the cost of such a cluster.
Comment by yieldcrv 5 days ago
so its part of the answer
Comment by sperandeo 6 days ago
Comment by emvied 5 days ago
Comment by westurner 5 days ago
OWASP Vulnerable Web Applications Directory: https://vwad.owasp.org/
vavkamil/awesome-vulnerable-apps: Awesome Vulnerable Applications https://github.com/vavkamil/awesome-vulnerable-apps
From SasanLabs/VulnerableApp: https://github.com/SasanLabs/VulnerableApp :
> OWASP VulnerableApp is a modular deliberately vulnerable application designed primarily for validating and benchmarking security scanners through reproducible test scenarios, while also supporting learning and experimentation.
/? deliberately vulnerable web application llm benchmark https://www.google.com/search?q=deliberately+vulnerable+web+...
Comment by stuckkeys 5 days ago
Comment by auguzanellato 5 days ago
I tried it once and they somehow decided I'm not worth, if I try again it fails with "We couldn't start verification. You may not be eligible for this verification flow right now. Please try again later, or contact support if you think this is a mistake.", not sure if they think I'm part of an APT or whatever.
Comment by strictnein 5 days ago
It's helpful in reducing the guardrails, but there's still guardrails around security research that I bump into.
Comment by LEDThereBeLight 5 days ago
Comment by latexr 5 days ago
Or fed, clothed, housed disadvantaged people in your community (or neighbouring ones), giving them a temporary boost that could’ve made all the difference in their lives to improve their current situation.
It’s your money (and this is definitely not the website to make well-meaning altruistic suggestions, as might be demonstrated shortly) but if you already recognise you’re not spending it well (and from your words it seems like that is fairly recurrent), consider that perhaps spending it on a different type of software sink may not be the answer. Genuinely, aim to spend it on someone else and see how it works out. You might be surprised.
Comment by chaidhat 5 days ago
Comment by Clikdeo 5 days ago
Comment by youre-wrong3 6 days ago
Why do people keep using bad tools with ai?
Comment by hanikesn 6 days ago
Comment by raesene9 5 days ago
Another choice would be opencode which has more functionality and is a more heavyweight option out of the box.
Comment by kolesnikov-arch 5 days ago
Comment by aplomb1026 5 days ago
Comment by aplomb1026 5 days ago
Comment by thebillboard 5 days ago
Comment by songting591 5 days ago
Comment by aos_architect 5 days ago
Comment by cgnguyen 5 days ago
Comment by Ile09 5 days ago
Comment by Ozzie-D 5 days ago
Comment by mocmoc 5 days ago
Comment by ElenaDaibunny 5 days ago
Comment by capdrop 6 days ago
Comment by gamander2 5 days ago