Did Claude increase bugs in rsync?
Posted by logicprog 4 days ago
Comments
Comment by RustyRussell 4 days ago
https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0
(Disclosure: while I haven't talked with him in years, Tridge was my colleague and mentor for many years. I feel it is worth considering his view before joining a crusade)
Comment by jorvi 4 days ago
I don't entirely understand what this is saying. People wouldn't have been outraged if only the tests had been updated and/or he pushed solely on master - but he pushed breaking changes onto the release branch(es) too. Breaking workflows that have worked for years is a prime way to get people irate, and then seeing "Claude" in the commits just pours gasoline onto the fire.
Comment by RustyRussell 4 days ago
Rsync has many options: I can totally believe that fixing a bug in one place broke someone's usage, to be fair.
Comment by jpalomaki 3 days ago
Comment by matheusmoreira 4 days ago
I think it's pretty sad that he even had to write it. Quite a lot of judgement from people who aren't paying his bills.
Comment by Laurel1234 3 days ago
Comment by matheusmoreira 3 days ago
Comment by Laurel1234 3 days ago
Comment by rickydroll 3 days ago
Comment by noddybear 2 days ago
Comment by rickydroll 2 days ago
Also, since it's a separate entity, its lifetime is not tied to the owner. So if the owner dies, their shares are inherited by somebody else, and the company keeps operating.
It helps in raising money for business operations. A corporation raises capital by issuing and selling shares of stock. However, if a physical person did that, I think it would be called indentured servitude.
Comment by advael 19 hours ago
Comment by rickydroll 18 hours ago
I was trying to show that it is not "merely a legal expedient", that corporate personhood had a specific purpose, and that it differed from a real person. I think that the confusion about legal personhood in corporations comes from how lawyers explain its existence. A couple of lawyers I've had explained it as, it's just like a person in the law, except where it's different.
The problem is that we haven't created a clear enough distinction between a natural person and a legal person. In many cases, corporations have rights but not the responsibility. For example, they have speech rights, but they don't go to jail when the corporation commits a crime. The judicial inequalities between ordinary people and rich people are even greater between natural persons and corporations.
Comment by advael 16 hours ago
A model that treats what effectively amounts to a body of assets united by a charter as equivalent to a person - except when it isn't - is inherently confusing because these are not at all similar kinds of entities. While it's clear that this model has a purpose, I think people are right to point out that the equivalence is drawn by rather stilted logic and even more right to question whether the consequences of this legal framing are desirable from their perspective
Comment by Laurel1234 2 days ago
Comment by rickydroll 19 hours ago
Comment by anamax 2 days ago
Are AIs?
Comment by rickydroll 2 days ago
Comment by varispeed 3 days ago
Comment by dnnddidiej 4 days ago
Comment by el_io 3 days ago
Comment by guilhas 2 days ago
When you quickly churn more lines of code in a few days than you changed in months, and then release them as a normal, not sure you're expecting "constructive criticism"
Also if I suspect the project is just slopping high amount of code without proper thought, I probably won't invest my time into reading those changes
Comment by advael 19 hours ago
I think he makes a lot of good points here, but also think that kind of statement is unlikely to assuage the real concerns of people using the software. I think people are more likely to fork rsync now rather than rely on a more diverged earlier alternative implementation though
Comment by nullc 4 days ago
Comment by GodelNumbering 4 days ago
original commit: https://github.com/RsyncProject/rsync/commit/d046525de39315d...
```
- if (!ptr)
- ptr = malloc(num * size);
- else if (ptr == do_calloc)
+ if (!ptr || ptr == do_calloc)
ptr = calloc(num, size);
```Written with claude. This is a good example of what slips through LLM attention. It forces all allocations to be calloc as if it is a strict upgrade. For large and recursive allocations, this becomes a significant cost.
reverted in https://github.com/RsyncProject/rsync/commit/7db73ad9a1b8721...
if you read the description of revert half carefully, it's easy to tell that even that was written by an LLM .
I can understand the sentiment of whoever posted the original thread.
Comment by wolletd 4 days ago
That's exactly what I'd expect when someone is excited about AI usage and becomes... well, sloppy.
Comment by logicprog 4 days ago
"Like many developers of open source packages I’ve been hit by a flood of security reports lately in my role as the rsync maintainer. Many of those reports are AI generated (not all though, there are some notable ones with very careful and high quality manual analysis).
As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues (so I find at least some of them before other people!) and the addition of a whole lot of defence-in-depth hardening techniques. This is all a huge amount of work. "
Comment by Eufrat 4 days ago
I honestly think the main problem is Tridge just failed at communicating any of this correctly and I don’t think the implication he gives that all of this was due to the urgency of the impending security apocalypse really holds water.
Why was all of this written straight to the master branch? Now that the release is out, why not better explain what the urgency of this release was? Why wasn’t he proactive in communicating this and instead let the mob make up their own story? I think a lot of people are inclined to give Tridge a lot of leeway due to the fact that he literally is the reason why rsync exists, but this was avoidable and I think the comment in his response post where he mentions that, “I’d rather be out sailing than working on rsync security issues, so I have reached for several AI tools to help with what needs to be done,” speaks volumes as to what is going on.
Comment by rsc 4 days ago
Tridge doesn't owe anyone anything as far as rsync is concerned. Yet he is spending his time maintaining it, only to be attacked for his efforts.
To respond to the specific technical point, there really _is_ a flood of security reports arriving everywhere in the past few months. The jury is out on whether Mythos is that much better than alternatives, but even the publicly available models are _highly_ capable of finding real problems, and they are being employed to that end quite effectively. Here are the counts of security issues fixed in each monthly Go minor release going back to the start of 2024:
0 2024-01-09 Go 1.21.6, Go 1.20.13
0 2024-02-06 Go 1.21.7, Go 1.20.14
5 2024-03-05 Go 1.22.1, Go 1.21.8
1 2024-04-03 Go 1.22.2, Go 1.21.9
2 2024-05-07 Go 1.22.3, Go 1.21.10
2 2024-06-04 Go 1.22.4, Go 1.21.11
1 2024-07-02 Go 1.22.5, Go 1.21.12
0 2024-08-06 Go 1.22.6, Go 1.21.13
3 2024-09-05 Go 1.23.1, Go 1.22.7
0 2024-10-01 Go 1.23.2, Go 1.22.8
0 2024-11-06 Go 1.23.3, Go 1.22.9
0 2024-12-03 Go 1.23.4, Go 1.22.10
2 2025-01-16 Go 1.23.5, Go 1.22.11
1 2025-02-04 Go 1.23.6, Go 1.22.12
1 2025-03-04 Go 1.24.1, Go 1.23.7
1 2025-04-01 Go 1.24.2, Go 1.23.8
1 2025-05-06 Go 1.24.3, Go 1.23.9
3 2025-06-05 Go 1.24.4, Go 1.23.10
1 2025-07-08 Go 1.24.5, Go 1.23.11
2 2025-08-06 Go 1.24.6, Go 1.23.12
1 2025-09-03 Go 1.25.1, Go 1.24.7
10 2025-10-07 Go 1.25.2, Go 1.24.8
* 2025-10-13 Go 1.25.3, Go 1.24.9
0 2025-11-05 Go 1.25.4, Go 1.24.10
2 2025-12-02 Go 1.25.5, Go 1.24.11
6 2026-01-15 Go 1.25.6, Go 1.24.12
2 2026-02-04 Go 1.25.7, Go 1.24.13
5 2026-03-05 Go 1.26.1, Go 1.25.8
10 2026-04-07 Go 1.26.2, Go 1.25.9
11 2026-05-07 Go 1.26.3, Go 1.25.10
3 2026-06-02 Go 1.26.4, Go 1.25.11
* The Go 1.25.3 and Go 1.24.9 releases were a fast follow to fix a problem introduced by one of the security fixes the previous week.You can see that 2026 has been quite different from the previous years. There are plenty of other contemporaneous accounts from other security teams about the load increase they've seen (which again is almost entirely not Mythos).
Also, the number of reports we are receiving has gone up far faster than the number of actual vulnerabilities. Over the 75-month period from January 2020 to early April 2026, the final 30 days accounted for ~16% of the reports.
It is easy to believe that Tridge is seeing a similar flood of reports. More reports means more fixes means more code changes means more bugs.
Comment by wolletd 3 days ago
Which, in general, is totally legit. Doing something voluntarily doesn't relieve you from criticism if what you are doing isn't good.
Comment by mgfist 3 days ago
Comment by throwaway7356 3 days ago
Comment by laserlight 3 days ago
Comment by tumetab1 3 days ago
Recent examples are certification validation logic, one issue after an another... because it's a mess of thing to implement.
Comment by rakel_rakel 4 days ago
> More reports means more fixes means more code changes means more bugs.
Sounds like we'll be riding a downward spiral for the foreseeable future? It will be very interesting to see how stats like the ones you shared develop in the coming year(s).
From the article I find this a bit concerning:
> So: the Claude releases changed way more lines of code than historical ones, but didn't have more bugs. More code, same bugs. That's not what you'd expect if Claude were making things worse.
More code, same bugs, is a net negative, no? I mean unless it's strictly needed for the inherent complexity of the program. But I've seen a tokenizer written by Rob Pike and I've seen a tokenizer written by Claude.... they are not the same :D
Comment by bonzini 4 days ago
Comment by JetSetIlly 4 days ago
Much of the language from both groups is incredibly off-putting, frankly. Tridge in his blog post describes people as "foaming at the mouth"?!
The rhetoric around this has gotten way too emotional from both groups.
I'm glad I'm just a hobbyist.
Comment by lemming 3 days ago
Did you see the picture in the article where the user posted a picture of them strangling the maintainer? I think “foaming at the mouth” is probably gentler than how I would characterise that.
Comment by Eufrat 4 days ago
Comment by JetSetIlly 4 days ago
Agreed. The way to address it though, is through calm analysis and reason. The emotional language from both groups is not helping.
If there's one problem with Claude et al, it's that it's all happened way too quickly for people to keep up. We're all at different stages of acceptance and I think that's what we're seeing manifest in the various discussions.
Comment by Barrin92 4 days ago
I do hope you see the irony of accusing people of armchair psychology and then hitting us with the five stages of grief.
I trust rsync (which handles critical data on my system) because I know a veteran of 40 years wrote the code it runs. If I see code like the one above posted by the OP, that the author wouldn't have written, I start to pay attention. When I then read the blog post of him saying that he'd "rather go sailing than fix rsync issues", I start to question whether the software is still written in a way I can trust and where it's going quality wise.
The problem isn't this weird gaslighting attempt that we just haven't let Claude in our hearts and souls yet which you seem to have determined is inevitable (spoiler alert, it is not), it's that a bot wrote crappy code and I wasn't even aware I was running it and now don't know to what standard this project is held.
Comment by cthalupa 4 days ago
Except the author did write it. https://github.com/RsyncProject/rsync/issues/959#issuecommen...
Which is part of the problem with all of this nonsense right now - everyone is running off of emotion and not looking to see if what is being said is actually true. Which is somewhat ironic, considering the message of the article.
Comment by oxzidized 2 days ago
I just want to point out that those were two different commenters.
Comment by Eufrat 4 days ago
I agree that the entire episode is obscene, but I am also unsure of what to do here either. On some level this is the same problem movie stars run into. I agree that guessing or waxing about the motivations of anyone is a nosy and overall unproductive exercise (yet paparazzi exist because of this very human behavior), but I also think that there is a modest duty owed to users to explain things.
> Tridge doesn't owe anyone anything as far as rsync is concerned. Yet he is spending his time maintaining it, only to be attacked for his efforts.
I am reminded of this piece: https://mikemcquaid.com/open-source-maintainers-owe-you-noth...
Which, I empathize with, but I fundamentally disagree that maintainers owe users nothing. I will die on that hill. If you are getting to that point where you actively loathe working on the project, I agree you should be able to walk away. However, I strongly believe that when you create something for people to use that there’s an implicit social contract about how to go about doing certain things.
I suppose in a very extreme and intentionally histrionic example, having a project carry the MIT license, getting frustrated and then changing the project to delete the entire system is a crime. The average person and the courts don’t care if the license is “as-is”. There is a duty that is understood that you don’t do that and I think we need to make it clear what that duty is for OSS.
Ultimately, though, I think this is all symptomatic of the fact that the OSS model has gaps that the increase in security reports whether AI generated or not has exerted more pressure on. I have certainly been on the receiving end of a lot of frivolous security reports that were discarded because it was obvious that it was just someone with a security scanner wandering around the Internet. You still have to review that nonsense and it eats into your time. Doing this on your own time, without pay and having to listen to the peanut gallery is just infuriating.
Is any business built on top of rsync going to donate their money in a sustainable manner?
Comment by nl 3 days ago
Wow.
The entitlement in this statement is outrageous.
Comment by 0123456789ABCDE 3 days ago
> I fundamentally disagree that maintainers owe users nothing.
> I strongly believe that when you create something for people to use that there’s an implicit social contract about how to go about doing certain things.
do you realize how unhinged this all reads like?
there is no duty. nothing is owed to no one. there is no implicit anything. this is all happening in your head. you are making up things that don't exist. the social contract is not a real thing either. the only contract you can have with the author of rsync is the GNU GENERAL PUBLIC LICENSE Version 3, and then, only when you get a copy of rsync.
> getting frustrated and then changing the project to delete the entire system is a crime
boop: strawman argument — you have been disqualified
> Is any business built on top of rsync going to donate their money in a sustainable manner?
does it matter? do you have an invoice for rsync?
the author wrote it themselves, he is retired, and sailing. unless google is buying him a new boat, i doubt he gives a crap what anyone has to offer.
truly obscene is the fabricated idea that you are owed anything after downloading code from github.
> I am also unsure of what to do here either.
touch grass?
Comment by agartner 4 days ago
There isn't any case law to show that. Certainly not in the age of AI. On the criminal side, the CFAA requires "intentionally causes damage" and that's entirely impossible to prove in the age of AI. On the civil side, liability waivers and warranty disclaimers generally cannot shield intentional or willful misconduct or gross negligence.
Comment by kelipso 4 days ago
Comment by kelnos 4 days ago
But ok, let's just pretend for a second that maintainers have indeed entered into some sort of social contract that gives them an obligation to support their software, uncompensated. But if we have this contract, then it cuts both ways. The users then have entered into a social contract of their own: they agree to treat me with respect when they deal with me, to not act entitled, to not demand things of me, to not be rude, and to do their part in being a helpful, productive partner in helping to solve any issues they report.
If a user breaks their part of the contract, then I have no obligation to fulfill my side of it.
It's a bit bizarre to me that non-maintainers have decided to invent some sort of "social contract" that benefits them (while putting a sizeable burden on maintainers), but seem to think that they aren't entering into a social contract of their own when they decide to use the software. (And that there are consequences for not upholding the user side of the social contract.)
Put another way: in contract law (in the US, at least), there's the concept of "consideration". It's the idea that both parties are getting something out of a deal. Some of that can be monetary, but it can also be other things. If a contract is one sided, that is, if one party isn't getting any consideration, then the contract can often be unenforceable.
That seems to be what people like you are doing here: requiring that open source maintainers enter into a social contract, but not give them any consideration in return for it. (And no, some sort of ill-defined concepts like "reputation" or "large user base" don't pass my threshold for meaningful consideration.)
That's one more thing, even: contracts are voluntary. All involved parties must agree for there to be a contract. I don't agree to your bullshit contract of one-sided obligation, so there is no contract.
Comment by FooBarWidget 3 days ago
Comment by tptacek 3 days ago
Comment by jasonvorhe 3 days ago
Comment by tptacek 4 days ago
Comment by Eufrat 4 days ago
Selling a toaster has an implicit warranty of merchantability. Society expects that if you sell me something, it should have certain promises. Yes, there’s no monetary exchange here, the work is given gratis, but there’s still a relationship and an interaction here and I think it is clear some people, like myself, believe that there are implied expectations. Just because it is “free” doesn’t mean it allows one to have a seemingly psychopathic attitude on the matter. It doesn’t absolve people of societal obligations.
I read that article by Mike McQuaid and I don’t get the impression that, “Yes, project maintainers should be allowed to run projects as they see fit and they put up with a lot of drive-by insults and hostile users. You don’t understand how hard all of this is and I’m doing it for free.” I get, “I hate my users and you should be grateful that I give you anything.”
Comment by mikemcquaid 4 days ago
The selling metaphor doesn’t work. Homebrew is not sold and its license, effectively a EULA, discloses all warranties because it is not sold and we are not paid a wage to build it.
I have also built a bunch of proprietary software for money where my obligations are different. I also enjoy that and my responsibilities differ there.
Users should be grateful that they are given anything. We do not get anything from their use. For the vast majority, it is a one way relationship (contributors excluded of course).
If they don’t like the choices made by me or the project: they can fork it. They won’t, though, because the closest friend of entitlement is laziness. They can use Nix or MacPorts instead which may be a better fit for them and, if they are not contributing, does not disadvantage Homebrew.
Comment by Eufrat 4 days ago
If you don’t mind me probing a little further, what is the motivation to work on it?
> they can fork it
I get that, but I also think it is too pat a narrative at the same time. I think the success of the project is both a testament to the effort that you and the Homebrew team have put into it. It is also an example of just how much effort any project really takes; this stuff doesn’t set itself up nor do all the patching required to make sure things behave as well as they do.
Comment by kelnos 4 days ago
Not the person you're replying to, but I do it because it's fun. Programming is a passion of mine, and has been a part of my life since my dad gave me a book on BASIC when I was 8 years old. I love solving problems with code. Giving it away as open source is, in a way, philanthropy to me, with the hope that at least some of the things I create are useful to others. There's also a bit of a "political" aspect to it, in that I think it is bad for society for all useful programs to be locked up in proprietary software, making everyone dependent on profit-seeking corporations (whose interests and incentives are often hostile to their users) to provide the software they need to use in their daily lives. My work is a small contribution to combat that.
That joy I feel hits a wall when I run into an entitled, lazy user who thinks that I owe them something more than what I've already given. If most users were like that, I just wouldn't do it. Or at most I'd do it, releasing under a pseudonym, and have no public issue tracker, pull request mechanism, or public contact information. That would make the project worse, not better, of course, but the most important thing to me is my mental health and my happiness. If that's selfish, so be it.
> > they can fork it
> I get that, but I also think it is too pat a narrative at the same time.
I'm not sure what you expect someone to do with that statement. So what if it's "too pat"? That's the reality of the situation. It's the maintainer's way or the highway. If you don't like it, then open source has a truly wonderful escape hatch that proprietary software doesn't: you can fork and go your own way with it.
Many open source communities have problems, certainly, but I think many of the better ones are the some of the closest things we have to true meritocracies. If you do the work, and the work is good and valued, you get a say. If you don't, you don't. And yes, I would say "providing good, helpful, actionable feedback" can be part of "doing the work", so people who don't write code can have a say, depending on how well they are able to provide value to the process. But people who just want to take: no, they don't get a say, and that's exactly how it should -- and must -- be.
Comment by mikemcquaid 3 days ago
I use my words carefully: I don't "owe" my users anything but that doesn't mean I don't "give" them anything. It's charity as opposed to taxes; I do so freely and on my own terms rather than obligation.
On forking: yes, it's a lot of work and forking would also be a lot of work. That's exactly the point. Lots of people over the years could have forked Homebrew but no meaningful forks have taken off because those who are most dissatisfied with our decisions are least willing to do the work to solve these problems.
Hope that clarifies. Thanks for the polite discussion :)
Comment by nl 3 days ago
No there isn't.
Pay money and there's a contract.
Anything else is in your head.
Comment by wang_li 3 days ago
Comment by 0123456789ABCDE 3 days ago
anyway, to the gist of this reply: you disagree with the license conditions. an important, but rather obvious, observation to be had is that, the rights the LICENSE offers, are contingent on your acceptance of the LICENSE conditions. one cannot be had without the other.
the LICENSE is real, it's a contract, and is in effect the moment you obtain a copy.
> You can’t [un-]license your way out [of the] liability if you [copied], formally or informally, [wares] that [you have no rights to, because you disagreed with its license conditions].
Comment by wang_li 2 days ago
Regardless, contracts are not required for reliance interest to apply.
Comment by 0123456789ABCDE 1 day ago
you're right, in case law exchange of considerations matters, and licenses are treated as rights grants. however, civil law does not care about considerations, and use of the object implies consent.
but that is irrelevant to our thread, because whether you breach the terms of the contract, or violate the terms of the rights grant, the different legal systems seem to have arrived at the same conclusion: it is copyright infringement
> Regardless, contracts are not required for reliance interest to apply.
was hoping that including "some other evidence" would be enough to avoid that comment
Comment by nl 2 days ago
Reliance requires an exchange of value. If you get something for zero value without a contract it's a gift under US commercial law.
You need to provide citations if you keep insisting otherwise because every open source licence relies on this
Comment by akerl_ 3 days ago
Comment by nl 3 days ago
Comment by jasonvorhe 3 days ago
Please continue.
Comment by tpm 4 days ago
Why would you think this is worth mentioning here?
Instead of explaining, just try to do something, that people actually use, for free, in the open, for some time. It doesn't have to be software, can be work for a nonprofit or a charity etc. I'm sure you will be enlightened.
Comment by Eufrat 4 days ago
Comment by tpm 4 days ago
Comment by kelnos 4 days ago
Comment by mgfist 3 days ago
Would you continue volunteering if the beneficiaries spat in your face and cursed you out for it?
Comment by lemonad 3 days ago
Comment by tptacek 3 days ago
Comment by 0123456789ABCDE 3 days ago
Comment by celiacFun 3 days ago
Comment by kelnos 4 days ago
Because I don't. It's that simple. There is nothing that says I have a responsibility, and the license I release under even makes it clear and explicit that I have no responsibility. So I don't.
If you are going to claim that I do have a responsibility, then the onus is on you to present some solid, convincing, extraordinary evidence or argumentation to support that. And you haven't succeeded in doing so.
> Selling
That's part of it, right there. If I sell my open source software, then yes, I may have created an implied warranty of merchantability, even if my license disclaims that.
But if I haven't sold it to you, then no such warranty or obligation exists.
> Yes, there’s no monetary exchange here, the work is given gratis, but there’s still a relationship and an interaction here
So you admit that, but seem to ignore the idea that there's a difference between selling something and giving it away for free. I fundamentally disagree with that. If I give away something for free, the person accepting it has zero claim on me or my time. If I sell something, then there's some claim there, depending on the terms of sale that we both agreed to before I took payment.
> It doesn’t absolve people of societal obligations.
This is something you've invented out of whole cloth. There's no societal obligation to maintain something (for free) that you've given away for free. And on top of that, there's no societal obligation to deal with demanding, entitled, sometimes angry people, who want more of your time for free.
Let's actually look at it from a paid perspective. Let's say I release some software (open- or closed-source; I suppose the distinction doesn't matter for this example), and also offer paid support for that software. Some people use it without paying for support, some people pay for support. Let's say some of the people who are paying for support are demanding and rude when reporting issues and asking for fixes. Even then, I still don't have to put up with it. I can "fire" those customers if I want, either by cancelling and refunding their remaining support contract, or by deciding not to renew them when their current contract runs out.
I don't think anyone would reasonably require a company to continue to have a business relationship with a customer that is causing too many problems for them. I think the reason we are fine with this concept is because there's a remedy that gives both parties something: if we refund the customer some portion or all of what they've paid, we consider that a reasonable way to terminate that relationship. With gratis open source software, there's no such monetary arrangement, so it feels a fuzzier what the author-user relationship even is. But to me, this makes an even stronger case for the idea that open source maintainers have no obligations to their users, aside from any that they voluntarily take on, and can also decide to terminate at any point they like.
Comment by jodrellblank 3 days ago
Just ordinary evidence. If there was a charity event which asked for a volunteer to organise drinks, and you volunteered, and then there were no drinks, and you said “I don’t owe you anything stop being entitled, if you want an event with drinks you can fork the idea and organise your own”, people would be unhappy and reasonably so. It’s not that you had a legal obligation to do that work, it’s that you told everyone you would and that stopped other people from doing it.
If rsync had no maintainer and someone publicly offered to take it on and maintain it, that would also block other people taking that spot. It stops people investing time effort and money into a fork or replacement to an abandoned project. If the volunteer then either didn’t do anything or wrecked it and said “I don’t owe you anything etc.” that would be bad in a similar way.
If you want to be able to tell people you are the maintainer, that the thing is maintained, and you get to control what happens to a widely used project, you can’t really stand by the position “why did people expect me to maintain it? I only told them I would maintain it, why would they believe me, that’s not fair”.
Make it clear that it’s abandonware and has no maintainer, and you can totally uphold the “not my problem, says so in the license, deal with it” position. But if your thing becomes popular then you should expect a company like RedHat to fork it into ‘redsync’ and run it their way as their project, not look to you as ‘upstream’ and sideline you completely. Which is what a lot of open source people say they want but don’t behave as if they want that. Probably because there actually is some prestige and power and status and reputation involved, even though people try to claim there isn’t.
Comment by tptacek 3 days ago
Comment by jodrellblank 3 days ago
Comment by tptacek 3 days ago
Comment by jodrellblank 2 days ago
And
“This is maintained and I am the maintainer”
Are different states. 'Maintenance' is not work-free or effortless, so the second sentence is explicitly volunteering to do some non-zero amount of work, right?
I don't see how it can be read any other way, you either have to argue that maintenance isn't work, or that "I am the maintainer" is not volunteering oneself into the role of doing that work.
Comment by nl 2 days ago
It's like a business asking for volunteers, you saying you will, then the business demanding that you turn up when it suits them and you not being allowed to say "no"
It's an outrageous position to take.
Comment by akerl_ 2 days ago
Comment by jodrellblank 2 days ago
If you put a note on the public noticeboard saying "I have planted some things in this area of the public commons and I am the maintainer them" can you defend the idea that you are not voluntarily offering to maintain something?
Comment by akerl_ 2 days ago
Comment by jodrellblank 2 days ago
1. The difference between ‘abandoned’ and ‘maintained’ is that ‘maintained’ is bounded at the lower end to a greater-than-zero amount of maintenance work. Not a specific amount but necessarily >0. (Without that, “maintained” and “abandoned” become the same thing and that’s absurd).
2. “I am the maintainer” can be a voluntary statement, it’s not compelled (e.g. by a gun to the head).
3. The role of ‘maintainer’ is ‘doing that >0 amount of maintenance work’.
?
By the time we’re arguing how much maintenance, you’re agreeing with my position. In the case of your garden, if I saw it on fire I would think it reasonable to contact you about the fire given you are the gardener. I wouldn’t contact someone who was not the gardener.
Comment by akerl_ 2 days ago
That one is pretty obvious because community gardens that want to enforce a floor on amount of maintenance include that in rules that you have to agree to before they give you some of their space.
I checked the whole terms of service for GitHub and they don’t have anything about how much work I have to do on a repo once I publish it for it to stay mine.
If you’re asking me which of those statements I disagree with, 1 and 3.
Comment by jodrellblank 2 days ago
> "You are responsible for keeping your Account secure."
That is a non-zero amount of work.
> "You may not use GitHub in violation of export control or sanctions laws of the United States or any other applicable jurisdiction"
That requires you to be aware of those laws and put a non-zero amount of work into complying with them.
> "You will promptly notify GitHub by contacting us through the GitHub Support portal if you become aware of any unauthorized use of, or access to, our Service through your Account,"
That is a commitment to do some work.
> "For contractual purposes, you (1) consent to receive communications from us in an electronic form via the email address you have submitted"
That is a commitment to have a working email server/account.
If you don't do these things at times which are required, Github may close your account and your repo will go with it.
> "If you’re asking me which of those statements I disagree with, 1 and 3."
And on what grounds do you disagree? That "I will do something" is not saying that you will do something, or that "letting something rot" counts as "maintaining it"?
Comment by nl 2 days ago
But it is a great example of the social contract!
If you fail to keep your account secure you lose your account.
If you don't maintain your project then someone forks it.
That's the only social contract.
Comment by akerl_ 2 days ago
Comment by tptacek 2 days ago
Comment by jodrellblank 2 days ago
What I'm getting from you and akerl_10 is "la la la I don't want words to have meanings so I'm just going to deny that they do".
Comment by tptacek 2 days ago
Comment by jodrellblank 2 days ago
The work need not be “for me” and nowhere did I say it was or ought to be.
Comment by tptacek 2 days ago
Comment by cap11235 3 days ago
Comment by wolletd 4 days ago
Well, then maybe it's already overdue to find a new maintainer for the project and let someone else continue it? The tool will not get better from someone working on it who doesn't want to.
Comment by monooso 3 days ago
> Luckily I’ve been joined by some other very good developers with great systems development skills and security knowledge... Watch out for some credits for some great new rsync developers in the next release.
Comment by kelnos 4 days ago
Comment by wolletd 4 days ago
That's my impression from that sentence, at least. Don't you agree?
So, why didn't he do it? Because just firing up Claude and let it rip is way easier than finding real people and building up trust?
Did Claude increase bugs in rsync? Or did Claude just gave some basically retired programmer, who doesn't even want to work on his project anymore, the impression that he can replace finding a successor with just handing it to AI?
Comment by lemming 3 days ago
Based on Tridge’s post, this seems an unfair characterisation of how he used Claude.
Did Claude increase bugs in rsync?
TFA answered this, the answer is “no”.
Comment by rossjudson 3 days ago
- generally decide to fix security issues over preserving compatibility - rewritten an aging test suite in what appears to be a highly responsible way - brought on additional qualified developers to help with the workload
Not bad for a guy who's retired.
You care enough to complain on HN. You could be a part of the solution.
What were you going to do differently, specifically?
Comment by aseipp 3 days ago
No. Given a choice between doing laundry and driving Lamborghinis, I would probably choose the latter. But I still have to do my laundry. I might use a washing machine to do so. It's just a responsibility among many responsibilities. It isn't that deep, really.
The reality few people want to admit is that maintaining open-source software is often closer for many people to "doing laundry" than like, being the software equivalent of Atticus Finch.
> Or did Claude just gave some basically retired programmer, who doesn't even want to work on his project anymore,
The only thing Claude has "done" apparently is give a bunch of annoying people online a license to engage in armchair psychoanalysis of someone they don't know at all, from what I can tell.
Comment by prmoustache 3 days ago
He doesn't have to do that. If he ever do not care enough he can just stop maintaining it and that's it.
Comment by duskdozer 3 days ago
Comment by gverrilla 3 days ago
Comment by rossjudson 3 days ago
The person owning the project is using the master branch in the way he sees fit.
Incidentally, there is no amount of communicating "correctly" that quells a mob. There's a Venn diagram of concerns, and those with concerns not being met will generate (now infinite) outrage.
Comment by rendaw 4 days ago
Comment by ekidd 4 days ago
You can avoid this overhead if you use a language that forbids reading from uninitialized memory, but C is not that language.
Comment by kelnos 4 days ago
For some uses, you do genuinely need (specifically) zeroed-out memory before you start to use it, and that's where calloc() is truly useful. But that need not have anything to do with security.
[0] The allocator will often hold onto memory that has been freed in order to quickly service future requests for new allocations, without needing a context switch into kernel space.
[1] Granted, the correct way to handle that is to zero it out before freeing it, in a way that the compiler won't optimize out.
Comment by ekidd 3 days ago
There are at least two different ways in which memory might be semantically "uninitialized":
1. The memory was provided by the OS. On modern desktop and mobile OSes, this memory will normally be zeroed automatically. 2. The memory was provided by the language's allocator. This may contain a mix of data used by previous allocations and memory that has never been touched (perhaps because previous allocations reserved it as end-of-array "capacity" that never got used). From the perspective of a language like Rust, this memory is considered uninitialized, and safe code should never be able to read it without first setting it.
In ancient C code, it makes a fair bit of sense to preemptively calloc everything. Or better, to wrap the allocator with one that zeroes on free. Though even there, you need to be careful not to expose recycled heap block headers in the middle of newly allocated objects.
My opinion for the last 30+ years has been that C is unfit for purpose, and that using it almost inevitably introduces large numbers of dire security holes. But until the last 10-15 years, there hasn't been any seriously viable alternatives.
Comment by whateveracct 4 days ago
Comment by lokar 4 days ago
Comment by gravypod 4 days ago
I wonder if the data looks worse or better when not doing per-10commit and instead do per-commit.
Comment by echelon 4 days ago
Start with unsafe then gradually convert into idiomatic Rust.
Comment by yubblegum 4 days ago
Comment by kajaktum 4 days ago
Comment by klabb3 3 days ago
That said if something like rsync was written today, I still think Rust may be a better choice. Mainly because a 95 percentile skilled Rust programmer is less dangerous than for C. The people that are skilled enough to be trusted with C are few and diminishing every year.
Comment by bryanrasmussen 4 days ago
LLM: this commit changes whole codebase to Rust!
Comment by whattheheckheck 4 days ago
Comment by globalnode 4 days ago
Comment by bwfan123 3 days ago
Elon announces that spacex wrote its AI software in C. And now suddenly, C has become the new (old) kid on the block. Now we have folks saying, lets redo this in C as it gives you full power over the machine since we are 10x engineers. Earlier it was rust this or rust that. So, fads work both ways.
Comment by CaliforniaKarl 4 days ago
No.
The reversion commit references https://github.com/RsyncProject/rsync/issues/959. In that GitHub issue is this comment:
> The change to zero memory was my idea and my change. It was a reaction to a security report I got which caused use of an element past the end of an array. By zeroing the allocation I could ensure that misuse of that memory if a similar bug came up in the future could only cause a null ptr deref, which is better than the chance of a valid pointer.
> It got a claude co-authored tag on it as I got it to do some tidy ups of a series of commits, and that is just what it does when it makes any modification. It doesn't mean the change was written by claude. It was written by me.
Comment by scottlamb 4 days ago
I wouldn't assume Claude made that decision; it's not as if that was some incidental thing that it snuck into a large commit. The commit message starts with "zero all new memory from allocations", and that's exactly what the commit does. What do you imagine the prompt was?
It seems totally plausible to me that a human initially thought this was an improvement, then rethought after discovering the RSS regression. And it's not a law of nature anyway that this change has to increase RSS; calloc could special-case the case in which memory was freshly returned from the OS, knowing fresh memory mappings are zeroed anyway.
I blame AI for these regressions mostly in the sense that it caused a flurry of vulnerability reports. Those led to a flurry of quick fixes. Sometimes quick fixes cause other problems.
Comment by delusional 4 days ago
> The change to zero memory was my idea and my change. It was a reaction to a security report I got which caused use of an element past the end of an array. By zeroing the allocation I could ensure that misuse of that memory if a similar bug came up in the future could only cause a null ptr deref, which is better than the chance of a valid pointer. It got a claude co-authored tag on it as I got it to do some tidy ups of a series of commits, and that is just what it does when it makes any modification. It doesn't mean the change was written by claude. It was written by me.
https://github.com/RsyncProject/rsync/issues/959#issuecommen...
Comment by jagged-chisel 4 days ago
How does that prevent reading past the end of the buffer? Or change how bytes outside the buffer are used? Are these arrays of pointers so that the “null ptr deref” comment makes sense?
Or am I the bozo and don’t know what’s happening here?
Comment by kccqzy 4 days ago
Comment by jagged-chisel 4 days ago
edit: removed unnecessary examples
Comment by davrosthedalek 4 days ago
Or it's an allocation for an arena? The zeroing might help trigger 0 derefs earlier if the overrun happens for the object that are then allocated in the arena (and not by allocating more objects than the arena can provide)
Comment by Shish2k 2 days ago
Comment by wzdd 3 days ago
You’re not a bozo but it is helpful to read the code.
Comment by GodelNumbering 4 days ago
I hope if this doesn't come across as unkind towards the dev who gives their time and energy to the project. Grateful for that.
Comment by scottlamb 3 days ago
I've said "rebase onto <newbase>" and let it handle all the merge conflicts. I wouldn't expect this particular commit to conflict with anything, but it could have been part of a big series where it'd be worth doing that instead of running the rebase command yourself. It wouldn't surprise me if I picked up some Co-Authored-By:s along the way.
Comment by wzdd 3 days ago
Comment by tom_ 4 days ago
(My own view: 10.8 GB is nothing these days. Your sprintf buffers are probably larger than that. (And if they aren't: they should be. That, or you should start using snprintf...))
Comment by baq 4 days ago
Comment by bruce343434 4 days ago
If you pass NULL as the destination pointer, it doesn't write any string. If you combine this with %n at the end of the format string, you can get the exact length that the output string would be. Then you allocate that, then you print again, into the actual destination buffer this time.
Comment by baq 3 days ago
Comment by bruce343434 3 days ago
Comment by KaiShips 3 days ago
Comment by alfiedotwtf 4 days ago
Comment by aesthesia 4 days ago
- The release with the highest number of attributed bugs is the release _right before_ the first release with Claude-coauthored commits, released in January; is there a chance that unattributed LLM-authored commits made it into this release?
- The release attribution methodology is not great, since it will tend to attribute bugs introduced in a minor version update to the longest-lived patch release of that minor version. I doubt that 3.4.1 actually introduced a lot of bugs, but since it was released a day after 3.4.0, bugs that were introduced in that release get attributed to 3.4.1.
- Relatedly, more recent releases have had less time to have bugs filed against them, so there may be a bit of a bias toward evaluating recent releases as less buggy.
Comment by theteapot 4 days ago
> Here's my favorite part, though. Digging into the data, one of the first things that jumped out at me with blinding clarity was that the worst release, by far, in rsync history was entirely prior to the introduction of Claude ... And yet nobody noticed.
Language really does suggest the article's author does have a dog in this fight and is cloaking opinion in fancy statistics jargon. "Blinding clarity"? All you have to do is draw a plot. And anyway, v3.4.1 was 2025-01-16, technically well within the AI assisted coding era and before attribution was becoming standard practice.
Comment by iandinwoodie 4 days ago
> "Claude clearly made things worse" &emdash; the main claim
This article was clearly generated by AI, yet I found no mention/attribution of that by author.
How likely is it than someone who vibe codes articles would also vibe code the underlying analysis and be eager to accept an outcome that is highly validating of that person’s workflow? I’d say very.
Comment by yorwba 3 days ago
Comment by nerevarthelame 4 days ago
> "The scripts used to fetch the data, collate it into a DuckDB database file, construct the views on that DB, and then do the statistical analysis on that data, were indeed written by GLM 5.1, as was the HTML and much of the original prose for the final report webpage you're looking at right now."
Comment by davrosthedalek 4 days ago
So rewritten in his own voice. Maybe the m-dashes are from GLM, maybe from the author.
Comment by int_19h 4 days ago
Also, humans do use em dashes, just FYI.
Comment by latexr 4 days ago
Data without interpretation is irrelevant, and correct numbers can be interpreted wrongly, either on purpose or by mistake.
I’m not saying any of that happened here, only that “are the numbers wrong” is not the only thing that is relevant.
> humans do use em dashes, just FYI.
Your parent comment is not complaining about em dashes, they are pointing out the article has a literal “&emdash;” in it.
Comment by davrosthedalek 4 days ago
And the author discussed the use of AI pretty exhaustively in point 0 of the post.
Comment by OptionOfT 4 days ago
I've seen plenty of code that was LLM generated but the commit message itself did not have the co-author attached to it. This only seems to happen when someone's interface to the codebase is completely though Claude/Codex/..., and those are usually the most verbose commits, and yet they say the least, because they just summarize the code changes, not the why.
On the other hand I've seen developers using Claude as a tool. They have VSCode open and a terminal window with Claude and go back and forth, ensuring they write correct code, and leave the plumbing to Claude.
So maybe the author of the code started off small and it grew over time?
Comment by hparadiz 4 days ago
I have been experimenting with both aforementioned styles with interesting results.
Comment by duskwuff 4 days ago
You might be surprised. C applications which interact heavily with the system - like rsync - can be tricky to test comprehensively, as it's nontrivial to inject faults into system calls. If the application is architected to support this kind of testing, or uses a HAL, that may make matters easier - but an older codebase like rsync probably isn't.
Comment by cyanydeez 4 days ago
It's amusing. It's not terrible, but tests arn't going to save you from a malicious tester.
Comment by logicprog 4 days ago
Which brings me to my overall response, which is that there is absolutely no evidence, and nothing even intimating this hypothesis, that LLM commits were secretly being added to earlier releases before they were attributed, and that's why the rate of bugs is higher. There's no reason to think that it's an unreasonable thing to think, and there's no evidence for that whatsoever unless you beg the question and assume that higher bug counts must automatically indicate AI involvement, which is just circular reasoning. You're essentially just making up a hypothesis out of thin air to preserve your point.
Regarding your third point, that one's fair, but I've done the analysis and I can put it up if you want, as to how long it usually takes to find bugs and how far through the release cycle we are for each version.
Comment by aesthesia 4 days ago
Regarding unlabeled LLM-authored commits, I don't think it's unreasonable in general to think that an open-source project might have had unlabeled LLM-authored commits at some point before 2026. Looking more closely at rsync's recent commit history, I think it's less likely in this case. There's just a low number of commits in general, _until_ large batches of Claude-authored commits start showing up early this year. But this then raises some questions about the bugs-per-commit metric; it does correct for something like "size of release", but also obscures a significant shift in commit velocity that may be downstream of adding LLM development tools to the workflow.
Like I said, I don't have a dog in this fight, and I try not to approach sorts of questions from a position of explicit advocacy. I do think it's an interesting question, though, and we should try to understand what the data is actually telling us.
Comment by jonquark 4 days ago
All code is technical debt.
If rsync releases used to have 500 lines changed and 5 bugs in and AI-powered rsync releases have 50000 lines and 500 bugs, it's the same bugs/line but much worse experience for the user?
I've not looked into the details of this case and I do use AI assistance coding at work but in my experience, the problem is that it's too easy to write lots of code and therefore hard to review the huge volumes of code and this analysis will ignore that?
edit: actually your table shows there weren't unusually large numbers of commits in this release, so perhaps my initial skepticism shows a bias I have?
Comment by tolciho 4 days ago
Comment by hariseldom 4 days ago
Comment by PunchyHamster 4 days ago
Comment by logicprog 4 days ago
I really think this a much better standard of evidence — limited though it is — to outrage-fueled cherry-picked anecdotes, which is what has been driving this whole thing. If you disagree, and think the outrage should go one when I've shown there's an absence of evidence entirely for it (although of course, that's not evidence of absence; maybe I'll have to eat my words 5 releases down the line, but appealing to that now feels like a Russell's Teapot), would you care to explain why?
Comment by ofjcihen 4 days ago
Comment by logicprog 4 days ago
Comment by PunchyHamster 1 day ago
> that are actually valid, safe, standard, and useful to do on such low amounts of date,
if you presented paper with that amount of data points you'd be laughed out of the room
Comment by runarberg 4 days ago
This analysis showed that there is indeed an absence of evidence, but it concludes there is evidence of absence.
Traditional p-hacking is done by oversampling and overtesting. If you do 20 analysis on average one will show p < 0.05 by random chance. This analysis is doing the inverse of that. Under-sampling, and concluding with p > 0.05
Comment by logicprog 4 days ago
I tried pretty hard to avoid saying that, can you point me at how to rephrase? The point I'm trying to make is just that there is absolutely no evidence at all for what people are saying with such absolutism and claimed objectivity (that Claude made rsync worse), and thus it doesn't justify the outrage.
> Under-sampling, and concluding with p > 0.05
How would I avoid under-sampling here? And if you're going to say it's because I only have 2 data points, well, the side making the positive claim — that Claude made rsync worse — only had two as well, and unremarkable ones at that, as I've tried very hard to show.
Comment by runarberg 4 days ago
> With a p-value of 74%, the answer is a decisive no. The odds ratio is 1.06 — essentially 1:1. Claude releases are no more likely to be above the median than any other releases.
are problematic in this context as the correct conclusion here is you just don‘t have enough data conclude whether or not you are more likely to encounter a bug after a Claude commit.
> How would I avoid under-sampling here?
You don‘t. You admit that you don’t have enough data and move on. What you are trying to do here is prove a negative, which is extremely hard to do. In your discussion you claim that the users complaining had no right to, however nothing in your analysis showed they were wrong. We simply don‘t have enough data (yet) to say either way. When we have enough data they may be proven right or wrong, but until then, we cannot conclude either way.
If you insist still, I recommend looking into bayesian analysis. Theoretically at least the posterior distribution from a bayesian analysis can be interpreted directly and analyses on its own merits. However I suspect your posterior will have way too much uncertainty to reach any conclusions.
Comment by logicprog 4 days ago
Comment by runarberg 4 days ago
That you found a single pre-AI release which did not cause outrage is proof of nothing. This single release is equally anecdotal, and statistically insignificant.
So, the biggest context that is missing here is that people hate AI for various reasons, and they don‘t want their favorite tools to fall victim to AI for equally many reasons. It is only natural that people who hate AI react this way when they find out their favorite tool uses AI, and doubly so when they sniff correlation between their favorite tools use of AI and bugs.
> I'm just trying to say that these specific releases are unremarkable, and there's no evidence at all of harm currently.
Well, there is no evidence against harm either. But what you did here is a bit of a slight of hand. In your analysis your null hypothesis is: “There is no difference in bug count between releases which includes code commits from Claude Code and releases which don‘t”. (You then go about doing what every psychology major is taught not to do; find evidence for the null hypothesis, not against it). However what hypothesis testing is for is to use a representative sample to generalize over a wider population. You do hypothesis testing because you want to demonstrate that your sample is representative of a wider population, that you just so happened to have picked the two sample, by random chance, which shows the effect regardless of the experiment.
By calculating the p-values you were telling me that you were in fact ready to make generalizing statements over a wider population of commits, but your results were statically insignificant, so really you should not draw any conclusions from them. You have not, in fact, shown that they aren’t different from the rest of the population.
Comment by wzdd 3 days ago
This was actually the convincing one for me though. “Did AI increase the rsync bug rate? Dunno, can’t tell yet” seems like a fine conclusion to me. Plenty of people in this thread and previous ones on the topic seem convinced one way or another, so it’s nice to see actual numbers.
Comment by runarberg 3 days ago
I think in this era of scientific literacy people tend to overcorrect in the absence of evidence. Anecdotal evidence still evidence though, and people are right to react to them.
If we remove our frequentis hats and put on our baysian hat (which is a wise thing to do when n is very low) we can take into consideration evidence from multiple direction at the same time as we upgrade our belief. A baysian might start with the prior that claude assisted commits have the same distribution as non-assisted commits. I would start with a Poisson distribution as my prior, and then they would factor inn all the evidence of AI slop they have seen in their lives and update their posteriors accordingly. Claude caude has been wrong about so many things in the past, which should contribute to a smaller lambda then the control group.
Comment by xmddmx 4 days ago
The ELI5 version is that there are two mistakes you can make when looking at a P value:
Type I error, where your P value is falsely low. In the experiment being discussed here, it would lead one to conclude that AI code is worse. Otherwise known as a false positive.
Type II error, where your P value is falsely high, leading you to conclude that AI code is no different. Otherwise known as a false negative.
https://en.wikipedia.org/wiki/Power_(statistics)
One can calculate statistical power for a given experimental protocol.
My hunch is that if you did this, you would find this experiment is grossly under-powered.
This means you can't make the "absence of evidence" claim.
Comment by davrosthedalek 4 days ago
Comment by xmddmx 4 days ago
In an underpowered statistical study, a claim that two experimental conditions did not differ are not persuasive.
Comment by davrosthedalek 4 days ago
The claim is not "two experimental conditions did not differ". The claim is "The data do not show evidence that the experimental conditions did differ".
Comment by xmddmx 4 days ago
Of course the critical part is not the numbers, but what they mean.
So, what does the evidence mean?
The author interprets it to mean that there is no difference. They state this several times:
"46% EXACT PERMUTATION TEST P-VALUE (ONE-SIDED, H₁: CLAUDE MEAN > HISTORICAL)[...] What this p-value tells us is There's nothing unusual about the Claude group."
"74% ONE-SIDED P-VALUE (H₁: CLAUDE MORE LIKELY ABOVE MEDIAN) Fisher's exact test asks: if we split all releases at the historical median (0.74 sev/10c), are these Claude releases significantly buggy than previous releases (more likely to land above the median)? With a p-value of 74%, the answer is a decisive no. "
In an under-powered study, when a P value is above your alpha level cutoff (.05, .01, whatever was chosen) you can't distinguish between "no effect" and "could be an effect, but I didn't see one".
Comment by davrosthedalek 4 days ago
The two examples you bring are not claims of absence of evidence, but claims of evidence of absence. The author takes the result as evidence that there is no effect. As I wrote, the author shouldn't do that, because indeed you cannot distinguish between "no effect exists" and "no effect observed". But again, these are (wrong) claims for evidence of absence.
The author can absolutely claim: I did these statistical tests, and none showed evidence that there is an effect. Absence of evidence. It's not a claim that there will never be evidence. Just that there is none from these tests.
Edit: To convert the absence of evidence into evidence for absence, indeed you need to understand the statistical power of your test, and how it is affected by alternate hypotheses. And for that, without having done the math, having only two data points seems very thin.
Comment by its-summertime 4 days ago
Comment by kelnos 4 days ago
Comment by xmddmx 4 days ago
TFA is defending the use of AI, and it very clearly (to me) used AI to analyze the data and present the results.
In doing so, the author used statistics in a way they do not appear to understand, and ended up making numerous false claims (you can see the thread discussing these here https://news.ycombinator.com/item?id=48417626 )
In short, the study doesn't have sufficient statistical power, and is making "no difference" claims that aren't justified.
The meta-irony is this: the author used an LLM to interpret data in this study, and seems to have made the same category of mistake (confidently asserting falsehoods) that the study was supposed to be investigating (confidently submitting bad commits to the rsync project).
Comment by simianwords 3 days ago
Comment by classified 4 days ago
Comment by newsoftheday 3 days ago
Religion is about faith and what people feel and sense as much as believe.
Comment by Joel_Mckay 3 days ago
Comment by logicprog 4 days ago
Comment by MichaelDickens 3 days ago
Comment by runarberg 3 days ago
> Did Claude increase bugs in rsync?
>
> TFA answered this, the answer is “no”.
Comment by karagenit 3 days ago
Comment by thorum 4 days ago
Comment by zzyzxd 4 days ago
People need to be responsible for code they commit and push anyways. This has never changed. Whether the code is written by hand, by their cat walking over keyboard, or by AI, is not my concern.
A project's code quality can decline for all kinds of reasons. I don't think it's productive to laser-focus on whether it's produced by AI or not. That's a distraction. If a person just want to find excuse to criticize AI, and another person wants to fight back and defend AI, sure, go for it. But that's not how you would want to assess a project's code quality.
Comment by calvinmorrison 4 days ago
So - why bother forking or going upstream? maybe its selfish. I think publishing the patches are cool but I feel less of a need to force other people into doing what I want or even writing every possible configuration or solution. I just hack it for me
Comment by delusional 4 days ago
Well the GPL (which rsync is licensed under) says: "This program comes with ABSOLUTELY NO WARRANTY" so actually nobody is responsible for anything.
Comment by matheusmoreira 4 days ago
People should be doing this regardless of drama. No reason to provide free advertising for trillion dollar corporations. Generated-by trailers are only relevant when contributing to third party projects, in that case disclosure is polite.
Comment by Aurornis 4 days ago
I don't care about the advertising angle. We all know Claude by now. I want some indicator that AI was used.
Comment by block_dagger 4 days ago
Comment by Aurornis 4 days ago
I don't see a need for an attribution line in this case.
Comment by fragmede 4 days ago
Comment by block_dagger 4 days ago
Comment by fragmede 4 days ago
Comment by vips7L 4 days ago
Comment by ethagnawl 4 days ago
This is fucking insane. How does this correlate with productivity in any way? The results are all that matters, who cares how you got there?
Comment by andrekandre 3 days ago
> The results are all that matters, who cares how you got there?
i actually said this at $JOB to a manager, to which they replied "yes, but in the future all code will be ai generated, so thats the 'results' we are looking for"....Comment by mexicocitinluez 3 days ago
That's what I can't for the life of me figure out. Bad code is bad code regardless of who is writing it. Adding a disclaimer about how it was written is meaningless. Hell, it could say "Written by the Easter bunny" and that would have 0 impact on it's utility.
Comment by agentultra 3 days ago
I think many people in this camp have political or ethical concerns and want to avoid contributing to or supporting the companies behind frontier-AI tools. Or they have moral or technical concerns and want to boycott usage to maintain their principles.
It should be fairly widely known at this point.
Comment by mexicocitinluez 3 days ago
> The value of the Claude attribution is that you can tell at a glance who used AI.
Specifies none of that, which is why I was asking the question.
> technical concerns
Which is exactly why I asked what I did. What technical concerns could possibly exist if the code is good? What does adding that attribution remove or add to technical concerns that you can't already see from the code itself?
Comment by agentultra 3 days ago
I know my personal choice doesn’t make much of a difference but I refuse to own a car. I advocate at my local city council to remove car storage from streets, remove parking minimums, add better transit, make the core of our city car-free. It sometimes feels easier to join in and just accept that this is the way of the world but I refuse to believe in inevitability: building cities for the benefit of cars is a choice.
Maybe some folks want to avoid AI code because they don’t want to make that choice?
I can’t say for them. But I do know there’s no sense pretending like they don’t have a point or feigning shock that someone might not have the same view as you do.
Comment by mexicocitinluez 3 days ago
I asked a simple question: What technical concerns could possibly exist if the code is good?
I made it very clear I wasn't talking about personal, political, or ethical arguments.
> But I do know there’s no sense pretending like they don’t have a point or feigning shock that someone might not have the same view as you do.
Where am I feigning shock? Are you reading the right comment thread before you're replying?
Comment by agentultra 3 days ago
Maybe I was reading too much into this part of your comment.
Plenty of folks don’t separate the ethical, political, or moral from the technology. For them using it is condoning it. Like for me, owning a car is contributing to car culture. It might be inconvenient for me or seem backwards to others but it’s worth resisting. They want to know that something was written with AI so they can avoid supporting AI or condoning its use.
Comment by mexicocitinluez 2 days ago
Comment by matheusmoreira 4 days ago
Comment by Hammershaft 4 days ago
Comment by matheusmoreira 4 days ago
As I said, disclosure is polite when contributing code to third party projects which will undergo human review.
No need for such things in one's own projects.
Comment by Groxx 4 days ago
This can be largely assumed to be true for any open source code. It's kinda the point of open source.
Comment by matheusmoreira 4 days ago
If there's one thing I learned not to do in open source, it's to assume nonsense like that.
Comment by Groxx 4 days ago
Even with coding agents gaining popularity, many humans still look at the code at some point.
Comment by matheusmoreira 4 days ago
Comment by toofy 4 days ago
why do you so many people want to hide who the real author is?
we should be very weary of anyone claiming they’re the author of something when they’re absolutely not. if jon wrote a book and i take credit, that’s shady as hell.
Comment by matheusmoreira 4 days ago
Comment by kelnos 4 days ago
Comment by Barrin92 4 days ago
Comment by matheusmoreira 4 days ago
Comment by Barrin92 4 days ago
because no person can read every line of code written in software they use, or track every commit made to a project. Integrity and authorship matters. If a person lies or obfuscates the origin of what they produce, an article, software, what have you they're doing it for a reason, otherwise they would be honest. That's not prejudice, that's recognizing deceit. And you don't eat fruit from a rotten tree.
Comment by matheusmoreira 4 days ago
Ask Claude to do it for you.
> they're doing it for a reason
And you concluded that the reason was they were pretenders who can't hack it.
That's your prejudice. Not interested in helping you categorize me, thanks.
Comment by kelnos 4 days ago
Comment by matheusmoreira 3 days ago
Comment by codygman 4 days ago
Comment by matheusmoreira 4 days ago
Comment by Aurornis 4 days ago
The tag is helpful because AI authorship is different than the human authorship. When you work with a project or team for long enough you start to trust certain people and their intuition, but when they start submitting AI-produced code you have to reset and review it like AI code.
I use these tools a lot, too. But I want to know where the code came from so I can review it accordingly. The source matters.
> Ostracize us?
I don't know why you're so defensive. If AI wrote the code just be honest about it.
If you outsourced the code writing to some guy named Bob on Fiverr, I'd want to know that too.
Comment by matheusmoreira 4 days ago
Check it out:
https://lobste.rs/s/29pm2f/llm_generated_submissions_should_...
https://lobste.rs/s/ytim7h/collection_small_low_stakes_low_e...
Comment by Aurornis 4 days ago
Comment by matheusmoreira 4 days ago
Comment by kelnos 4 days ago
Comment by matheusmoreira 3 days ago
Comment by eschaton 4 days ago
Comment by matheusmoreira 4 days ago
Comment by eschaton 4 days ago
Comment by int_19h 4 days ago
In the absence of such an entitlement, not volunteering to disclose the tools used is not fraud.
Comment by Ronsenshi 4 days ago
Comment by matheusmoreira 4 days ago
If LLM generates some code but I edit it, does it become my own work? How much editing must be done?
How large is "largely" ? Exactly how many bits of information must come from my fingers tapping the keyboard in order for me to qualify for authorship? Be precise.
If I write something but the LLM polishes it up a bit, is it still my work? Or is it AI generated?
Comment by Ronsenshi 3 days ago
There are some precedents and rulings related to copyright and AI, so we have at least some rubric by which "authorship" can be determined. But when it comes to AI doing polishing of existing code - that is less certain.
Comment by kelnos 4 days ago
I'm not going to define substantive for you. That's something you should feel obligated to research and learn about yourself; anything less is dishonest.
Comment by matheusmoreira 3 days ago
So "consider copyright" isn't really strengthening your position.
Comment by ezst 4 days ago
Comment by matheusmoreira 4 days ago
Comment by kelnos 4 days ago
Comment by matheusmoreira 3 days ago
But you and others in this thread seem hellbent on stigmatizing it to the point you take it as evidence of someone's incompetence. So I'm not at all sympathetic to your "requirements".
That's really all anyone's asking of you: enough respect for your fellow programmers that you avoid pre-judging them. If you can't do that, then what do we care about your "requirements"?
Comment by eschaton 4 days ago
Comment by julianeon 4 days ago
Comment by amiga386 4 days ago
- Sent from my iPhone
Comment by AnotherGoodName 4 days ago
— Sent from my iPhone
Comment by AlienRobot 4 days ago
I use Linux, btw.
Comment by redsocksfan45 4 days ago
Comment by trwired 4 days ago
Comment by eli 4 days ago
And I guess maybe there's no such thing as bad press but at least in this cases it doesn't seem like effective marketing for Anthropic.
Comment by eschaton 4 days ago
Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code.
Of course that fits right in with the use of an LLM to generate code in the first place, since what it’s actually doing is regurgitating its inputs stripped of any license and copyright notice.
Comment by UebVar 4 days ago
In academia this is miss-attribution, outside of academia this does not exist.
This is clearly not not copyright infringement either as LLMs do not claim copyright, nor could they. Just like the photograph taken by the monkey, or pictures drawn by crows. LLM output is not a creative work either.
If this is unethical or immoral is a totaly different question. I really dont think so and I dont think you argue that position well.
Comment by eschaton 4 days ago
It also is copyright infringement, because what the LLM “generates” are actually portions of its training set, which were covered by copyright. Just passing through an LLM does not remove that copyright from that work.
Comment by UebVar 3 days ago
In German and French (roman) legal systems this is a "Vermögensdelikt", and explicitly about material damage and gain. Yes, common law can be more broad (in canada it isn't really, it just also includes service, btw.), and yet it clearly does not meet the definition, as there is a damaged/defraued party and fraudulent/gaining party. We are not talking about somebody usurping somebody else reputation, after all.
You misuse a technical term that is well established since antiquity.
You do not know what this word means. If you want to argue about semantics, look up the definition. This works especially well for legal terms as laws define them.
(That said, IANAL and there are very many different legal systems and I am not ruling out there exists one that is competently different - laws can be changed a will, after all.)
It is also obviously not copyright infringement, because this is simply not how copyright works, at all. I cannot and will explain of all copyright here. Instead I will point this out: Every code produced by a human who read copyrighted code would fall under your definition.
Comment by eschaton 2 days ago
With respect to the former, “fraud” is a shorthand for “fraudulent misrepresentation,” which is what you’re doing when you take someone else’s IP and try to contribute it to a project without securing the right to do so. It can be read as implicit in the attempt to contribute to the project that you have secured this permission (or do not need to, because the work is original to you). Whether the code came out of an LLM or was copied from another project or Stack Overflow doesn’t matter, it’s that you’re misrepresenting the rights you have that’s the fraudulent part.
For the latter, I specifically pointed out that the gain from fraudulent misrepresentation need not be monetary. The gain can be reputational or any other sort of benefit. For example, someone pretending to a fictional person to gain access to a space they otherwise wouldn’t is still committing fraud.
Finally, you’re wrong about whether the output of an LLM infringes copyright of material in its training set. Just running a copyrighted work through an LLM does not remove the copyright on that work if reproduced by the LLM.
Comment by jhack 4 days ago
Should there by attribution for Google or Stack Overflow copy/paste? Who should we bully about this?
Comment by umanwizard 4 days ago
Obviously, and I'm a bit taken aback that anyone thinks otherwise.
Comment by eschaton 4 days ago
They are in fact committing fraud if they do not attribute the code in their commit properly, because by committing it they’re claiming to have rights by virtue of authorship that they do not have. (Namely, the right to contribute that code to the project,.) They may also be committing copyright infringement, depending on the copyright and license status of some code they found via Google or Stack Overflow.
It’s always fascinating to me to see how many people on Hacker News have such extremely poor understanding of how intellectual property actually works, and how misrepresenting themselves or their work can actually have consequences.
Comment by dml2135 4 days ago
Comment by eschaton 4 days ago
It’s clear on its face that LLMs can and do store and reproduce copyrighted works; using a form of (somewhat) lossy data compression. And using a lossy stochastic or perceptual form of compression to reproduce a copyrighted work doesn’t somehow make it not storage or reproduction, otherwise sharing MP3 files wouldn’t be copyright infringement.
Anyone engaging in responsible risk management should assume that anything LLM-generated is infringing until determined otherwise by the courts, not the other way around.
Comment by dml2135 3 days ago
Your interpretation of the law is certainly plausible, but it is clearly not a settled question.
If you really are so confident, go bet on Kalshi and make some easy money: https://kalshi.com/markets/kxnytoai/new-york-times-wins-open...
Comment by Leynos 3 days ago
Comment by eschaton 2 days ago
Comment by infamouscow 4 days ago
Their name being attached to the commit is itself, irrelevant, as their is no way to submit a patch otherwise. You could use a fake name, but you're just moving this fraud problem around.
You're going to have a hard time convincing anyone that using a tool constitutes fraud. Frankly, it's silly, if not genuinely stupid.
Film photographers in the early 2000s routinely called digital "not real photography" and Photoshop "cheating" because you could delete bad shots and fix everything later. Traditional musicians and critics dismissed drum machines, synthesizers, and autotune as soulless tools.
Comment by eschaton 4 days ago
Often this is also spelled out in a project’s contribution guidelines, and some projects have even had more explicit copyright assignment policies they required contributors to agree to, but the lack of such guidelines or assignment policies does not mean the custom as normally observed in the field is irrelevant.
Comment by kelnos 4 days ago
Indeed, and I'm not aware of any (Western, at least) legal system that would consider it fraud to not disclose that an LLM had generated some code.
I'd like to gently point out that your insistence of fraud here is hurting your overall argument, and is causing people to focus on the language you're using, instead of the substance of what you're trying to say. I do agree with you that people should disclose LLM generation when writing commits. But the way you're going about arguing this "fraud" thing is an unproductive dead end.
Comment by eschaton 3 days ago
When you send a patch or pull request to a project, you’re saying (implicitly) that you have the necessary rights to contribute the intellectual property it contains. If you used an LLM to “generate” some of it, that is not necessarily the case.
A similar situation would occur if you agreed to pay someone else to create a patch, and then submitted it under your own name without paying them. Because it’s a work for hire, it’s not yours until they’re paid for it, so you’re fraudulently misrepresenting your rights to that patch to the project. If you did pay the creator, you don’t have to attribute them unless it’s in the contract between you and the creator, or unless the project requires such attribution.
Comment by infamouscow 4 days ago
Comment by Unit327 4 days ago
Setting aside the whole AI = bad argument, let's do a metaphor. Tax evasion is bad and unethical and you should call it out where you see it. But wait, that creates an incentive for people to hide it! So I'd better not call it out, it's best to just keep my mouth shut.
Comment by mohamedkoubaa 4 days ago
Comment by eschaton 4 days ago
Comment by mohamedkoubaa 4 days ago
Comment by Daishiman 4 days ago
Comment by overgard 4 days ago
Comment by potsandpans 4 days ago
Comment by matheusmoreira 4 days ago
Comment by eschaton 4 days ago
If I contributed code to an Open Source project behind my old employer’s back, that would have been bad, because that code was owned by them and not me, even if I wrote it on my own time using my own equipment, because of the contract I signed with them.
If I copied code out of an AGPLv3-licensed codebase and contributed it to a BSD-licensed codebase without telling anyone, that would have been bad, because I did not have the right to change the license on that code to BSD (or change the license on the codebase to which I was contributing to AGPLv3).
If you use an LLM to produce code, you may well be doing the latter since an LLM is actually just regurgitating portions of its inputs. This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license, while claiming they “wrote it themselves.”
Any project that accepts contributions needs to take liability seriously and manage their risk appropriately.
Comment by red75prime 4 days ago
You say you "recognized code". Does it mean that you weren't able to find the exact match?
> an LLM is actually just regurgitating portions of its inputs
You seem to be talking about the inputs to the autoregressive pretraining stage. Correct? Then it's not how LLMs work, unless we use a definition of portions as a "few letters blocks."
Comment by eschaton 4 days ago
The LLM the person used was trained on a very large corpus of Open Source code, and reproduced that code exactly. Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.
Comment by red75prime 4 days ago
Were those functions trivial? With, say, 1% probability of someone who have not seen them writing them like that?
> Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.
Have you read the articles? As far as I remember they fed large chunks of an article multiple times to an LLM to sometimes get a not-so-long exact match. It can mean that LLMs can infer a style and humans are predictable.
Comment by Topfi 3 days ago
So they had to prompt? An LLM? I got this argument before and still don’t get what it’s trying to say. These models do not output anything unless prompted, that’s not any kind of gotcha.
On the code outputting front there is a lot of relevant evidence beyond the NYC lawsuit [0].
If I slightly modify GPL code, that doesn’t give me the right to relicense.
[0] https://arxiv.org/html/2601.02671?amp=&= and https://arxiv.org/abs/2506.12286 and https://ai.stanford.edu/blog/verbatim-memorization/
Comment by eschaton 4 days ago
Comment by red75prime 4 days ago
The possibility of SUV clearly shows that a model does more than "just regurgitating."
Comment by matheusmoreira 4 days ago
Comment by eschaton 4 days ago
Unfortunately, a large number of people are being told—and here, you can see many who believe it—that the output of an LLM either carries no copyright or is copyright by the one prompting it. In other words, even right here on Hacker News it’s widely believed that LLMs “launder” copyright.
Comment by matheusmoreira 4 days ago
Comment by eschaton 4 days ago
Comment by potsandpans 4 days ago
Comment by archagon 4 days ago
Have fun with 1000x more Buns that literally no one is using or maintaining. An entire software industry built on top of a burning garbage pile of crappy, dead code.
Comment by elnatro 4 days ago
Comment by int_19h 4 days ago
That has been the case for the last, oh, decade or so. Where do you think LLMs learned to slop code?
Comment by archagon 4 days ago
Comment by int_19h 4 days ago
Comment by archagon 4 days ago
Comment by int_19h 4 days ago
Comment by potsandpans 4 days ago
Comment by archagon 4 days ago
Consider collecting related thoughts into paragraphs.
Comment by potsandpans 4 days ago
Comment by eschaton 4 days ago
You might consider that there is a very large incentive by the large and public players in this market to promote the idea that this is not true, that they consider themselves large and powerful enough to actually flout the law, and that they plan to use the argument that enforcement will be too damaging to the economy to make their view the “new normal.”
This playbook has been run before, by Uber and Lyft, by AirBnB, by Tesla with “FSD,” and so on. It’s very clearly the approach being taken.
Comment by saagarjha 4 days ago
Comment by potsandpans 4 days ago
Comment by eschaton 4 days ago
Comment by potsandpans 4 days ago
Comment by potsandpans 4 days ago
Comment by automatic6131 4 days ago
Do you have any popular open source projects? Or are you just an Internet gremlin?
Comment by ch_fr 3 days ago
I don't know how to word this in a non-confrontational, respectful way, but this article just feels like ammo for your next "debate with your anti-AI ennemies" where you get to say "look, I proved with data that those people had a disproportionate reaction and have double standards, therefore anyone who dislikes LLMs or their impact are the same!". Like, sorry, I know that sounds really reductive, but this really is the vibe I get when reading this and your other replies where you repeatedly talk about "showing the hypocrisy and double standards".
The global LLM discourse has grown massive, it spans trillions of dollars in promises and investments and affects pretty much everyone, so it's the easiest thing in the world for both sides to just find some people being assholes in the other camp and say "look, here's how [other camp] behaves".
The irrational, extreme, and heinous reactions are partly bandwagoning, and you can go about your day thinking that anyone who reacts like that is evil. But if you wanna dig a bit further, you'll notice that the entire media sphere has been screaming in everyone's ears for a few years now, that they're expandable, low-value-human-capital. All the money in the world (exaggerating a little) is being spent on making sure to remind anyone who opens a computer, opens a website, looks at a billboard, or turns on his tv... that their boss really really really wants to replace them.
Now you'll say that the friendly rsync contributor has nothing to do with any of this and... well yeah he doesn't. You don't need to agree with an emotional response to understand where it's coming from, and even if you're still dead set on considering them "the enemy", then understanding why the anti-AI crowd reacts like that is STILL a positive for you.
Comment by pie_flavor 3 days ago
Comment by ch_fr 3 days ago
These guys on the github thread aren't my friends, I have no concern for them embarrassing themselves or leaving a bad digital footprint by drawing ms paint gore. I also have no concern for OP, but it just so happened to be the post I found, and I just so happened to be in the mood to leave a comment.
Engaging in LLM discourse is already a waste of my time, I'm not going to waste more of it just to avoid fallacious accusations of double standards because I didn't "do the same for the other side".
Comment by simianwords 3 days ago
Your problem is that this was shown? You don't value epistemics -- you care about the ideology more than truth. Even if you don't like AI you should still do it in the right way.
Your comment comes across as more unself aware and more destructive. Lets keep this place truth first and ideology second.
Comment by runarberg 3 days ago
This still leaves the anecdotal evidence. And anecdotal evidence is still evidence, and in the absence of better evidence, it is perfectly rational to react based on the evidence you do have.
Comment by ch_fr 3 days ago
OP spends a lot of time time doing statistics, but when another person replies "hey, 2 claude-authored features is not really statistically significant", the author literally agrees and says "my point was to show that you can't draw conclusions". Direct quote:
> I'm only trying to show there's no evidence for the anti-AI hypothesis
---
And here's another thing. How exactly is it self-unaware to say "Hey, I get your frustration with people being assholes, I'm not excusing it, but it can never hurt to understand why some of them have become so extreme in their vitriol, here's a few reasons for their feelings".
Is "feelings" a curse word or something? What's so wrong with understanding the emotional component of the AI discourse?
Talking about "emotions" is not destructive when the topic at hand is literally people being driven by emotion under a github thread.
What the article is saying is "these people are acting irrational because they're evil and the enemy, so here's how I prove them wrong with statistics!", and my comment to the article was "hey, you seem to go with the assumption that this is all based on pure evil, here are a few reasons why people might get tired, and then angry, about this whole thing".
You quite literally exemplify my point when I said that the analysis is mostly just >ammo for your next "debate with your anti-AI ennemies", it's a tool that allows you to not engage and dismiss any argument as "not on the side of truth because not on the side of the numbers".
All of this even though, and I need to state this again, I never once rejected the analysis or the results that OP came to, all I did was point out that OP is also engaging in "us vs them" think with the occasional "wink wink, CLASSIC AI hater amirite?" sprinkled in the article.
Comment by pie_flavor 3 days ago
Comment by ch_fr 2 days ago
I'll once again have to remind you that all of what I said was just "yeah these guys are really pissed off, here's why they might be" and never once asked any of you to agree with them, or that their conclusions about rsync were right, or anything like that.
You act like "understanding" is only ever a gift/favor you do to someone else. While it's true that in many cases it is, "understanding" can also be something that just helps you and doesn't require interaction or agreement, it's when you think "oh there's no debating with these guys so it's probably not worth it to engage" or "oh these people are talking about X but it looks like the Y underlying issue is the actual problem, so talking about X might be a waste of time".
You completely disregard human emotions as if doing so makes you stronger, but all it does is make you more confused, surprised and angry whenever you're faced with irrational reactions you can't understand.
Now if you decide to interpret all of that as me saying "I am in moral agreement with the dogpiling and the witch hunts shown in that github thread", that is a failing of yours.
Comment by pie_flavor 2 days ago
Comment by ch_fr 21 hours ago
There's nothing more I can say that I haven't already said at this point, understanding that people have emotions (and once again, understanding!= agreement) literally has no downside for you when looking at any kind of discourse.
Call me preachy or whatever, I'm just trying to find a topic of discussion that's not just the usual yelling past eachother: "Large providers have ROI!!" vs "Large providers are in the red!!", not opening that can of worm.
Comment by runarberg 3 days ago
I consider all of the above trivially false untruths. But on the other hand you have rsync users who are fed up with all the lies, all the propaganda, all the fear mongering brought up by literally the richest people the world has ever seen, who have experienced AI slop first hand, who have been tricked by AI music, are fed up by all the AI generated posters at their local coffee stand, have given up on trying to correct their coworkers AI generated code during peer review. And now these rsync users see AI slop has been pushed to their favorite tools, and see a new bugs popping up at the same time.
I consider the latter a completely understandable reaction. Yes they jumped to conclusion, but in doing so they have evidence. They have both anecdotal and circumstantial evidence for their conclusions. In comparison, the AI singularity people have nothing but vibes and science fiction behind their conclusions.
Comment by bwfan123 3 days ago
Motivated reasoning in both camps.
There are folks incentivized by AI: engineers and managers working for AI related companies who justify their beliefs with selective facts. And then there are engineers who are threatened by AI and are extra-sensitive to slop.
The article is missing the point that once camps have made up their minds - no amount of analysis is going to change that.
Comment by davrosthedalek 3 days ago
Stay out of camps, people!
Comment by scsh 4 days ago
If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer.
That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.
Comment by skeledrew 4 days ago
Comment by MostlyStable 4 days ago
Comment by ex-aws-dude 4 days ago
Comment by cobertos 4 days ago
* Why was v3.4.1 the most buggy, right before the Claude commits? Why did "nobody notice"? It's way to strange to just say welp, it must be human error. * Why does v3.4.2 have 0 bugs, or 0 bug score. And why was such an outlier (no other commit seemingly has this??) allowed to mix into aggregate statistics and bring all the "is Claude buggy?" scores down. Tbh idk how that _wasn't_ a red flag in the author's analysis...
This article feels like half of an analysis presented as a highly complex finished product due all the advanced stats they're running.
Comment by logicprog 4 days ago
Why wouldn't it be except question begging priors assuming it couldn't be?
> Why does v3.4.2 have 0 bugs, or 0 bug score. And why was such an outlier (no other commit seemingly has this??) allowed to mix into aggregate statistics and bring all the "is Claude buggy?" scores down.
My original metrics which didn't filter out feature requests and questions had it at four bugs and prior to that it was even higher and it didn't make much of a difference to the overall analysis (fell well within the IQR, the lower end of it too). Also, removing one outlier just because it looks kind of funny to you, especially when we only have two Claude releases at all, would be worse in my opinion and more arbitrary.
Comment by cobertos 4 days ago
A multitude of reasons? A change in maintainer. A change in the mental state of a maintainer. A sudden focus by the community on a given undesirable behavior. Someone else here suggested use of Claude AI before it was disclosured. The framing implies that it was human-produced coding error, but my point is it could be _any other human error_ or even just some odd benign human behavior (a stampede of bug submitters), affecting the data. Which does not lead to the conclusion that AI code > human code. Not looking at these potentials is so unsatisfying.
> My original metrics which didn't filter out feature requests...
It still feels like a lot of weight of the phrase "If that doesn't look like a red flag to you, you'd be right." hinges on the fact that one of the versions has 0 bugs and it really killed the weight of that statement for me, because the oddity of there being 0 bugs just wasn't explained.
---
Could you please post the duckdb file that has the raw bug -> severity + version mapping to the GitHub repo? I have a desire to dig into this myself
Comment by logicprog 4 days ago
Comment by Laurel1234 3 days ago
Because he didn't analyze shit, just asked a clanker to rationalize his "clankers are great" conclusion.
Comment by faitswulff 4 days ago
Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.
Comment by germanjoey 4 days ago
It is the exact metric you'd choose if you wanted to make the current situation of rsync look like not a big deal.
[0] https://github.com/RsyncProject/rsync/graphs/commit-activity
Comment by logicprog 4 days ago
Comment by vsundar 4 days ago
Not sure if this is mentioned somewhere else, but looks like the maintainer has a blog post that addresses this: https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0
Comment by floxy 4 days ago
Comment by logicprog 4 days ago
Comment by runarberg 4 days ago
Of interest is this post here: https://github.com/RsyncProject/rsync/issues/929#issuecommen... which echos the same concern which was raised up thread, however, I failed to find the maintainers’ response.
EDIT: Found it! it is in the (untitled) discussion section (after the results).
https://lobste.rs/s/k1b0za/rsync_outrage#c_2iowov
EDIT 2 (and advice on design): The page design changes backgrounds after the results sections, which kind of conveys to the user that they have reached the end of what was is important and can just skim over the rest (usually pages have a radical change in typography like these when you’ve reached the comment section), however this is what is analogous to a discussion in a typical paper, and is arguably the most important part. I had simply assumed that you just left it at the result and skipped the discussion as a stylistic choice.
Comment by logicprog 4 days ago
I also paraphrase Tridge himself explicitly saying that this is why commits/releases have increased:
> Essentially, this isn't a "Claude" problem, it's a "more security work" problem, something that Tridge himself confirmed in his response, describing how a flood of AI-generated CVE reports forced rapid, extensive changes to rsync's attack surface.
> The page design changes backgrounds after the results sections, which kind of conveys to the user that they have reached the end of what was is important and can just skim over the rest (usually pages have a radical change in typography like these when you’ve reached the comment section), however this is what is analogous to a discussion in a typical paper, and is arguably the most important part. I had simply assumed that you just left it at the result and skipped the discussion as a stylistic choice.
Good point, I assumed everyone would read till the end, that's on me. I'll give it a heading.
Comment by ex-aws-dude 4 days ago
Why is it that some unfounded claim is made and the onus is suddenly on the project maintainer to prove it beyond all doubt?
It should be on the person making the claim to prove it
Comment by logicprog 4 days ago
Comment by bsza 4 days ago
A commit is a measure of nothing. Severity weighted bugs per unit of nothing? What does that even mean? In any repo it's trivial to achieve a sev/10c that's arbitrarily close to zero while completely ruining everything.
I suggest you practice some humility and update your conclusion instead of updating the mental gymnastics you used to arrive at the same conclusion.
Comment by logicprog 4 days ago
Thus, if anything their sev/10c is inflated. If I changed it to lines of code changed, the relative bug ratios would be much smaller, and the conclusion wouldn't change. In fact, the conclusion would look "better" for Claude; if I was using "mental gymnastics" to come to this conclusion, I would have already used a metric other than adjusting per commits!
What different metric would you suggest that would change the conclusion?
Showing "humility", as you so moralistically and condescendingly put it, would require being wrong first.
Comment by bsza 3 days ago
No, they don't, you just made that up.
> What different metric would you suggest that would change the conclusion?
What would be a lot more useful to know is whether or not the original prompt used to generate this post instructed you to do a fair and unbiased review of these bugs, and whether or not that prompt itself was framed in a fair and unbiased way. If you take a piece of paper and write "therefore Claude is not at fault" at the bottom, then nothing you write above that line is admissible, no matter how well-reasoned.
Comment by skeledrew 4 days ago
Comment by atmavatar 4 days ago
So my systems recently updated to rsync 3.4.3, and as soon as that happened my backup system - which does incremental backups using multiple --compare-dest= arguments - started to fail on anything but a full backup.
Incremental backups is perhaps the primary use of rsync, and they were broken for this person. That's pretty severe.The second reply is similar:
i wondered why my 3d printers were running like sh*t and at 100% cpu; turns out log2ram uses rsync.
This one I took with a grain of salt, since it read more like a dogpile than an actual bug report. However, if it's genuine, it's also reasonably severe.Later in the comments, someone attempted to provide a list of issues that had been added: https://github.com/RsyncProject/rsync/issues/929#issuecommen.... The list included several failures to build or run rsync that appear to have resulted from broken backward compatibility. That seems reasonably severe. If intentional, I would have expected mention in the release notes about the removal of backwards compatibility, but none was made.
The issue comments already degraded into a lot of unnecessary vitriol even before the above mentioned comment and only gets worse from there, so I stopped. But, the fact remains that the whole issue started with a severe bug.
I applaud the attempt at dispassionately analyzing whether the recent LLM releases of rsync were normal or outliers as far as bugs are concerned, but I don't think you can do so properly without analyzing severity.
Comment by skeledrew 4 days ago
"A lot of claims in the wider discussion have treated every recent bug report as if it had the same cause. That is not accurate. Some reports were regressions from recent security hardening, some were missing historical test coverage, some were older bugs found because rsync suddenly had more eyes on it (especially by AI that can find issues quickly) and some were packaging or environment-specific failures. A Co-authored-by line is not enough by itself to establish root cause." - https://github.com/RsyncProject/rsync/issues/929#issuecommen...
Comment by KaiShips 4 days ago
Comment by e40 4 days ago
Here is the process I used to do it, which was way more complex than I thought it would be:
https://gist.github.com/e40/caa67c1b8d439a528695f996d0519d8e
Comment by jarym 4 days ago
I was an AI skeptic some months ago but truly Claude and Codex have changed my development style and velocity in a way I never imagined would ever be possible. With that, yes, I produce more code and am finding more bugs.
So looking over at comments in HN articles the amount of polarising hate to anything produced with AI is quite surprising. Just because some AI helped or even produced entirely doesn't suddenly make a project 'vibe coded' as if that's meant to be some insult levelled at users of LLMs.
It reminds me a lot of when offshore outsources started getting more software development work from the mid-90s with all the derogatory remarks made towards 'Indian developers'. Now we're in the mid 2020s and similar remarks are made towards AI.
I don't get it. I really don't. What I do know for sure is more and more code will be AI generated with or without the detractors.
Comment by jiggawatts 4 days ago
People report the same “took a shortcut” issue with AI vibe coding, and I can confirm that I’ve had to rewrite practically everything the AI generated for me, despite using a frontier model dialed up to 11 thinking levels.
Having said that, AI is very useful for other activities like PR review, security vulnerability analysis, typo hunting, reverse engineering, etc.
I’m probably going to have to increase my subscription to the next tier but at the same time I still can’t use any of the code it generates.
If even one person can simultaneously experience "very useful, need to pay more for it" and "useless output code quality" then of course you'd expect a variety of opinions amongst the general user base.
Comment by albedoa 4 days ago
OP knows this but finds himself in the strange position of having to defend India slop in order to defend AI slop, totally unnecessarily and unprompted. It's baffling to you and me.
Comment by int_19h 4 days ago
Last year was the first time I saw an AI agent actually debug and fix a non-trivial bug in a satisfactory way. Even then, trying to use it on larger tasks made it clear that it wasn't something I could just hand over the issue tracker to.
Now? I've been using Codex for the past several months to work on a nontrivial project. Which was prototyped in C++ (for library reasons mostly), then had the initial version written in Haskell, and more recently I got it ported to Rust to keep memory use in check on mobile.
These things are not trouble-free, but the sheer amount of progress made in just the last year alone is astounding. Skepticism is well and good, but healthy skepticism ought to yield to tangible evidence.
Comment by Joel_Mckay 3 days ago
Most people feel more productive with chat bots, but often end up wasting more time chasing self-inflicted issues. Same clown-car of Dev-ops proponents no doubt billing by the hour. =3
Comment by nomel 4 days ago
With programming, I've always been in the later: it's a tool that allows me to do what I actually love, which is problem solving, system level thinking, and providing some nice solution to that problem, that happens to be through software.
So, I have an absolute blast with AI, because it helps do the more boring bits. And, seeing my non-programming colleagues get excited to see their vibe coded ideas become reality has been so much fun.
I'm genuinely curious to hear the perspective of someone anti-AI, who works in software. Perhaps the impending doom/skill shift of our profession?
Comment by CapsAdmin 4 days ago
But developers also say good practices should be followed when talking to each other, and while some may do, reality is often very different.
It requires discipline, which varies a lot between developers, between projects, current mood, and so on.
In the beginning you might be careful doing small changes, but after a while you might get more tempted to accept the output for what it is, because ultimately that's much easier.
So the way I see it; the left side is harder work and potentially bigger but delayed dopamine hits, the right side is quick dopamine hits. How do we (at least those who struggle with discipline) resist just slipping to the right?
I started out carefully myself and slipped more into vibe coding, but I don't feel particularly proud of it for some reason.
Comment by Daishiman 4 days ago
In the beginning you might be careful doing small changes, but after a while you might get more tempted to accept the output for what it is, because ultimately that's much easier.
Counterpoint: how is this any different from how things were pre-LLMs? I have seen, in the same codebase, some throughly well-written and tested PRs that read like Shakespeare and some of the laziest slop that even no LLM would ever write because humans have an unlimited capacity for laziness.
You catch the bad stuff through oversight, process, automated and manual checks, and the ultimate threat that your job depends on your ability to deliver so you better allocate at least enough energy into this so that you can ship moderately working code.
Comment by yw3410 4 days ago
Reviewing vibe-coded PRs and features has been utterly exhausting over the past few months.
I work on critical, mature software - a small change in behaviour can mean data loss or non-compliance with regulations for our customers. The biggest problem with AI PRs is the sheer amount of churn, extra code and lack of intent with the PRs it generates.
The only way I can describe the latter is that an AI-only PR feels to me like a painting where everything is high detail - and you have to comb over each part before you understand why it's there because so much is superfluous. A well written human PR on the other hand, is painted such that your eye naturally follows the thought process of the author so you can just nod along during the review, as if the solution was obvious.
Also when I'm _using_ the agent; at least 50 percent of my time is spent telling it to stop with it's approach so it doesn't go down a useless rabbit hole and waste tokens.
Comment by Daishiman 4 days ago
But this isn't an LLM problem; this is a problem of undisciplined engineers who feel they need to cram extra stuff in a PR. If an engineer doesn't look at the output of the LLM and generate extra work then it's still on them, right?
> The only way I can describe the latter is that an AI-only PR feels to me like a painting where everything is high detail - and you have to comb over each part before you understand why it's there because so much is superfluous
This just indicates that the engineer doesn't know how to use the tool. Hell they can ask the LLM to split the work into focused PRs and Claude will be happy to do it and the results might no even be half bad.
> Also when I'm _using_ the agent; at least 50 percent of my time is spent telling it to stop with it's approach so it doesn't go down a useless rabbit hole and waste tokens.
If this is happening often then the tool is probably not fit for the job.
Comment by yw3410 3 days ago
I'm not talking about extra feature s; I'm talking about for the same single feature the code is either convoulted because the algorithm is overly complicated or the abstractions are just wrong for the domain.
The PRs typically are already focused in that they address a single feature; or at least a single "usable" feature in a complex system which necessarily has a lot of connected parts and behaviors.
> then the tool is probably not fit for the job.
Perhaps; but with an LLM I haven't found which jobs it _does_ work for and which it doesn't. I already use planning mode extensively; and capture the major points, but then it makes a stupid decision mid implementation and just starts churning.
Comment by jarym 4 days ago
I don't have a good analogy but the immediate one that comes to mind is treating AI like a junior developer that you're mentoring. If you know what you're doing you can iterate quickly; if you don't then its a whole other story.
Claude built me a Markdown editor - I designed it, set coding standards, etc. It coded it to my spec. The output is in my opinion not bad and is very usable (for me - I use it daily now). Probably would have cost me north of $50k to get a team of seasoned devs to build it to the current level of polish. https://github.com/emrul/md
Comment by lelanthran 4 days ago
So... you're vibing? Not looking at the code at all?
Comment by Joel_Mckay 4 days ago
For context search, I find LLM quite useful... still wrong 20% of the time... but it has some utility.
Here is a thought experiment: If "AI" will eventually generate your work, than what actual value do you bring to the table? =3
Comment by tom_ 4 days ago
Comment by albedoa 4 days ago
What was the impetus of the derogatory remarks?
Comment by kelnos 4 days ago
Some of it was genuine cultural differences. It's hard to work with people and get the results you want when you don't understand their culture, and how they communicate. (For example, people from some cultures just can't say "no" or "I don't know"; you need to learn how to communicate with them in a different way to get the understanding you need.)
Some of it was certainly a form of jingoistic or xenophobic protectionism.
Comment by Joel_Mckay 4 days ago
However, you also get the lowest common salient answer guaranteed, uncopyrightable work (differs from public domain), and potential legal peril from copyright bleed-through.
We are in the golden Napster age of isomorphic plagiarism. =3
Comment by logicprog 4 days ago
Comment by igregoryca 4 days ago
I don't have empirical evidence for this claim, but best I can tell, security patches are the principal source of observed bugs in software of a certain vintage, because they cause churn. (Just think of Windows updates that break drivers.)
Comment by geraneum 4 days ago
So the criticism was bad, and that somehow makes it ok to use a bad metric?
Comment by logicprog 4 days ago
Comment by abirch 4 days ago
I come to hn because I get very nuanced, informed information and glorious puns.
Comment by epolanski 4 days ago
Comment by lbrito 4 days ago
Comment by logicprog 4 days ago
And again, that's kind of the point. There's exactly zero actual evidence, however you slice it, that "Claude broke rsync" except cherry-picked anecdata, and the whole point of my analysis is to demonstrate the total lack of any such trend/evidence at all, and just how in-distribution/normal these releases are, to show that if people hadn't known Claude was involved in them, they wouldn't have remarked on them.
Comment by kelnos 4 days ago
> My statistics courses are far behind me, but don't you need at least 30 data points to conclude anything?
That cuts both ways. If we say that the author here can't claim any conclusion because there are only 2 Claude-authored releases, then we must also say that the people claiming "Claude broke rsync" have no statistical basis to draw that conclusion, either.
Comment by matheusmoreira 4 days ago
There is no fixed number. Sample size depends on the size of the set you're sampling, desired margin of error and confidence interval.
If your total set has a million items, you need ~16600 samples to draw conclusions with 99% ±1% certainty.
Comment by wlonkly 4 days ago
Comment by gravypod 4 days ago
This would be even harder to measure.
Comment by dvt 4 days ago
I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever. It's tiring and a complete shame that author has to make such an insane deep dive into a random accusation that just caught on social media. I want to emphasize that this has nothing to do with AI, it's just tech tourists, consumers (as opposed to creators), and engagement farmers that have taken over. AI slop probably doesn't help, but the underlying issue has been brewing for at least a decade.
Also, the "making soup for the homeless & pissing in it" is not only an off-base analogy (software is pretty low on Maslow’s Hierarchy of Needs), but also somehow looks down on both people in need and the volunteers that help them. Just absolutely gross.
Comment by matheusmoreira 4 days ago
Comment by Panino 4 days ago
Agreed, and similarly, as a hobbyist programmer who loves Rust and Go, I've always felt that the people who command others to "rewrite it in xyz" are not themselves developers, they're "ideas people." There's a mass of these people whose main interactions with the world are through the dramatic forcing of their correct opinions.
> I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever.
That's a bummer and it's something I'm fearful of. I post some code on my website, not on a github type site, and don't interact with people about it. It's nice and plenty of people do it. Is that something you'd consider?
Comment by unphased 4 days ago
Comment by WesolyKubeczek 4 days ago
Instead we have a shitstorm over presumably legit issue, for which the only source is some mastodon post.
One command that used to work in 3.4.1 and stopped working in 3.4.3. Just one! We could have already bisected the living shit out of this and go home, but no.
Comment by aarjaneiro 4 days ago
Even this report is full of claude-introduced bugs
Comment by runarberg 4 days ago
This mistake does exist in the wild though: https://github.com/search?q=%26emdash%3B&type=code
If I was more ambitious I would plot the dates of the blames of these results in a histogram and see if an there is a significant increase in these mistakes (over a baseline —) correlating with the release of some models.
Comment by logicprog 4 days ago
Comment by MantisShrimp90 4 days ago
But the reality is that if you were already set enough to call rsync slop because of a single post, you aren't going to be more down now. Even in these responses I see everyone nitpicking and moving goalposts as if one more commit being actually claude-aided will tip the scales from stable project to "vibe coded slop".
Software has always been fuzzy, we have never come up with an objective way to handle software quality, and this Uber hatred of llm contributions lets the humans who make egregious bugs and mistakes off the hook.
Taking a step back, we need to have more empathy and thoughtfulness of one another in this space. Its new and people are experimenting and there will be nothing good coming from personal insults and DDOsing a good project just because someone got ragebaited on threads, x, mastodon or whatever else.
How do we determine bugs and increase quality? Its almost like we have been grappling with this question for decades and I still hear people fight on the best way forward. Simple design, test driven development, user surveys, all of the above have been used as a proxy for software and they all failed to capture everything. Back in the day we used that ambiguity to give each other grace, now we use that ambiguity to tear down other creators. Whatever, if open source software really is dying its because of this toxic shit just as much as the llms
Comment by thin_carapace 4 days ago
Comment by tptacek 4 days ago
Hey, 'logicprog, your writing is fine!
Use LLMs to critique your writing, check its structure, vet your choice of topic sentences, check flow from graf to graf and section to section, look for passive voice and overused words. LLMs are fantastic for that. But don't use a single word an LLM suggests in your actual writing. If it suggests something really fucking good, too bad, those words are disqualified. It's an easy red line to adhere to, easier than it sounds, and it'll keep your writing human.
(You ended up somewhere around here anyways, but that was after you posted something with LLM-written language because you weren't confident enough in your own writing. The things you do "worse" than an LLM are what make you you; be protective of them!)
Comment by logicprog 4 days ago
Comment by rovr138 4 days ago
Is this a configuration that's not common and thus not tested?
If people think they can do better, I want to see their forks and them keeping up with it.
https://github.com/RsyncProject/rsync/graphs/contributors?fr...
Comment by parliament32 4 days ago
Your verbosity and sentence structure are not a problem. I hope that publishing this gives you a bit more confidence in your writing, because it's legitimately good.
Comment by rswail 3 days ago
There was one regression bug apparently (related to multiple destinations and the way people do backups), but all the attention/anger has been about a test suite that makes the rsync development better, more rigorous and copes with the onslaught of both good and bad AI generated PRs as well as hardening something that has two decades of C code in it.
People need to grow up and appreciate what others in the community (especially people like tridge) have provided.
Comment by bwfan123 3 days ago
Comment by guilhas 2 days ago
If several or critical lines of code get changes quickly, and keeps breaking things, with or without llms, there will be backlash
Rsync should rightly loose reputation if the project allows the release breaking changes to follow the latest hype trend
Comment by vlovich123 4 days ago
Comment by saagarjha 4 days ago
Comment by PunchyHamster 4 days ago
Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
Comment by logicprog 4 days ago
I'm using methods appropriate to that low amount of data, first of all. Second of all, since I'm only trying to show there's no evidence for the anti-AI hypothesis (not disprove it, or prove the null hypothesis), that's sufficient in itself. Also, I wonder why nobody said things like you're saying ("there's too little data to tell") in response to all the absolutist claims that AI caused rsync to get worse?
> The fact last few commits were attributed to claude doesn't mean previous ones didn't use it.
At this point, you're just positing Russel's Teapot: you'll keep assuming more and more of the code was "secretly" Claude when there's no evidence for it and no reason to think so, just because you've started with the assumption that Claude makes things worse and you want to find a way to prove it.
Comment by vintagedave 4 days ago
Especially since if the earlier commits were so clearly AI authored yet without the Claude marker, surely you or anyone would be able to spot them. You could say, X commit does not have the Claude commit marker yet was AI written. But for all the speculation on this thread, I haven’t seen anyone actually doing that. What may be possible is that the rsync maintainers used AI to assist yet reviewed and edited themselves, as many devs do, and if so then the stats in this article are still notable: there are no poor quality outliers that can reliably be attributed to AI and if one specific release (3.4.0) was, the subsequent releases which presumably also had as much AI as this speculative hidden AI release only show improvement and thus act as a pro-AI argument.
The blog has many more datapoints than two. It compares many releases. You’re looking at 2-vs, not 2.
Comment by mikaeluman 4 days ago
I think it will be up to some group in academia to make a real full blown study across several repositories.
There must be tons to learn on how LLMs have changed software development and perhaps the cleanest separation will simply be going by what repositories declare e.g. "No LLM involved" vs those that proudly do the opposite or are neutral.
Bugs is not the only variable of interest here. I am guessing someone is already doing this as we discuss it here...
Comment by Polarity 4 days ago
Comment by gjvc 4 days ago
Comment by davrosthedalek 4 days ago
Comment by drankinatty 1 day ago
The issue the coding tools like Claude present is the sheer size and scope of changes and commits they generate that would take mere mortals months of careful coding to do.
That's an issue everyone using those tools will have to confront. I don't know Andrew personally, from a "let's go have a beer" standpoint, but I've known him from the samba list and his work with rsync for a very long time.
My take on the issue is less about the regressions and Claude screw-ups and more the lesson to all about the reliability of the coding tools and the diligence required to validate what they spit out.
It's an unfortunate black-eye, no doubt, but it's not a unique one. The takeaway is if something like this can slip by somebody like Andrew, then we all need to redouble the validation effort, lest we too are destined to share an unfortunate black-eye or two.
Never forget, "to err is human, but to really foul things up requires a computer."
AI just applies that adage at industrial-scale.
Comment by throw7 4 days ago
Comment by AEVL 4 days ago
Comment by logicprog 4 days ago
Comment by moktonar 3 days ago
Comment by ladax72707 2 days ago
Comment by ltbarcly3 3 days ago
Comment by htk 4 days ago
Comment by foxes 3 days ago
Comment by Havoc 4 days ago
Comment by manlymuppet 4 days ago
HN relatively, is a very intellectual part of the internet, yet even still, it's really common to see very uneducated opinions here. Not that everyone needs to be very educated, but posts with plainly wrong assumptions and biases shouldn't go completely unchecked so rampantly.
Comment by amluto 4 days ago
This is kind of a sad situation. Tridge is an excellect programmer and a very respected member of the community, and I totally get it. rsync, like most old C projects, has a lot of accumulated cruft, and things that would be nice to fix, and bugs. And those bugs come in at least three classes: semantic bugs, improper interactions with the OS, and memory safety bugs. And the author and long-time maintainer has the same problem as every other maintainer and team: not enough time to deal with everything. And now LLMs come along, and they are so, so seductive. They will fix your bugs if you ask them to. They will even find your bugs. And they're right a remarkably large fraction of the time. It's magic! You can write an agent loop or magic harness or swarm and let them do this on their own if you want. And so you start getting through your backlog, and it's fun, and you feel good, and you let your guard down. And you start having problems: - Your favorite LLM does not have the context that lives in your head. I use rsync because Tridge wrote a fine piece of software, and he knows how to write serious software, and I'm willing to accept that it's in C and therefore almost certainly has a safety bug or three. If I wanted to use claude-ersatz-rsync, I'd use that instead, but I really don't, TYVM. - Remember how LLMs are right a remarkable fraction of the time? The fraction is remarkable, but it's nowhere close to 100%. (Yet? Who knows. Right now, it's DEFINITELY nowhere near 100%.) - The training process for the current crop of LLMs does not adequately reinforce long-term maintainability of the outputs. And, for all the LLMs seem magic, they seem to love a workload in which they write code with poorly named functions and no docs and sort of assume that they can parse their own code down the road and figure out WTF is going on, and they are AT BEST only a tiny bit right. Because every project has interfaces where one module touches another, and every LLM has very limited context (larger than humans' in straight up verbatim working memory but MUCH MUCH WORSE than humans' (for now, anyway) in actual broad picture retention), and this workload doesn't work. If it did, we could give up on structured programming and just have the LLMs vomit up uncommented asm. And so, where humans have conventions and decently named functions and ideas that you shouldn't churn your code just for funsies (at least not in a production context), LLMs do this: https://github.com/RsyncProject/rsync/commit/30656c5e358b1c6... Most of that is blindly changing calls do functions like do_foo(args) (which makes sense) to do_foo_at(the same args), which makes no sense. Sorry, but the world of POSIXish-targetting programers (including, presumably, Claude) knows what _at means, and it means "at" the specified directory fd. Which is not specified in the call sites. It makes no sense at all. Buried in all that mess [0] is the implementations, which are sloppy. Seriously: - There's a function called do_utimensat_at. Is Claude stuttering? - There's a lovely comment in syscall.c:1660-1673 that's quite bad. It's handling strings that contain "/../" and such. If there's some actual contract that the function makes to its callers (and there surely is -- this is critical security-sensitive code), then SAY WHAT THE CONTRACT IS. Don't bury a partial explanation in a comment in the middle. - There's a repeated pattern: In do_foobar_at(path), there is, in effect: if (!path) do_foobar(path); Nice NULL pointer handling. Is NULL a valid argument or not? Why handle it by forwarding it to the less secure variant? - Those nice, supposedly secure "at" variants check for paths that start with '/' and forward to the raw insecure syscall. And they don't check for .. in the middle. So what, exactly, is the special code for .. promising to do? (See above.) I don't think more details are needed. But my take is that this whole thing is a mistake. I personally work on the sort of code where messes like this are entirely unacceptable. And using an LLM while maintaining the kind of oversight that prevents it is mentally taxing and not exactly fun. If you want to fix all the gunk in a C program like rsync by LLM magic, go rewrite it in Rust or something -- you're already exposing yourself to a massive rewrite and all the risks that entails, and you're pretty much guaranteeing a high level of sloppiness, so at least use a language that is more resistant to slop.
[0] Which GitHub doesn't even render by default because their diff viewer is so bad.
[There were follow-ups. See https://news.ycombinator.com/item?id=48352182]
Comment by j16sdiz 4 days ago
Comment by amluto 3 days ago
Comment by TZubiri 4 days ago
Can someone explain why one would ever use rsync (pre vibecode version) instead of cp and dd?
Can't we just 'apt remove rsync' and save ourselves the time even spent on evaluating this dependency?
Thanks
Comment by Joel_Mckay 3 days ago
While stuff like sshfs is great for a few small files (and win11), it will be an order of magnitude slower than an rsync task.
Most smart folks automate backup/recovery scripts, and only sometimes edit them with a new OS install. =3
Comment by Arcuru 4 days ago
Comment by int_19h 4 days ago
Comment by esailija 4 days ago
Comment by logicprog 4 days ago
Comment by nasretdinov 4 days ago
The best approach I've tried that actually increases quality (and _may_ speed up development) is to write ~80% of the code yourself and then ask LLM to review it thoroughly. While it's doing its thing you're also thinking about the code and reviewing it yourself in parallel. You then merge the findings and fix stuff worth fixing. At this point the authorship of the code is still mostly yours, you _understand_ the system and you ship fewer bugs, slightly faster than otherwise. It's a moderate improvement to the workflow, but it actually doesn't cost nearly as much either, and definitely doesn't produce rage at the machine from the slop. The only downside is that it requires lots of discipline, and it's a relatively rare commodity among software engineers these days.
Comment by nelox 4 days ago
Comment by block_dagger 4 days ago
Comment by themafia 4 days ago
You can write for an audience or you can write for yourself. Which is fine either way but you shouldn't pass the blame for bad results on to your audience.
> and recieving almost no substantive input, discussion, or response on the actual content of the article
Well did you write it for that purpose?
> "Just wait, more bugs will surface" -- v3.4.3 has been out long enough
Wait for _more releases_. As your own data shows the bug rate is not consistent between releases. So this is probably not a worthwhile metric. Perhaps systems touched, new features included, or attempted fixes would be a better way to contextualize releases and the goals of the author.
Comment by tiahura 4 days ago
Comment by KronisLV 4 days ago
> v3.4.3 has been out long enough that its rate (5.00) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break.
I mean, as someone who uses LLMs, it might be a good idea to consider how one might limit the amount of bugs that will appear in the future at least a little bit: parallel iterative code review loops would probably be the easiest and most applicable to LLMs, though I guess test coverage and other code analysis tools help too.
Comment by nazgul17 3 days ago
By the way, I did find this a bit hard to read but, as instructed by OP, I'll go fuck myself.
For what it's worth, I find AI written prose easy to read, and am annoyed by all the constant HN comments which just point out the author was AI, without anything else substantive to add.
Comment by logicprog 4 days ago
Comment by overgard 4 days ago
Comment by yobid20 4 days ago
Comment by logicprog 4 days ago
Comment by noAnswer 4 days ago
Comment by WhereIsTheTruth 4 days ago
Comment by steno132 4 days ago
So what? You've saved a significant amount of time for a decent number of humans, and if those humans are working on other projects, the overall net output for the world is net positive compared to without LLMs.
You have to broaden your perspective. It's not just about how rsync was affected.
Comment by boxed 4 days ago
> ok, so I was wrong and badly, but I will double down and say I was right anyway
Comment by iainctduncan 4 days ago
Comment by int_19h 4 days ago
Comment by logicprog 4 days ago
Comment by pushcx 4 days ago
What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment.
The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits":
It spread to Hacker News and Lobsters, generating hundreds more comments.
This is false, it did not appear on Lobsters. Here is the function in the codebase that prohibits this kind of brigading: https://github.com/lobsters/lobsters/blob/main/app/models/st...Please correct your article.
Comment by tptacek 4 days ago
Comment by logicprog 4 days ago
> On Lobste.rs, in response to the Medium essay Tridge himself posted in response, finally some users like boramalper begin to actually ask for evidence one way or another:
Comment by pushcx 16 hours ago
Comment by sspoisk 11 hours ago
Comment by nairboon 4 days ago
Comment by quentindanjou 4 days ago
Comment by overgard 4 days ago
"My honest assessment is that this is a competent calculation performed on a badly confounded measurement, followed by conclusions substantially stronger than the calculation warrants. It is useful as a rebuttal to “the Claude releases are obviously unprecedented disasters,” but not as evidence that Claude was harmless."
Comment by aplomb1026 4 days ago
Comment by eddysir 4 days ago
Comment by sathyayoshi 4 days ago
Comment by dang 4 days ago
[see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]
Comment by logicprog 4 days ago
- I used GLM 5.1 to help with the coding and math for this.
- However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions)
- Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible.
- I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty!
Comment by jchw 4 days ago
> A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.
Comment by logicprog 4 days ago
Comment by jchw 4 days ago
Comment by logicprog 4 days ago
Comment by ok_dad 4 days ago
Heck, I use LLM assistance for coding and I’ve even coded up whole features with the clankers, but giving it the right to speak for me is too much.
I should also add that I read and understand every line of clanker output that I publish for others, so I’m not a vibe coder either, just adhd.
Comment by skeledrew 4 days ago
Comment by grey-area 4 days ago
So your statement betrays a significant misunderstanding - there is no neat clean divide between style and content.
Also, LLMs often generate text that is plausible, but wrong, in ways big and small.
Comment by skeledrew 4 days ago
> Also, LLMs often generate text that is plausible, but wrong, in ways big and small.
So do humans. Always have, always will.
Comment by grey-area 4 days ago
Comment by skeledrew 4 days ago
Comment by jchw 4 days ago
Poor prose does not just make writing ugly — it creates friction, obscures nuance, and introduces ambiguity.
You can eat a gourmet meal out of a dirty paper bowl. You still get the calories, but the delivery mechanism definitely impacts the experience and the perceived value of the food. Same food, different response.
See? I can write slop too, I don't even need to burn down a forest to do it. If you are OK with every fucking thing being written exactly like this, good for you. I am not.
Comment by skeledrew 4 days ago
Comment by jchw 4 days ago
I waited a minute to make sure you weren't going to delete this post because frankly, if I had written it, I would have. Guess not, so... Here goes.
No. It is not the fault of my "attitude" that the Internet is going to suck. That is a complete reversal of the reality. The fact that even people without bad intent are already spreading slop everywhere should be enough evidence to essentially prove that there was never any hope. If this is what good actors are doing, what exactly do you expect from bad actors?
Also, to stress it yet again, I don't care if people use LLMs in general. I'll even say that I don't particularly care very much if people use them without disclosing it in most cases. If you're using it like a normal tool and not merely just dumping the output verbatim there is not any particular need to disclose it any more than you'd disclose other tools, though I think people would prefer if you did just for transparency.
My chief complaint is just how bad LLM slop writing is. It simply is not good at all. It would literally be much better for the Internet if they weren't so turboshit at writing. There is almost no writing style I don't prefer over garbage LLM writing. I'm dead serious. Early LLMs were worse at almost everything else, but they were a lot better at writing for sure. Something went wrong somewhere.
But I do also believe that it is inherently bad to dump prose as-if you are communicating as a human, but said prose isn't actually written by a human. If someone shows me a cool drawing that they made, that means that they sat there and went through the process of sketching, possibly multiple drafts, inking, coloring/shading/painting/etc. to create an expression. This involves many human skills that take years to hone, and every detail carries someone's explicit intention. I think that this is cool, and shows a great degree of skill and effort.
When you, of course, generate some crap from an image generator, it may very well look similar. It may emulate some actual defects that make it look like someone really drew it. But someone didn't. A model went directly from a text prompt and dumped out pixels on screen. No sketching. No layers. No thought processes about how to frame things or what details to include. That doesn't mean zero effort went in: I'm sure in many cases someone sat around and fudged with LoRas and inpainting for a couple hours and pulled the slot machine lever to get good seeds and etc. That doesn't mean that an AI model does not have some model for how to structure an appealing image: it does, that's obviously why the results can look decent to begin with. But when you dump out an image from an image generator and you wink wink nudge nudge present it as your own and people evaluate it as if you drew it, this is basically fraud. Everyone looking at it who doesn't know it is AI generated actually believes you went through the normal effort of drawing that image and all of the years of practicing skills and acquiring knowledge that takes. That's bullshit, and it takes away from the actual accomplishments of people who put in the work like cheating in sports does.
Like yeah, a lot of people are cheating at chess, by passing off engine play as their own, but does that really make it okay? When the entire point is using your brain and not just the raw outputs themselves, doesn't that hit you as a problem?
For generative AI, I personally draw this line at what I feel are expressions of creativity. If you use AI for drawing references, whatever. If you use AI to generate globs of repetitive code, whatever. Code can be creative but I do not view it as an expression of creativity and almost any tool is fair game. If you are using ML models for motion capture or some other data processing thing where humans had to do repetitive work before, whatever. Maybe these tools sometimes do devalue the work, but the LLMs are not doing the interesting part here, they're doing the boring part. (This is, in part, an admission that actually writing code is often pretty boring in and of itself, something that I realize programmers have been inconsistent with in an attempt to justify their value. But, I still believe it to be true.)
So okay fine. People are reluctant to disclose that they used AI to generate text because they fear the backlash that it will get them. This is understandable. What upsets me about this is that well-meaning people are apparently falling back to the idea that because LLM backlash is strong, what would be better than either trying to just simply write your own damn posts or be honest about your usage of LLMs... Is to just try to wink wink nudge nudge pass off more or less verbatim LLM writing as if it's a post that you wrote.
I am not ruining the Internet. There is literally nothing I or any group of angry mobs could do that would even remotely slow down the decay of the Internet even if we desperately wanted to.
So in fact, I'm not even trying to not ruin the Internet. I don't particularly care if my attitude is not helping or hurting. I'm not having an attitude as part of some grand strategy to save or destroy the internet. I'm having an attitude, because I am pissed off.
And I am pissed off because I am tired of reading posts the author probably only skimmed themselves.
Comment by moomoo11 4 days ago
Comment by aozgaa 4 days ago
At the time, I found this a bit irritating, but with a few weeks time I see the merit. The informational content tends to fall into “derivative” territory when LLM’s write stuff. And people are here for novelty and some socialization.
Also LLM prose seems optimized for engagement rather than concise communication. Takes longer to sift through linguistic boilerplate to get to the point. (The quoted bit being a case in point)
Comment by fireflash38 4 days ago
Comment by jchw 4 days ago
And while the comments are always flooded with people like me, the upvotes seem to tell a different story; clearly LLM writing really does appeal to some people. Or idk, maybe a lot of people who vote on stories and don't comment don't actually read them. Hard to say for sure.
Comment by grey-area 4 days ago
Comment by otabdeveloper4 4 days ago
(I need a better model to translate from llmese.)
Comment by grey-area 4 days ago
Comment by noctuid 4 days ago
Comment by CuriouslyC 4 days ago
Comment by logicprog 4 days ago
Comment by bri3k 4 days ago
Comment by ex-aws-dude 4 days ago
The author provides evidence to the contrary and the HNers won't even engage with it instead just talking about the writing of the article in classic HN bikeshedding fashion.
How about after that we talk about the formatting of the website and the colors?
This site is really going down hill
Where is the accountability for your own opinions?
Are you guys only upvoting things that confirm your existing gripes?
Comment by dang 4 days ago
It would be preferable if someone would seed a better discussion by engaging with the article's claims/observations.
Comment by ex-aws-dude 4 days ago
Is that the kind of low effort posts we want around here? Just a link to a github comment of a screenshot?
You're complicit here in fueling the harassment of an open source project
Comment by dang 4 days ago
Even if you're right, though, you shouldn't be posting comments that break the site guidelines.
Comment by ex-aws-dude 4 days ago
People opening issues just to rant against an open source project is acceptable content for HN?
How is that even allowed in the first place without getting flagged/removed?
And every time that happens the project gets brigaded from HN users
Comment by dang 4 days ago
Please follow the site guidelines from now on.
Comment by ex-aws-dude 4 days ago
How does low effort rage-bait spark curiosity?
The comment to upvote ratio clearly shows it’s inflammatory
Comment by roywiggins 4 days ago
If you want me to read your analysis, you are going to have to make it not read like Claude wrote it. What does "placement" even mean here?
Comment by rroblak 4 days ago
The use of "regime shift" is what gave it away for me. I've never seen a human write that, but Claude does from time to time.
At least they removed occurrences of "load-bearing".
Comment by roywiggins 4 days ago
Comment by genxy 4 days ago
Comment by gamegod 4 days ago
Comment by logicprog 4 days ago
If you don't want to read the LLM prose, you can just go to the GitHub of my project, grab the scripts, and run the full pipeline. It will gather the data, build the database, and run the analysis from scratch for you, and you can look at the numbers directly. It's all repeatable.
Comment by roywiggins 4 days ago
LLM output has conditioned in me a near reflex response to just close a tab as soon as I smell LLM-authored text. Like, I'm not mad or anything, I just frequently find most default LLM-voiced text very unpleasant to read so I just don't continue reading.
Comment by logicprog 4 days ago
Also, it wasn't written by Claude FWIW, GLM 5.1.
Comment by dang 4 days ago
> After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice.
I've therefore turned off the flags and hopefully people can actually now discuss the claims/findings being reported.
Comment by hypfer 4 days ago
Soo... it didn't just sound like genai but was genai?
___
Huh. From the article:
> If anyone complains about my verbosity or sentence structure — as they usually do, which is the reason I originally let the AI write the prose, among other reasons obsoleted by templating — they can go fuck themselves.
This is kinda sad, honestly. But also should show the author that doing what people try to bully you into doing will not stop them from bullying you.
Just stick with your unique voice man. If people don't want to read that that's fine. They do not have to. You're fine
.. what are those em-dashes doing there though?
Comment by logicprog 4 days ago
You're literally doing exactly the bullying I was trying to avoid, even while denouncing it. I like em-dashes. I have AuDHD, and they help me represent how I think.
Comment by ajkjk 4 days ago
If someone gives them shit about their writing, that's on the critic for being shitty. If they use AI to write, that's on them for being fake. But, to write online at all requires being ready to have people be shitty to you and ideally not reacting in a way that makes the situation worse. Sounds like they need work on that part.
Anyway it is basically always possible for someone to find something legitimately bad about anything a person does. The question is, how much of an issue is that? Not much actually. So you have flaws. Fine, just be flawed. It had no affect on your life beyond your reaction to the attack. And putting aside that reaction is a prerequisite for learning anything useful (or discerning that there is nothing to learn) from the experience.
Good people will trust good intentions through the flaws, while shitty people will write off your work and your intentions because of the flaws (and try to make sure you feel bad about it in the process). But it's always they're too weak to express disagreement maturely, or sometimes because they're bitter and threatened by your good intentions directly. Either way, it's their flaw, not yours.
Comment by hypfer 4 days ago
"No these are fine, now look over there!! <lotsoftext>"
Pay no attention to the man behind the curtain?
Comment by ellyagg 4 days ago
Comment by dang 4 days ago
I agree that it will be interesting to see how this develops going forward. One can imagine wildly varying scenarios.
Comment by hypfer 4 days ago
Why should I care? If it's a good thought, chances are it appears without slop around it. If it doesn't re-appear, life will still go on regardless.
No need to shift through noise just to avoid FOMO.
Comment by otabdeveloper4 4 days ago
"Claude, rewrite all of the prose in my own voice."
The funny part is that it probably works.
Comment by mschuster91 4 days ago
Please, why can't people write stuff by hand themselves any more? It's a good analysis but how can I trust it without reviewing everything myself?!
Comment by logicprog 4 days ago
Comment by sanitycheck 4 days ago
At this point we're all used to skimming through thousands of AI-generated sentences every working day and constantly thinking "this is likely to be 20% bullshit", it's hard to turn that off even if I try.
Comment by logicprog 4 days ago
Comment by JasonSage 4 days ago
This is low-quality--every single day I witness Codex and Claude misunderstand, mislead, and hallucinate responses based on "assumptions" and I have to fact-check them.
If I wanted a statistical analysis and to be the human in the loop, I would ask the LLM myself, and I would definitely NOT read an article that just dumps the LLM output as-is.
Comment by bradrn 4 days ago
(Also, I suggest clearly acknowledging where AI was/wasn’t used. I like CuriosityC’s suggestion: https://news.ycombinator.com/item?id=48411968)
Comment by logicprog 4 days ago
Comment by bradrn 3 days ago
(For what it’s worth, I think your own writing style is quite nice, now that I can see it.)
Comment by sanitycheck 4 days ago
Comment by BigTTYGothGF 4 days ago
You didn't care enough to make a good writeup, why should we believe that you cared enough to make a good analysis?
Comment by skeledrew 4 days ago
Comment by tappio 4 days ago
Of course this is a bigger problem, as its now harder to distinguish content that is "AI slop" with "content co-authored with AI that is carefully reviewed" with a quick glimpse, and the "AI smell" is quite off-putting. My initial reaction was also negative, but after glimpsing it through and reading the summaries, I found it decent summary, which also... speaks of this thread, of the content of the blog post and everything about the discussion and the strong feelings people have developed around the use of LLMs.
Anyhow, it would be good to disclose the repo with the code for the statistics & use of LLM in the writing right up front. Which model, and why it was used to do the writing, etc. Its enough to say "I think it writes better than I do" or "I was in a hurry, sorry" or what ever, but it really should be disclosed. It reads more honest.
ps. really... that sideways scroll? plz fix it.
Comment by JasonSage 4 days ago
The problem I see is that this is indistinguishable to a reader at a glance.
Distancing the writing from the "AI smell" not only improves the quality by dropping the unnecessary ocean of rhetorical devices, it forces the human to have real weight and agency on what's being said.
I think that act of distancing from raw LLM output through refinement is a huge quality leap. Even if you're only doing the refinement with an LLM, it forces the writing to have more voice and ideas from the author.
I can see the work that went into the analysis here but again, as a casual reader, it's impossible to tell that there were any original ideas here expressed by the author.
Comment by logicprog 4 days ago
Comment by rjh29 4 days ago
If OP had said "here's an AI summary of the data" and generated a conscise summary, I think I would fine with it. But default AI writing is really verbose -- the opposite of a compression algorithm, spewing out cliched phrases that don't add information. It's exhausting to read, and it lacks the interesting noise of a human response.
Comment by sfink 4 days ago
I am pretty insensitive to AI writing. I have never commented before about something sounding like AI, because mostly I don't notice. But this was so over the top that I spent the whole article trying to decide whether it was an intentional parody of AI writing style.
This article's language is not en-US. It's not en-BR. It's en-SLOP.
Yes, that was my clumsy attempt at AI parody. Here's another: this article doesn't just have AI tells. It is AI tells.
Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
As for the substance of the analysis, it seems pretty good to me but I see some flaws that weaken it a bit.
The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates."
The whole analysis has very limited data. It's necessarily based off a single pair of releases at the very end of the chronological timeline. You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. (By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.)
"The critics' claim is a simple comparison: did the rate go up?" That's reductive. For one, these releases are known to be in reaction to a flood of (AI-discovered!) security reports, which is a novel situation and in fact is a huge confound to anyone arguing about what those two releases mean -- they're both heavily AI-written, but in response to an unusual situation. When the samples are only drawn from a distinct scenario, statistic analysis can only speak to the quality of code in that scenario.
Also, another reasonable hypothesis could be: AI-written code has bugs of a different flavor that bothers users more. It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. (If true, it still doesn't support the claim that depending on AI code is a catastrophe, fwiw.)
I'm not arguing the conclusion is wrong. I'm saying the analysis proves far less than it claims to. As for whether it's a debacle for rsync to become dependent on AI code generation, I think that's a reasonable debate to have but it's not going to be resolved this reductively.
Comment by logicprog 4 days ago
It does not statistically prove anything, but as I thought I made extremely clear in the card where I discuss it, the point of bringing it up is different: to prove the hypocrisy of the anti-AI crowd.
> By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.
The entire outrage is because people noticed what they thought was an unusual number of bugs and/or regressions in the release, saw it had Claude in it, and assumed a causal link, not just "priors about AI code."
> You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis.
The point I'm trying to make is that there is no evidence, based on these two releases, to think Claude made anything worse, whatsoever, and so the outrage is unfounded. This doesn't require me to prove Claude didn't cause any problems. If I ever made the latter claim, I should clean that up.
> It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it.
Tridge actually explicitly says he made that tradeoff on purpose, not the AI.
> Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me.
Comment by sfink 4 days ago
> I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me.
Thank you thank you thank you. I would love to be able to describe how hard it was for me to think about the actual evidence you're presenting when reading about it through the AI writing, but I suspect it's one of those things where it bothers you or it doesn't. If you'd like to empathize, maybe I'll give it one try: imagine an otherwise solid PhD thesis written in crayon. The facts and evidence and reasoning are unaffected, but it's just so hard to take it seriously.
Anyway, with the rewrite I don't have to battle my kneejerk reactivity nearly as much.
I'm no expert like she is, but based on what I know, I agree with your wife on the statistics. That style of analysis is going to be the best you can do with the data available. It's an accepted way to stretch data without being too dependent on an assumed distribution. It's a good analysis. I still don't come away with the conclusion that concerns about AI code maintenance are necessarily overblown, but that's fine. I think your analysis project is a very solid contribution, and it's a hell of a lot more evidence-based than the rants people were posting.
Comment by duk3luk3 4 days ago
Comment by volume_tech 4 days ago
Comment by perching_aix 4 days ago
Comment by jrflowers 4 days ago
Yes, it did. Here is some math showing that you shouldn’t care about that.
Comment by logicprog 4 days ago
Comment by jrflowers 4 days ago
It is like if your neighbor opens your door and a dog walks in, there’s no point in doing some weird analysis about all the times you yourself have let a dog walk in. He still did that.
Comment by logicprog 4 days ago
Comment by jrflowers 4 days ago
https://news.ycombinator.com/item?id=48419197
And your response to someone pointing out that sloppy, buggy code that Claude introduced, was to just quote Tridge (which does not in any way refute the fact that you’re looking at a bug that Claude introduced to the code)
https://news.ycombinator.com/item?id=48419621
I’m not entirely sure what the purpose of this project is (maybe to “prove” Tridge’s opinions about LLMs and human intelligence that he made in the linked blog post to be right?), but it appears as though you are ignoring irrefutably true observations. You just asserted that “the data” doesn’t show Claude introducing any bugs (which is a bizarre claim) after previously responding to a documented bug with a… deferral? Do bugs not count if you can find a vague excuse for it?
There is nothing in the blog post that is evidence that Claude didn’t introduce bugs. It is a thought experiment that uses “increase bugs” and “increase bugs more than a given arbitrary statical amount that I selected” as interchangeable statements.
Comment by logicprog 4 days ago
Additionally, I quoted Tridge in response to a comment about an increase in changes to rsync, not in response to the person pointing at one bug Claude introduced. If you actually looked at the thread, you'd see that. I didn't deny the Claude introduced bug at all.
Comment by MagicMoonlight 4 days ago
Comment by Etheryte 4 days ago
Comment by logicprog 4 days ago
Comment by wookmaster 4 days ago
Comment by everdrive 4 days ago
"Cars are just a tool. The drivers who piloted the vehicles and weren't careful enough [are responsible for the deaths.]"
Comment by roywiggins 4 days ago
Comment by ebiederm 4 days ago
The unsolicited security reports are the issue.
Comment by Angostura 4 days ago
Comment by runarberg 4 days ago
Comment by the_real_cher 4 days ago
Comment by throwaway7356 4 days ago
So far it reintroduced several security issues and replaced the README.md.
Comment by MYEUHD 4 days ago
It's not a fork, but it's 8 years old, and is already shipped by default in OpenBSD and macOS.
Comment by logicprog 4 days ago
> As to all the people saying “I’m going to package openrsync for platform XXX and we’ll use that!”. I find that rather amusing. If you do decide to go down that path I’d suggest you try the new rsync test suite on openrsync if you can stomach something that an AI has helped write. I tried it today and openrsync currently fails 85 of 98 tests, so I’m sure it won’t take you long to get it up to speed. You run it like this “./runtests.py — rsync-bin=../openrsync/openrsync — use-tcp”. Admittedly a lot of the failures are just features openrsync doesn’t have, but still, it’s not a great result.
Comment by MYEUHD 4 days ago
Just like I have been using doas for several years.
All I need is `rsync -urvP` and I suspect the majority of users don't need the advanced features either.
The smaller code base also means less bugs and vulnerabilities. As an example doas is ~1k lines vs 160k for sudo. That surely means a smaller attack surface. The same is true for openrsync and rsync at approximately 18k vs 57k lines.
Comment by atmosx 4 days ago
And here: https://marc.info/?l=openbsd-tech&w=2&r=1&s=rsync&q=b you can quite a few other bugs...
I use OpenBSD for routers and love it :-) but it is software, hence it has bugs .-
Comment by SoftTalker 4 days ago
Comment by nilslindemann 4 days ago
Comment by gadrev 4 days ago
$ apt-cache policy rsync | grep Installed
Installed: 3.4.1+ds1-7ubuntu0.2
$ sudo apt-mark hold rsync
rsync set on hold.Comment by imurray 4 days ago
As usual, Ubuntu backported fixes and didn't upgrade to a new version. Whether or not they also backported regressions in edge cases that afflict the latest rsync, I don't know. Pinning the Ubuntu package may prevent getting further regressions, but is preventing you getting any future such backported security fixes.
Comment by imurray 1 day ago
Comment by logicprog 4 days ago
Comment by overfeed 4 days ago
This is a terrible argument; I didn't need to have had secrets exfiltrated before applying row-hammer mitigations. If rsync is the cornerstone of my backup strategy, and has been for years, I need to trust that on its correctness, and for it to not lose my data. If I wait until I "face any actual bugs or regressions" - that will be far too late.
Stability is another issue not discussed. If the error rate holds steady, but number of significant PRs merged per release goes up from 5 to 200, that would be huge net-negative for my use case.
Comment by gadrev 4 days ago
I didn't have the time to actually think about any "arguments" at all tbh it's just a knee jerk reaction as I get ready to log off for the weekend. Not actually looking to argument for or against your post at all lol.
Comment by mwkaufma 4 days ago
Comment by logicprog 4 days ago
Comment by vintagedave 4 days ago
Good for you. I really mean that. I think people are winding you up in this thread, but keep your cool, and I admire publicly crediting and being proud of your wife. That’s a healthy relationship. Good for you.
Comment by bakugo 4 days ago
Do you genuinely believe an article written by AI defending itself is going to convince anyone who wasn't already on your side? All you're doing is giving more fuel to the "anti-AI crowd" you hate so much.
Comment by logicprog 4 days ago
> Your analysis was so thorough, rigorous, and objective, that you couldn't be bothered to write it yourself. Do you genuinely believe an article written by AI defending itself is going to convince anyone who wasn't already on your side?
Except that I did. I spend days comparing and manually deciding on metrics and methodology – I did not use the AI to decide what I would do or how I would do it, so it is not "the AI defending itself" — then refining things, adding more angles to analyze, and, as I literally say in the opening section, I rewrote all the prose in the entire document just to satisfy critics like you. That sounds like "could be bothered" to me. But people like you will never be satisfied.
Also, even if I hadn't done all that work, that wouldn't make it not rigorous (it clearly is) or objective (it is as objective as it can be with so little data). You're bikeshedding to avoid the point.
Comment by bakugo 4 days ago
This statement is honestly so ridiculous that I felt it didn't warrant a direct response, but here's one anyway: AI enthusiasts have been proudly proclaiming for literal years that AI makes them 10x as productive based on cherry-picked anecdotes with zero empirical evidence to back it up. It's way, way too late to claim hypocrisy here. As I stated under the original submission about this topic, irrational anti-AI behavior is usually just an equal and opposite reaction to irrational pro-AI behavior.
> I rewrote all the prose in the entire document just to satisfy critics like you.
And that doesn't help. If anything, editing the AI output to make it read less like blatant slop just comes off as deceptive, like you're trying to hide the fact that the analysis was AI generated. Looking at the commits, you were adding more AI generated text less than 2 hours ago[0] before quickly editing out one of the most blatantly sloppy sentences I've ever read[1].
Regardless, the final contents of the article are not the main issue. Even if we ignore the bias clearly on display there, the premise alone is enough to dismiss the entire thing as heavily biased and chasing a pre-determined conclusion - of course someone who is so dependent and trustful of AI that they decide such an analysis on the bugginess of AI code should itself be written by AI is going to steer the conclusion towards "actually AI code is good and you luddites are overreacting". The entire concept is so tone-deaf that failing to notice it or predict the criticism before publishing is enough to prove the bias.
[0] https://github.com/alexispurslane/rsync-analysis/commit/e029...
[1] https://github.com/alexispurslane/rsync-analysis/commit/740b...
Comment by logicprog 4 days ago
> This statement is honestly so ridiculous that I felt it didn't warrant a direct response, but here's one anyway: AI enthusiasts have been proudly proclaiming for literal years that AI makes them 10x as productive based on cherry-picked anecdotes with zero empirical evidence to back it up.
Let's go back to remedial classes on this one.
"I have found that [tool] has made me more effective" is what we call lived experience. It is an "i" statement communicating something about the person’s life. It does not require evidence by default, and you are a crazy person if you call bullshit without good reason, because many "I" statements are epistemically justified in ways that can't be empirically demonstrated or require tacit knowledge.
"[tool] has been buggier since [change]" is a falsifiable claim; you need to actually provide evidence for believing it, and what I'm showing is literally that there isn't any.
Comment by logicprog 4 days ago
I'm talking about the double standard on the anti-AI side about what evidence should count, not some vague industry-wide epistemic standard, whatever that means. I'm aware LinkedIn Lunatics and Steve Yegge are also being crazy. And it seems to me that even your response here is engaging in a bit of a double standard, or something akin to it, in that you think the irrational anti-AI behavior should be given a pass — and the conclusions perhaps even taken seriously — just because pro-AI people did it too.
> And that doesn't help. If anything, editing the AI output to make it read less like blatant slop just comes off as deceptive, like you're trying to hide the fact that the analysis was AI generated.
Okay, so, if I don't spend the time to write everything myself, that's bad because it's AI slop. If I do rewrite everything myself, then it's evidence of deceptiveness... despite being asked by multiple people to do that, and being extremely explicit about my methods and process and the commit history being (as you've shown), very public.
Also, the AI-generatedness of the text doesn't mean the analysis is AI generated, in terms of what was actually done. That's a category error.
> Looking at the commits, you were adding more AI generated text less than 2 hours ago[0] before quickly editing out one of the most blatantly sloppy sentences I've ever read[1].
The second commit literally says that that was my prose it was fucking with by adding slop. It's just that me adding my prose, and it adding slop to it, were in the same previous commit. Additionally, my process is often giving it exactly what I want to say, more or less, and having it HTML-format it and insert the templated numbers and UI widgets around that text.
But again, even if I'm spending the time to read through and edit everything it's writing to de-slop it, then I'm clearly also reading it through enough to make sure the analysis makes sense, and is accurate; how is that not enough "effort" for you, if effort is supposed to be a proxy for verification?
> Even if we ignore the bias clearly on display there, the premise alone is enough to dismiss the entire thing as heavily biased and chasing a pre-determined conclusion - of course someone who is so dependent and trustful of AI that they decide such an analysis on the bugginess of AI code should itself be written by AI is going to steer the conclusion towards "actually AI code is good and you luddites are overreacting".
That's not ignoring the bias, that's literally restating that you think the bias is there. But if you really think that my bias meaningfully "steered the results," then show me how that happened. Tell me how you would've proven the Claude releases were meaningfully worse, or unusual, at all, or how the methods I chose biased the data against that result, or literally anything except shifting the goalposts and using accusations of "bias" as a get-out-of-jail-free-card.
> The entire concept is so tone-deaf that failing to notice it or predict the criticism before publishing is enough to prove the bias.
And you're so committed to your preconceived notions that anything made with AI must be bad, wrong, or not worth your time, that you'll spend your entire time begging the question ("it's made with AI, therefore it's wrong") and shifting the goalposts instead of engaging meaningfully.
Also, I certainly predicted the criticism (in general, anyway, to the fact that it was made with AI; not the prose being AI) but I made it this way anyway, because if someone is so AI-blinded that they can't read and evaluate the actual metrics, methodology, and provide meaningful criticism to it, and instead can only see that it was made with AI, and they're so it doesn't matter.
Nothing you have said makes the analysis wrong. At this point, you're essentially just resorting to ad homenem and begging the question.
Comment by bakugo 4 days ago
I don't know who asked you to do it. I wouldn't have done it. Personally, the original intent matters far more to me. You intended to submit an AI-generated article, defending AI, to be read by humans. Anything short of taking the article down and rewriting the entire thing from scratch doesn't meaningfully change that.
> Additionally, my process is often giving it exactly what I want to say, more or less, and having it HTML-format it and insert the templated numbers and UI widgets around that text.
Sorry but you're just further proving my point here. You are so deeply invested in AI that even just manually writing some English text into a static HTML file is something you consider to be below you.
Imagine going back in time 5 years and telling someone: "In the future, nobody uses text editors. On the rare occasion that we actually want to write something to a text file verbatim, we instead recite the text to a complex artificial intelligence algorithm that uses large amounts of computing power to process said text and then recite back a command that writes the text to a file. Sometimes the algorithm decides to be a smartass and change our words or add an extra quip, but that's all part of the fun."
> That's not ignoring the bias, that's literally restating that you think the bias is there.
I was referring to the bias within the actual text of the article vs the inherent bias displayed by the very concept of an AI-generated article defending AI. Passages like these:
> The thread did not stop at words. As is typical for anti-AI users, it eventually escalated to fantasies of violence
Make it fairly obvious that you went into this project with the primary goal of proving such people wrong, possibly backed by a sense of moral superiority relative to a few weirdos on the internet who took things too far (such individuals are present in every online discussion that gets big enough, and their actions do not represent the whole).
> And you're so committed to your preconceived notions that anything made with AI must be bad, wrong, or not worth your time
"Bad" or "wrong" may be subjective, but it's definitely not worth my time, no. If you didn't consider it worth your time to write it, why do you believe it's worth someone else's time to read it? Again, it doesn't matter if you went back to rewrite parts of it after being criticized, as that doesn't change the original intent.
Submitting an AI generated article and expecting meaningful human responses only makes sense if you consider your own time to be worth more than that of others. Do you?
Comment by shawndumas 4 days ago
i also am seeing them engage aptly with constructive criticism and adapting the material while handily dispatching the non-constructive critiques. most of which amounts to a colossal missing-of-the-point.
they have made no out of proportion claims, no non-recreate’able analysis, used exactly the correct tools, and, frankly have addressed all of your points
i am not sure you’ll agree with anything i’ve said either so feel free to misunderstand me too
Comment by logicprog 4 days ago
And once it's originally posted, it doesn't matter the great extent I go to address metrics and methodological critiques in order to ensure that the data is as robust and helpful as possible. And the effort in writing and refining my prose and the organization of the report in response to people's complaints and criticisms because I do value their time. And when people told me the AI prose was bad, I spent two hours to to make sure that it was something people would want to read, that doesn't matter at all? It's only the original intention that matters. So you just have this arbitrary cutoff point for what counts towards my intentions in the post and my character. No allowance for learning or adaptation, and the fact that I'm clearly committed to putting a lot of effort into making this something that is useful and pleasant to read for people, I just didn't do it for the first draft originally, doesn't matter, only the original version matters?
And more than that, you're not going to actually deal with the substance of the issue, the actual calculations and methodology and conclusions that I came to, instead, the only semi-substantive critique you're going to make of the post is to tone police me and dance around the real issues, as if you're afraid of ever touching them?
The best argument you could make that my bias actually influenced my conclusions would be to point into the methodology and metrics where I did that. I made it all extremely open and transparent and auditable both by describing it in extreme detail in the post and by providing all of my source code and the ability to build the database it runs on from scratch. If there was an actual flaw or bias that my intentions going into this created your biggest possible Smackdown, your best weapon in your arsenal would be to actually point that out. But instead, again, you're just tone policing me. but a polemical style in the presentation of an objective statistical analysis does not in the least undercut its accuracy. Have you considered that my polemic became so fiery, in fact, precisely because I ran the tests and found how non-existent the evidence was for this outrage and that's what made me angry? No, you didn't because you saw some words that hurt your feelings and now you won't listen to facts.
Comment by bakugo 4 days ago
Ultimately, I'm just trying to get you to understand how this decision undermines the presumed goal of trying to convince the anti-AI crowd that they're wrong. It's simply not fair to expect humans to engage with the article in good faith when the article itself was not written by a human in good faith, regardless of its contents or the numbers it's based on. If you still disagree, so be it, I have nothing else to argue.
And for the record, I didn't engage with the methodology itself or its merits because I don't believe this question can be answered via an automated statistical approach, or really any sort of objective approach. The only way to truly evaluate the quality of AI generated code is for a skilled developer who is at least moderately familiar with the codebase to carefully analyze each commit, understanding what it does and looking for dumb mistakes that a human likely wouldn't have made in the same situation. But it's very unlikely that anyone will waste their time on that, and the conclusion would still be subjective anyway.
Comment by 1a527dd5 4 days ago
It's open source, no one is forcing you to use it.
If you don't trust the newer versions; use the old versions.
If you no longer like the maintainer because of reasons, fork it/start your own.
It's not that hard.
Storm in a teacup.
Comment by mmonaghan 4 days ago