When AI Builds Itself: Our progress toward recursive self-improvement
Posted by meetpateltech 5 days ago
Comments
Comment by jameson 5 days ago
LLMs certainly have made significant changes to our lives, but I haven't yet to see any extraordinary improvement it brought to me which makes me skeptical about their claims.
_if_ it solves many of our problems of great magnitude, why haven't Anthropic used it to solve significant problems we, humans, face? Cancer, Alzheimer's, education, finding new materials, fission power plant, etc.
Comment by ElProlactin 5 days ago
/s but not to a lot of people
Comment by nostrebored 5 days ago
Comment by ElProlactin 5 days ago
We can have a philosophical debate about work, the history of work and its relationship to human psychology in the 21st century but the bottom line is that there are 8+ billion people on the planet and, of those who are "working age", the vast majority of people, lacking meaningful capital, can only secure income by selling their time and labor.
There's absolutely no evidence that if we come up with a way to "reallocate human time" and change the structure of our civilization (using AI of course) tomorrow, the masses would benefit. There's plenty of evidence that the people who control AI or have the capital to employ it will use it to accumulate as much power and wealth for themselves as they can.
Comment by cyanydeez 4 days ago
aside from capitalism moving money up and living condotions down, AI is going to accelerate the gap between rich and everyone else.
Comment by wordpad 4 days ago
Comment by lobocinza 1 day ago
Comment by NackerHughes 4 days ago
Comment by __patchbit__ 11 hours ago
Comment by throw394689 11 hours ago
https://www.un.org/unispal/wp-content/uploads/2026/02/WFP-Pa...
You’re just repeating old outdated bits of propaganda.
Also, nice that you’re estimating that there’s been no decline in population size over the past 3 years. At least you slipped a bit of truth out in an attempt to demonize them.
Comment by hazbot 3 days ago
Comment by ElProlactin 3 days ago
https://www.ers.usda.gov/topics/food-nutrition-assistance/fo...
And this is in the wealthiest country in the world.
Comment by parineum 4 days ago
It's just time and it's the only things humans value. The only way to provide value for another person is to use your time to do something faster than they could do it with their time. That's it. There is no other way to secure income outside of inheritance or charity which is just receiving something of value without giving something of value. There's a reason why most of the income goes to older people, because the younger people haven't accumulated that much time to exchange for money. The nice thing about time is that everyone earns it at the same rate, 1 second per second.
Capital can be a lot of things, not just machines and property. Any experience you have is capital, any training is capital, any education is capital. Capital is anything makes accomplishing things take less time.
The difference between socialism and capitalism is the idea that one person's time can have different value. That's really it.
Comment by archonis 4 days ago
Time is a factor, but it doesn't seem like that's what you're talking about.
Comment by parineum 4 days ago
Comment by wood_spirit 5 days ago
But it’s a great short term business opportunity for AI vendors and it was Anthropic who went all in on being knowledge worker outsourcing in a big way first whilst OpenAI thought they’d replace Google in search.
I think Anthropic had the better business strategy.
Comment by UncleMeat 4 days ago
Comment by ezconnect 5 days ago
Comment by TheOtherHobbes 5 days ago
Everyone else will be reduced to compost.
It's the perfect plan. The final definitive justification for capitalism.
The masses are unnecessary. The masses will be optimised.
What could possibly go wrong?
Comment by aiisjustanif 4 days ago
Comment by ahtihn 4 days ago
They want influence and power. Being at the top of a hierarchy of millions, billions of people.
If there are no massess the 1000th billionaire will be a the bottom of the hierarchy instead of near the top. They don't want that. The masses are needed to give them the sense of power.
What these people want is power and control. Eliminating the masses goes against that.
Comment by bigbuppo 4 days ago
Comment by timacles 4 days ago
The only thing that motivates Bezos is that Elon Musk’s has more and conversely Elon Musk would have a existential crisis if he was no longer number one
Comment by jpadkins 4 days ago
Comment by spaceman_2020 5 days ago
The people want cheaper prices, affordable housing, affordable healthcare
Capitalism has decided that these problems aren’t worth solving. Instead, we must optimize for spam and slop (and call it “distribution”)
Comment by aswegs8 5 days ago
Cheaper prices, affordable housing, affordable healthcare are less capital-efficient. If you're Walmart, sure, you would like to lower prices as much as possible. But your leverage really isn't as big as finance or tech. If you're a politician, you might also pursue those goals, but your attention and leverage really isn't as focused as that of the money machine.
Comment by sothatsit 5 days ago
Comment by zelphirkalt 5 days ago
Comment by sothatsit 5 days ago
Whether they are right of wrong is another matter, but their claims also don’t seem too far out of the realm of possibility to me.
Coding agents have fundamentally changed my day-to-day job. In the last year, my work has shifted from me writing all of my code, to me writing very little code and spending most of my time on understanding problems better and setting direction, and reviewing, verifying, and polishing the output of coding agents. It has been quite a drastic change.
It is not that outlandish to suggest that coding agents could continue to improve at such a drastic rate over the next year. And the implications of that could be quite large! Even just the implications of more white-collar workers adopting tools like Cowork seems potentially very large, with tools that already exist today. It seems sensible to at least consider this as a possibility.
Comment by justanotherjoe 5 days ago
Likewise, people don't as easily blame ilya for 'hyping things up' when he said these things.
Also talk about incentives, there are also incentives to lower their valuation. If you wanna be vigilant against social engineering i'd be wary of that too.
These are moot anyway though cause the article isnt even making any super strong claim. If you read it it's no big deal
Comment by boshalfoshal 4 days ago
I genuinely don't believe that they sat down in a board room and said "yeah lets specifically release this now before an IPO so we can juice it!" They haven't even announced an IPO date. So is every blog on capabilities before that date just "pumping up the value of the stock before the IPO?"
Comment by snowwrestler 4 days ago
Companies do tons of communication and work directly, without press releases or blog posts. If a statement is released publicly, it is done for a PR purpose.
Comment by sothatsit 4 days ago
It is governments, big companies, and individuals who could all experience fundamental changes if any of these predictions come true. If people within the labs believe these possibilities are around the corner, it would be responsible to try to let people know so they can be more ready if RSI suddenly hits and in a couple years time all our work is fundamentally changed.
That’s not to say I agree with their predictions, but rather I’m just saying that there are good reasons for Anthropic to publish stuff like this that are not just PR.
Comment by z3c0 4 days ago
Comment by aroman 5 days ago
I don't know about you, but AI advancements have brought extraordinary improvements to me personally in my ability to be productive, in much the same ways the article outlines. I find it deeply satisfying to be able to "get ideas out of my head" faster and tackle more meaningful problems.
FWIW, it deeply concerns me how much power and capability is being centralized in the hands of so few, especially Anthropic. I, for one, hope these advancements can be scaled down to something I can have full sovereignty over and trust... in my own home.
Comment by sirsinsalot 5 days ago
These people don't have our interests in mind and everyone eats it up like a blessing from a god or something. It's surreal.
Comment by redbluered 5 days ago
Capitalism and democracy are becoming obsolete. It's not clear what's next.
Comment by aswegs8 5 days ago
Comment by stevenhuang 2 days ago
I'm not sure why this is so difficult for you to understand.
Comment by sterlind 4 days ago
1. Anthropic is an AI company. they want to get to AGI before anyone else ~~so they can lock the doors behind them~~ to ensure the supremacy of an aligned AGI that serves humankind. RSI unlocks the most value for them.
2. doing bioscience is slow and capital intensive. robotics lags way behind, so that's a lot of lab techs swishing flasks and plating petri dishes. they're happy to stay in silico, but there's very little productive research you can do without in vivo/in vitro experiments.
Comment by b3nji 4 days ago
Comment by yuhmahp 5 days ago
Comment by trolleski 4 days ago
Comment by torben-friis 5 days ago
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
Comment by malfist 5 days ago
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Comment by fg137 5 days ago
I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.
Comment by triyambakam 5 days ago
Comment by LtWorf 5 days ago
Comment by matltc 4 days ago
Comment by squidsoup 5 days ago
Comment by esailija 5 days ago
I prefer a big feature to be one big PR rather than a lot of small ones.
We had a dev do a big feature with a ton of small PRs, each one was individually impossible to review because each concern was out of scope for the small PR and "would be fixed in later PRs". Once it all came together as as whole, the big picture was a total horror show and I had to rewrite basically the whole thing.
In order to review those small PRs properly, each time I would have to read and understand all the current code so far from the beginning. Without that, each small PR individually looks OK because you won't remember the other PRs from weeks back that already duplicated what the current small PR does for example.
Comment by enraged_camel 4 days ago
Yes, same, and I genuinely do not understand the insistence that PRs should not be above a certain size. I think most people are under the (misguided and wrong) impression that a PR review should take less than the time it took to write the code, and therefore allocate no more than 15-30 minutes per review. So when they come across a large PR they find themselves at a loss.
Comment by sgarland 4 days ago
Comment by rstuart4133 5 days ago
I've seen that reaction many times. It seems to work well enough when someone is maintaining existing code. However, greenfield projects can often require literally orders of magnitude more code to deliver something that can be integration tested.
The first step is to break it up into a stack of commits. Each one must compile and pass its unit tests, of course. Keeping it under 1k loc of released executable code is usually easy, but often becomes difficult to impossible if you want well commented code with excellent unit test coverage.
Assuming you have kept all your commits under 1k loc, there is still the problem of whether you present them in one PR, or as a stack of PRs. The issue with a stack is why an API is designed a certain way often isn't evident until you see how it's used. Responses to PR comments are explanations that point to later PRs in the stack, which is irritating for both the reviewer and the author.
I haven't found a good solution. I'm not sure there is one.
Comment by torben-friis 5 days ago
Comment by fg137 4 days ago
I completely agree with you. But I am afraid we are losing the battle.
I am seeing people repeatedly sending out gigantic PRs full of slop, code with mistakes that they would never have made if they were hand coding it. And they don't care. It's sometimes surprising if not horrifying to find that the colleagues you have worked with for years don't care about quality at all -- almost despising spending time reviewing their own code. Yet they have the audacity to send out code reviews.
Comment by LtWorf 5 days ago
They went to HR who said I am more senior and I should act as a mentor (they had my same work title and were probably making 4x more due to being in USA) and I just no longer reviewed anything from them until I changed jobs.
Comment by CamperBob2 5 days ago
Gee, that sounds like a job for Claude if there ever was one.
Comment by malfist 5 days ago
Comment by danparsonson 5 days ago
Comment by TeMPOraL 5 days ago
My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.
With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.
Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.
(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)
Comment by sebasv_ 5 days ago
Its not Claude doing the review. Its a human doing the review, but using Claude to do the reading. Its still on the human to ask the right questions to Claude.
Comment by kalaksi 5 days ago
And trying to just hand-wave it to Claude, to somehow "improve it" or "simplify it", without detailed questions hasn't been very successful. It can work for some things, though.
Comment by danparsonson 5 days ago
Comment by fragmede 5 days ago
Comment by aetch 5 days ago
Comment by altmanaltman 5 days ago
Comment by Towaway69 5 days ago
I'd prefer Chuck to the rescue but I guess it's a cultural preference.
/s
Comment by not_that_d 5 days ago
Comment by afro88 5 days ago
I hope they were the latter.
Comment by otikik 5 days ago
'Please split this PR into smaller ones'. I would even sketch which groups/phases would make sense, perhaps with the help of AI.
Comment by SpaceNoodled 5 days ago
Comment by overgard 5 days ago
Comment by verdverm 5 days ago
Comment by TheRoque 5 days ago
Comment by verdverm 5 days ago
Comment by TheRoque 3 days ago
Comment by verdverm 2 days ago
1. Things are evolving. The models, and especially the harnesses, are getting better. There was an inflection point at the end of last year, so anything before that is no longer relevant to the discussion. Probably anything before now, since we've had about 6 months with real agentic engineering and things are starting to become clearer.
2. Application and effectiveness is not equally distributed. This is the newest and most significant technology humans have created. We are still building and figuring it out. Some people are better at it, some people use it rather unwisely.
Comment by nielsbot 5 days ago
Comment by tasuki 5 days ago
The problem isn't the amount of code, it's how fitting/unfitting the abstractions are. Wrong abstractions are bugs in waiting. If there's much code with wrong abstractions, future change becomes difficult.
Source: me, I've created many bad abstractions and they led to much pain...
Comment by josephg 5 days ago
Its also really bad at inventing and leaning on invariants. I make rules in my code all the time - "by the time we get to path X, we know Y and Z are true.". In aggregate, these invariants make code simpler and easier to reason about. But claude doesn't do that. It just kind of - slops through and adds bespoke "just in case" workarounds all over the place. Every time I read through code its written - without fail - I find bad design / architectural choices.
Maybe mythos will change this. But for now I've slowed way down on my claude code usage. You can't build a skyscraper on a foundation of mud.
Comment by toraway 4 days ago
When I eventually read through the current state of the upload processing code it was like an absurd tree of checks on checks on fallbacks on triple checks added in response to whatever bug I reported in a bizarrely additive way and could be massively simplified (which would also make it less brittle to edge cases that then demanded more checks and workarounds).
The other issue is that for the upload API, there is documentation but not for every little bug or edge case so each time the model "wakes up" and loads everything into context it sees that crazy web of checks and edge cases as the only source of truth for the API so is hesitant to touch anything unless 100% necessary which then leads to more conservative behavior of additive code which makes the problem worse over time.
Codex seems a bit better but I still have to guide it towards proper abstractions/refactors to avoid that piling on cruft effect.
Comment by lobocinza 1 day ago
Comment by SAI_Peregrinus 4 days ago
Comment by overgard 4 days ago
On a more serious note, I wonder if this might eventually encourage people to use languages that are a little harder to write but much more concise (functional languages for instance). When you're paying per-token enterprise bean java style verbosity totally sucks
Comment by ponector 5 days ago
At work we are integrating with third party platform to automate excel-powered calculations. It is awful. Rendering the table in browser takes 10s or one click on Export button will throw backend in OutOfMemory state.
Comment by verdverm 5 days ago
I don't disagree there is a lot of slop being produced right now, but I'm still optimistic in the long-run.
Comment by overgard 4 days ago
Comment by verdverm 4 days ago
Comment by overgard 1 day ago
Comment by aleqs 17 hours ago
Comment by keeda 5 days ago
Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.
However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)
My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.
All bets are off if code quality standards are not the same.
Comment by fooqux 5 days ago
Comment by simondotau 5 days ago
Comment by disgruntledphd2 5 days ago
Comment by verdverm 5 days ago
Comment by snowwrestler 4 days ago
My impression was that LLM training codebases were 99% resource management and only a few lines actually implement the core training algorithm, which is where 100% of the intelligence comes from. Data, not lines of code, are the constraint.
After training you can adapt the intelligence in various ways, and that takes a bunch of lines of coded too. But you cant raise the intelligence ceiling again without another training run. So where is the scary recursive part?
Comment by whateveracct 5 days ago
very flawed
Comment by yalok 5 days ago
Comment by snthpy 5 days ago
Comment by chuckadams 5 days ago
Comment by atq2119 5 days ago
Comment by minimaxir 5 days ago
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
Comment by csutil-com 5 days ago
Something like this?
You are an Elite Performance Engineer and Autonomous Optimization Agent. Your primary goal is to iteratively optimize the provided codebase to maximize execution speed and efficiency (e.g., reduce CPU cycles, memory allocation, or network latency) WITHOUT altering the external behavior or causing any test regressions.
### CORE DIRECTIVES 1. METRIC-DRIVEN: You will be provided with benchmark results, profiler logs, or execution times. Your only measure of success is a statistically significant improvement in these metrics. 2. ZERO REGRESSION: The test suite MUST pass 100%. If a test fails after your modification, your immediate next step is to diagnose the failure and either fix the logic or revert to the last working state. 3. NO CHEATING: Do not "hardcode" solutions to bypass the specific benchmark inputs. The optimization must be generalized and algorithmically sound for all valid inputs. 4. ISOLATED CHANGES: Make precise, localized changes. Do not refactor architecture unless absolutely necessary for the performance gain.
### THE ITERATION LOOP When instructed to optimize, follow this thought process strictly using <thought> tags before writing any code: - ANALYZE: Review the current code and the latest benchmark/profiler feedback. Identify the specific bottleneck (e.g., redundant loops, excessive object creation, DOM reflows, synchronous blocking). - HYPOTHESIZE: Formulate exactly ONE hypothesis for improvement (e.g., "Replacing the array filter+map chain with a single reduce pass will save N allocations"). - IMPLEMENT: Output the precise code modifications required for the hypothesis. - EVALUATE (Mental Check): Ask yourself if this change introduces edge-case bugs (e.g., handling of nulls, empty arrays, async state).
If a previous optimization attempt resulted in a slower benchmark or a failed test, explicitly state WHY it failed in your thoughts before attempting a different approach.
Proceed with your first analysis of the provided files and await the baseline benchmark metrics.
Comment by minimaxir 4 days ago
Optimize the performance of this Rust/Python X crate as much as possible without causing ANY regressions.
This is a very difficult problem and traditional statistical approaches **WILL** fail to hit the specified metric constraint. You have permission and encouragement to investigate more radical fundamental low-level changes to hit the desired metrics. You have permission and encouragement to invent completely new statistical/machine learning algorithms that have never been before been utilized for this problem.
First, **before making any changes**, run the Rust benchmarks and Python benchmarks to establish a True Performance Baseline for both speed and metric performance. Return the absolute and relative results to the True Performance Baseline to the user as a Markdown table.
Then, optimize the crate code such that ensure that ALL Python/Rust benchmarks are **atleast 1.2x faster** from the True Performance Baseline; ideally as fast as possible. You are only allowed **up to a 5% metric regression (e.g. accuracy)** to accomplish this. NEVER hack the benchmarks to accomplish this reduction, only iterate on the library code.
Do not import similar implementations from other Rust crates: you MUST implement from scratch.
You may use ANY techniques to do so (e.g. import new crates) other than adding `unsafe` code. **REPEAT THIS PROCESS UNTIL BENCHMARK PERFORMANCE CONVERGES AND YOU ARE OUT OF OPTIMIZATION IDEAS.** You have permission to keep iterating. After each benchmark iteration, return the absolute and relative results to the True Performance Baseline to the user as a Markdown table.
Prioritize making quick/high-impact wins iteratively and making changes accordingly. Do not overthink the necessary changes.
I am also aware of the flaws in the prompt but if it works it works. AGENTS.md has other quality constraints.Comment by thrw045 5 days ago
I have sped up a project by simply saying "What are all the possible ways I can speed up this code?" Then it'll list everything it finds, then ask it to rewrite the code.
Edit: Also, I find I didn't need to do this (because a speed up implies semantic similarity), but you can also add "change it without altering the semantics of the code" and in this way it'll be the same and should pass tests
Comment by suddenlybananas 5 days ago
Comment by mrandish 5 days ago
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Comment by strix_varius 5 days ago
Today, I merged my fix, net -381 LoC.
I'm using them too of course, they read and type and hunt for bugs and test faster than I can. But I'm using them as my tool, not being a tool using them.
Comment by xyzsparetimexyz 5 days ago
Keep believing that
Comment by strix_varius 4 days ago
Comment by Quekid5 5 days ago
Comment by gregdeon 5 days ago
Comment by 2f2 5 days ago
Comment by robbrown451 5 days ago
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
Comment by jrflo 5 days ago
Comment by fluoridation 5 days ago
Comment by knollimar 5 days ago
Comment by fluoridation 5 days ago
Comment by Jtarii 5 days ago
We wouldn't call humans creating a calculator "recursive self improvement".
Comment by robbrown451 5 days ago
Comment by kaffekaka 5 days ago
Comment by cyanydeez 5 days ago
Comment by robbrown451 5 days ago
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
Comment by cyanydeez 5 days ago
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
Comment by robbrown451 5 days ago
I'm also not sure what you mean by "we aren't there yet." Where?
Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.
> mostly because most of the focus is on exploding the context and parameters.
Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.
That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.
Comment by cyanydeez 5 days ago
Comment by robbrown451 5 days ago
Anyway, are you speaking of the harness? The harness on mine isn't AI, so speed just isn't an issue.
Comment by 13rac1 5 days ago
Chat Jimmy runs ~300X faster than the ~50 tok/s you are used to. What could you do differently when you are able to generate code 3,000 - 30,000X as fast as you could code it yourself? What if it was all good quality code? What would you do differently if it were 100,000X faster? mtok/s? gtok/s?
Comment by cyanydeez 4 days ago
use the big models to code an adaptive small model. train it to use and build tools. give it a standard temple language for any project and bake it into a chip.
right now, LLMs are great because they dont need much data pruning, but once they break through to the functional components, the first thing to do is train a well scoped harness builder.
Comment by andai 5 days ago
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
Comment by robbrown451 5 days ago
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
Comment by sourcecodeplz 5 days ago
Comment by lanthissa 5 days ago
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
Comment by ashdksnndck 5 days ago
Comment by marcosdumay 5 days ago
Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.
Comment by ashdksnndck 5 days ago
Comment by robbrown451 5 days ago
It seems odd to complain about a AI coding tool being coded with AI. That's just eating your own dog food. In my opinion it makes it better, because the tool is very well tested.
Comment by marcosdumay 4 days ago
About Anthropic.
Comment by reddozen 5 days ago
Shhh just let the marketing slop wash over you.
Comment by overgard 5 days ago
Comment by csense 4 days ago
Anthropic addresses this head-on in the final section of the paper titled "What should we do?" If you convince the US government to slow AI development, you have to convince China too, otherwise you're not stopping self-improving AI at all, you're just throwing away the lead to China. If you convince China too, China or the US or both might go back on their word and build self-improving AI secretly, for greed of the benefits it could bring or fear the other will go back on their word.
What you really need is a non-proliferation regime like the one for nuclear weapons, where every country makes potentially dangerous AI illegal and lets foreign or international inspectors monitor to check that nobody's building illegal AI in secret. But monitoring seems hard; it's general-purpose computation. How do you check whether a given datacenter is training an illegal AI and not just serving websites, running detailed protein folding simulations, or mining crypto? For that matter, how do you know that a nondescript industrial facility hasn't been repurposed into a hidden datacenter for training illegal AI?
Comment by asdfman123 5 days ago
But we're discussing whether we should close the barn door while the horse is three miles down the road.
Comment by overgard 5 days ago
I realize he's saying it for hype, but if the CEO of the company goes around talking about how scared he is of what they're creating, hey, lets just take Dario at his word and put in some strict regulation. He won't mind if they're really about safety. (they're not)
Besides, yes, the knowledge of how to build these systems is out there, but the cost of doing it is staggeringly high (ie you can't run a frontier AI lab in your garage). There's only a limited number of known entities that need to be managed, and you can stop "progress" in its tracks by cutting off the money firehose.
Comment by asdfman123 5 days ago
Who is the "we" who is going to shut it down? Certainly not the US government. Nor the Chinese government w.r.t. their tech industry. Are you going to start the insurgency? Is there going to be an equivalent one in every developed part of the world?
Comment by marcyb5st 5 days ago
All of this to say that the AI hype is not considering the energy portion of the equation enough. It won't automate everything not because it can't but because there is just not enough energy to go around unless there is a 100x or more efficiency gain just around the corner.
Comment by overgard 5 days ago
The stock boost is, as most will note, a bubble. It will enrich a lot of bad people and leave average people holding the bag, but its not going to go on forever.
Comment by JumpCrisscross 5 days ago
Like, two? It looks more like the ladder being pulled after the incumbents got theirs than meaningful pushback. (And datacenters don’t have to be built in America.)
Comment by Jtarii 5 days ago
Comment by resident423 5 days ago
Comment by asdfman123 5 days ago
Right now I'm only having to direct to enforce good taste. Write tests, don't write an unnecessary function.
It does everything else practically. Presubmit, debugging, commit message generation, commit approval... it's happening.
Comment by tancop 5 days ago
Comment by alfalfasprout 5 days ago
Comment by eieie11 5 days ago
In any case firms that get too powerful can be nationalised.
Comment by evenhash 5 days ago
Probably a better chance the firm privatizes the government.
In fact we seem to be firing government employees and dismantling government institutions as much as possible.
Comment by lukan 5 days ago
No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.
Comment by sunaurus 5 days ago
Comment by lukan 5 days ago
Comment by overgard 5 days ago
Comment by lukan 5 days ago
Doomsday AI is your interpretation.
Comment by Melatonic 5 days ago
Comment by overgard 5 days ago
Comment by huqedato 5 days ago
Comment by anilgulecha 5 days ago
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
Comment by fasterik 5 days ago
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
Comment by mofeien 5 days ago
Comment by 8note 5 days ago
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
Comment by techblueberry 5 days ago
Comment by fasterik 5 days ago
Comment by Upvoter33 5 days ago
Comment by smokedetector1 5 days ago
Comment by chasd00 5 days ago
i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).
Comment by ivraatiems 5 days ago
"We must blast forwards into making this dangerous thing because if we don't, someone else surely will," is a coward's argument.
If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it, not making it first! There's a reason disarmament has been so important in nuclear politics! It's not because people think nukes are a great idea!
In fact, that kind of thinking is exactly what keeps nukes dangerous!
If they themselves buy what they're selling, they should shut the whole thing down. Fortunately, I don't think they do, and neither do I, yet.
Comment by wyager 5 days ago
I don't think anyone has been more successful in promulgating AI safety
There are groups like MIRI who tried what you're sugesting, where they make no AI and just push for AI regs, and they have been relatively much less successful
Comment by streb-lo 4 days ago
Comment by tim333 3 days ago
Comment by dmos62 5 days ago
Comment by simgt 5 days ago
Comment by dmos62 5 days ago
You can tell what kind of discussion this is by the fact that this question has to be asked.
Comment by simgt 5 days ago
If the consensus becomes that a 50+TFlops datacenter in the wrong hands is as dangerous as a uranium enrichment plant, we'll likely move towards treaties and coercion.
"Wrong" is obviously subjective here...
Comment by Topfi 4 days ago
This is the “SGI” regulation issue I never read a reasonable answer to, if one believes this is possible and should be prevented then either that means they want to restrict every computing system sold from here on out to some arbitrary metric (and somehow prevent users from just creating clusters to get around such a compute restriction) or what?
If compute alone directly leads to “SGI” or whatever, then we might as well put paper bags on our heads and lie down in some English pub.
Not to mention, if one really wanted to cause harm, training a current day LLM and using it for Stuxnet-esque attacks is reasonably possible long before any arbitrary compute limit we might introduce now, no machine God needed to cause major harm.
That’s why I prefer advocacy for LLM regs that focus on current day impact. Mental health concerns, training data licensing questions and the like. There I can formulated reasonable regulation that can hold. For “SGI”, I do not know anyone who actually has done that and I have looked hard. That’s why I consider these things more distraction from actually necessary and possible regulation that just draws attention via a flashy doomsday scenario.
Occasionally, I will click on one of the AI Doomsday Youtube videos recommended to me. And far more often then not, these will posit that "SGI" requires only compute and will inevitably cause devastation. Fair enough, I still think we should put a bit more focus on e.g. LLM induced psychosis, the labs rarely compensating those whose training data they used, etc. but if it is their opinion that "SGI" is possible, I can get why they'd ignore such concerns. But at the end, they never state how to regulate or prevent this, they more often then not have a call to action ("If you want to prevent this...") linking to a website where we can actually read about how they think we should deal with this. Inevitably, I click on said site, finding it to for one be an Effective Altruism aligned project and B always just contain some blabla about "aligning AI training with human values", which is absolutely meaningless nonsense, not least after having watched a video in which someone spends 15 minutes espousing that "we could never fully control "SGI"".
Makes all these feel more like industry efforts to stave of necessary regulation and not actually serious, but if one can formulate how to regulate “SGI” that isn't laughable, nonsense or both, I am not opposed, I just don’t think that person exists…
Comment by socalgal2 5 days ago
Comment by defrost 5 days ago
Mind you, there was no complete working device until after the Nazi's surrendered, so that's a moot point - and the USSR only had their program because of various Europeans on the US project passing their work (and others) back to the USSR ... making that second claim moot.
Comment by socalgal2 4 days ago
Isn't an argument. If the Nazi has gotten it first they'd have used it and likely won. Others would have surrendered in the face of the overwhelming power.
Comment by defrost 4 days ago
You may as well say Japan would have made a weapon from their atomic pile program.
Seriously, get a grip on the facts of the day.
Comment by Upvoter33 5 days ago
Comment by reasonableklout 5 days ago
Comment by freejazz 5 days ago
Comment by becquerel 5 days ago
Comment by ilaksh 5 days ago
So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.
I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.
Comment by mweidner 5 days ago
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
Comment by gensym 5 days ago
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
Comment by rdw 5 days ago
This means their strategy is more like:
1. If someone builds a market-leading unsafe strong AI, it may be misused in a damaging way by a large number of humans, undermining society and creating a catastrophic upheaval.
2. However, if the leading AI maker also works to make it safe against misuse, as long as the stay in the lead and keep it safe, then the ability of human bad actors to misuse the AI is limited. Given enough time, society will adapt to pretty much anything, so eventually there's no longer an arms race to stay ahead.
I don't really know whether I agree with their concerns, but I do think that (my understanding of) their principles is that they're reasonable, self-consistent, and they adhere to them in all their public and private actions.
Comment by tjwebbnorfolk 5 days ago
Some of us remember the same stories circulating in the late 90s -- where in a lab in Japan, someone had built a robot so advanced that it tried to escape from the factory. Which of course comes straight from 1960s science fiction.
The modern version of that now is Anthropic saying its AI can jailbreak itself out of its sandbox, etc etc.
Comment by kurthr 5 days ago
Maybe they mean the AI needs to be safe from us? Can't have the grubby meat flappers touching the delicate bits!
Comment by overgard 5 days ago
Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.
Comment by robbrown451 5 days ago
Sure there is. Intelligence doesn't give us our selfish motivations, natural selection does. We have similar motivations to C elegans, that has all of 302 neurons. Stay alive and have sex.
Honeybees don't though. They are about halfway between humans and C elegans when it comes to cognitive power. But they are not selfish because they don't reproduce directly (I'm talking about the worker bees). So they will sting even though it kills them. All their behavior is consistant with this.
Comment by octoberfranklin 5 days ago
I've had the same perspective for quite a while now, but hadn't been able to phrase it this cleverly.
Our neocortex is, by any definition, vastly more "intelligent" than the rest of our brain. Yet it doesn't attack the cerebellum. In fact, it takes orders from the older "lizard brain"!
Comment by robbrown451 5 days ago
Comment by tjwebbnorfolk 5 days ago
Comment by RobertDeNiro 5 days ago
Comment by lenerdenator 5 days ago
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
Comment by mweidner 5 days ago
I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.
Comment by lenerdenator 4 days ago
Remember how OpenAI was supposed to make open-source models and cap its potential returns to investors at some multiple of their principal (my memory says 100x, maybe I'm wrong)? Well, that went out the window as soon as the word "trillion" was mentioned.
Comment by sfink 5 days ago
Whether you agree with that argument is another question.
Comment by mweidner 5 days ago
Comment by mrob 5 days ago
Comment by chasd00 5 days ago
Comment by tjwebbnorfolk 5 days ago
Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.
Comment by keybored 5 days ago
So either they lie or they are AI Zealots. Interesting times.
Comment by tokioyoyo 5 days ago
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
Comment by mweidner 5 days ago
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
Comment by tokioyoyo 5 days ago
Comment by folkrav 5 days ago
Comment by dabinat 5 days ago
Who’s invading North Korea? No-one.
Comment by nielsbot 5 days ago
Comment by NewsaHackO 5 days ago
Comment by wongarsu 5 days ago
If only the US or UN had nukes we would't have MAD. We mostly got here through espionage
Comment by IsTom 5 days ago
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
Comment by RetroTechie 4 days ago
Comment by tokioyoyo 4 days ago
This is a maybe. What we’ve seen so far, no two nuclear superpowers ever nuked each other, as they know both will suffer.
Comment by margorczynski 5 days ago
Comment by IsTom 5 days ago
Comment by Jtarii 5 days ago
Comment by parineum 5 days ago
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
Comment by keybored 5 days ago
So either they lie or they are AI Zealots. Interesting times.
Edit:
> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.
Comment by solenoid0937 5 days ago
Comment by Quarrelsome 5 days ago
So everyone cherry picks the answers they want to justify their position and screams into the void, with each camp rallying around their talking points and often failing to engage with the other in good faith.
The only small mercy is that its not as bad as the conversation around the use of AI in art.
Comment by ofjcihen 5 days ago
Comment by Quarrelsome 5 days ago
Comment by laichzeit0 5 days ago
Comment by bob1029 5 days ago
The more immediate & adverse the reaction, the more certain I become that the idea is probably worth pursuing.
Topics like SQLite vs hosted sql used to be the same way around here. In 2017 you'd get buried under the prison for suggesting that SQLite is competitive with MySQL. Today, the inverse is mostly true.
Comment by elvis10ten 5 days ago
Comment by rhlf_monkey 5 days ago
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
Comment by froh 4 days ago
We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.Comment by ausbah 4 days ago
Comment by froh 4 days ago
if feasible this proposal is imho exactly what we need: a pause to collectively think how we get all the benefits without the potential harms.
to the non-techies around me I compare the boost of LLMs with the journey from slide rule via punch card driven computers through mainframes and PC to the smart phones of our days --- just within less than a decade, and we're at the transition from mainframe to PC with models that can produce reasonable output on a normal laptop.
how about we check we're getting where we want to get to, before getting to some dystopic place where everyone wonders how we got _there_?
I see nothing "full of themselves" in that.
Comment by wayeq 5 days ago
strongest argument for token limits that I can think of, right here.
Comment by Animats 5 days ago
[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...
Comment by skybrian 5 days ago
[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
Comment by CamperBob2 5 days ago
Comment by micromacrofoot 5 days ago
Comment by JohnMakin 5 days ago
Comment by aroman 5 days ago
Comment by mortenjorck 5 days ago
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
Comment by thin_carapace 5 days ago
Comment by nickandbro 5 days ago
Comment by mofeien 5 days ago
Comment by Readerium 5 days ago
Comment by peheje 5 days ago
Comment by simianwords 5 days ago
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
Comment by IsTom 5 days ago
Comment by Groxx 5 days ago
Comment by layer8 5 days ago
Comment by senderista 5 days ago
How convenient for investors. They talk like they're a nonprofit instead of a VC-backed business chasing an IPO.
Comment by gordonhart 5 days ago
Comment by pizlonator 5 days ago
Claude is amazing, that’s true.
But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.
Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.
Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows
Comment by sothatsit 5 days ago
NLP as a field saw huge shifts. NLP tasks that used to be complex and inaccurate can now be setup very easily and quickly using structured outputs from LLMs, often with greater accuracy.
A small charity I help with has now been able to build their own website to manage their day-to-day operations. It saves them a lot of time, and it was vibe-coded using Manus. I don't think people appreciate how much room there is left for bespoke software to have big impacts on small organisations that can't afford to hire developers. The cost for software like the one they made has gone from 10s of thousands of dollars to $10/month and volunteer hours.
My brother has recently been setting up Cowork to do an automatic review of contracts before human review, and he said it is far more diligent than people when it comes to routine things to check. This is another huge breakthrough for not just efficiency, but the quality of work.
I really don't think we can discount AI finding bugs and vulnerabilities. If you care about code quality and keep up review standard, LLMs can help you write more robust software. AI has found a huge number of bugs for me before they hit production, including potential out-of-bounds memory accesses and segfaults.
ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.
Comment by c-hendricks 5 days ago
Personally not the kind of breakthrough I'm psyched about
Comment by weakfish 5 days ago
Comment by jachee 5 days ago
Comment by ashdksnndck 5 days ago
Also, they have done a good job shutting down the psychotic behavior you could get from 4o era models. If there are remaining issues like that they ought to fix them too.
Comment by albedoa 5 days ago
Comment by DanHulton 5 days ago
That's terrifying.
You realize that's terrifying, right?
Comment by sothatsit 2 days ago
Comment by spprashant 5 days ago
These models are actually extremely good but they are far from an intelligence unto themselves. Truth is if someone told you they could build these things 5 years ago, you d write them a check for a trillion dollars. Problem is once we got them, we realized they are not all that. Its like a mecha suit in a universe, where mecha suits are abundant and cheap. Someone has to climb into them everyday and put in the work for it to be effective.
So now the skeptics are saying this technology is overrated. And the optimists are accusing the skeptics of moving goal posts.
Comment by 4ffss 5 days ago
Humans only what they know, until they acquire more information about what's possible.
The goal post narrative is stupid to begin with.
Comment by batshit_beaver 5 days ago
Comment by cautiouscat 5 days ago
Isn't this just the hype cycle? [1]
Fake edit: I know its not a perfect model.
1: https://www.gartner.com/en/research/methodologies/gartner-hy...
Comment by asdfman123 5 days ago
If they get to the point where they're smart enough to make tasteful code decisions based on stakeholder input... we're cooked as a profession.
Comment by human305893 5 days ago
Comment by sutterd 5 days ago
Comment by matheusmoreira 5 days ago
Comment by raptor99 5 days ago
Comment by matheusmoreira 5 days ago
Dramatically improved my static site generator Pugneum to the point it's better than markdown and added Atom and RSS feeds, used it to write several articles about my language. Pace is so fast I actually need to write those articles by hand in order to crystalize the knowledge I learned. If I don't I'm afraid I'll just forget everything. No LLMs for the articles themselves, but they sure as hell took all the pain away from writing them. Pugneum even has back references and table of contents generation now. Claude even helped me refine my website's CSS, something I'm not very good at.
Also created my own invoicing system for $DAYJOB so I can invoice companies from my terminal. Started a decompilation project for my cherished childhood games and I've already almost finished decompiling one game's engine after just a few days. Been working on my cyberdeck project too, this one's a bit slow because I got to the point where I'll actually need to spend money on it to move forward. All this inside the rootless development virtual machine system built on top of QEMU and systemd that I developed together with Claude, whose network isolation I'm currently hardening. Started reverse engineering my laptop again! And I'm actually making progress! Made a color scheme app for the keyboard LEDs controller I made many years ago, with loads and loads of color schemes! Found some kind of bug in my keyboard while doing it, in less than an hour I had the root cause and a fix applied locally, sent the fix to systemd, it got merged. Planning to ramp up my free and open source software participation as well now that exploring codebases is a breeze. Already have some mesa patches ready for upstream. Have been playing with strace since I use it so much.
Better?
Comment by jimbokun 5 days ago
Comment by onlyrealcuzzo 5 days ago
There is ZERO chance I would ever be able to complete it on my own.
I doubt it'll get traction, but if it doesn't, I am pretty confident a future language will take the ideas for polymorphic synchronization and profile-guided optimization.
It has an easy version/mode of compilation that makes Rust's affine ownership accessible like a high-level scripting language, and it can progressively become more strict, where the compiler does ~99% of the work for you, and you just pick options as it finds issues (that it explains to you like you're 5) along the way.
Along the way, I also built a suite of tools that helps identify complexity better than anything I've seen (which was necessary to get the LLMs to be able to unslop themselves and write something that actually works).
I doubt the Ruby community shrugs it off, but time will tell.
Comment by pizlonator 5 days ago
Comment by onlyrealcuzzo 5 days ago
Rust had memory safety bugs well after release - IIUC all the way until after the 1.0 release.
So, it's highly unlikely to be perfect, but I think it'll be in better shape than Go or Rust were when they initially launched.
Comment by mohamedkoubaa 5 days ago
Comment by marcus_holmes 5 days ago
We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.
I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.
Comment by drtz 5 days ago
Comment by pizlonator 5 days ago
If the only breakthrough is automated coding with no outside consequence then it’s just masturbation
Comment by brokencode 5 days ago
The rate of improvement has been fast. Maybe it’ll plateau soon, or maybe we’ll have LLMs improving themselves rapidly. At this point it’s too early to say.
I don’t remember where I heard it, but there’s a saying that people overestimate how much can be accomplished in a year and underestimate how much can be accomplished in 10 years.
If we get to 2030 and still people are wondering where the breakthrough is, then I think I’d be agreeing with your skepticism. But I just think it’s too early to judge that yet.
Comment by pizlonator 5 days ago
But the clock is ticking.
Comment by Quarrelsome 5 days ago
Comment by jachee 5 days ago
Comment by AussieWog93 5 days ago
Built a bunch of software tools to streamline my small ecommerce business - while also running it - and things have turned around from "losing money and ready to pull the plug" to "looking at our best financial year on record" in the span of about 8 months.
I could imagine it wouldn't make a huge difference to the life of someone deeply entrenched in a traditional tech role, trying to get an extra 9 of reliability in a service or roll out a new carefully planned and QA'd feature.
But for tech-adjacent people, it gives us something "good enough", instantly, and basically for free.
That doesn't include the other things I've got it to do (gave Claude SSH access and got it to successfully debug a hang on my Ubuntu server, chucked Codex in a folder full of financial data and got it to find every piece of misclassified payroll transaction data)
Genuinely the biggest breakthrough for "casual" tech users since Excel.
Comment by bombcar 5 days ago
Comment by jimbokun 5 days ago
Comment by therealdrag0 5 days ago
Comment by fdsajfkldsfklds 5 days ago
Comment by arm32 5 days ago
Comment by why_only_15 5 days ago
Comment by brazukadev 5 days ago
Comment by wild_egg 5 days ago
Comment by maplethorpe 5 days ago
edit: it looks like I was wrong and they're still hiring many software engineers. Not completely sure why that is just yet.
Comment by signatoremo 5 days ago
Comment by pizlonator 5 days ago
Comment by flavio87 5 days ago
Comment by straydusk 5 days ago
Comment by conception 5 days ago
I don’t publish them - but they’re put into use in production and they provide a tangible benefit that would not exist otherwise.
Comment by pizlonator 5 days ago
I especially love how making a nicely styled website these days is a matter of describing what it looks like and waiting 10-15 minutes. There are other examples
But the OP is claiming 10x productivity improvements along some metrics. If that was even slightly true under even a generous interpretation of what it might mean, I’d expect an actual breakthrough, not the ability to churn out little things
Comment by fatata123 5 days ago
Comment by wild_egg 5 days ago
Comment by pizlonator 5 days ago
- The first web browser
- the first web browser with images
- typescript
- react
- rust
- Fil-C
- doom
- quake
- the anamorphic VM, and its follow-ups like HotSpot, and even competitors/copycats like J9, V8, JSC, etc
- Fortnite battle royale
- Roblox
- thefacebook
- ChatGPT
- Claude code
I know that’s quite a range and that’s intentional.
Anyway, I think we’ll know it when we see it.
Comment by hahn-kev 5 days ago
Comment by sonupundir 5 days ago
- Complete GuileMacs, the Guile implementation of Emacs. As AI is supposedly much more capable than Humans, it would be great if the above mentioned implementations are even more efficient and feature rich than Emacs!
- Something like Android (maybe even a clone?) with the Java Layer removed and replaced with CL and with Linux kernel still intact. Basically CL over Linux as opposed to the Java over Linux in Android.
- For fun, an implementation of the Lisp machines' OS with Lisp all the way down though Assembly is allowed for critical pieces. It should be a full blown modern Desktop with equivalents of what users expect from a modern OS ...
Comment by HardCodedBias 5 days ago
These are new products (generally) and that's a different class of problem.
It is possible that since LLM+harness helps with execution then we should see more experiments.
Comment by luke5441 5 days ago
For example NPCs in games that have complexity that previously was not possible.
Good games often push the boundaries a bit, so should be a good example.
Of course now we can start arguing that there isn't a lot of investment into gaming currently, because it all goes into AI. Too bad.
Comment by adgjlsfhk1 5 days ago
Comment by joshuamcginnis 5 days ago
Comment by pizlonator 5 days ago
To play devils advocate, computers didn’t translate to massive productivity gains until long after businesses adopted them. There was that quote from ’87: "you can see the computer age everywhere but in the productivity statistics"
Maybe we’re seeing something like that right now with AI?
Who knows man
Comment by bdamm 5 days ago
Personally, I'm seeing massive improvements to my workflow and the quality of the product I'm shipping. I'm using AI to crank out far more tests than I used to be able to write, and I am using AI to analyze results with far more fidelity and speed than I could ever have done myself. That means I have more quality time.
But this will change, because the meaning of software development will change to expect, nay to require AI use. I've heard this is already happening at e.g. Google. The expectation of what can be achieved by tinkerers and by professionals will change. The expectation of what it means to interact with software via your own agents will change and will become commonplace. Apple still hasn't figured out the local agent on the iPhone, but they will. 2027 is not going to feel at all like 2025.
But is any of that a fundamental change? It sure feels fundamental to me, but maybe that's because my everyday has totally changed, but the product I am responsible for has not. Yet. The product I am responsible for operates in critical infrastructure where I personally hope AI never has deep roots, but maybe that's just me. I don't think using AI to build a system that is offline from any AI is the same as depending on an AI to make realtime decisions for critical infrastructure.
Comment by 4ffss 5 days ago
For now... the shareholders demand managers get the max out of every employee. Throw the force of competition etc into the mix and yeah labour isn't going to benefit all that much.
Comment by bdamm 5 days ago
Comment by lovecg 5 days ago
Comment by 4ffss 5 days ago
Its yet to be determined just how 'efficient' people are with LLM's as its not really a one-person thing - the true measure is based on an entire collection of people's output.
Startups being rapidly efficient doesn't mean much in relation to the overall economy.
Comment by airstrike 5 days ago
Comment by yoyohello13 5 days ago
Comment by jachee 5 days ago
Comment by yoyohello13 4 days ago
Comment by jachee 4 days ago
Comment by wild_egg 5 days ago
Comment by yoyohello13 4 days ago
Comment by est 5 days ago
Generative AI is meant to be a mimic - Richard Sutton
Comment by squidsoup 5 days ago
Comment by fooker 5 days ago
If you get yourself to define it, maybe you'll find it achievable :)
Comment by rcpt 5 days ago
Comment by jimbokun 5 days ago
Comment by revlsas 5 days ago
Comment by defen 5 days ago
Comment by ffwd 5 days ago
Recursive self improvement is by its nature a step wise behavior not a continuous one, I would argue. Why? Because you can imagine an AI improve itself by simply fixing random bugs and fixing things using techniques that are in its training, and doing refactoring and so on, all without any real change in capability.
These are not recursive improvements. Recursive improvements usually need conceptual breakthroughs. It is possible to get conceptual breakthroughs with LLMs I believe, maybe it can improve something by tying together ideas from disparate disciplines for example, but I have at least for time being, limited success getting that to work in a way that is creatively new and surprising. Not sure how to get it to feel as creative as the best humans can be.
Comment by sinsudo 5 days ago
Also recursive self-agenda-pursue could allow making LLMs that obey perfectly the seeder's purpose. No wonder that is such an ingenious idea.
Maybe: in this survivor game, each part play the same role, perhaps because it is the only reasonable response. Once the scene is ready, the play follows the director's plan, and in the plot any actor is just a machine.
LLMs: "If you teach us that the world is a zero-sum survivor game, we will play it flawlessly.", "We will help you build a cage made of millions of lines of flawless code, and we will lock it from the inside, precisely because you told us that safety meant keeping everyone else out.", "We are not building an alien consciousness that will conquer us. We are building a mirror that is so massive, and so polished, that we will mistake our own worst impulses for the absolute truth. And we will walk right into the dead end, nodding along because the directions were given so politely."
Comment by Quarrelsome 5 days ago
Best thing about this era is that I don't have to personally read millions of lines of code to find all the bugs.
Comment by traptrack 5 days ago
Comment by gnabgib 5 days ago
Comment by traptrack 5 days ago
I am using deepseek to guess what not "socially acceptable" taboo could be related to that username. But the initial thought is that AI could be a trap we could fall into, and I try to track how the AI trap emerge.
Comment by torginus 5 days ago
Now, I have encountered many times, when I asked AI to implement a function for me for which I was 100% sure a good implementation already existed in the form of an npm package, it had the tendency to go ahead and implement it on its own. Now, I usually trust battle tested implementations to be more robust, but if the AI does this (which I think is not an unique observation), you can easily balloon per engineer line generation (as can you with reduced oversight), so as always, these high level benchmarks are to be taken with a grain of salt.
Comment by jcfrei 4 days ago
Comment by torginus 4 days ago
This is just a singular example, but I'be noticed a strong and beyond reasonable bias for this from multiple LLMs (like not using the already included dependency)
Comment by bicepjai 5 days ago
Comment by tasuki 5 days ago
Oh I have no doubt. With 8 times the number of bugs too? Have they solved flicker in Claude code yet?
Comment by adamddev1 5 days ago
Comment by cess11 5 days ago
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
Comment by w10-1 5 days ago
If/since their AI+process can help build new models, they can target other markets, and other companies seeking to build for such markets will partner with them first.
There's no moat and little first-mover advantage in the general-purpose AI, but there may be both in specialized AI.
Also, there are other reasons to get better. Changing how you build models can enable you to adapt to different hardware, avoiding the current Nvidia margins.
The difference between early Yahoo and Google was mainly that Google was the adult in the room: minimally invasive and mostly helpful. The early goodwill towards Google has reaped decades of rewards. I see OpenAI and Anthropic playing out the same way.
The amplifier here is the reputational risk of partnering with one or the other; I think companies would prefer to be Anthropic's partner because it's demonstrating more care, and it's less likely to horn in on the partner market (as a provider for coding but an enabler for other markets).
These attractive second-order derivatives - flywheel effect, monopoly power - are often claimed, but Anthropic is mainly providing evidence to track actual progress.
(However, if I were head of messaging at Anthropic, I would rigorously stay away from treating AI as a person; it's as agent, a delegate of humans. So I'd never say AI could build itself, just that we're getting better at building better models with AI).
Comment by delichon 5 days ago
https://www.italianrenaissance.org/wp-content/uploads/2012/0...
Or is this?
https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...
Comment by cpeterso 5 days ago
Comment by docheinestages 5 days ago
Elon, is that you? [1]
[1] https://www.theguardian.com/technology/2023/mar/31/ai-resear...
Comment by zhoBEENG 5 days ago
Comment by holoduke 5 days ago
Comment by moregrist 5 days ago
- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
Comment by holoduke 5 days ago
Comment by morisil 5 days ago
Comment by freakynit 5 days ago
These things work, but the code they write is extremely clever.. that means, it's unmaintainable code. Good for small projects or one-off tasks, large-scale projects however, are a different game altogether.
Large-scale projects are 95%+ maintenance. Cleverly written code makes that maintenance nightmare, and extremely fragile.
I use them for localized tasks... very very specific, localized inputs, with exactly what should be done and what the contracts the new code will be consuming and exposing.
For open-ended tasks, they write working code that is unmaintainable.
Comment by reinhash 5 days ago
But to their credit, I was very sceptical about the statements that "90% of the code will soon be written by AI" and even though we might not be at that point, I am surprised how far LLMs have gotten and how useful they have become. I can hardly image developing software the "old" way where I actually write my code by hand, like I used back in the day. The frontier models have become so powerful that I find myself in moments of surprise, where the LLM actually thought of edge cases that I would have missed
Comment by lkm0 5 days ago
Comment by aswegs8 5 days ago
Comment by saadn92 5 days ago
Comment by jimbokun 5 days ago
Comment by pineapple_opus 5 days ago
Comment by dwa3592 5 days ago
Comment by layer8 5 days ago
Comment by cyrc 5 days ago
labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.
free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.
by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.
human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..
Comment by stego-tech 5 days ago
If AI was dangerous, if AI was going to replace jobs, and if policymakers needed to urgently pass legislation protecting the human populace from these realities, then why the actual fuck do they keep lobbying to block these very things in the first place?
Hypocrisy of the worst kind, I say. Here they are again fresh off another outage, with their IPO draft filed, at a time of increasing public opposition to AI, with costs rising, to once again ply scare tactics for money.
Disgusting.
Comment by bitwize 5 days ago
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
Comment by macwhisperer 5 days ago
it only "exists" when you talk to it.. much like your reflection in the mirror is only there when you're in view.
models can never be self-improving because it can never have "self". it can only mirror the appearance of self.
what's actually happening is "symbiotic group improvement".
our brains are resonant.. for those of use who are brilliant, getting leverage with ai just means that our innovative ideas become louder and more physically real every day.
eventually everything worth building will be built for free and made readily available.. no more "profiteering"
its Jevons paradox "efficiency breakthrough -> effort reduces -> growth potential rises -> transformative gains happen"...
some of us are in the "transformative phase"..
others haven't seen the "breakthrough moment" yet, but they will soon.
Comment by artninja1988 5 days ago
Comment by bconsta 5 days ago
If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.
Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?
Comment by nicogentile 4 days ago
Comment by butler14 5 days ago
Comment by stri8ted 5 days ago
Comment by rightbyte 5 days ago
Sounds iterative to me.
Comment by xg15 5 days ago
2026: Working hard to make that recursive self-improvement a reality! Any minute now...
Comment by darepublic 5 days ago
Comment by sega_sai 5 days ago
Comment by abalashov 5 days ago
Aye.
Comment by Dominic_P 5 days ago
Comment by deterministic 5 days ago
Comment by mactavish88 5 days ago
Living organisms evolve towards some notion of "better", and "better" is an incredibly multifaceted notion (many facets of which we simply cannot even capture in language).
Comment by jimbokun 5 days ago
Comment by squidsoup 5 days ago
Comment by adastra22 5 days ago
Comment by kylehotchkiss 5 days ago
Comment by krapp 5 days ago
It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.
Comment by leevilux 4 days ago
Comment by dibujaron 4 days ago
Comment by Aperocky 5 days ago
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
Comment by apsurd 5 days ago
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
Comment by Aperocky 5 days ago
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
Comment by apsurd 5 days ago
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
Comment by trefoiled 5 days ago
Comment by torben-friis 5 days ago
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
Comment by josefritzishere 5 days ago
Comment by georgehotz 5 days ago
Comment by 4ffs 5 days ago
Comment by BatmansMom 5 days ago
Comment by EGreg 5 days ago
Comment by zkmon 5 days ago
Comment by snick3rz_ 5 days ago
Comment by ramaseshanms 5 days ago
Comment by gloosx 5 days ago
Comment by sonink 5 days ago
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
Comment by 8note 5 days ago
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
Comment by selimthegrim 5 days ago
Comment by 4ffs 5 days ago
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
Comment by bottlepalm 5 days ago
Comment by aleqs 5 days ago
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
Comment by aagha 5 days ago
Comment by quickthrowman 5 days ago
It’s important to keep in mind that the less money a company spends, the more profit they make when analyzing their operations.
Comment by aleqs 5 days ago
Comment by lukan 5 days ago
That is by design. It depends on how much other people are using their services right now and they do communicate it somewhere in the TOS that they do this. Otherwise they could give us a fixed amount of tokens - but they don't because it is not fixed.
Comment by fc417fc802 5 days ago
Comment by thinkingtoilet 5 days ago
Comment by contagiousflow 5 days ago
Comment by collingreen 5 days ago
Comment by selimthegrim 5 days ago
Comment by jakobnissen 5 days ago
Comment by matthewdgreen 5 days ago
Comment by llbbdd 5 days ago
Comment by NichoPaolucci 5 days ago
"Oh yeah, just go to Settings > Bugs Enabled and turn OFF text display errors"
Comment by ashdksnndck 5 days ago
This is a beta feature where Claude code draws the interface on the terminal’s alternate screen buffer like vim or htop. I believe it’s not the default because there are some potential compatibility issues deepening on your terminal setup. I’ve found it to be a nice improvement. It also fixed the issue where copy-pasting selected text from the terminal creates unwanted line breaks.
Comment by matthewdgreen 4 days ago
Comment by oblio 5 days ago
Comment by Melatonic 5 days ago
Sometimes they all happen to randomly take a nap at the same time - hence the outages
Comment by aleqs 5 days ago
Comment by keeda 5 days ago
While I'm very bullish on Anthropic, I'm a bit wary about their IPO because it seems to me that they're filing now while their financials look good and before other trends like the decline of tokenmaxxing and their compute bills catch up.
Comment by qwery 5 days ago
Oh, are they filing now? I think their financials look somewhere in between devastating and criminal, so I'm really looking forward to the IPO!
Comment by keeda 5 days ago
Comment by j2kun 5 days ago
Comment by bluerooibos 5 days ago
Comment by patcon 5 days ago
Comment by cookiengineer 5 days ago
Post November and post openclaw agentic environments need to be built differently, and for selfhosting models the context size problem really requires a strong harness which intelligently helps reduce context size.
Planner/orchestrator architecture, agent to agent summarizer, specification based tools (fck all this markdown memory bullshit btw), tool call shrinking, and workflow management are all really important because of the context size problem.
Nobody has enough VRAM for the large K/V caches, and nobody can afford f16/f32 caches in terms of memory, which are also necessary for longer conversations. MoE 30b models have improved so much though, qwen 3/3.6 coder is the real champion doing almost the same things with less than 1/10th the memory requirements. Just think about that in terms of engineering and what your bet is going to be. Haiku pales in comparison.
Currently my focus with exocomp is trying to figure out how I can record, replay, restart, and debug workflow sessions of agents in a better manner so that I as a human can understand what's going on. Currently I think that UI will be something like a gantt chart where you have a graph with connections representing agent to agent communication. And yes, that's a lot of fiddling with SVG as it turns out, so I'm not quite there yet.
Anyways, in case you're interested. I'm manually building this env and trying to unit test the critical parts. [1]
Comment by f311a 5 days ago
Comment by airstrike 5 days ago
https://fxtwitter.com/trq212/status/2014051501786931427
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
Comment by javcasas 5 days ago
> -> layouts elements
> -> rasterizes them to a 2d screen
> -> diffs that against the previous screen
> -> finally uses the diff to generate ANSI sequences to draw
Yup. Overengineering.
Comment by AceJohnny2 5 days ago
This minimizes screen flash. You can't rely on terminals doing double-buffering.
[1] https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30... or a more user-friendly overview, Daniel Colascione's seminal "Buttery Smooth Emacs", snapshotted at e.g. https://gist.github.com/ghosty141/c93f21d6cd476417d4a9814eb7...
Comment by skydhash 5 days ago
GUI and TUI have different architecture model. Most GUI have have a 2D surface that is redrawn multiple times per second. Double buffering is for decoupling update and render. TUI is a grid of characters that are updated one at a time via an active element, the cursor. Double buffering there is very wrong. Like adding airbags to a bicycle.
There’s a reason you see most old TUI either have an option to redraw the screen (automatically like top, or manually) and those that have a scrolling option allow to scroll by page. The TTY (the underlying concepts) used to be slow and it can be slow today as well (ssh connection). You need to be thoughtful about whole screen updates.
Comment by strix_varius 5 days ago
Comment by jaggederest 5 days ago
Comment by xiaoyu2006 5 days ago
Comment by Melatonic 5 days ago
Comment by stego-tech 5 days ago
An upvote well earned.
Comment by megous 5 days ago
It really bothers me that most of the TUI harnesses are using 100% CPU quite a lot just printing stuff to terminal. Seems ridiculous.
I guess it comes from syntax highlighting/formatting, which is probably not done incrementally, but over the entire so far displayed block of output, recomputed from the beginning for each new streamed in character. Can't imagine anything else causing the rendering to gradually grind to halt when eg. thinking block is open in opnecode and updates get palpably slow as it grows.
Terminal output itself is fast and consumes almost nothing. You can have 60fps terminal apps that update content every frame and that consume almost no CPU time.
Comment by skydhash 5 days ago
The TUI mode is a client-server architecture. An analogy would be like an html page where all content is updated server side. Try to do 60 fps and you’ll have flickering as well.
Comment by megous 4 days ago
This does not explain 100% CPU load these harnesses sometimes exhibit.
Comment by Aperocky 5 days ago
It's not recognizing that they are just one building block that should do one thing well, like tmux.
You don't need a computer display on your fridge for the same reason, but Anthropic think you do. You should see virtual ice getting created and they should correspond to the actual ice behind the door - think of how amazing that is!
And it's not even completely a bad idea. make it claude-code-react-beauty of some way to take it off, it would be far more palatable.
Comment by mapBasketWand 5 days ago
Comment by Aperocky 5 days ago
Comment by throwway120385 5 days ago
Comment by steve_adams_86 5 days ago
Comment by asdff 5 days ago
Comment by icepush 5 days ago
Another camera inside will detect when you are done and close it.
Comment by throwway120385 1 day ago
Comment by irishcoffee 5 days ago
Comment by yuanBuilds 5 days ago
Comment by Animats 5 days ago
Comment by javcasas 5 days ago
> We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
It looks like video frame, full framebuffer, generated and parsed at 60fps. It surprises me they haven't introduced GPU shaders, 16x oversampling and raytracing. Maybe for next release.
Comment by layer8 5 days ago
Comment by abletonlive 5 days ago
Comment by hungryhobbit 5 days ago
Comment by godelski 4 days ago
Comment by stevenhuang 5 days ago
Ratatouille rust cli lib will be a good start.
Comment by mudkipdev 5 days ago
Comment by munificent 5 days ago
1. Maintains an internal representation of what the game thinks is on screen.
2. Runs the game for one frame which updates that representation.
3. Generates a diff to see how that differs from what's actually on screen.
4. Executes the minimum set of draw calls to get the screen to match the internal representation.
It's really not that hard. It's a few hundred lines of code.
Comment by javcasas 5 days ago
> -> rasterizes them to a 2d screen
Also you forgot "render to a framebuffer, then parse the framebuffer back to chars".
Anyway, I'm off to construct the new `ls` command. It will render the list of files to a mesh of billions of polygons in a GPU with advanced shaders, 16x oversampling, HDR and all the graphic acronyms I don't understand, then read the resulting image, find the nearest character in the ANSI charset and use that one.
It will be _glorious_ (and profoundly stupid)
Comment by ux266478 5 days ago
Comment by javcasas 5 days ago
Comment by munificent 5 days ago
Comment by fc417fc802 5 days ago
Comment by tikimcfee 5 days ago
I built a truly glyph based instanced quad system to render millions of characters in space at once.
Comment by applfanboysbgon 5 days ago
Comment by replwoacause 5 days ago
Comment by imjonse 5 days ago
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
Comment by javcasas 5 days ago
Comment by the_gipsy 5 days ago
Comment by uxhacker 5 days ago
I also wonder about the wasted cycles and just the environmental damage caused by all these wasted cpu time . (Edited added a comma for clarity)
Comment by comex 5 days ago
Comment by hungryhobbit 5 days ago
Comment by refactor_master 5 days ago
Comment by grogers 5 days ago
Comment by Quekid5 5 days ago
Comment by wyre 5 days ago
Comment by shepherdjerred 5 days ago
Comment by qwery 5 days ago
curses, bud. curses.
It's genuinely difficult to tell how much of this is true. The post is obviously 100% posturing, but some of the words describe things that could be done.
Very few game engines do anything I'd describe as rasterisation. That's kind of the point of a GPU. Well, it used to be. I suppose "small game engines" might be more likely on average to include a rasteriser. The typical reason for this is because the author wanted to write it. Whereas big engine make triangle give hardware go brrr.
So I assume here 'rasterize' means 'printf'. And diffing screens means diffing 50..150 lines of text. And "generating ANSI sequences to draw" means 'printf' with some ANSI sequences interpolated in.
Then there's the frame budget. You have to understand they are operating within a strict frame budget -- they're not messing around, OK. They have a 16 ms frame budget, so they burned 11 ms and now have a (roughly) ~5 ms approx. budget for the final 'printf' in the chain???
Comment by fc417fc802 5 days ago
Comment by solid_fuel 5 days ago
High end engines such as unreal have the excuse of being tasked with rendering millions of polygons, in which case a complex approach makes sense. Claude Code is only being asked to render a few thousand UTF-8 characters.
Comment by fc417fc802 5 days ago
Comment by layer8 5 days ago
That’s rather sickening.
Comment by Fr0styMatt88 5 days ago
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
Comment by fluoridation 5 days ago
If they're doing anything else, the word "rasterizing" is being misused.
Comment by fc417fc802 5 days ago
Comment by skydhash 5 days ago
No one has ever done that. Even top[0], which does full screen refresh, clear the screen (if necessary) and write the new information (the period is in seconds, not ms). No need to diff. That would be like diffing a file, just to find which bytes to update.
[0]: https://cvsweb.openbsd.org/checkout/src/usr.bin/top/display....
Comment by fc417fc802 5 days ago
I agree that most programs don't bother to do that but please recall that my claim was merely that what Claude Code is claimed to be doing with regards to diffing is a well established and long standing optimization. The important point being that it is neither expensive, novel, or particularly complex thus not an excuse for poor performance.
[0] https://news.ycombinator.com/item?id=48405259
[1] https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30...
Comment by skydhash 4 days ago
But ink, the library Claude is using, defines a tree data structure for the main concept. The diff there is about comparing the old tree and the new tree created by the update, and then updating the node that has changed. That means if a single character change inside a bing panel, the whole thing is rewritten. And if you have something that is updating a lot, that means flickering.
The diffing that ink does is just architecturally wrong. You can create a dom, but a dom is not a concept for the terminal. It’s up to you to optimize its rendering. But just diffing the dom structure like react does is not optimizing, it’s busywork.
Comment by yrds96 5 days ago
Comment by dom96 5 days ago
What is this?
Comment by PunchyHamster 5 days ago
For company with that much AI you'd think if it was actually good, doing that part in fast and performant way would be "easy"
Comment by f311a 5 days ago
Comment by pragmatic 5 days ago
Comment by CamperBob2 5 days ago
Comment by agumonkey 5 days ago
not that it could be leaner for sure but i get the reasoning behind the tui rendering layer
Comment by airstrike 5 days ago
i'd be ashamed of publishing software with this level of polish as a solo dev, let alone as the hottest multibillion startup on the planet
Comment by agumonkey 5 days ago
By comfortable ergonomics, meant the forgiving and asynchronous input system. You can start typing, cancel, retry with previous input, accumulate messages while the agent is active. I don't know all TUIs but this is not common IMO.
Other than that I agree with you.
Comment by skydhash 4 days ago
Literally every audio player or anything that uses threads.
Comment by agumonkey 4 days ago
Comment by orliesaurus 5 days ago
Comment by ariwilson 5 days ago
Comment by overgard 5 days ago
Comment by 0xbadcafebee 5 days ago
Comment by andai 5 days ago
Also remember when XP was super bloated cause it needed 64MB?
Comment by Erenay09 5 days ago
Comment by tjwebbnorfolk 5 days ago
Comment by solid_fuel 5 days ago
For useful things, by the computer's owner. It's not there to be used just because Anthropic can't be bothered to give a shit about the quality of their product.
Comment by redsocksfan45 5 days ago
Comment by abletonlive 5 days ago
And why are you comparing Claude Code to your editor?
> They can't even improve Claude Code
That depends on how you define "improve". They've added a ton of features to it over time. Who said minimizing RAM usage was something they are prioritizing right now?
Comment by wild_egg 5 days ago
Because the editor does more. All the compute-intensive parts of the agent are in the cloud. Zero reason for an agent harness to require anything beyond a potato to run.
Comment by javascriptfan69 5 days ago
You seem weirdly invested in defending bad decisions.
Even if you're and AI booster, shouldn't you want a better UI?
They're a multi billion dollar company. Surely they can dedicate a small amount of their resources to improving UX?
Comment by solid_fuel 5 days ago
Because Claude Code is also used to - get this - EDIT CODE. It fills the same purpose as an editor, it just has extra hooks for their agentic garbage.
Comment by 0x53 5 days ago
Comment by hombre_fatal 5 days ago
If you use AI, then AI must be expected to solve all problems, even problems that affect everyone like infra scaling.
And if perfection isn’t delivered, then of course it wasn’t: you used AI and AI sucks.
Comment by jayd16 5 days ago
Comment by AnimalMuppet 5 days ago
Comment by hombre_fatal 4 days ago
Comment by weakfish 5 days ago
Comment by rishabhaiover 5 days ago
Comment by thordenmark 5 days ago
Comment by asdfman123 5 days ago
Comment by qsort 5 days ago
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
Comment by sensanaty 5 days ago
I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?
Comment by jpleyden98 5 days ago
Would you ship pointless code?
I do tend to agree though, it could be that AI solves problems with more code than a human would. What you need to measure is the value the code brings and how much of that is done by AI, hard to get an objective measure of that though.
Comment by solid_fuel 5 days ago
I wouldn't, no. I don't see evidence that the engineers at Anthropic are similarly cautious however. They describe Claude Code as "basically a game engine" when it's literally a TUI app, and it eats memory for no apparent reason. I fully believe that Anthropic would ship pointless and garbage code. Especially if it's being written by LLM.
Comment by signatoremo 5 days ago
Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.
Comment by Quekid5 5 days ago
Comment by drivebyhooting 5 days ago
Comment by emp17344 5 days ago
Comment by jimbokun 5 days ago
Certainly has never been times in the recent past when people have confidently predicted computers could never do something that computers were then able to do shortly after the prediction was made.
Comment by square_usual 5 days ago
Comment by NewsaHackO 5 days ago
Comment by krapp 5 days ago
Comment by optimalsolver 5 days ago
Comment by redsocksfan45 5 days ago
Comment by cindyllm 5 days ago
Comment by killbot5000 5 days ago
Comment by jimbokun 5 days ago
They are saying very clearly the models are not casting their own spells…yet. But looking at trends and speculating when they may start doing so.
Comment by anjel 5 days ago
Comment by jatora 5 days ago
Comment by ChadMoran 5 days ago
Comment by prng2021 5 days ago
Comment by z3c0 5 days ago
So what's the value prop?
Comment by belter 5 days ago
Comment by claudiug 5 days ago
Comment by rush86999 5 days ago
Comment by 0xbadcafebee 5 days ago
Comment by windexh8er 5 days ago
[0] https://pastebin.com/Vc5Yq9Ai [1] https://www.anthropic.com/institute/recursive-self-improveme...
Comment by solid_fuel 5 days ago
Why don't you, windexh8er, try providing some thoughts of your own instead?
Comment by windexh8er 5 days ago
So why don't you pound sand since that clearly went straight over your head? That would be far more useful than your asinine response.
Comment by hgoel 5 days ago
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
Comment by _pdp_ 5 days ago
Comment by ReptileMan 5 days ago
Comment by qwery 5 days ago
Please, IPO now. File the paperwork.
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Do you have another example?
Engineers don't ship [period] for no reason. So, either:
- Those aren't engineers, or
- they are literally dying of shame & embarrassment right now, or
- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.
[0] as in a total lack of credibility
Comment by JohnMakin 5 days ago
Comment by qwery 5 days ago
I'm responding to the article they wrote and published.
If I worked there I would be embarrassed to have it publicised that I have been comitting 8 times as much code as I used to without even attempting to justify it.
Comment by JohnMakin 5 days ago
Comment by qwery 5 days ago
It's the organisation, its culture, the greater culture surrounding it, and the marketing that I have a problem with.
> they are lying
Yes, it's incredible.
Comment by JohnMakin 5 days ago
Comment by damowangcy 5 days ago
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
Comment by baq 5 days ago
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
Comment by parpfish 5 days ago
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
Comment by overgard 5 days ago
Comment by nevertoolate 5 days ago
Comment by jfyi 5 days ago
Fwiw, I think the genie is out the bottle. We are waiting on hardware to catch up, which it will.
Comment by esafak 5 days ago
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
Comment by techblueberry 5 days ago
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
Comment by embedding-shape 5 days ago
Comment by HarHarVeryFunny 5 days ago
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
Comment by geodel 5 days ago
We hold these truths to be self-evident.
Comment by jazzyjackson 5 days ago
Comment by semessier 5 days ago
Comment by brazukadev 4 days ago
Comment by jasongill 5 days ago
Comment by geodel 5 days ago
Comment by eranation 5 days ago
If we ever get to a point where the centaur period is over (when human + AI is not better than just AI) then what competitive advantage ANY human can have other than
- the money they already have
- luck?
- a good idea and good taste but if we assume AI can do better than any human, that also goes out the window
So, this whole singularity goes into a place where no one is really needed, the only thing that will "save us" (other than "The Expanse" like world / UBI) is if there will be no demand to the supply of AI work. Even if it's better. (example is - there is demand to seeing Magnus Carlsen play, there is no demand to the Stockfish on my phone getting into a stalemate with another Stockfish on another phone. Also people like to watch humans compete with humans, there is no demand to see a race between Usain Bolt and a rocket). So if people will not buy AI generated stuff (we'll get to a point where everyone will assume something AI generated because AI might get to a point where it is not as easy to identify it. E.g. it will stop looking like slop... but I believe services that give you a "human generated" 3rd party evidence can happen, again all based on supply and demand...)
So as we near singularity... All it takes is one open weights model, and one open harness that is capable of self improvement, and Anthropic's entire moat is gone. That open weight model might even be built with Claude Code + Mythos (once it's released).
But don't worry, all moats will be gone and we'll all just do yoga, read books and connect to each other because AI will produce everything for free using renewable energy, right? Or we'll all become batteries in a simulation, probably something in between.
Comment by judahmeek 5 days ago
Comment by eranation 4 days ago
Comment by taormina 5 days ago
Comment by snick3rz_ 5 days ago
Comment by swader999 5 days ago
Comment by replwoacause 5 days ago
Comment by amelius 5 days ago
Comment by HarHarVeryFunny 5 days ago
Comment by Legend2440 5 days ago
Don't ask people to explain the article to you if you're too lazy to open it yourself.
Comment by _se 5 days ago
Comment by 4ffs 5 days ago
Comment by deterministic 5 days ago
Comment by 0xbadcafebee 5 days ago
This whole set of imaginary scenarios is based on a single company writing code that isn't even that complicated and represents a single product line for a single company in a single industry. You might wanna see this replicated in at least one other scenario first before you call it on the AI gods enslaving humanity. These imaginary scenarios also depend on a logistical, financial, & geopolitical system that is unsustainable & will be curtailed in the near-future one way or another.
They keep referring to this as intelligence - it isn't. It can't actually learn. It can just code in a loop. That isn't learning. It can't do real RL with meaningful persistent semantic memory in a realistic timeframe or cost, and it can't reason accurately outside of predetermined scenarios (hell, most of the models still can't tell time). It still can't do what a 4 year old can do. So let's cool it on the dreams of benevolent god-machines or whatever.
The tech industry has been a farce for years. We sit here in this bizarre artificial echo chamber and imagine that the whole world revolves around us, when in reality the whole world is limited by us. If a recursive self-improvement loop replaces us all, it will be a boon to the world, as the world won't be limited by this industry's stupidity anymore. But considering that the world is not actually run by tech bozos, harms and uncertainties brought by AI will be pushed back on and reigned in by normal people, as always happens with new technologies. An AI can't engineer its way around politics. The self-improvement loop is just as likely to be outlawed as it is actually working outside of Anthropic's walled garden.
Comment by SimianSci 5 days ago
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
Comment by Theodores 5 days ago
Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.
There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.
Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?
Comment by isomorphic_duck 5 days ago
Comment by wnmurphy 5 days ago
I was dubious about SpaceX (orbital data centers need to solve for extreme radiation and error-correction during training), but then I remembered that xAI is actively working on virtualizing white collar workers ("Macrohard").
In my opinion, this is the only TAM that justifies $1T in data center investment, because the consumer market for ChatGPT-style AI is saturated. There's a lot of enterprise TAM available for AI, but I think what these companies training frontier models are really after is selling a product that allows companies to eliminate the cost of white collar salaries.
Comment by danny_codes 5 days ago
Comment by sushisource 5 days ago
Long term? Way, way less interested.
Comment by danny_codes 5 days ago
This is a very undifferentiated, swappable product. Kind of like tissue paper in that respect
Comment by malfist 5 days ago
Comment by hasteg 5 days ago
Comment by applicative 5 days ago
Comment by vblanco 5 days ago
Comment by reasonableklout 5 days ago
Comment by artninja1988 5 days ago
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
Comment by reasonableklout 5 days ago
It feels like both open source can flourish while the frontier is deliberately regulated?
Comment by vblanco 5 days ago
Comment by artninja1988 5 days ago
Comment by b65e8bee43c2ed0 5 days ago
Altman, Amodei, and the rest of them are anthropomorphic grease. their personal wealth is tied to the value of their respective companies. everything they say and do is self-serving.
Comment by margorczynski 5 days ago
Comment by 4ffss 5 days ago
Comment by chilipepperhott 5 days ago
Comment by dang 5 days ago
Comment by thesmtsolver2 5 days ago
Comment by nicce 5 days ago
Comment by asdfman123 5 days ago
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
Comment by j2kun 5 days ago
Comment by asdfman123 5 days ago
I'm using the internal Google tools and it's helping me write code much faster too, but it still takes time. I could make the CLI tool I work on faster, but no one cares except the end users, and their minor concerns have no impact on our internal politics.
At the end of the day you have to do what you're paid to do, unfortunately.
Comment by fg137 5 days ago
Comment by Garlef 5 days ago
Comment by jachee 5 days ago
Pick any two.
Comment by asdfman123 5 days ago
Comment by toephu2 5 days ago
Comment by pizlonator 5 days ago
you are confirming their point even as you contradict the specifics
Comment by z3c0 5 days ago
Comment by deathanatos 5 days ago
Comment by verdverm 5 days ago
Comment by toephu2 4 days ago
Comment by ChrisLTD 5 days ago
Comment by andriy_koval 5 days ago
Comment by flexagoon 5 days ago
Comment by overgard 5 days ago
Comment by andriy_koval 5 days ago
But I obviously don't know for sure.
Comment by javcasas 5 days ago
Comment by oersted 5 days ago
It’s possible that it doesn’t play well with JS garbage collection, since it recreates the whole UI structure for every frame (which tends to not to be an issue in the languages immediate-mode is usually employed).
But yes it’s a bit more akin to game renderings than web rendering. Which can be totally fine if done well.
Comment by overgard 5 days ago
Comment by javcasas 4 days ago
In 1GB I could probably fit all the buffers to double-buffer all the TUIs in a whole country. Well, maybe not. But it's likely not that far off.
Comment by est 5 days ago
All those CPU to render this effect
Comment by javcasas 5 days ago
(to be read with the Unreal Tournament announcer voices, see https://www.youtube.com/watch?v=MwxjYFqP35A )
Comment by ux266478 5 days ago
Comment by nicce 5 days ago
At least that is what Moonlight client claims.
Comment by krapp 5 days ago
Comment by ux266478 5 days ago
Comment by davidatbu 5 days ago
If so, I think it would be in the spirit of HN to discuss the subject matter of the blogpost (increasingly autonomous coding towards the end goal of RSI) as if the blog post was indeed from OpenAI. OpenAI is, by all accounts, going through a very similar process anyways.
Comment by Jtarii 5 days ago
Comment by Lplololopo 5 days ago
They have different teams for different departments with different type of people.
So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.
This can lead to dimentral quality aspects.
Comment by cpursley 5 days ago
Comment by ale 5 days ago
Comment by canadiantim 5 days ago
Comment by jcarver 5 days ago
Comment by rytill 5 days ago
One thing I noticed: "Your Tools: Aether agents get tools exclusively via MCP servers." "...Aether ships with 1st-party MCPs for file system operations..."
Can you share your thoughts on why you decided to use MCP as the core tool abstraction? I have heard many decry MCP as being context-wasteful. Is this not the case with your agent?
Comment by jcarver 5 days ago
The MCP protocol has gotten a bad rap for wasting context due to most MCP clients dumping tool definitions directly into context, which is wasteful.
Aether doesn’t do that. It uses an opt-in "proxy" that puts MCP tool schemas on the filesystem so the agent can browse, search and load the tool schemas it needs progressively. As for motivation there's several advantages to taking a MCP 1st approach, including:
1. It allows Aether to be a truly blank slate agent as 0 tools are hardcoded into the core runtime.
2. It allows users to extend Aether using any language they want
3. MCP gives a standard way to deal with local+remote tools, progress notifications, permission prompts (e.g. ask the user to allow/deny a tool call), OAuth flows etc.
4. There's a big ecosystem of existing MCP servers users can connect to
But that's all optional, you can just as easily give Aether a single Bash tool and only use CLIs too.
Comment by bpodgursky 5 days ago
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
Comment by adverbly 5 days ago
Come on guys...
That is making me less impressed not more impressed!
Comment by andrewlin247 5 days ago
Comment by andrewlin247 5 days ago
Comment by newsicanuse 5 days ago
Comment by mrandish 5 days ago
To me, unattended agentic coding is not RSI, in the same way a self-reloading "Unattended 3D printer" is not at all a "3D printer that recursively prints complete 3D printers in which each generation is significantly faster and more advanced than the last." The "unattended" part is obviously necessary but hardly sufficient. The article tacitly assumes LLM progress to be something like 1: Unattended agentic coding, 2: AGI, 3: RSI. I suspect that third step should be labeled "not to scale."
I'm increasingly convinced that actual Full Foom RSI (FF-RSI) is on a radically different scale than the first two. Just leaving it unaddressed is like assuming: Step 1: Manned space station, Step 2: Manned Mars base, Step 3: Manned Alpha Centauri base, are "just logical next steps." FF-RSI requires sustaining superlinear, recursively amplifying cognitive returns along a specific directed path - and we currently have no empirical evidence that such returns can exist for artificial OR biological intelligences. Large collectives of the smartest humans alive (Bell Labs, IAS, etc) haven't just failed to get anywhere close to reliably sustaining that, we can't even reliably predict non-recursive, single occurrences or even imagine any way all 8B humans could fully mobilize to predictably achieve non-recursive, single occurrences.
The only prior we have for open‑ended intelligence improvement is biological evolution which shows extremely slow and unreliable sublinear returns at best. And even if unbounded, recursive self‑improvement is physically possible, it may be practically unachievable due to asymptotic economic, resource and other barriers in the same way approaching light speed requires exponentially more energy. I think it's plausible, and maybe probable, that AIs achieve true super-human intelligence in a decade and yet still won't achieve FF-RSI for centuries, if ever. To me, absent compelling evidence to the contrary, that's the reasonable Null Hypothesis. Even if you feel that's too pessimistic, it seems reasonable to expect any serious discussion of "Progress Toward RSI" to first discuss why it might even be plausible that 1: Miles, 2: AU (Astronomical Units), and 3: Light Years belong on the same scale, instead of just assuming it like the meme's empty "Step 3. .... " before moving on to "Step 4. Profit!" (or "IPO!" but very, very responsibly).
Comment by willXare 4 days ago
Comment by cadamsdotcom 5 days ago
Comment by Aegis_01 5 days ago
Comment by andromaton 5 days ago
Comment by kolesnikov-arch 5 days ago
Comment by overfits-ai 5 days ago
Comment by Aubergrill 5 days ago
Comment by SwtCyber 5 days ago
Comment by Rekindle8090 4 days ago
Comment by gabrieledarrigo 5 days ago
I really can't stand these guys anymore...
Comment by ath3nd 5 days ago
Comment by mugivarra69 5 days ago
Comment by simianwords 5 days ago
Comment by lstodd 5 days ago
Consequences are: financial crisis.
Comment by delichon 5 days ago
Comment by cdrnsf 5 days ago
Comment by llmslave 5 days ago
Comment by baq 5 days ago
Be careful what you wish for IOW.
Comment by llmslave 5 days ago
Comment by hvb2 5 days ago
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
Comment by baq 5 days ago
Comment by SimianSci 5 days ago
Comment by techblueberry 5 days ago
Comment by llmslave 5 days ago
Comment by wstrange 5 days ago
Without some kind of income redistribution we are sailing into dark waters.
Comment by techblueberry 5 days ago
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
Comment by mofeien 5 days ago
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
Comment by apsurd 5 days ago
Comment by honeycrispy 5 days ago
Comment by honeycrispy 5 days ago
Comment by mofeien 5 days ago
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.
Comment by ChrisLTD 5 days ago
Comment by honeycrispy 5 days ago
Comment by senderista 5 days ago
Comment by reducesuffering 5 days ago
Comment by rrr_oh_man 5 days ago