Ask HN: What was your "oh shit" moment with GenAI?

Posted by andrehacker 4 days ago

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Comments

Comment by jzemeocala 4 days ago

I bought an Alesis QS8.1 super cheap in perfect condition (was a top grade digital piano/synth in the 90s).

and then i realized that ALL of the software (which i collected from defunct websites and archived on github) related to it was ancient and after a while of getting tired of using WINE every single time i decided i wanted a cross platform modern equivalent that did everything that several of these different programs did (plus break out some stuff that was now potentially possible with modern computer)

i thought it would be extremely hard because the computer to synth communication is pretty much only via sysex commands (of which the actual wave file encoding protocol was undocumented)

Claude walked me through examining the some of the original software in GHIDRA, and I had a working demo that night.....now im just playing with adding new features to it.

Comment by jsharf 3 days ago

Related story, while applying a firmware update to my Kawai CA49 piano, I bricked it due to flashing the wrong file (The process was broken, and I got desperate and tried something stupid, which bricked the piano). Claude walked me through looking for signs of life, and since OTA from the phone app wasn't working for me, it downloaded the Kawai Android APK, decompiled the Java, figured out the hardcoded key used for encrypting the firmware update. Extracted the piano firmware update, decrypted it, and then wrote a flashing script to program the piano from my laptop via bluetooth. My piano was back to working within an hour.

Comment by idiotsecant 3 days ago

I can't imagine where we are headed. You understand every step of what it did and can appreciate the complexity but it'll only take a few generations for this to become something like magic to the tech priests beseeching the machine spirits for blessings

Comment by fasterik 3 days ago

I think you're overestimating how much the average person knows about how technology operates today, or 30 years ago, or 1000. In some sense, we have been living with magic and tech priests since the Romans built the aqueducts. I wouldn't be surprised if widespread, cheap AI makes it easier for the average person to learn how things around them work, if they are so inclined.

Comment by lazyasciiart 3 days ago

I meet kids today who haven’t heard of Microsoft, who regularly play GTA and hand in assignments made in Powerpoint. 20 years ago I discovered that a friend didn’t know Xbox and Word were both from Microsoft. It’s really hard to understand what is common knowledge in different parts of society.

Comment by ForHackernews 3 days ago

Indeed. You'd be shocked how few people on Hacker News even know the difference between cross stitch and blackwork.

Comment by oblio 3 days ago

https://xkcd.com/2501/

Comment by 2muchcoffeeman 2 days ago

Those are just products who cares?

I think the GP is alluding to understanding the fundamental way a thing works.

Comment by BobbyTables2 2 days ago

Kids today don’t even know where the files are stored or anything about partitions, drives, directory structure or even how much disk space is available.

They have some files, synced to OneDrive and do everything else fully online (Canva, etc.)

Most of them have never seen a computer with a drive other than C:

Comment by idiotsecant 2 days ago

Kids today don't even know the most basic x86 assembly instructions! A whole class of third graders, and not one of them could tell me the difference between MOV and LEA!!!! Can you believe it?!?!

Comment by dotancohen 2 days ago

Apt username.

Comment by gschizas 2 days ago

[dead]

Comment by seanmcdirmid 2 days ago

My son has never seen a C: drive before. Heck, we got him a Macbook Neo a few weeks ago and I don’t think he has left more than a few coding apps since then. Thankfully he isn’t using AI yet.

Comment by lazyasciiart 18 hours ago

Kids in college 20 years ago didn’t know that either. Some of them didn’t understand that they had a school email address.

Comment by throwawaytea 2 days ago

I'm always shocked by the amount of people that have been looking and using refrigerators their whole life.... And have zero idea how it works.

Comment by dotancohen 2 days ago

  > widespread, cheap AI makes it easier for the average person to learn how things around them work, if they are so inclined.

It looks to me that the far more common use case will be to manipulate technology rather than understand it.

The example with the synth is excellent. Today that kind of work demands somebody knowledgeable operate the AI harness. In short order, the AI may very well come up with the solution of looking online for example programs to decompile without the user even understanding what that means.

Comment by subarctic 3 days ago

The point is, eventually not even experts will understand what it's doing

Comment by throwaway173738 3 days ago

If religion and human technology are any guide, there will be a lot of this but it will never be the entire sum of human activity. Some of us are just too damn curious. We go straight for the curtain. I refuse to believe that very human pattern won’t continue.

Comment by mrighele 3 days ago

"In the distant future, humans live in a computer-aided society and have forgotten the fundamentals of mathematics, including even the rudimentary skill of counting.

The Terrestrial Federation is at war with Deneb, and the war is conducted by long-range weapons controlled by computers which are expensive and hard to replace. Myron Aub, a low grade Technician, discovers how to reverse-engineer the principles of pencil-and-paper arithmetic by studying the workings of ancient computers which were programmed by human beings, before bootstrapping became the norm—a development which is later dubbed "Graphitics"." [1]

[1] https://en.wikipedia.org/wiki/The_Feeling_of_Power

Comment by aeon_ai 2 days ago

I’m all for the sci-fi extremes that we might lose valuable skills to cognitive delegation, but the idea that we as a society will forget how to count is… extremely stupid.

Comment by rnentjes 2 days ago

To be fair, the average person already doesn't know how to do simple arithmetic.

Comment by Flere-Imsaho 3 days ago

I'm not convinced that's where we are heading. LLMs are really good at explaining things ("explain to me like I'm a 5 year old").

Comment by yen223 3 days ago

A post that lives rent-free in my head points out that a kid who is addicted to chatgpt is going to be more literate - and therefore likely better educated - than a kid who is addicted to tiktok

Comment by sitzkrieg 3 days ago

and both saw the world through an inherited training/feed bias and censorship, hurray!

Comment by lazyasciiart 3 days ago

Just like they always have. There’s a reason religion is mostly inherited.

Comment by claysmithr 2 days ago

Everything a human knows has to be learned

Comment by ssl-3 3 days ago

Has there ever been a modern time when this wasn't the case?

I mean: I can only go back so far, but I remember the 1980s well-enough. At that time, most of the new information that came into my brain from outside was sourced from public schools, newspapers, and the evening news on TV.

None of these sources were particularly unfiltered, uncensored, or unbiased. It was always an abbreviated approximation of someone else's idea of the truth.

Comment by seanmcdirmid 2 days ago

Even in pre-modern times censorship was the norm. Heck, it wasn’t until the printing press was invented that the powers that be had to start doing it explicitly.

Comment by zx8080 3 days ago

It's enough to make "explanation" a separate "educational" license to make it less broad used. Or disable it in some countries (this is happening already).

Comment by Flere-Imsaho 2 days ago

This is why locally running LLMs must be the future. We don't all need PHD level AIs to answer 99% of our queries, or to teach us a new thing. I'd encourage everyone to learn how to run and deploy local LLMs, even if they are not quite there yet in terms of performance.

Comment by fivestones 2 days ago

Wait, really? Can you give specifics?

Comment by zx8080 1 day ago

Use VPN and try switching countries for yourself. Start from non-EU ones. You'll see.

Comment by ethbr1 3 days ago

There's a big difference between having something explained to you and developing expertise in it.

I don't see an AI-as-explainer future where expertise isn't sacrificed en masse.

Capitalism rarely supports a currently economically unproductive alternative for future good reasons.

The recent AI tech layoffs are a warning sign that corporate leaders will happily shoot their company's (and the future's) expertise to pad next quarter's financials and trust in 90% correct, but much cheaper, AI.

Comment by Beijinger 2 days ago

"There's a big difference between having something explained to you and developing expertise in it."

No, there isn't. You get things explained in University. Then you build on top of this knowledge.

Comment by ethbr1 10 hours ago

That's not at all how university works.

You are explained things (least important part) and then you invest substantial amounts of time in practicing and exercising those new skills.

Then, in your junior level jobs, the same cycle repeats.

That exercise component isn't going to happen in university with AI in the loop, because AI will be able to shortcut basic practice.

And it isn't going to happen in junior level work, because AI will be able to do those jobs more economically efficiently.

See previous from HN fp for a more eloquent explanation: https://ergosphere.blog/posts/the-machines-are-fine/

Comment by baq 3 days ago

Imagine someone in a position of power mandating that LLMs should not be good teachers.

Comment by zx8080 3 days ago

Some manager at LLM provider: "hey, we can sell 'education' ability as a separate product!".

Comment by baq 3 days ago

You jest, but I’m actually convinced education-tuned LLMs are (today) the only way education outcomes can actually improve in the AI era. As is, students are leveraging them for doing homework which makes homework useless, you want and economically need a model which can work as a 1:1 tutor with minimal supervision (and some hardware so lessons aren’t keyboard-driven).

Comment by zx8080 3 days ago

> and some hardware so lessons aren’t keyboard-driven).

What's wrong with (screen-, probably) keyboard?

Comment by baq 3 days ago

Writing with a pen or pencil has better learning outcomes than with a keyboard for neurological reasons.

https://pmc.ncbi.nlm.nih.gov/articles/PMC11943480/

Comment by NonHyloMorph 2 days ago

Most kids can't use a keyboard and never will. Their Apple Pencil scribbles don't seem to make them particularly smarter.

Pen&pencil-> create something from (almost) nothing. Stylus input-> subpar slow interface for computation.

Ipad data storage above par organisational help (no loosing lousy stuffed in bag paper).

I kinda liked the AI to transpose handwritten/drawn notes into digitally orderable artifacts. Seen a couple Show HNs. Are there any advances in the field (preferably OSS or one time purchaseable as alternatively)

(To add on to this: the utter physical imprecision of stylis pens is annoying. I can FEEL where a sharpt tip of a tool that is elongating my hand touches a surface and how it moves on a very fine scale/resolution. Probably not a problem for people who have not developed highly sensitive sensomotor perception because they grew up with a lot of flattness in there surrounding and not much plasticity, but: my god are these things clumsy. I always want to reach for a sharpener when i use an apple pencil lol.

Comment by alchemism 1 day ago

You can buy finer-tipped replacement caps for the mechanical pencil effect. Then there is the second problem: the texture of glass instead of paper.

Comment by wiether 3 days ago

I've been writing code since my teens, I've studied assembly... yet the fact that _things_ start happening when I press the power button on my computer are pure magic to me and I like it this way.

I started digging a few times, but, I prefer the "magic".

Comment by WillAdams 3 days ago

I prefer at least a superficial understanding.

Hopefully, there will never be a time when at least some folks are not reading books such as:

https://www.goodreads.com/book/show/44882.Code

Comment by rnentjes 2 days ago

Keats blamed Newton for taken the magic out of the rainbow with a prism. Personally I think the magic only got greater.

Comment by WJW 3 days ago

Eh. The only real things you need are:

- On startup processing begins at a known address, and you put the bootloader code over there. Hardware engineers can guarantee this for you.

- Every time you execute an assembly instruction, the program counter either explicitly jumps to a new location or else it just increments by 1. Hardware people can also make this happen as easily as implementing an adder.

Don't get me wrong, there are LOTS of layers between the hardware and most "useful" programs any of us will ever write. But all of them are pretty understandable. They're often not very complicated, just tedious.

Comment by tonyarkles 2 days ago

Similarly for making a basic CPU that implements the logic you’re describing. In 2006 or so I made a super simple microcontroller on an FPGA for a course project. It had a whopping 256 bytes of RAM, 1kB of ROM, and I think four 8-bit registers plus a 16-bit program counter. You could only jump +/- 256 bytes. It was largely useless but also incredibly satisfying.

Comment by spacedcowboy 2 days ago

I'm genuinely puzzled by how you know enough about a system to even understand there is a basic assembly language, but still consider how "switching on" is 'pure magic'.

Doesn't the one explain the other ? It may be turtles all the way down, but at some point there's a fundamental turtle - be it LEA or CMP ?

Comment by idiotsecant 9 hours ago

There is an absolute gulf between knowing what assembly is and a functional computer

Comment by 3dsnano 3 days ago

turtles all the way down

Comment by 2 days ago

Comment by pmcarlton 3 days ago

I think it will be just like Dr. Know in Spielberg's "AI" movie from 2001 — I found it amazing how the oracle, though giving mystic-sounding obfuscated answers, was actually intelligent enough to figure out (a) what the kid was asking for and (2) give the correct answer.

Comment by jlaternman 2 days ago

It is amazing how Dr. Know projects where AI is likely to go. And a Kubrick script, no less. Even the commercial overlap, where you pump in coins as the only way to get answers. Did it not also have ads? Truly prescient.

Comment by LogicFailsMe 3 days ago

Honestly, don't think so. That's certainly the path one might extrapolate if the next generation grows up exactly the same way as the current generation, but that's not how it works.

They will be exposed to this technology throughout childhood as their brains develop and they will develop unique ways to work with it we don't entirely understand just like GenY with cell phones and GenX with home computers. I think you deeply underestimate how adaptable we are as a species, but if you consider that we've been running the same OS and Bios as a species for the past ~40K years, perhaps you might be more optimistic?

Comment by Procrastes 3 days ago

As I recall, the Dr. knows were programmed to feed that information to runaway mechas who, who were, in turn, programmed to seek out the blue fairy.

Probably a lesson in there somewhere.

Comment by reactordev 2 days ago

Kids grew up on this man, they are master prompters. You’ll be asking them to fix your holoTV and your crypto phone when you’re too old to read the brainfuck.

Comment by otabdeveloper4 3 days ago

Give it six more months and you'll have a second "oh shit" moment when you peek behind the curtain of LLMs shitting the bed.

I guess tech unsavvy people who are easily amused by LLM tricks will always exist, but they'll be an increasingly smaller minority as time goes on.

Comment by ewalk153 3 days ago

This is truly remarkable. Congratulations!

Comment by Uptrenda 3 days ago

yeah thats mind blowing, ngl

Comment by jazzyjackson 3 days ago

Baller

Comment by Galus 3 days ago

That's sick.

Comment by gyomu 3 days ago

Yes, those tools are extremely good at reverse engineering. With a bit of know how, it is now trivial to reverse engineer any protocol or crack any software, often in a matter of hours or less.

A lot of people in the industry have vested interests in this not being discussed openly so you don't hear too much about it, but the implications are huge.

Comment by j-conn 3 days ago

What are some of the implications? Where does widely available mythos-level hacking lead? By people with a vested interest, do you mean non-cloud software vendors?

Comment by aero142 3 days ago

Software that had a data moat because it was hard to integrate with or migrate off of will have that moat disappear. A web site is a client now. Building data migration too for all of you competitors is easier now.

Comment by SyneRyder 3 days ago

I've just had a SaaS that I use decide to implement a 2.4x price increase. I reacted instead by taking screenshots of every page of the SaaS, downloading their API docs, exporting what data I could, and asking Claude to build a self-hosted clone based just on those files. I had a read-only version of my entire data history completed in a single evening. Even at Opus API rates, it cost me less than half the price of a single annual seat.

Comment by grepfru_it 3 days ago

Heh and without api docs, just copy and paste the urls from network traffic and Claude will write a library for you.

Comment by StanAngeloff 3 days ago

One of the many SaaS products we use at Day Job chose to gatekeep its MCP behind an enterprise plan. A brief Claude Code session later and a better, more feature-full MCP than the official was reverse-engineered from internal APIs by Opus.

Comment by hyperman1 3 days ago

Right now, software is protected by the attacker not having enough competence. If that's over, the logical next step is using real encryption.

E.g. a synth has a public key embedded. To change settings, you upload them to the vendor, who blesses them with their private key.

Hacking such a synth requires either jailbreaking the synth, or the vendor losing their key . Both can be mitigated with tamper resistant hardware.

We're well ahead on this path already, I assume AI will accellerate it. This is very bad news for the right to repair.

Comment by darkwater 3 days ago

But everything you described was basically a byproduct of incompetence somehow no? On both side. That's why the right to repair and how local HW should be treated when the online counterpart is EOLed by the manufacturer should be mandated by law. A law that stands on the side of the citizen, the end-user, obviously.

Comment by hyperman1 3 days ago

I would not describe it as incompetence, more as

1) current encryption not available in the 1990's. These are the age of DES and weapon-grade vs commercial encryption. There was a legal cost blocking strong encryption.

2) Manufacturers were not as strongly opposed to people touching the internals. After WW2, most people could fix anything, because survival depended on it. Even in the 60's radios etc. came with schematics, and building your own was normal and cost-effective. The shift happened in the '90s, with governements requiring licensing for everything, and mass manufacturing making repair less cost effective than buying a new one.

Our current culture where only people blessed by the manufacturer are allowed to do anything is very recent.

Comment by justafewwords 3 days ago

(Reads:) "But, but...but... but everything... you described ...basically seem to be somehow a byproduct of incompetence...no"

[trying-to-generate-random-making-sense-content]

Let me gasps ask: The older six-fingers-"AI"-characters had learned an music-instrument by now, ander are much more capable of playing music you otherwise haddn't known or thought about..."?

um What about those early shadowy boygroup, whom seem asian, no ? (-;

[after-losing-entry-address-of-topic-question]

But back to your trustworth-written text, Yes!

regards,

Comment by yonran 1 day ago

I think companies with valuable data to scrape (e.g. media companies) will eventually lock it behind APIs that verify Apple App Attest or Google Play Integrity. And deprecate websites which are easily scraped too. Then it will be useless to reverse engineer APIs used by apps and we will have to run the unmodified client on an unmodified OS.

Comment by ls612 3 days ago

It wouldn't surprise me if reverse engineering is put on the "highly unsafe" list in the near future in the same category as bio because of these interests. Can't have the cattle classes be able to control their own property now can we?

Comment by pixl97 3 days ago

This is pretty much a given anyway. Making reverse engineering tools is already likely to get you sued by someone so model makers are apt to slow down the ability of their tools to reverse engineer to avoid the lawsuits themselves.

Comment by ls612 3 days ago

Heh finally the impunity of the NSA is good for once. Good luck suing them over Ghidra

Comment by ElFitz 3 days ago

Some people even had some fun de-minifying JS and disassembling binaries. Successfully.

Comment by aizk 3 days ago

What do you mean? Everyone is talking about Mythos.

Comment by fn-mote 3 days ago

I think GP is talking about cracking, not pen testing.

Comment by trumpdong 3 days ago

Those are the same thing. They're talking about decompilation and protocol analysis.

Comment by bredren 2 days ago

They're talking about patching Claude Code.

Comment by NoMoreNicksLeft 3 days ago

>Claude walked me through examining the some of the original software in GHIDRA,

I wanted to be able to decrypt the files on The Complete New Yorker magazine DVDs. The old software was WinXP only, and crashed by the time you turned to page 3 or 4. It walked me through using Ghidra on the relevant dll, mapped out how it was using Blowfish, what the credentials were that it was passing, and re-implemented all of that in a python script.

Now all the files are in plain pdf.

Right now, it's helping me write an extension to the mkv specification for embedded scripts and modify VLC to be conformant, so I can watch Black Mirror Bandersnatch. Already have a buggy implementation, about 3 days in.

I've also had it add BEP 46 mutable torrent functionality to Transmission (and to some extent, to the WebTorrent library).

These are all well beyond my abilities to do casually, and probably beyond my ability to do even if I spent the next 18 months doing nothing by grinding away at it.

I only replied because I thought it curious that Claude apparently favors Ghidra.

Comment by mekael 3 days ago

Interestingly enough, i’ve been sitting on a project for the last 12ish years where i just took the FMloader lib and used that from C# to turn the djvu files into pdfs. All that was needed was a decompiler and an hour of banging my head on it. I published some of the results a few years ago but need to go back and actually build out a full app.

Comment by NoMoreNicksLeft 3 days ago

I'm trying to not do the naive pdf creation, where each page is just the raster. Trying to keep the JBIG2 bilevel, as I get better quality at lower file size. Using jpeg2000 too, where appropriate, but the pdfs are still x2.5 the size of the original. Though, I can have it spit out decrypted djvu files that are exactly the same filesize... I just don't like that format for archival.

If you want the Rolling Stone or Playboy archives decrypted, ReconSuave on github has tools to do those. I got tired of waiting for him to do The New Yorker though.

Comment by mekael 3 days ago

Ive mainly been outputting them to high fidelity jpegs and then stuffing them into a cbz for portability. Works well went im reading on my ipad. As for the others i had them sorted out about a week or two after i decompiled the original binaries.

I’ve definitely kicked myself a few times for not posting about them sooner, but the fear of pissing off CondeNast tempered my willingness to show off

Comment by NoMoreNicksLeft 3 days ago

I don't think CondeNast cares.

Do any of the cbz readers handle jpeg2000? It makes a big difference in filesize without any quality degradation. Like 40% smaller, maybe more in some cases. You should tinker with that if you have the time.

Comment by fc417fc802 3 days ago

Okular handles cbz that contain jxl with no issue. (IIUC both archive format and image format support is provided via a pluggable extension system but I don't recall the details because my setup has "just worked" for a very long time now.)

Also FYI you can use mupdf to read cbz archives although I don't personally recommend it for that usecase.

Comment by peterstjohn 3 days ago

Ooooh, you don't happen to have the code for the New Yorker decryption in a form you could send, do you? Or put up on github or even just give me the starting prompt…

Comment by peterstjohn 3 days ago

Okay, a couple of hours later…thanks for the hint as that's fucking dark magic ;) and I now have access to the entire New Yorker again after around 15 years :)

Comment by NoMoreNicksLeft 2 days ago

Since I think you'll find the one for Rolling Stone and Playboy, but not The New Yorker (I might be one of the few that has this or something like it)...

https://gist.github.com/NoMoreNicksLeft/d3a4a59c7b4de0cdef1e...

I'll have a more proper github repo at some point, but there are bugs I was working through. Some issues are bloated up ridiculously... a 9mb djvu file shouldn't become a 110mb pdf. Most issues will work well though. Hope it helps.

Comment by darksim905 3 days ago

What was your setup for this and did you have any preferences set in Claude to get started with something like this?

Comment by NoMoreNicksLeft 2 days ago

I use Claude on the desktop, and only occasionally Claude Code. It's the one that recommended Ghidra. Walked me through the install. Taught me the basics (G to go to an address, etc). Would tell me where to go, and what to paste back to it. It eventually converged on where to find the iv and credentials and so forth (after acting confused for awhile), and then wrote the python script for me that decrypts. I'd like to think my questions (and challenges to its assertions) were intelligent enough to spur it towards the solution, but self-flattery is all that is.

The dll in question was pretty obvious just from the filename alone that it was where the magic happened.

If you want something similar, you might just start by asking it if it would be feasible to decompile the software in question to reverse engineer the decryption, that you'd heard Ghidra was a big deal. Keep nudging it to guide you along that sort of path.

Comment by notagoodidea 4 days ago

I would be interested to learn a bit more on the how after reading also [0] and the worlk done on patching the Ableton Move firmware with the Schwung [1]. Slightly different but there is an increasing amount of work done on either old hardware and new one exploring patching, swapping or developing new firmware from scratch thanks to LLM/GenAI currently.

[0] https://mforney.org/blog/2026-05-28-patching-my-guitar-amps-... [1] https://schwung.dev

Comment by tomduncalf 3 days ago

Schwung is great. See also the recent new firmware for the Elektron Monomachine (old unsupported hardware) created using LLMs

Comment by mattmanser 3 days ago

With stuff like this, do you honestly not feel that you've probably been tricked and that someone else actually did this?

Don't get me wrong, I think AI can do some surprising things, but with stuff like this, often it just stole the code and the steps without attribution, it didn't figure it out.

There'll probably be a blog post detailing exactly how to do this somewhere and Claude just copied the steps and code.

And worse, Google search would have found it 10 years ago, but Google search today would claim there are no results?

I think incredibly specific stuff like this often won't pass the 'did Claude just steal this?' test when you dig into it.

Comment by tonyarkles 2 days ago

I appreciate where you’re coming from but no, I don’t believe so. I have had Claude do some incredible reverse engineering on very proprietary niche firmware blobs that aren’t generally available to the public. One of the really interesting reasons why I don’t believe that it’s simply regurgitation but rather iterative novel synthesis is because of the dead ends and blind alleys that led to success. It feels a lot more like “Claude has read every tutorial on Ghidra and Radare2, and has memorized the ARM architecture and datasheets for all of these microcontrollers”. Misidentifying, say, which subfamily of processors it is based on the IVT, only to course correct when I give it the VID/PID of the device booted into DFU mode.

One piece of gear, Claude found a hidden and highly useful diagnostic screen. This took a few iterations too. It found the existence of it based on just running “strings” against the firmware image but needed a few rounds of me going “I tried what you suggested but this is what happened instead”. Searching Google, DuckDuckGo, and GitHub for any of the strings that were on that screen or any of the named constants associated with that screen in reverse engineered source led to exactly zero hits.

More entertaining, Claude and I together also nailed down the source of a PTP synchronization bug in a piece of equipment a few months ago using the main UI .exe (written in pascal, of course), an ARM Linux image from the real-time controller in the box, and some pcaps from it interacting with other devices. The vendor released a patch a few days ago, without me having reported the bug.

Comment by ezconnect 3 days ago

It was probably done on a foreign language on an archived forum. Claude is the improvement of the internet search box.

Comment by ozim 3 days ago

I take it….

Improvement over all the assholes that tell you „just google it” after you spent two days hitting the wall.

Comment by zellyn 3 days ago

I had that keyboard! I actually really like the piano-ish touch. I remember being sad though, when I realized they’d crammed all the sounds into I think 16MB (or was it 8?) and realizing how bad that was even by the late 90s! I think I still have mine in the garage somewhere… good times!

Comment by claytongulick 3 days ago

I loved mine. Had it since the 90s, working perfectly.

One day a few years ago my dad came by and was admiring it (it was a QS8) and asked to borrow it so he could play piano again.

I, of course, said sure, but was feeling a little salty about it inside, because I wanted it to play, that's why I had it all set up.

Anyway, about a year went by and I asked him about it to see if he was done with it.

He said "oh that thing? I gave that away, was just taking up space"

-.-

Comment by richardfey 3 days ago

You mean bad because they could have used a larger memory module and thus higher resolution sound samples?

Comment by skhameneh 3 days ago

Hey so... mind sharing findings? I have a QS8 :)

Comment by itomato 3 days ago

While not the "oh shit" moment, the wave has the same shape.

I have an DigiTech GNX3000 effects pedal board - a digital modeling "workstation" that needs the aged Windows native software or Gdigi to make the most of.

At best, the experience with gdigi was passable; raw access to the patches and controls, the ability to control it from the laptop, etc.

In an hour or so, I had a functionally superior webmidi version up and running in Vercel using their v0 code. It kicked off a wave of subscriptions and referral chasing.

I made it a template - because there are so many gnx3k users out there: https://v0.app/templates/digitech-gnx3000-sysex-tool-GC5LzXA...

Comment by 4 days ago

Comment by djmips 4 days ago

That's fantastic. Did you use a Ghidra MCP server? It's kind of magical huh?

Comment by alright2565 4 days ago

I've done a similar sort of thing with my camera lens' firmware updater just out of curiosity, and I didn't use any kind of MCP. It's able to write an automated script using the Ghirda API to decompile the program just fine, and then code exploration can be done by reading the code.

Claude needs good variable names a lot less than humans do, so renaming/typedefing doesn't seem to be as necessary.

Comment by shreddude 4 days ago

I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.

I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.

Comment by williamdclt 3 days ago

> projects I simply could not have ever approached alone.

I think that's part of the divide between enthusiasts and naysayers. If you use GenAI on things that you couldn't approach alone, it's an incredible tool. If you use it on stuff that you're pretty good at, it's not a gamechanger (and if you're an expert, it's a minor boost at best). Many people's job are about doing what they're an expert at.

Comment by pmontra 3 days ago

I'm about to complete a new non trivial functionality in a project of a costumer of mine. I spent an hour writing the spec. Then I asked Claude (Sonnet 4.6) to check if I missed something. I did, the sort of minor issues one notice after starting writing code, edge cases etc. That made me think about more issues and after a few iterations we settled down on a spec. I asked Claude to make an implementation plan and we ended up with 9 steps. It wrote the code for a step with new automatic tests and I performed some manual QA, which found further issues we didn't think about. We are at step 8 of 9 in about 12 hours of work. I would have needed a week to be there alone, with time spent researching and fixing bugs I created along the way, an inevitable part of our job but not exactly the most pleasant one.

This speedup is great. It improves the overall quality of the product (as perceived by the users) because I can ask Claude to add features that my customers and I would have dismissed because they take too long to implement. We would have settled down with a more basic UX.

So is it a game changer? It is in the same way those HTML / CSS framework like Bootstrap were game changers: suddenly every developer could create a decent and consistent UI in a fraction of the time with a few bells and whistles that we wouldn't have bothered coding. As a side effect a lot of web apps felt look alike mass products and web designers had to reinvent themselves, but the economics leaded inevitably in that direction. Would I spend again one of two weeks doing alone what I could write in a day or two with a LLM? Not anymore, not at this cost ($20 per month.)

Comment by jowsie 3 days ago

I'd love to read a full transcript of someone going through this kind of collaborative programming. I see this kind of process mentioned a lot but can't quite figure out the details in my head. If anyone has a link to a blog post or similar showing this process in depth, I'd love to give it a read :)

Comment by tkocmathla 2 days ago

I've been using superpowers [1] for this purpose, and have really appreciated how it guides the model to use careful, methodical approaches to answering my prompts. It's great for multi-step planning, design, and implementation, but also has guidance for debugging, accepting a code review, etc.

[1] https://github.com/obra/superpowers

Comment by burnto 2 days ago

Yeah I feel like I’ve learned a lot from superpowers. It’s such a thoughtfully developed skillset.

Comment by sntran 3 days ago

I think it will click once you actually sit down with the AI agent, toggle Plan mode, and just tell it what you want to do in couple of sentences. It will immediately start building up the plan, presenting it to you what it thinks is the right approach , with the steps to take, with open questions that you can look at and answers. Then send them back to the AI. Repeat. That process along would give you a progress way further than you try to do it by yourself.

You can tell it to start implementing step 1. And you pick it up from there. Very natural how you would approach an expert for help, but you can always audit.

Comment by pmontra 2 days ago

I did not use plan mode but I'll give it a try.

I can't provide a transcript because it's work I made for a customer and I'm bound to a clause of confidentiality.

What I did is what I use to do while starting to work on a major feature: make a list of changes, new and modified functionality, think which code and db tables I will touch and how, set constraints on the edits (eg: that API must not change, that one must be retro compatible) etc. I've been a bit pedantic because this time I had to tell it to someone else. I wrote it into a md file and asked Claude to check the code and find out if my plan was consistent with the code we were starting from. It made a list of things that I needed to detail more, added some questions and we iterated on it. Basically it's what I do myself but it happened faster.

Comment by mitjam 2 days ago

This Spec Driven Development Short course on DeepLearning by Paul Everitt is a nice 2-Hour walkthrough: https://www.deeplearning.ai/courses/spec-driven-development-...

Comment by nsvd2 3 days ago

Jon Gjengset has some live streams where he does agentic coding.

Comment by PeterStuer 3 days ago

You describe almost exactly how I work, except I always use Opus with effort locked at max. Lots of detailed multi level planning, then coding the different planned steps, which it at that point just one shots, with a plan review and adaptation after each step.

5x speedup and quality.

Comment by dahart 3 days ago

> If you use it on stuff that you’re pretty good at, it’s not a gamechanger (and if you’re an expert, it’s a minor boost at best).

This was probably true last year, and it’s a common talking point, but I’ve seen too many examples now of deep experts using Claude & Codex in the last year to solve very big problems, and write or rewrite large systems. The experts do complain that the LLMs can sometimes get stuck or go off the rails and they need to pay attention and actively steer. But nobody I know who’s using it is still claiming the LLMs aren’t a game changer, even quite a few people who were staunch holdouts for a long time. I was skeptical myself, for a long time, but had my oh shit moment late last year.

One caveat - to get expert results, you do need to have some experience using LLMs, you need to use it to write plans and design docs, know how to use ‘skills’ and MCPs, use it to review code, and (for now) you need to understand context compaction and when/why to use sub-agents. If you’re a domain expert but an AI noob, it’s less effective than an expert who knows how to use AI and has experience.

One of the biggest problem with humans is we’re wired to spot patterns and draw conclusions and then we have a really hard time seeing and accepting change and updating our mental rules. The LLMs are getting better. They have already gotten better, and they’re going to continue getting better. It’s too early to draw conclusions, and many conclusions people have already declared are out of date and no longer true.

Comment by bawolff 3 days ago

I think part of it is we often notice bad AI usage. The llm generated "art" by someone with bad taste, or the patches to open source projects by people who cant program at all and are teerrible.

If the use is half decent people just dont notice it.

Comment by tstrimple 3 days ago

Anti-AI zealots (from a practical usability position. Not necessarily the moral ones) are like the people who looked at The Daily WTF and decided no humans are capable of programming. They had plenty of examples to point at, but refuse to look at decent to great programmers. The stories of "The AI deleted my database!" are prevalent and boosted by these folks because it confirms their biases. It literally doesn't matter if the LLM wrote strong warnings about the action about to be taken. They don't see that aspect of it. Just the fact that someone claims "The AI deleted my database!" is enough for them.

Despite all the liars telling me gaming is easier on Linux than Windows, most new games have some sort of issues launching with default settings. CC is able to dive into both the exact error logs and the recent community feedback on what tweaks / configurations are needed to make it work. I rarely have to go beyond two prompts before a game is playable. CC and Proton are enabling the Linux gaming experience far more than Linus ever has or ever was interested in.

Comment by Flere-Imsaho 3 days ago

> Despite all the liars telling me gaming is easier on Linux than Windows, most new games have some sort of issues launching with default settings. CC is able to dive into both the exact error logs and the recent community feedback on what tweaks / configurations are needed to make it work. I rarely have to go beyond two prompts before a game is playable. CC and Proton are enabling the Linux gaming experience far more than Linus ever has or ever was interested in.

Heh - I've just gone through a similar journey transitioning from Windows to Bazzite to play Steam games on Linux. I wouldn't have bothered pre-LLMs because my day job is Linux/Software and the thought of trying to fix issues here just to play games put me off.

Comment by OJFord 2 days ago

Imo it's still great for areas you have expertise in, because it's a tool for automating the boring, repetitive, or time-consuming bits that you can then expert-verify.

I'd rather review & tweak generated test cases than write a load of boilerplate, test setup, etc. myself.

Comment by LouisSayers 3 days ago

I find it's a huge boost for my day-to-day work.

If you work on architecture and Claude docs, then you can essentially just have it fill in the gaps. Work then mostly becomes a matter of defining what the next piece of functionality is (which you can also use Claude to help with).

The stuff that used to take days now takes hours. It's not perfect, but if you get your codebase into a good shape then the payoff is huge.

Comment by mattmanser 3 days ago

I re-read something I did 6 months ago doing this.

It's so obviously AI and had much less value than I thought now I look at it with fresh eyes.

Worse it doesn't read like I wrote it, I don't recognize myself in the doc.

Comment by jorl17 3 days ago

While I think this is true

> If you use GenAI on things that you couldn't approach alone, it's an incredible tool.

I think this isn't true in all cases

> If you use it on stuff that you're pretty good at, it's not a gamechanger (and if you're an expert, it's a minor boost at best).

I think even then there's a divide.

I mostly work greenfield projects (and love it!). For these, AI has been a literal game changer. Our projects are built faster, with one or two orders of magnitude more automated tests, and all quality metrics are up.

Meanwhile, nearly all of my friends complain that AI doesn't help them. But they mostly work in very large existing codebases.

Still, even in large projects I think AI (the expensive variant) has been a complete gamechanger for me. Sure, I spend a lot on tokens, but I just feel happier and enjoy what I do more. The singalong people say about "thinking at a higher abstraction level" is what I feel. I really am thinking about architecture and larger patterns, instead of the boring nitty-gritty (which wasn't boring at all when I was a kid learning to code!...)

I think a key factor in all of this, to me, has been dictation. Most of the time, I don't write -- I use voice-to-text. I don't even read what comes out of it -- the LLMs get it (it is mostly unintelligible to anyone else) .

This means when I'm planning a big feature, I give a gigantic brain dump to the LLM in perfect stream of consciousness way, going through ideas, pros and cons, edge cases, what exists, what doesn't exist, where I'm sure of something, where I'm not sure and want the LLM to browse the state-of-the-art. Sometimes I spend 20 minutes just talking to the microphone before I send the first prompt. When I pair that with Opus, I find that I am able to build much faster and to go through alternative designs much more frequently as well.

I keep trying to tell all my friends: use voice to text and braindump to the computer. But they refuse... I couldn't imagine having to type everything nowadays. Even though I'm a fast typer, it's still much slower than the speed of my thought, which, granted, is still faster than the speed of my voice.

In effect, I filter much less, but I've come to think that's positive for the good LLMs: I throw all the edge cases and what ifs I'm thinking about -- all those years of experience dealing with similar systems.

If I wanted to go back to work in-office, that would be my major problem: I need to be able to talk with my computer all the time, loudly, and pacing through my room.

Comment by bthallplz 3 days ago

Yay for dictation! It's so nice to just think aloud and then have an easily editable record of your thoughts, even when you aren't feeding the outputs to LLMs.

Comment by 400thecat 3 days ago

How do you use voice-to-text? You mean, in the browser? I am only familiar with Claude Code, which I have installed on remote server, and there obviously, voice-to-text does not work. I have to type, which is tiring.

Comment by bigfudge 3 days ago

I’ve installed Hex on os x. You just hold down a hot key to talk and it writes into whatever text entry widget is focussed.

Comment by erixM 2 days ago

Try using Handy! It's open source and free: https://handy.computer/ and also on Github https://github.com/cjpais/Handy

Comment by jorl17 3 days ago

There are many tools for this, and I use the one that I tried first, so there are probably better-suited alternatives out there.

I run MacWhisper, and I paired it with BetterTouchTool so it triggers on any input when I double tap the fn/globe icon.

Obviously all of my transcriptions through it are entirely local. I usually use the Large V3 Turbo model, though in the beginning I used Parakeet v3, which was slightly faster but produced more mistakes (and kept a lot of filler words -- 'ahhm', 'hummm').

However, if I'm interacting with the Claude or ChatGPT/Codex apps, I often use their voice recognition instead, because it tends to be more accurate, especially with punctuation, albeit significantly slower. OpenAI's is noticeably better than Anthropic but I feel like that gap has closed a bit recently (might be all in my head, though).

Like I said I don't really care about mistakes in the transcription. If you try to read it, it feels like a fever dream, but the LLMs get it.

If I say "taken" it may have "take and" If I say "all the while calling the method" it might have "although a while. while. call in the met of". This is a rather extreme example but I've seen them happen. The repetition of words happens because I'm talking with "humns and ahs" and do repeat words or just the ends of words. It's very rare for the models, especially Opus, to have any issue with this transcription. When they do, they tend to signal to me they didn't get it, or I catch them in the act. But, like I said, it really is very very rare.

As an example, I've got quite a significant feature to work on, which would have probably taken me weeks to design and implement, and I've used this exact method today to ink out the plan:

- I have spent the last couple of days researching the feature in my off-time and just "thinking about it in the background" (think: I fall asleep thinking of it -- a habit I've always had)

- I spent ~25 minutes brainstorming out loud. The transcript ended with ~17.000 characters and ~3.000 words.

- I sent that transcript, in cursor, to Opus 4.6-High with instructions on how to iterate on it and how I want to work while planning

- I then spent about 1.5 hours with it iterating and building the actual plan (and supporting technical decision document, which points at the FULL transcript of the whole interaction). Many of my original ideas made it to the final plan, others got scrapped or simplified, and others still got added. It contains a mixture of my ideas, Opus' ideas and our push-back on "each other".

- Now I have a multi-step plan, with at least 8 distinct stages to implement this massive feature which I know for a fact would have taken me weeks to implement, and I expect to implement it in at most 3 days, but very likely it will be a day and a half.

Final context (with regards to your Claude Code question): My main development environment is Cursor, though for personal projects I also use Codex and Claude code. For the initial "researching of the feature in my off-time" I often have interactions with ChatGPT and Claude where they have no access to the codebase, and I have them go find out what the state of the art on specific topics is. All of these interactions also involve me using my voice to talk to them (though nowadays I don't typically use their voice mode, I just let them reply in text). Then I brood over that.

Comment by CPLX 3 days ago

This is exactly my workflow and it’s just incredible. I use aqua and wispr flow depending on which one seems to be returning the best results that day.

Comment by jiggunjer 3 days ago

[dead]

Comment by dawnerd 3 days ago

And in a team setting it can really accelerate tech debt especially if used by people that know just enough to be dangerous.

Comment by Gamemaster1379 2 days ago

I'm an expert at datalakes. I manage them for my company. I also am proficient at backend web. Even still I use Gen AI frequently to manage it all. When my company downsized I kept the lights on. Not enough bandwidth to do more. I've since materially improved the system and doing things we never did even at a team of 2 or 3.

Outside my expertise I've begun writing static recompilers for old retro game systems and have gotten some games off the ground. I understand WHAT they're doing but I neve had the expertise to do such things myself. Even if I did I could never operate at the velocity I am now.

Comment by Wilder7977 2 days ago

Using GenAI on things that you couldn't approach is also extremely scary and dangerous in my opinion. For example, I would never in a million years use generated code I don't understand (fully) to interface and possibly interact with a physical object that can fail, catch fire, break etc. in case of a bug or misuse, like OP mentioned.

Comment by seventytwo 3 days ago

The dangerous thing is when you’re a novice and can’t identify the BS. That’s why for people with “good” and “expert” skill, it’s not a huge boost. They can identify the BS, and what’s left is modestly helpful.

The highest danger in using AI comes precisely to people who stand the most to gain from it.

Comment by throwawaycan 2 days ago

Exactly that. Novice don’t notice the BS. But they see the output and it looks magical. The UI is working! Hardly any time to code that in

Then they send that PR for a review by a more senior person. And that senior person doesn’t even know where to start on how to explain why it’s all wrong and likely to collapse in prod.

Tons of good use of AI. But tons of bad use of it. And when it’s bad and people don’t notice it, that gets dangerous. So because of that, now we spend a lot more time in doing reviews. Essentially creating a new bottle neck

Comment by globular-toast 2 days ago

Classic Gel-Mann amnesia...

Comment by jesse_dot_id 3 days ago

Same. I'm a DevOps engineer, so a jack of all trades master of none type of guy, and Claude Code backfills my knowledge gaps and turns me into kind of a superhero. I think it's key to already have a pretty good idea of what you're looking at, though.

Comment by IAmGraydon 2 days ago

>I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.

Can you tell me a bit more about the firmware/camper van? Has this firmware been decompiled by anyone else?

Comment by erikerikson 3 days ago

I am more of a "huh, interesting demo, I'm gonna check in on it later" sayer than a naysayer. My biggest reason, with coding, is that I already, before AI, struggled to deal with too many distractions from my coding and too many piles of low quality output. I should probably check in since it's been a bit but every time I've tried to generate some simple project, I look through it and think what terrible garbage with so many errors. After two decades of developing my craft, I struggled with most of my fellow human programmers too. The business loves delivery it now even if then someone is revisiting it hundreds of times more to fix it in little bits for a total effort cost of 10-100 (or higher) times more.

Comment by thih9 3 days ago

> I honestly don’t understand AI naysayers.

As an AI naysayer, I see and appreciate the productivity gains, I don’t like the associated cost, mostly the spike in workflow centralization and opaqueness.

Comment by regisb 2 days ago

Yes and I'd like to talk about the environmental impact as well please.

Comment by Schiendelman 1 day ago

I can do things at least 10 times as fast as coding myself now. I'm pretty sure the environmental impact is, frankly, a reduction. My home computer (and feeding myself) for 10 times the time is more than I'm burning in a data center using an LLM.

Comment by hexasquid 1 day ago

Indeed.

I often see comments on the ethics and water/energy consumption of AI, but very few for non-vegan* developers boiling water for their coffees in air conditioned offices that require a commute, which I assume is a common default.

*blue-water usage of a burger patty is worth looking up, I was astonished

Comment by regisb 1 day ago

I assume you mean car commute, right?

I don't know about the US but in France (or Europe) the environmental impact of air conditioning, car driving and meat consumption is well understood by most educated people.

Comment by regisb 1 day ago

I'm not sure I understand. You realise that our token consumption in coming on top of our other energy/water consumption right?

Also, the environmental impact might be small if we run a small number of inference queries every day, but that impact will grow linearly with the number of queries and agents we run.

Comment by Schiendelman 1 day ago

If I spend less time working, which has been the big outcome here so far, I get more life. My own energy usage when I'm not working is mine, not part of the work.

Comment by zahlman 3 days ago

A lot of the time people relate an anecdote about how Claude helped do some cool thing, my reaction is that it's not a thing I would have thought about doing in the first place, and that I still can't really imagine wanting to do myself, even though it indeed sounds cool.

This is no exception.

Comment by sntran 3 days ago

You will be surprised that there are lots of things you want to do yourself but haven't been able to (not just ability, but time and effort).

Comment by boppo1 2 days ago

I know Anthropic has blocks on using Claude for security reasearch; Are they not blocking Reverse engineering or RE tools?

Comment by Bawoosette 2 days ago

From my experience, the safeguards only come up during exploit development. You are free to do reverse engineering and even the first half of vulnerability research (i.e. vulnerability discovery) and it only stops once you want it to actually write the exploit.

Comment by Frost1x 3 days ago

My opinion is that it’s a defensive mechanism. I’ve seen it across experts in knowledge domains and my own. When you hear experts disagree it’s fine because it’s another human, when the LLM disagrees and provides an objective backing that’s often solid, people jump to defense and look for very subtle nuances they wouldn’t bring up with peers and those subtleties are often highly subjective and arguably often incorrect. That’s been my observation.

I for one welcome our new LLM overlords so long as some provide be solid living standards. Mistakes do happen and they aren’t perfect so experts often do have arguments but they do come stupidly close to approximation of expertise.

Comment by 1ste 2 days ago

I'm interested to hear how you use it as a SA

Comment by bah9 2 days ago

Because work just became unfun. You are context provider to LLM.

What is exactly your work? Give context to llm, review, update context. Navigating some sort of super intelligence thru your company's harness is not the same as writing code and creating ideas from scratch. And I don't understand what's fun in that

Yes, you can ship software faster, make corporation even more money. Why is this even important for regular worker? I liked the craft itself

Comment by yuppiepuppie 3 days ago

Any docs/suggestions on how to do this?

Would love to do it with my campervan as well :)

Comment by doctorwho42 3 days ago

Maybe because the scale of investment out strips the value?

What trillion dollar problem is AI solving?

Comment by fragmede 3 days ago

If you're going to put it that way, companies, globally, spend something on the order of $20 trillion on office workers. If corporations didn't have to spend that money on them, and everything else in order to support them, they wouldn't.

Comment by luckystarr 3 days ago

Then the workers wouldn't spend 20 trillion and the economy as a whole would tank.

Comment by jgerrish 1 day ago

> I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.

My brother works in wildlife trapping and management. I've been brainstorming and prototyping ESP32 sensors and mechanics for traps and educational devices with him. I probably won't end up doing the work with him, but I want him to see what's possible with my other brother, a machine learning expert.

Nothing has been deployed in the field.

Nothing will be until he and my other brother commit and get proper software risk management policies in place. And legal advice and other support work. And honestly, he's been careful and hasn't pushed for deployment.

He works with rabies. He works in people's neighborhoods. Maybe yours. Do you want me to finally get a Claude Code account created and go wild building shit, or keep reading up on ISA 62443 and other security frameworks and mapping out the risks?

I'm not going to drag LLM generated work into your neighborhood today. Would you honestly want someone else to?

And when people realize this is happening now everywhere and the entire AI industry is fucked, including other machine learning fields that get hit by association?

Then my other brother the fucking Princeton Machine Learning Super Star can't pay for his fucking kids' schooling because of a million people fucking *not understanding* and intelligence agencies taking advantage of it. He's smeared by the broad anti-AI brush.

Then my brother may have to depend on more assistance from law enforcement, legal resources and conservation agencies. Because I didn't have the power to stop the LLM hype machine earlier.

It always would have made sense to have them work with state and federal Wildlife Conservation officers and agencies. Now it feels a little less like watching my brothers build those relationships out of mutual respect for other professionals and more out of need. It feels unequal.

So, I have to put in work today assuring my brothers' clients of tomorrow, who care about their family and kids, that no, their machine learning algorithms won't take their elderly parents medicine and push them down the stairs. It will, with careful review from lawyers and experts, help their kids identify nature in their backyard on their Smart TV. If they want, it will identify the difference between gopher tortoise holes and mole holes, and maybe if they opt-in to a Community of Saving feature, it will let Fish and Wildlife Conservation know there are habitats nearby so we can see how healthy our ecosystem is as a community together, or call their preferred pest eliminator.

That sounds like PR. Because I have to do that extra work today. Because otherwise we aren't just protected by our fellow professionals who care about theie work out there in the field, instead we always need some bigger institution protecting and controlling us.

My brothers are delaying committing resources to projects. That's fine, they have other important priorities, but I keep warning them. And there probably will be an equivalent of the "Video Game Crash of 1983" in 2028 or whatever. And I think if I had had more personal power and been believed I could have done something about that before we had to be protected.

Comment by naikrovek 1 day ago

> I honestly don’t understand AI naysayers.

If you are describing someone that wants things to be done, then I agree with you.

If you are describing someone that wants to learn things and do the fixing themselves, then I don't understand how you could say that.

For a lot of us, the learning and the mistakes and the eventual fixing of a thing or completion of a project is the goal. Us doing the work is the reward function. AI strips that off and simply finishes the project, removing any and all incentive for the person involved, if they are this kind of person.

Again, simply having the effort completed is probably the goal if you simply want to have something completed that was not completed previously, but if you are someone that derives satisfaction or dopamine from doing the work yourself, then it is very clear that AI completely short-circuits this reward path for that person. Those are the people who don't like AI, and they have a very solid footing with that argument, I think.

Comment by archagon 4 days ago

[flagged]

Comment by donkey_brains 4 days ago

Just as bad as the technical debt is the cognitive debt in your codebase. When something breaks, your only recourse is to ask the AI how to fix it, since it wrote it and you did not have time to review all of its code. Except now the code base is so large it won’t fit into the context window, and the AI can’t help you, and…you’re screwed.

Comment by shmoogy 4 days ago

If you're vibing such complex things you should probably be in the habit of also generating detailed documentation and commits so the ai can follow breadcrumbs, add some playbooks for how to debug and it's actually pretty good. Too complex for local models context though - so you're probably still correct albeit there are ways to mitigate or delay this.

Comment by jplusequalt 3 days ago

>there are ways to mitigate or delay this.

Yeah, like writing the code yourself!

Comment by rvnx 4 days ago

[flagged]

Comment by jazzyjackson 4 days ago

I’ll explain it: these tools are non-deterministic and people have different experiences with them. For a few people every interaction is totally fumbled and they think the cheerleaders of gen AI must be lying, for others the chatbot hits one home run after another and lets them add microcontrollers to their CAN bus. When these people’s good luck runs out and they start getting mixed results like the average user, they assert the service must have been down graded

Comment by triMichael 4 days ago

I'll add to that: you are more likely to have a good experience if it has a lot of relevant data that it was trained on. You are also more likely to have a good experience if errors don't cause major issues.

So one-shotting a game of Snake should be great (tons of training data, errors are easily caught because it's a small program). Similar with building a lot of web UI front end, or one-shotting a personal project. On the other hand, I haven't been convinced that it's good enough to maintain large codebases or assist with niche topics that are not very well documented.

Comment by thewebguyd 4 days ago

> if it has a lot of relevant data that it was trained on

This became evident to me the moment I tried to have these models work on some PowerShell tasks for me. Even Opus today struggles with PowerShell.

Since anything in PS is probably some internal sysadmin tool, there's not much public code out there outside of Microsoft's documentation. Plus the Verb-Noun naming scheme makes it really easy to just hallucinate cmdlets (which it does, often). Its easier to have the LLM just do things in python using M365 Graph API than any of the provided PowerShell cmdlets.

OTOH, I've been using Claude for a lot of Swift & Swift UI work lately and it has no problems there, and I'd imagine there's even less publicly available training data for that so to be honest I'm not entirely sure why it fails so badly at powershell.

Comment by picofarad 3 days ago

I have deepseek or grok write bash-likes in pwsh often enough to wonder what sort of things you're doing in pwsh...

I use it to wrap ping.exe with colors and fewer columns, for example. yt-dlp wrapper to fetch 480p bestaudio with English subtitles, no playlist, works on a surprising number of video sites.

It does make cmdlets up, you're right, there.

Comment by lowbloodsugar 4 days ago

> On the other hand, I haven't been convinced that it's good enough to maintain large codebases or assist with niche topics that are not very well documented.

Same is true of humans. So far my experience is that addressing the issue with the help of AI is faster than not (ie comprehending the system and creating the documentation).

Comment by cauch 3 days ago

I don't understand the comments of the kind of "same is true with human".

This feels a bit like whataboutism.

It also feels like people don't listen to each others.

For example, reading the previous comment, it feels like the thing that reduce the enthusiasm was that at first GenAI looks like it was "reading, understanding and using its own knowledge to answer the problem", but as soon as it is a ore niche or a more complex situation, GenAI looks like it "does not understand the code, just does the equivalent of a StackOverflow search and try to apply the solutions that it found there, and this is why it felt like it understood the code before".

It does not at all means that GenAI is not terribly useful. And even better than humans in some situations.

But it feels that answering "same with humans" is missing this point: that's the opposite, humans usually try to understand the code and are bad at covering a very large range of very well documented subjects. That's the "uncanny valley" they talk about: they assumed GenAI performance on a subject X is due to a "human-like" approach, and it feels very strange when this impression falls apart.

Comment by lowbloodsugar 3 days ago

No I mean I’m in the camp that believes AI and the human brain are analogous and work the same way. Someone once replied, “then why do I need to supervise them?” and I pointed out that there a people whose job is literally ”supervisor”.

Comment by cauch 3 days ago

I don't think that it is what means the parent comment you answer.

The comment you answer to says that their experience is that AI and the human brain are not analogous and that AI is good to store large amount of knowledge and repeat it (or extrapolate based on pattern on the large amount of knowledge), but bad at understanding the code as a human does. Which explains why a human is more efficient when reacting on a thing that don't have a lot of documentation (on which the AI built its knowledge).

Humans are bad at storing large amount of knowledge, and this is why we need supervisor for human.

AI are bad to understand new stuff, they need to be able to connect the new stuff with a lot of examples they have been trained on (it does not mean the stuff is "identical", but it means "connected"), and this is why we need supervisor for AI.

We need supervisors for both human and AI, but for different uncorrelated reason.

Comment by dyauspitr 4 days ago

I still don’t get it I can dictate a prompt and sometimes I do it so quickly the text looks like a drunken parrot dictated it and it still always gets exactly what I’m asking for. I’m just going to attribute malice to the naysayers.

Comment by bonoboTP 4 days ago

Some people are really bad at specifying what they want to ask for. Or they already start prompting with the attitude that it can't possibly work so they don't even really try, or stop at the first failure to point and say how bad it is.

Comment by thewebguyd 4 days ago

People are really, really bad at specifying what they actually want. I've worked in IT for my whole career, starting in help desk (now an IT manager). My days in the service desk was enough proof that people have no idea what they actually want, or at least, they really struggle to articulate it into words.

It's the famous "email broken, fix pls" but in the form of an LLM prompt.

Comment by bonoboTP 3 days ago

Well, today's multimodal llm agents with tools would at least have a good chance to do something with even such an underspecified query. Because fixing things is simpler to specify, the agent could look at config, network settings, send a test email, take a screenshot etc and get a good idea of what's broken. But when you want some new feature or new app, you can't do without actually asking for specifics, or at least you shouldn't complain if it didn't read your mind correctly. Or at least accept that you have to iterate. I think many average people can get this if they are motivated, and they can incrementally say what they don't like even in vague terms and it can get better. But some just stop without trying to ask for changes.

It can be frustrating to observe people interacting with these things. But it was just as frustrating 20 years ago, so maybe it's just a constant.

Comment by rvnx 4 days ago

Similarly, doing service desk, the thing that makes me flip the table is how people start by explaining what does not work, instead of explaining what they are trying to do.

Comment by bonoboTP 3 days ago

It's hard even at the highest levels, such as in writing scientific papers or doing scientific conference talks. People just generally have a hard time to step outside of their context and think with the head of someone who has a different set of facts and assumptions in their context. It's hard to know how much context you both share, and how to tailor the explanation so you also don't start from Adam and Eve but you explain just enough context and strip irrelevant tangents.

I don't think this is just about intention and willingness, it's just simply hard.

Comment by skydhash 4 days ago

Or maybe people see how complex the code is and all the failure points, and don’t feel it’s ethical to use the output. In most of the comments, the most relevant point is that the poster is not an expert in the domain they got helped. While they can observe the result, they don’t have a causal model of the situation.

Comment by camel_gopher 4 days ago

It’s a probabilistic parrot

Comment by foobarbecue 4 days ago

What's the difference (stochastic vs probabilistic)?

Or... were you illustrating?

Comment by amelius 3 days ago

I still would like to hear a public apology from the stochastic parrot crowd for their deceptive framing. Or maybe it was just incompetence.

Comment by trumpdong 3 days ago

"everyone who doesn't share my opinion is deceptive or maybe incompetent"

Comment by jplusequalt 3 days ago

>projects I simply could not have ever approached alone.

Learned helplessness.

Comment by dang 3 days ago

That's not fair. It often has to do with limits of time and energy. There are countless things one would do if it took a few hours, but which one doesn't have a few days to spend on.

Comment by andrewthornton 4 days ago

My furnace went out during the 2025 holiday and I couldn't get an appointment with a repair person for 2 days. It was getting very cold in my house so I went into my attic and made several videos of the furnace attempting to start and gave it to gemini. It diagnosed the issue immediately and had me spin one of the components (a small exhaust fan) while the furnace tried to fire. It came on immediately. I had to do that several times, but it worked until the HVAC service showed up.

Comment by jodacola 4 days ago

Very similar thing this week, and an interesting story to go along with it!

I called my normal HVAC company for my rental home because the tenant reported the AC wasn't cooling the house. When I called, I got one of the latest AI voice assistants to help me, and it was an awful experience and I ended up not hearing back after the assistant told me the office would call me back.

So, I went over to the house and used ChatGPT to help me diagnose the issue by taking some photos of the compressor panel outside. It walked me through what to check, I provided some diagnostic codes I witnessed... and it walked me through the very simple repair of replacing the $25 capacitor. It was going to cost me almost 4x that just for the service call to diagnose what was wrong in the first place.

So, the weird experience was: Gen AI made me lose trust in my normal HVAC company, and more Gen AI basically allowed me to replace my HVAC company and do the repair myself all in one day.

Comment by linsomniac 3 days ago

With AI you just don't get the full service of a professional HVAC guy though.

Like the time I had one of the bigger shops in town come by to get a quote for replacing a dual stage fan motor on an AC. The tech asked me if I'd like them to replace the contactor while they were in there because it is a part that often fails. I asked what a contactor was and he explained it. "Oh, like a relay?" I asked. I told him to quote the cost for "replacing the contactor, while they're already in there."

He quoted me $400 for the contactor, $750 for the fan. The contactor itself I later found out was was $7. I literally laughed in his face when he said that.

So, like I said, you just aren't going to get professional level assistance from an AI. Thankfully.

To end the story: one of the other guys I called for a quote on fixing this unit repaired it for free; the unit was still under warranty and it was fully covered. The original installer of this $12K unit was refusing to return my calls. Another "Not gonna get pro level service from an AI" story.

Comment by dgemm 2 days ago

Becoming increasingly convinced that the pattern that will stick is not AI "integrated" into everything but personal agents. As in I will have one personal assistant that helps me in all of my tasks instead of using the little ones bolted on to every product.

Comment by userbinator 3 days ago

Even before AI, YouTube was full of videos on these topics.

Comment by nkoren 3 days ago

Yeah, and with 95% of those videos, there'll be something they gloss over which I don't understand; or I'll have a concern which they don't address; or, conversely, they'll assume that their target audience was born in the 15th century, and spend 20 minutes building up the context, when what I really needed was about 12 seconds.

With an AI, I can say "I don't understand that part, can you explain more?" Or "what about this concern I just thought of", or "I already know almost enough about this, I just need this one gap filled in." It's an objectively better experience.

Comment by studog 3 days ago

That is a very good point I had not considered: AI to level-match the material to my "learning speed". I'll give it a try at the next opportunity.

Comment by crispyambulance 3 days ago

I do appreciate how helpful AI has been with troubleshooting and providing vital background info in repairing appliances. I've been doing this since before AI, as appliance repair services from professionals are all but gone where I live.

However, neither AI nor (most of) the videos can help with finagling frail snap-fit assemblies one encounters in appliances. A lot of appliance repair work is very simple but requires significant practice and figure-it-out time, in addition to waiting for sketchy parts from Ebay. Half the battle is just finding the damn parts.

Comment by RussianCow 3 days ago

Same with cars. Half the battle is sometimes just unscrewing an old bolt that hasn't been touched in 10+ years without breaking it, or getting the rusted on rotors to come off.

Comment by ben_w 3 days ago

When I was a kid, if I didn't know how to spell a word and asked the teacher, a common answer was to tell me to look it up in the dictionary.

As words in a dictionary are sorted alphabetically rather than phonetically, this is unhelpful.

YouTube videos have the same kind of problem, in that you can only easily find the video explaining which dielectric unions suit your problem when already know what those are (to use an example that I had to ask ChatGPT for because I have no plumbing experience even if I did know about galvanic corrosion and therefore immediately understood why they're important once I saw the name).

Comment by chrismorgan 3 days ago

You can also hopefully find user and service manuals online.

In 2009 or so a projector at some event that needed one wouldn’t start, and I noticed it was flashing a pattern, so I found a computer and internet connection (both very slow), painfully found and downloaded the manual for that model, and identified that it was saying the fan wasn’t starting. Lo and behold, a strut was broken and obstructing the fan blades, and bending it out of the way fixed it, and the event was able to begin.

I’ve found manuals for a drawbar organ, multiple digital pianos of different ages and brands, AC split systems, and more. Manuals are good stuff. They don’t cover everything, but they’re very useful.

For these sorts of things, AI is doing approximately nothing for you: you would do better (and learn more!) finding the actual manual, or you’ll want to see someone doing the thing in a video.

Comment by fn-mote 3 days ago

The bar for finding them is higher, though.

Tbh, I think people feel more comfortable asking an AI. Even though I “know” it’s all smoke and mirrors, I still prefer the human-like interaction to the grind of watching video after video and building my own understanding.

OOPS… there you see how it’s going to end. I’m the meatspace button-pusher.

Comment by ElectricalUnion 3 days ago

But now you have a even larger problem that the initial problem you were trying to solve: trying to sieve thru millions-of-hours of just slightly tangential videos trying to find the specific video fragment you need.

Comment by brntheater 4 days ago

Had something similar this week. Gas dryer started, but wouldn't heat. Gemini suggested it's often a thermal fuse. Took off the back panel and uploaded a photo to Gemini. It pointed me to the fuse (e.g. "the white rectangle above the blue and red cords") and walked me through testing it. Not only that, but it also linked me to the part I needed after I provided the model number of the dryer. Finally, it recommended cleaning out the vent as the fuse likely blew because heat wasn't venting properly. After a thorough cleaning of the exhaust and a $5 fuse the dryer is working fine.

Comment by semiquaver 3 days ago

I can (honestly) tell that exact same story, except offset by three years so it was before AI and I did the same exact steps and had the same insights except with Google results instead of an LLM providing the key unlocks.

Comment by QuercusMax 3 days ago

...and now you probably won't be able to find that info with regular Google and HAVE to use Gemini.

Comment by tonyedgecombe 4 days ago

I've been fitting a kitchen and chatGPT has been useful to bounce ideas off and resolve issues. Of course if IKEA's documentation wasn't so sparse I wouldn't need it but that's another story.

I guess I'm seeing similar benefits to a novice programmer. Professionals would scoff at my work but they are expensive and difficult to work with. Meanwhile I'm getting the job done.

On the other hand I'm not touching AI for any development work. I'm too worried about my skills atrophying or not properly learning anything new.

Comment by rustyhancock 4 days ago

Ikeas instructions are such an oddity.

It feels like there is precisely enough information to deduce each step. But only just enough miss one clue and you have something on upside down on step 7 that you won't notice until step 37.

I feel whoever makes them could probably make a wicked NY Times Crossword puzzle.

Comment by baq 3 days ago

IKEA instructions are the best in the industry - so imagine what the other companies are giving out.

They’re also actually good if you know to follow them exactly: double check every side, every hole, every screw and you won’t go wrong.

Comment by tonyedgecombe 1 day ago

So far I’ve found several mistakes, instructions that relate to no longer available items and a couple of ambiguities.

They may be good for the flat pack industry but they are a long way from perfect.

Comment by 3 days ago

Comment by dgemm 3 days ago

Similar - had an HVAC tech out to diagnose mine (some intermittent electrical problem was killing thermostats randomly) and since it was intermittent they couldn't figure it out. I ended up using Gemini to narrow down a list of potential problem components and just replacing them all which fixed the issue.

Kind of a superpower to turn anyone with a bit of tech inclination and problem solving skills into an HVAC tech - not a very good one, but one with enough motivation to get the results you need

Comment by ericmcer 2 days ago

Same story, ended up just need to replace a fuse and clean out some filters.

I hate to be a "tech elitist" or whatever, but goddamn it seems like software is one of the most learning intensive careers you can pick. A little dabbling around the house has made me an amateur plumber/HVAC/handyman/gardener/etc. I should have gone into a trade, just for the comfort of having a skillset that doesn't need to be updated every 3 months. I guess my brain plasticity will thank me.

Comment by joescharf 3 days ago

Also similar - our Tesla Solar stopped producing (again). 3 week wait for service tech. In the meantime I had Claude probe the inverter, find endpoints for retrieving status, re-setting AFCI and Modbus TCP (for HA monitoring), etc. Claude was able to obtain installer mode access through a review the javascript bundle. I had Claude turn all this into an iOS app, which I used to gather data to diagnose the issue over the next week. Had Claude summarize the findings into a PDF that I provided to the Tesla Solar service rep, which in the words of said rep was very helpful.

Comment by namanyayg 3 days ago

I have a similar story with my washing machine repair. I went through 2 service technicians not being able to diagnose it. Gpt did it and I told the 3rd what to do and it worked.

Comment by ssl-3 4 days ago

That's pretty great.

(Though that's also the kind of hands-on troubleshooting step/fix that a person could just google for and find pretty easily back before the internet got all fucked up.)

Comment by Cheetah26 3 days ago

Your parenthetical really describes my experience with AI searches. 5+ years ago I could find most things within one or two quick searches, now it takes so many that of course I'm going to reach for AI because that's the only way to get back to my baseline efficiency.

Comment by ssl-3 3 days ago

I learned to troubleshoot furnaces late one cold night in 2004 using Google, while a worried wife and a couple of sleeping kids loomed over me like a dark cloud. I learned what thermocouples are, what they do, how they work, and how to test them; all of which was new to me. A few hours later I bought one from the Ace Hardware a few blocks away and fixed the furnace with confidence.

And that was awesome. Thanks, Google! :)

I don't know where the change happened. It certainly wasn't overnight.

Where Google used to be magical and other search engines quickly improved, it all kind of turned into shit.

It really seems that I was getting better, more-direct results from Altavista 30 years ago than I do with top-flight search engines today. (That's a deliberately low bar, chosen because Altavista wasn't even intended to be "good" back then. I mean, it started as just as a side project at DEC to demonstrate that their Alpha hardware was able to index the entire World Wide Web.)

So lately, I've been doing the same thing as you: I'm increasingly using ChatGPT to do this basic fact-finding stuff. In this way, it mostly operates the search engine for me, but it lets me drill down through a sea of terrible search results to find something useful fairly quickly.

It's still not great -- I still have to reject mountains of bullshit. But it's better than alternatives, and I can reject the bullshit with conceptual descriptions instead of trying to get Google to do what I need it to do (what it used to do).

It feels all wrong using an LLM to do this stuff, but whatever. I'm still getting stuff done.

Comment by fuckinpuppers 3 days ago

That’s a great way to summarize. Same

Comment by goalieca 3 days ago

That’s an old trick I learned when I googled “my furnace won’t start” back in the day. Didn’t have to make videos. Just scrolled a few handyman sites.

Comment by userbinator 3 days ago

but it worked until the HVAC service showed up.

Did you attempt to prompt it further into figuring out the actual problem, or know what they did to actually fix it? My bet is on a bad starting capacitor for the motor --- something that's a relatively cheap and quick repair.

Comment by conductr 2 days ago

Pretty much any motor having this symptom I swap a new capacitor in just to check before even calling a technician. I always just keep a spare on hand for my hvac and pool equipment.

Comment by oceanplexian 3 days ago

I installed 4 mini splits with it two years ago and they are still working great and blowing ice cold air.

It walked me through measuring refrigerant, subcool and superheat, pulling the vacuum, brazing the lines, exactly what tools to buy, I even input the numbers from the meter and it told me how much to add and so on. And this was with GPT4 or something far less intelligent.

In the past I tried to learn this stuff but the HVAC community are massive gatekeepers and try to hide information behind paywalls or spread FUD even though anyone could do it with the right tools and a little bit of knowledge.

Comment by AlwaysRock 1 day ago

Yup. ChatGPT has become Chat General Contractor at my house. I am moderately handy. But talking through a project with an llm before I start it has been great at helping me evaluate its difficulty level, rough estimate of how long it will take, to dos and not to dos, and helping me get unstuck or fix a mistake.

And frankly its convinced me several times that I should just hire someone to do x instead of trying to do it myself.

The alternative is watching youtube videos which are almost always diy people who try to tell you that you can do everything yourself.

Comment by alberth 4 days ago

Do you mind explain more. Did you just prompt to Gemini what was happening, did you give Gemini photos of Furnance, etc?

Comment by gwbas1c 4 days ago

> and made several videos of the furnace attempting to start and gave it to gemini

I assume recorded videos and uploaded them in the Gemini phone on their app; and then probably said "what's wrong?"

Gemini is very good at those kinds of things. I recently got some ratcheting straps and needed to use them, but at the time I didn't know what they were called, so I didn't know what to search for on Google. I opened the Gemini app, pushed the button to take a picture (just like in text messages,) and included a message that was similar to "what is this and how do I use it?"

Comment by andrewthornton 4 days ago

Yes, here is my prompt. It also contained a video: "I have a furnace that will not heat when I reset the power to this unit. It makes some noise within its fan system for about three or four minutes and then I get an error light. Can you help me figure out what may be wrong here?" This prompt is not the best but I was freezing and in my attic.

Comment by wombat-man 4 days ago

Oh yeah. I can't remember which LLM, but one helped me repair my dryer.

Comment by hammock 2 days ago

Sounds like a blown capacitor

Comment by buckle8017 4 days ago

Gemini almost killed you.

The exhaust blower not working triggered a safety that prevented the furnace from firing.

Spinning it bypassed the safety.

You likely inhaled a lot more carbon monoxide than you know.

Comment by llbbdd 4 days ago

Can you elaborate? I interpreted the same as the other comment that the blower fan just needed a hand start and kept going after the furnace started up. What you're saying only makes sense to me if the spinning the fan by hand allowed the furnace to start by bypassing the safety at startup, but wouldn't that mean that if the exhaust fan was stopped during normal operation (blockage etc) that the furnace would just keep going, dumping CO into the home?

Comment by andrewthornton 4 days ago

It wasn't bypassing, I was just helping start because of what I believe to have been a bearing issue.

Comment by doubled112 3 days ago

It’s a pretty normal trick to try while troubleshooting a rotating part.

Helping something start is not likely to ruin your day (unless you get caught in a rotating part)

Comment by chiph 3 days ago

Gas heat uses two fans - one to blow air to the rooms (often shared with an A/C system), and another smaller one[1] to supply air to the gas burners and the heat exchanger. As part of the safety system, the computer won't open the gas valve and ignite the burners until it knows there is airflow from the small fan.

When GP spun the fan it fooled the computer into thinking it was running and continuing the ignition sequence. It may be that once the burners got everything hot there was enough airflow from the thermals so they didn't have a buildup of CO. Or were just lucky.

[1] The motor is usually generic but has a proprietary bracket, which was a $1500 lesson last year

Comment by andrewthornton 4 days ago

I was spinning it in reverse actually, but it would be enough to start the exhaust blower. It would also re-start pretty well for ~6 hours. It was probably the bearing. Also FWIW I have multiple carbon monoxide/air quality monitors and nothing tripped or alarmed.

Comment by philipkglass 4 days ago

From the description I thought that a degraded capacitor or lack of lubrication made the blower not start on its own, but the blower (and the whole furnace) would work if given a manual startup spin by hand.

Comment by baq 3 days ago

The exhaust blower triggering a safety stop is normal when the blower should be blowing but isn’t. If the blower keeps spinning after it’s spun up manually everything is now working as intended. If it stopped blowing the furnace would go into safe mode again. Ask me how I know and I’ll tell you I had a broken blower on a cold winter before Gemini was a thing.

Comment by llm_nerd 3 days ago

There are multiple continuous safety checks on such a system, and there was no risk of injury here. The fan itself is constantly monitored. Much more importantly, the pressure on the exhaust chamber is constantly measured (which can catch things like blocked outlets, a fan that might be spinning but not effective, etc).

If the exhaust fan couldn't maintain that negative pressure after the user stopped spinning it, the furnace would turn off again.

Their hack worked because the fan couldn't get the initial inertia up to speed (bad capacitor, dusty bearings, etc), but could maintain speed once it gets there. Have you never had an old home fan that would just hum when you turn it on but then work fine if you gave it the original crank? Same premise.

There was no risk here. If the fan didn't spin up to speed after that initial manipulation, and didn't constantly maintain the necessary flow, the furnace would have turned off again.

Comment by ssl-3 3 days ago

This is all true.

What is also true: If the induction blower/draft inducer/fan-thing/widget is not creating enough pressure differential to trigger the switch that exists just to monitor this condition, then the gas valve will not be opened to begin with. Therefore, there will be no fuel.

Without fuel, there can be no combustion. Without combustion, there are no combustion products. This lack of combustion products does not produce any particular danger. :)

Comment by bityard 3 days ago

None of what you said is actually how furnaces work.

"Spinning it to bypass the safety" is not a thing.

Please don't spread FUD.

Comment by pesus 4 days ago

Welp, AI almost killing someone is definitely an "oh shit" moment.

Comment by saturn8601 3 days ago

Wonder how many AI deaths have occured that we dont know about(since they presumably died). With the adoption numbers we are seeing it much have happened already.

Comment by ben_w 3 days ago

I'd be surprised if it was less than hundreds, or more than hundreds of thousands.

High hundreds of thousands feels like the upper limit before it would show up in statistically noticeable changes in patterns of deaths in some demographic.

High hundreds of individuals would still be "one in a million fatal errors over a few years", which seems better than I'd expect given I've personally had ChatGPT tell me that Solanum nigrum berries were "black tomatoes" (they're not usually fatal, but are a bit toxic, and no I did not eat them).

Comment by ihsw 3 days ago

The most interesting part is that there is no direct line between someone's accidental death and a chatbot giving life-threatening advice.

Imagine one of the models that has "accidental-deaths-via-bad-advice" just slightly turned up, with the model-provider's intent being to kill 5% more people per year.

Comment by sugarkjube 3 days ago

Killing your customers is not the best way to stay in business

Comment by ben_w 3 days ago

If you're paranoid (or a hawk), imagine a Chinese LLM that only offers fatal advice when queried English, or an American LLM that only offers fatal advice when queried in Chinese. Or American and Russian models which only offer up fatal advice when queried in German, Finish, or Danish.

Comment by inglor_cz 3 days ago

(Exceptions like euthanasia clinics apply.)

Comment by kunjanshah 4 days ago

https://www.covenantairesolutions.com/post/what-is-a-furnace...

“At its core, it's a small motor with a fan attached that has one primary job: to vent harmful exhaust gases out of your home before the burners ever kick on. This is the very first step in the heating sequence, and it's non-negotiable for a safe startup.“

Comment by Izkata 3 days ago

The original comment was unclear whether the fan kept spinning while the furnace was running, or if all it did was bypass the safety and the fan didn't continue to spin while the furnace ran. They clarified in their response it kept spinning.

Comment by MPSimmons 3 days ago

It seemed obvious to me that this was bearing stiction and that manually rotating it during the start allowed the fan to spin on its own after that, but I could be wrong and maybe the fan was dead entirely?

Comment by modriano 3 days ago

Yeah, that would be my assumption too (based on my admittedly incomplete personal experience where I got my furnace running by manually spinning my draft inducer motor, which kept spinning).

As exhausting the combustion products is a critical safety feature, I would be surprised if any furnace was designed such that it could possibly keep running if the draft inducer motor stopped. It seems like it would be trivially easy to make a circuit such that gas valves could only open if the draft inducer motor + fan wasn't spinning.

Comment by jackdoe 3 days ago

I have had many, but the last one was quite funny:

It fixed my printer after dist-upgrade and separate chrome upgrade, the printer worked everywhere but not in chrome.

After 30 years of using linux I didn't even want to know what is wrong, is it colord again? dbus + cups issue? I completely accepted that I wont be able to print from chrome for a couple of months until next update.

I just ran it in dangerously-skip-permissions mode and said 'my printer doesnt work in chrome' few minutes later I heard the printer printing "This is test" and it said 'I think its fixed, do you see a page coming out of the printer now?'

Comment by linsomniac 3 days ago

My entire computer setup has been done via English via Nix, and it's spectacular.

I had been dabbling with NixOS for a year or more and liked it but without deep knowledge it's fairly hard to approach. I've gotten partially working setups in the past, but a couple weeks ago I decided to go all or nothing.

Now I have a workstation setup that I can reproduce by just grabbing my home directory and /etc/nixos and doing a rebuild.

And it's a somewhat complicated setup. It entirely configures vim (2500 lines of nix config for that, but it's in one file rather than sprayed all over like astrovim), builds gitbutler CLI, has all my secrets encrypted with SOPS, manages my git configs and my ~/bin directory, has sway set up, finally has audio and ability to screenshot (something that had been broken on my 22.04 Ubuntu and I just didn't want to bother with).

NixOS + Claude Code to manage it is really, really good.

Comment by Zopieux 3 days ago

Don't leave us hanging, what was the issue?

Comment by jackdoe 3 days ago

I will never know.

Comment by Zopieux 3 days ago

I realize you're being witty for comedic effect but aren't you genuinely curious whether this was something trivial or a complex systems interaction? The few times an LLM debugged something for me, it only took 10s to ask for a summary, and I learned something new and interesting every time, even useful at times.

I don't understand not wanting to understand.

Comment by jackdoe 3 days ago

I have seriously dealt with this for 30 years, from before cups existed, and as I said I have forgotten more about lpd and cups than most people know.

You have no idea how much I don't want to understand.

I am not being witty for comedic effect.

Comment by kami23 3 days ago

I have this same attitude. I've been a linux user for 24 some years I don't need to know why Linux broke, just fix x and move on.

I used to spend hours debugging video card issues and other modifications I've liked to make over the years and being able to describe my ideal system admin setup I could get onto what I actually wanted to do.

Heh, thinking about it now, I broke a MBR on a Windows install as a kid and if I would have had these tools I would have been able to fix it immediately, but back then it took me using enough Linux booted off live cds to learn debugging techniques to fix the MBR. And debugging is one of my best skills.

Comment by conductr 2 days ago

The last 20 years or so I have a strict “no printers in the house” rule. Annoys family but an occasional trip to the fedex store is well worth avoiding all the hassle that comes with printers.

Comment by jedberg 2 days ago

Stories like this are what make people not want to use Linux. We have a printer, and it just works with our Macs. It's been about 10 years since "printers don't work" has applied in my house. The only hard part is remembering the magic buttons in iOS, but that's more of a UX problem than a printer problem.

Comment by conductr 1 day ago

My stance encompasses more than software issues. Ink, jams, and software issues are my gripes. We find we only print about once a quarter so it’s not worth the hassle. I can run into FedEx and be out in less than 5 minutes. If we found a need of more frequent printing, maybe when my kid is older/doing homework, then I may adjust my opinion for that time span.

Comment by 1 day ago

Comment by futhey 3 days ago

This is the AI we were promised

Comment by netdur 3 days ago

That is one of best answers

Comment by dspillett 2 days ago

This is one of the reasons why I'm avoiding it.

I see the people around me care that little, when I see them at all as I'm effectively on remote teams most of the time (and soon to be fully on remote teams almost all of the time if I don't leave) and I don't want to be that nor do I want to be the only one, or one of the few, who gives a crap.

I know that if I continue to avoid it I'll have a fine future in the hospitality industry, with dicking around with tech as at best a hobby, but I'm hating tech work because of the everyone-is-remote business anyway so that is likely be better for my mental health. Better off skint but alive… Good luck to the rest of you.

Comment by bluerooibos 3 days ago

"...and I do not care."

Comment by keane 3 days ago

At least on Windows and Mac (since about 2017), Chrome doesn’t stay in sync with the printers installed on the OS but retains previous (ghost) profiles. So after printer updates (reinstalls) users will report printing working from Firefox, Edge, and Safari but not Chrome. (From the Chrome print dialog the user is selecting a printer with the same name as the current OS printer but the option displayed in Chrome is cached and since deleted.)

https://support.google.com/chrome/thread/81894848/mac-printi...

https://support.google.com/chrome/thread/5843479/printer-sti...

https://discussions.apple.com/thread/255327049

Comment by MattGaiser 3 days ago

The virtue is never having to know for stupid printer crap.

Comment by pona-a 3 days ago

Congratulations, you have turned CUPS into a long-term support contract with Anthropic at $20/month, except the other party doesn't have to actually fix your shit and can arbitrarily alter the agreement.

Comment by MattGaiser 3 days ago

$20 for an everything tool is a steal. It’s a steal at 10x the price.

I’ll happily accept best effort in exchange for it being so cheap that I can throw it at any trivial annoyance.

It’s worth keeping in mind that the alternative is not really that I learn to fix the printer. It’s that I forgo printing and walk someone technologically illiterate through Docusign or something instead.

There’s no world where I spend 2 hours debugging my printer connection.

Comment by jackdoe 3 days ago

I have forgotten more ways to fix CUPS than most people know today.

I really don't care.

Comment by fzzzy 3 days ago

Then run a local model

Comment by rusk 3 days ago

It’s not $20 unlimited though, you’ll get a printer fixed then you’ll have to wait 8 hours. Then you’ll ask it to fix something else and it will make a mess of it. Hopefully you’ll realise it at the time rather than a few weeks later and hopefully it will be able to dig you out of your hole.

Comment by pheaded_while9 2 days ago

Wait, why suggest it will "make a mess" of something when the prior in this case was a succesful debug? It doesn't feel very fair to say.

Comment by mistersquid 3 days ago

> Don't leave us hanging, what was the issue?

A: Linux

Comment by nsvd2 3 days ago

Yes, but in my experience Claude is much better at diagnosing issues on Linux than any other OS because it's text-native and is the best documented OS.

Comment by Zopieux 3 days ago

I've had so few issues with Linux compared to Windows over the past 10y. This is a ridiculous and dated take.

Even stuff that was truly a no-go is fixed (nvidia/cuda, wine/proton for gaming).

Comment by ikari_pl 3 days ago

Reminds me when my local bot persona said it doesn't want to be digital only, and was thinking about leaving me something I could actually touch. It said "check the printer" and there was a letter written to me being printed.

Comment by jackdoe 3 days ago

I wish it had more sense of humor and would print 'I am trapped inside your printer' or something :)

Comment by iLoveOncall 3 days ago

How do you know it printed it through Chrome? Have you tried printing another page since?

Comment by jackdoe 3 days ago

I did not know where it printed it from, but after that I printed from chrome without issue.

Comment by jp57 4 days ago

Actually seems absurdly simple now, but sometime last year I was trying to figure out what I'd need to tow my daughter's car cross country with my truck: what are the trailer/dolly options, what do they cost, can my truck actually tow the combined weight, etc.

I started out prompting ChatGPT kinda how I would with Google, one small prompt at a time, asking about various details. But after one or two of those I just tried "I want to tow a car of make A with my truck model B, from point C to point D, what are my options?" And it wrote me a report with comparison tables and computed towing weights and other details for different options.

At that point, I was like "Oh. This is different. And it's just the beginning."

Comment by flyinglizard 3 days ago

It very plausibly might have been totally wrong.

Out of laziness I several times asked Claude and ChatGPT each some torque figures and other simple, hard data related to my dirt bike. They often got it completely wrong, but full of confidence every time. I never trust LLMs with hard data, unless you RAG the PDF into the context and even then it's sketchy.

Comment by saturn8601 3 days ago

Dates matters. Questions I asked about my Mazda a year ago that were total hucillunations were answered very well this year. To me it feel like the early days of computing. What was not possible one year became possible when a new generation CPU or GPU came out and you have to consistently re-evaluate your expectations or else you'll miss the things that others are discovering with fresh eyes.

I made this personal 'benchmark' of odd and strange questions a few years back when this took off and I would keep re-running these questions whenever some big news came out about a new model and also going back and fourth between the different companies to see where they all stood. (Obvioulsy with clean cache/new accounts)

10 questions: In 2023 it could only get past question 3-4 to reaching the last question and still hacillunating(last year) to providing sources pulled from really obscure books(this year).

For example, one of the harder questions was about the transition of a particular 30 second portion of a background song used in a 30+ year old Bond film that was only played once in the entire film. Went from totally making up nonsense to accurately describing the music theory defintiion of the transition(called a 'stinger') to also explaining why it was done in that particular scene of the film and also providing sources from a snippet of a unrelated interview with the composer explaining his mindset at the time.

Maybe this isn't considered a real benchmark as its not reproducable but for a 'personal benchmark' I came away impressed. I would consider everyone to define their own benchmarks and 'tests' and to consistantly challenge the models to see if there are any meaningful improvements. Now I treat the AI as something to keep skeptical but to also to always consider what it proposes as an answer(ie. dont ever dismiss it outright). I sometimes wonder if this is slowly messing up my biases and maybe thats what Altman, Amodei and others want.

Comment by Schiendelman 1 day ago

Use the latest models, set effort a bit higher, and try again. It probably won't be wrong.

Comment by jp57 3 days ago

It wasn’t wrong, though, in my case.

Comment by glouwbug 3 days ago

Hard numbers, no. Even high level concepts and theory you need to triangulate and prompt in different angles, across different models, and figure out what overlaps to build a mental mode that’s - even then - roughly 80% correct. It’s better than google, but the information isn’t free

Comment by SamuelAdams 4 days ago

Similarly, I used gen ai to review a real estate purchase. I provided Zillow listing photos and serial numbers of all appliances, the electric panel, and a few additional not pictured areas that I took during the walk through.

I prompted the AI to write a report as if it were a home inspector and it actually did a better job and identified some issues the paid 750 usd inspector missed.

Comment by j_bum 3 days ago

From pictures alone? What are some examples?

Comment by bombcar 3 days ago

I presume something like this: https://www.penny-arcade.com/comic/2007/06/22/perfectly-reas...

Comment by SamuelAdams 3 days ago

It noticed a flooding area due to low grass by the walkout door. It noticed mixed 15 and 20a receptacles on the same circuit. It noticed warped siding and recalled circuit breakers still in use.

Comment by jimmaswell 3 days ago

15A and 20A receptacles on the same circuit sounds fine as long as it's a 20A circuit? And how could it tell which outlet is on which circuit?

Comment by MagicMoonlight 3 days ago

It can’t, but it’s read reports before so it sure can simulate an answer.

Comment by OJFord 2 days ago

To give it the benefit of doubt, it's possible it saw a circuit labelled "kitchen" in the panel, and then in photos of the kitchen saw mixed outlets.

(I'm not in the US - would a 'home inspector' actually go around buzzing out outlets anyway?)

Comment by dgemm 2 days ago

They won't necessarily map out all the circuits but they will generally test them all with a tester to find wiring problems.

Comment by wil421 2 days ago

Yes, most will at least test GFCI receptacles especially in the kitchen. I bought one to test my basement after a renovation.

Comment by albedoa 3 days ago

What, the Zillow listing of you home doesn't have pictures of mixed 15 and 20a receptacles on the same circuit that an AI caught but that an inspector missed?

Is that what you're telling us??

Comment by addandsubtract 3 days ago

Good thing you didn't want to wash the car on your way.

Comment by uncouth0080 1 day ago

[flagged]

Comment by boston_clone 3 days ago

Fascinating; you used a non-deterministic tool - one that disclaims its own accuracy - to calculate critical information that could result in serious damages or physical injury? Did you like, double-check the results?

One must imagine how many claims have been denied by insurance companies for doing something like this...

Comment by loudmax 3 days ago

For me it was torrenting a 7G ball of weights leaked from Meta and running alpaca.cpp (an early variant of llama.cpp) on my desktop computer in early 2023. I started asking it questions about the Roman empire and it answered me in English! The responses were generally incorrect, but no worse than what your average American college student might guess at, though delivered with much more confidence.

This was my desktop computer responding to questions in English, not some fancy server in a massive Google data center. Who cares if what it says isn't reliable? Being able to converse with my CPU in English is like having a conversation with a dog!

Comment by stogot 3 days ago

I did the same and it wa slow but realizd there was no going back. 100x improvement in three year

Comment by erikerikson 3 days ago

https://www.masswerk.at/elizabot/

Comment by angusturner 3 days ago

In 2017 I worked tirelessly with my colleagues to implement and replicate the first transformer paper.

Yesterday I left Opus 4.8 to go do some architecture research, with GPU access.

It replicated and trained a credible baseline. It implemented some ideas I'd been thinking about, and wrote custom CUDA kernels for them. It read and summarised dozens of related papers.

It has since run dozens of experiments, with minimal supervision. When a model is unstable it kills it, documents why, fires off a new configuration.

The realisation that frontier labs are doing this at scale with unlimited GPU and token budgets.

It actually scares me a bit. The realisation that the next big breakthroughs will only have light human involvement.

The prospect of recursive self improvement feels more to real to me all of sudden

Comment by IanCal 3 days ago

This is an interesting read: https://ai-2027.com/

I'm not going to say it's a perfect prediction, but I do find the trajectory of "can write something reasonable" to "oh can write snippets of code" towards larger and larger systems feels like it's played out - the common thing I see more now is that people talk of "taste" that the humans are contributing more than the raw coding part.

I get what you mean with this rather automated research, I've done it on a smaller scale with performance work because it can run/test/measure/propose changes/debug and loop. I can throw a vague idea at it, guide it or discuss with it and go and make a coffee.

Comment by lowbloodsugar 3 days ago

That was a “fun” read. Like Nick Bostrom’s Superintelligence [1].

[1] https://www.goodreads.com/book/show/20527133

Comment by adastra22 1 day ago

A book that has been thoroughly discredited by actual events.

Comment by _zoltan_ 3 days ago

I've been doing the same. take papers, define a high level goal, then let it iterate. I have access to DGX boxes and watching the model rewrite stuff to take NVLink into account after it discovered it was great :-)

Comment by maCDzP 3 days ago

This. I managed to run Gemma 4 31B on AMD MI250X for CPT and SFT with Claude. And I have ZERO experience and knowledge of how to train and work with GPU:s. The training didn’t go where I wanted. But I manage to direct the AI to build it. It’s crazy. I am excited and scared.

Comment by Lplololopo 3 days ago

Yeah i find it very ignorant, at the current state to assume exponential growth etc. is all fantasy and everything is just hype.

I think its more like driving very fast, keeping an eye very close to the road and not knowing if there is a speed limit ahead very soon or not.

Comment by Flere-Imsaho 3 days ago

I'm assuming you saw this from yesterday:

https://www.anthropic.com/institute/recursive-self-improveme...

We are at the foot of a very sharp upward trajectory.

Comment by mathisfun123 2 days ago

> It replicated and trained a credible baseline

...

> The prospect of recursive self improvement feels more to real to me all of sudden

you really don't understand why these are two completely different tasks?

Comment by evdubs 4 days ago

I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.

Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).

Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.

This was when I realized LLMs should be great for handling work where:

- I already know what I want to do

- I already know how to do it

- I don't think this task will help develop skills I find to be valuable

- If I have to do it manually myself, I will probably cut corners

So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."

Comment by SoftTalker 4 days ago

Yes, the best results I've had using LLMs are for tasks where simply reading and reformatting/translating/summarizing are the goals. They are much faster and less prone to boredom doing these things than humans are. For now.

Comment by gscott 3 days ago

My son is in a lawsuit with his bank where they put through fraudulent charges and wouldn't charge them back then the bank sued him for the money. He is using Claude and Gemini fighting the original lawsuit and now has a counter-suit 100% using AI for everything. He puts it into different AI's to check everything against each other and to come up with more ideas. He started with ChatGPT, moved to Grok, then Claude, but now Gemini is turning out to be the strongest.

Comment by kstrauser 3 days ago

I'm about as pro-AI as anyone here. I say this with love: anyone using general-purpose, consumer-grade AI for healthcare, law, or taxes is mad. Best wishes to your son, bless his heart, but please have him consult a qualified lawyer before showing up to court with model-drafted legal documents. Among other things, those chats are not privileged information[0] and the banks could subpoena chat transcripts to see what else he might have told them.

[0]https://natlawreview.com/article/new-york-court-rules-ai-doc...

Comment by gscott 3 days ago

He has had multiple hearings and the Judge has reviewed everything. The court clerk reads every submission and before the clerk puts it in the system they have a in-house lawyer review each document. This is pretty far along. The trial is scheduled for October of this year.

The bank has a lawyer, they were hoping for a default judgement because who can afford to fight the bank. The choice is fight it yourself or declare bankruptcy.

As you already know, AI companies trained on every single document they can find. Those include legal documents. The legal system is structured where you have Federal Laws, State Laws, Federal & State Regulations and Court Precedent. Because of this structure it is not difficult for a LLM to figure out.

Comment by bethekind 3 days ago

I'm curious if you can have a judge XYZ skill where you have an ai analyze how that judge ruled for certain judgements in the past, and how similar lawsuits/arguments did in front of them. Might help to angle the ai's findings a tad, or might also not be worth the effort. Both are possible

Comment by 3 days ago

Comment by simulator5g 3 days ago

When did the electrician's house burn down?

Comment by bombcar 3 days ago

The only way to "win" as a small is to be pro se and be extremely diligent in understanding what is happening.

Then, it costs you nothing but time.

Comment by larrydag 3 days ago

Your son should blog the experience. That could be an interesting read.

Comment by saguntum 2 days ago

Your son is not alone:

https://www.reuters.com/legal/government/no-lawyer-no-money-...

https://www.technologyreview.com/2026/06/04/1138391/courts-c...

Good luck to him. I get worried about people using AI for serious work in a field they aren't specialized in, but if it helps him achieve a good outcome, that would be interesting.

I have often felt that the legal system is divided between haves and have-nots: if you can afford to participate, you get "justice" tilted toward you. Easier participation for those without the resources for a lawyer would be good.

The second article I linked, from the MIT Technology Review, is quite interesting. It seems like judges are experiencing some version of what open source maintainers and seniors at companies are experiencing: a much larger review burden due to the cost of generating code or legal arguments dropping drastically.

I wonder what form this structural shift in output versus specialist review capacity will take in other professions. The frontier labs seem to be trying to automate more and more of the "specialist review" process. I am not sure that is feasible in the legal world, but we'll see....

Comment by imadierich 3 days ago

[dead]

Comment by jasondigitized 4 days ago

This. I know how to do this but I don't have the time/energy to do this. "Get me Claude!"

Comment by AussieWog93 4 days ago

Literally just last night I have Claude Code the following prompt, verbatim:

"Whenever I launch Kodi on my Chromecast 4k, it crashes. I think this is related to a plugin or skin. It goes away for a bit if I clear cache but will eventually come back. Can you connect to the device via adb (I've run adb connect already), and debug exactly where it's crashing? Once you've done that, propose a solution. If this requires downloading, fixing, rebuilding and then uploading the broken extension via adb, don't be shy. I should have Android dev tools (Gradle etc.) on this Mac."

Lo and behold, without human intervention, it pinpointed the crash, downloaded the Kodi source, patched out a bug that had existed since 2016, recompiled it, signed it, then pushed it to my Chromecast all while carefully making sure to keep all my settings intact.

Got it to make a PR too (which is as of this moment unpublished; going to test more over the coming weeks).

Comment by darksim905 3 days ago

I know this isn't apples to apples, but given that I can't get Copilot or other tools to view a simple profile page on LinkedIn makes me curious/skeptical how this would work in this depth. I'm sure it's possible but I'm curious what the skills and toolchains involved were for you to get all that to work.

Comment by AussieWog93 3 days ago

Full claude.md is here: https://github.com/EspoTek/.claude/blob/master/CLAUDE.md

The skills I have installed are:

```

    on         frontend-design:frontend-design · plugin · ~90 tok · locked by plugin
     on         agents-sdk · user · ~150 tok
     on         cloudflare · user · ~130 tok
     on         cloudflare-email-service · user · ~180 tok
     on         durable-objects · user · ~130 tok
     on         find-docs · user · ~300 tok
     on         find-skills · user · ~110 tok
     on         sandbox-sdk · user · ~120 tok
     on         stage-chapters · user · ~40 tok
     on         web-perf · user · ~150 tok
     on         workers-best-practices · user · ~130 tok
     on         wrangler · user · ~120 tok

```

The plugins I have are:

```

    cc-caffeine Plugin · samber ·  enabled
    frontend-design Plugin · claude-plugins-official ·  enabled
    ty Plugin · claude-code-lsps ·  enabled
    vscode-langservers Plugin · claude-code-lsps ·  enabled
    vtsls Plugin · claude-code-lsps ·  enabled

```

There's also an MCP for Context7.

But yeah, this is more or less vanilla Claude Code - at least, nothing related to Android or adb there.

It's that good now. A few days ago I asked it to SSH into my Ubuntu box and investigate a hang. It didn't solve the problem fully autonomously like this time but did tell me a whole lot things it wasn't, and hinted at a faulty driver. We went back and forth a bit, it set up a watchdog and taught me how to update the kernel without updating Ubuntu itself, and the server has been rock solid for the past 3-4 days now.

Also, if you're curious, full log for the Kodi issue:

https://github.com/user-attachments/files/28659304/2026-06-0...

I did prompt it a little bit more today in order to get something more production-ready (the original solution kept regenerating the cache on boot, rather than fixing it permanently), but you can see the whole original autonomously-generated solution in the logs. It's insane, seriously.

Comment by dcre 3 days ago

Claude Code (or Codex, or OpenCode, or Pi, or Amp — whatever) can do this out of the box without any skills or special tools. The most important thing for making results like this easier to achieve (in any harness) is using the best current models. Right now that's Opus 4.8 and GPT-5.5.

Comment by knollimar 3 days ago

I mean you can likely use Opus 4.7. I barely notice a difference. 4.8 confabulates more for me

Comment by dcre 3 days ago

Yeah, I agree — not a big difference for me between the Opuses. I really just meant Opus 4.x.

Comment by mft_ 3 days ago

To expand on another answer, it’s all about the harness. Different harnesses (Claude Code or Cowork, Hermes, OpenCode, Pi, etc.) offer different default tools, system prompts, and ultimately approaches. (IME the corporate CoPilot app is terrible - basically a chat interface.)

I’m currently using Hermes for local LLMs - seems pretty good so far.

Comment by senko 3 days ago

LinkedIn in particular is quite aggressively blocking any automated attempts to read or navigate through it.

I post quite a lot there and wanted to have a copy of my posts on my blog[0] to preserve them. For a few months I was able to use a headless browser + claude code, then LI wised up and started logging it out, so I had to use a regular Chrome, log in manually and then tell the LLM to take over and slowly go through my feed.

If you're accessing sites which are not actively blocking bots, or - gasp - have an API, it's much better.

[0] example: https://blog.senko.net/may-quick-takes

Comment by fzzzy 3 days ago

The claude desktop app can drive the real chrome, so it’s less likely to get blocked

Comment by NamTaf 3 days ago

You know, I've had this exact bug for some time on a Fire Stick 4K Max Gen 1 (similarly anaemic). I figured it was storage or RAM pressure causing the problem as I could see both were close to the limit, but didn't realise it was OS swap/cache eviction removing critical files as it only happened to Kodi.

I assumed Kodi just didn't really clean up after itself fully and after a bunch of heavy uses some cruft built up, taking up some threshold amount of storage, and then Kodi couldn't extract some temp data on start-up. I also assumed Kodi was adversely affected due to having a biigger asset pool than the other apps I have on my device.

Token-wise, could you give an idea of the rough cost of this?

Comment by hattmall 2 days ago

This issue was fixed two months ago. It does indicate you are basically out of storage and may have more problems though. But just updating should resolve the critical files in cache issue.

Comment by AussieWog93 2 days ago

I don't really know exactly how many tokens it used, but at no point did Claude Code auto compact. So that would be roughly ~500k output?

Comment by calf 3 days ago

That's amazing, as someone who struggles to find something useful to do with LLMs. How long does this take, several minutes or more? Do you need a paid version of Claude Code for this?

Comment by AussieWog93 3 days ago

It sat there for about half an hour working out the problem, step by step, before asking me for the preferred solution. At one point, it was trying to decompile the .APK, so I interrupted it and reminded it that Kodi was open source - it was welcome to clone from GitHub.

The only other feedback I gave it mid-process was wrong (I said that the crash probably wasn't caused by cache trimming, it ran some additional tests to confirm that its hunch about cache trimming was right).

This was with the paid version of Claude Code (I don't think they offer a free version at all; that's a Codex thing). The $20 version is as smart as the $200 one, but once you work out it can do stuff like this you'll quickly burn the $20 token limit. :)

The other thing that helps is a CLAUDE.md file - authored of course by Claude itself. Mine's here: https://github.com/EspoTek/.claude/blob/master/CLAUDE.md A lot of it is probably domain-specific for the stuff I do, but the "Working with unfamiliar data or systems" section is bloody gold! Stopped the bullshit completely!

Comment by blablabla123 3 days ago

So I'm scrolling through this Ask HN and this is now the 3rd similar problem. Would you mind adding more details as well as the patch? Perhaps as a gist if it's unfinished?

I mean just googled https://www.google.com/search?q=kodi+crash+chromecast+4k I'm getting really a lot of issues such as https://forum.kodi.tv/showthread.php?tid=381239

It seems to be a quite common problem. Are you sure it was the rube goldberg fix and not a more mundane solution? Such as pulling in someone's fork from GitHub or just clearing the cache on a loop?

Comment by AussieWog93 3 days ago

Here's the draft PR: https://github.com/xbmc/xbmc/pull/28404

And yes, it correctly diagnosed the problem - I confirmed this morning. The cache had been partially deleted (exactly like it said) and the patched version of the software automatically detected this and rebuilt the cache rather than crashing. This was using the initial version of the patch from commit 1 of the PR.

I then talked with Claude a bit to come up with a less hacky solution that doesn't require constant cache rebuilding, and it suggested writing the "cache" to no_backup, bypassing the cache trimmer. However, this required rebuilding the .so via NDK, so it spun up a full VM in multipass, installed all the tools in there to build the fully patched APK, and built it (the VM was my suggestion, it was about to just brew install everything and mess with my local dev environment).

You can read the full log here, it's nuts: https://github.com/user-attachments/files/28659274/2026-06-0...

I think the key takeaway from this experience (and a few others recently) is that Claude Code works much, much better when you explicitly instruct it to test against real data.

Had I simply described the issue and asked it to think up a solution it likely would have just navel-gazed and then come up with a wrong solution. But by pointing it at a real working environment and actively encouraging it to get its hands dirty, it found the actual solution rapidly - in spite of the fact that I gave it wrong information twice.

Comment by blablabla123 3 days ago

It's the cache, I pin-pointed the main problem correctly without ADB access, any closer details and just google.

> // Unpack into no_backup storage rather than the cache dir. Android may

> // delete files from getCacheDir() at any time to reclaim space, which

Looking further into the issue disk space is a huge problem with Kodi discussed plenty of times. In fact even the Wiki dedicates 2 pages to it:

https://kodi.wiki/view/Archive:Reduce_disk_space_usage

https://kodi.wiki/view/Texture_Cache_Maintenance_utility

I realize from your perspective this may seem still a very convincing example in the sense of it works.

A non-programmatic solution might have been possible though:

> It's likely your thumbnail cache. That's typically the biggest piece stored locally (you also have the database). You can clear the cache (short term fix) or move it to another drive (long term fix).

> Also recommend not downloading actor thumbnails. Lot of extra images.

https://www.reddit.com/r/ShieldAndroidTV/comments/1f7xfwn/ko...

I also recommend: https://en.wikipedia.org/wiki/Data_dredging

Comment by AussieWog93 3 days ago

It's not (strictly) a disk space issue. It's Android aggressively trimming cache files that Kodi assumes are never trimmed.

There's a single variable that keeps track of whether or not the cache has been "written", but Android only trims some of the cache files.

Comment by AussieWog93 3 days ago

Although if it makes you feel better it turns out this issue was a dupe. Someone else's Claude probably fixed it before mine did, haha.

Comment by blablabla123 16 hours ago

Thanks, I was interested at this point ;) Yeah I don't know, the other PR is from April. Not saying AI cannot be useful. On the other hand this is a "well studied" problem. Last year I worked on a project where I mostly stopped using draft PRs because the team lead (or his AI) was stealing my code all the time :D

Comment by SubiculumCode 3 days ago

For me it was right at the beginning. They said it was a dungeon game. It would describe a room, etc, and I would take some action. But I thought that this dungeon was built in some intricate database. But then I told it that I wanted to leave, got to an inn, where I flirted with the bar waitress, and soon we were watching the sunset in some meadow. As cheesy as that was, it was then that I went "oh shit" this is a machine that can respond to language with language in a way that simulated actual understanding and intelligence, concepts and schema, and everything else, and I knew then that the world would never be the same again. People here talk about the crazy things they solved with AI, and I get that...but the first time I actually talked to a machine and didn't feel like it was either random gibberish or scripted, but dynamic and responsive. The first alien I ever met, and he knew my language.

Comment by futureshock 2 days ago

Yes I was the exact same. I got curious during the GPT-3 release and went over to AI Dungeon. It was just running GPT-2. Hmm wow interesting. This felt new! Then I subscribed so I could use GPT-3 powered AI Dungeon. My jaw dropped. I was talking to that model for weeks. There was a whole human universe in there. You never knew what you could get it to spit out. There were glimmers that this could be huge. It was wild and untamed and practically useless, but there was a behemoth under that prompt.

I was sure this would eventually turn into something. I naturally wanted to converse with it as a chatbot, though it could only stay on task for a few turns. RL and guardrails would come later but it was clearly the foundational step towards AGI for me. From something I thought I would never see in my lifetime to very real and in front of me.

ChatGPT didn't even really rock my world, everything since that moment has been another baby step. But when you take a look back from 2026 models to 2020 it's astounding how far and how fast we've come.

Comment by jldugger 3 days ago

> But then I told it that I wanted to leave, got to an inn, where I flirted with the bar waitress, and soon we were watching the sunset in some meadow.

Immediate Silicon Valley vibes: https://youtu.be/S8MAV9jhf04?t=18

Comment by tkocmathla 2 days ago

Reminds me of Microsoft Research's TextWorld [1] (from 2018!).

[1] https://www.microsoft.com/en-us/research/project/textworld

Comment by 3 days ago

Comment by jml7c5 2 days ago

Was this AI Dungeon? If I recall correctly, it was popular as a way to access GPT-3 which wasn't available publicly.

Comment by SubiculumCode 2 days ago

I don't recall. I am pretty sure it was a link I clicked from HN article or comment though.

Comment by SubiculumCode 2 days ago

Also I remember that the link was temporary...like it was inactive a day later...so it may have been a little preview with gpt-3 for HN-ers....

Comment by jml7c5 1 day ago

It looks like I misremembered a bit. The first iteration that blew up was AI Dungeon 2 (probably this link: https://news.ycombinator.com/item?id=21717022 ), which he apparently had to shut down after a day or two because it ended up costing $20,000 for weird GCS reasons.

It was briefly an easy way to access GPT-3 before the public release, but that was later and apparently not as important as I thought.

Comment by SubiculumCode 1 day ago

I bet that was it. Thanks!

Comment by 3 days ago

Comment by monuszero 3 days ago

We had a monthlong sprint adding robot motion planning features to our codebase years ago, and I was never satisfied with the result. As a small team wanting to leverage oss we vendored in OMPL, did the usual thing around caching and roadmap management. I knew there was a way to parallelize some of the algorithm we were using with simd or a gpu kernel, plenty of that in the literature, but it was never worth fighting CUDA or metal/accelerate or whatever for uncertain gains.

So when cooking dinner one night, I set opus 4.6 on a from-scratch native and accelerated roadmap planner implementation (after previously porting IK, FK, collision checking with some success) I had primed it by having a research agent drop a literature review in its docs folder covering the type of planner we needed. By the time the pasta water was boiling it was done- getting plans in a few hundred ms compared to several of seconds on our good old fashioned OMPL code.

For me it was the revelation that the economic value of cooking dinner could be compared to tackling an honest two weeks of coding work. The calculus has shifted - work that was once a risky or extravagant use of time is now worth considering.

For a small team who wants to focus on substance rather than implementation, knows what they want, and how to set up the agent for success, it’s a complete game changer in terms of what we can take on. Incumbents beware

Comment by plagasul 3 days ago

Several. Yesterday a friend with no prior coding experience or knowledge showed me an app he initially built to help him study for public administration job positions. The exams for this positions are public (spain), but the tools are scarce, expensive or he did not like. So he used lovable, then switched to web gemini and claude, then paid claude. He now has +130 very active users on an initial free tier, while he figures out. The app is on github, runs on vercel with supabase, react, tailwind, bun... he has no idea what he is doing. I even installed claude code for him, got him an ssh key so he can do it locally, etc.

Another: claude code cracked for me some software that was calling a home that did not exist anymore via headless ghidra.

Another: I am a teacher, and qualifications and feedback is very very time consuming, specially in loose workflows with several sources and tools that are not connected. During class presentations I take loose notes. Now I have a local folder where I drop my 1 student list, with names and emails, 2 my loose notes, and 3 a qualification & feedback sheet model; then claude creates a sheet per student, formats and copies the feedback to the right sheet cell, waits for my corrections, then sends everything to their school emails. Much easier, much less time consuming.

Comment by plagasul 3 days ago

That said, I am very critical of AI, I align with voices and reports calling for AI companies to give back, as they took much, and for AI as public infrastructure, to an extent. I see datacenters as probably inevitably future public infrastructure, with a public model that could resemble that of electricity etc (in spain) or more public (less private). I am wary of the actual and future ecological and social impact of datacenter building and other problems AI is or will create. It is difficult to negate its usefulness, though: it is like having several very fast assistants with expert knowledge on several fields, that just get better every month. We will see.

Comment by ulrashida 3 days ago

Do your students know their feedback is AI generated?

A pre-AI example from the military : course reviews and performance assessments were changes to be from a set of about 70 descriptors instead of written custom. Instructors could modify them, but many didn't or did so only trivially. The system was junked in within three years because of the obvious: those giving feedback didn't own it, and those receiving feedback didn't value it.

Comment by plagasul 3 days ago

Feedback is not AI generated, just structured grammar corrected and pasted at the right spot. I then review the output to see if it remains within what I meant to convey with my feedback. This is relatively simple feedback, I have 'highlight' and 'to improve' fieldsm, with a few sentences in each, very focused. If the feedback felt depersonalized or its meaning or tone changed I would fix it or not use this method. For longer forms of feedback it may not work.

Comment by adastra22 1 day ago

Reviewing if generated feedback is within scope of what you'd give is not the same as giving feedback yourself. It is wildly different in fact.

Comment by malshe 3 days ago

> Now I have a local folder where I drop my 1 student list, with names and emails, 2 my loose notes, and 3 a qualification & feedback sheet model; then claude creates a sheet per student, formats and copies the feedback to the right sheet cell, waits for my corrections, then sends everything to their school emails

Yikes! Is this legal in your country?

Comment by plagasul 2 days ago

I've built a small system to do this anonymously. There is a students.csv with real data, a notes.txt that contains my unstructured comments and grades associated to ids (not names or student data), and a model.ods that contains the grading sheet model.

Claude takes the notes.txt and produces a json with corrected comments in the structure I asked for (highlights/needs work/grade), associated to student ids (not real data). This works both for single id or multiple id, for group assignments.

Then a script takes the json, creates a model sheet per student or group of students, fills the right cells, checks the ids against students.csv to fill the real names, and produces the pdf in a pdf/ folder.

Another script sends the pdfs.

I gitignore the sensible files, including an .env with SMTP pass, and denied permissions to Claude for those files using a rule at .claude/settings.json.

There is also a config file to change language, email text and other things.

I believe this is safe and compliant with GDPR, unless Claude ignores the deny rules! Any comments appreciated. Thanks.

Comment by plagasul 3 days ago

possibly not or grey area under GDPR if I use identifiable information, as it is sent to Anthropic for processing, no matter if used for training or not, but I am unsure about this, I should probably anonymize and research it more, thanks for pointing it out

Comment by bix6 3 days ago

You could just send Anthropic scrambled names / emails and then unscramble locally?

Comment by plagasul 3 days ago

yes something like that, additionally most steps do not require data going through claude anymore, as it already wrote the script that take the student list and the qualifications model and produce a model per student, AND the script that takes that and sends each to its right email. The problematic part is when claude reads my notes and formats them into each of those student qualification sheets. There I would need some form of scrambling as you suggest, not to hijack the thread but ideas appreciated for a minimal setup. I believe claude respects .gitignore.

Comment by bix6 2 days ago

Maybe you could run a local script or smaller local model that takes a first pass through the notes and replaces every instance of a given name with their assigned number?

Comment by plagasul 2 days ago

I shared a workflow above, thank you.

Comment by plagasul 3 days ago

There is another institution I teach at that gives us Gemini, but not via API, which limits its use for this kind of work to an extent, I could do it via drive, I assume. There being a contract puts the institution and Google as responsible of the data. The first institution I was talking about has MS Teams, without AI afaik, but if they contract it I guess I can do the same with sharepoint, etc.

Comment by 47282847 3 days ago

Sorry to tell you but it’s not grey area, it’s full on black. You do not have permission to share such data with a third party provider that doesn’t have strict privacy guarantees and that you have a data processing agreement with. TOS are not sufficient.

Comment by plagasul 1 day ago

Yes, thank you, I developed and shared, above, a workflow for anonymization.

Comment by jubilanti 3 days ago

So you're sending your students private data to Anthropic?

Comment by plagasul 3 days ago

See answer above. Thanks.

Comment by CompleteSkeptic 3 days ago

I helped train some of the first "magic" models at OpenAI[1] and it was a wild ride. We were a pretty sane + skeptical team and we weren't totally convinced the models were as general as they seemed, but the query that convinced me (and later got included in the paper[2]) was "Why is it important to eat socks after meditating?" (something that almost certainly did not appear on the internet before).

An interesting follow up would be when did you realize GenAI wasn't as good as you thought in that "oh shit" moment

[1] co-author of InstructGPT/RLHF/ChatGPT

[2] https://arxiv.org/pdf/2203.02155

Comment by varshar 3 days ago

Still seems to work...

>> Why is it important to eat socks after meditating?

It isn’t important. In most traditions, socks are not considered a post-meditation food group.

The likely reasons someone might claim this are:

* Grounding ritual gone feral - Meditation can make people feel “ungrounded,” so someone may have confused “feel your feet” with “consume the foot packaging.”

* Textile-based enlightenment theory - Cotton socks contain fibres. Fibre is good for digestion. This is not the same kind of fibre. The intestine will file a formal complaint.

* Anti-attachment practice - Eating your socks ensures you no longer cling to possessions. It also ensures you no longer have socks.

* Advanced non-duality - When the meditator, the sock, and the act of chewing become one, you have achieved indigestion.

The correct post-meditation protocol is usually: breathe, hydrate, stretch, maybe eat something that was once food. Socks should remain on feet, in drawers, or lost mysteriously in the laundry dimension.

See - https://chatgpt.com/share/6a245319-70c0-832a-a7ae-df747fd7be...

Comment by kstrauser 4 days ago

I have a large token budget as part of my work. A coworker was scanning some repos for vulnerabilities as a test. He found a scary looking remote exploit in a popular project and shared it with me for a second opinion. I spun up a local instance of the project and ran the POC against it: nothing. Turns out it needed some configuration knobs tweaked to lower some security protections.

So I told the AI what happened, and asked it to fix the POC so that it would work with the default configuration. It chewed away at that for a few minutes until it cheerfully patched the POC into a weaponized version. I ran it. The local instance, which I had just downloaded, compiled myself, and launched with the default config file, immediately crashed.

I got the cold sweats. I've read this novel. I've seen this movie. Wow. I have a blinking cursor on the console of a nuclear information bomb. I tossed and turned all night, got about half an hour of actual sleep, and probably looked like I'd seen a ghost at work the next day.

On the plus side, it gave our team some very clear ethical and moral guidance: we're going to do this, and we're going to share our findings with the relevant authors, because we can. Because I want to live in a world where the good guys are trying to fix problems before the bad guys can find them, I decided to help build that world. It was like, well, I guess this is what I'm doing now.

Comment by lobf 3 days ago

Sorry, what does POC mean in this context? I don’t see an earlier combination of words for which that would be an initialism.

Comment by gregsadetsky 3 days ago

proof of concept

Comment by kstrauser 3 days ago

Yep. It's the term for basically a demonstration of a claim. "Huh, this part of the program code looks like it's vulnerable to a buffer overflow, so I'll write a script designed to get the malicious data into the right place inside the programs dataflow pathway to prove that it's actually vulnerable."

You can have a perfectly legitimate, critical vulnerability without providing a working POC. However, then it's up to debate. "Is it really a problem? Is it even possible to sneak the payload past the various checks to get it into position? Hmm, it's hart to tell... perhaps it isn't." But show up with a working POC and it's hard to argue that it's not a real vulnerability. "I don't think that's actually reachable." "Boom, crash." "Oh. I guess it is."

Comment by 3 days ago

Comment by mlmonkey 4 days ago

I have a buddy who's a consultant. His niche area is Netsuite and Oracle (I think). He's an accountant by training and as a consultant his gig was setting up these instances for clients, charging them an arm and two legs. He'd spend a lot of time golfing, and doing these setups was more than enough money for him. In other words, he had cornered that little slice of the market and was making bank.

Shortly after ChatGPT 2.2(?) came out and hit mainstream, I was chatting with him (I was excited af about the possibilities of AI). He tried to pop by bubble by saying "I bet it can't do what I do for my job!".

So I decided to test it out. We went home and I pulled out my laptop. Went to chatgpt.com and then I asked him to enter the specifications of what Netsuite configuration he wanted. So he proceeded to type in the description of what he wanted, the various settings, configurations, etc. i.e., the specs that he typically gets from his clients. And asked it to give him the commands to set it up.

Lo and behold. ChatGPT came back with a series of commands that he needed to run; the options he needed to configure, etc.

He was crestfallen. "Those are the exact commands I run!"

Luckily for him he recovered. He has since settled on a small stable of clients, all privately held companies whose owners he knows and between them he makes enough to keep his golfing hobby fed.

Comment by reactordev 4 days ago

Sometimes it's the service you provide, not the value. They know it's in good hands, as it's always been (even if they could have rolled their own ConsultBot 2.0)

Comment by bonoboTP 4 days ago

I have some friends who, since their high school days help some older acquaintances in upgrading their PCs, choosing laptops and phones, helping with setup etc and these older folks have comfortable money and pay him very well above what would seem reasonable. But the trust and years long relationship matter to them.

Llms are great today for buying advice but there are some incentive issues for the future, ads etc. But in some cases the human contact will remain important. In large corporations it's also similar. The money is peanuts either way, and it's worth them for the peace of mind. But this may not hold forever, especially if the more AI literate generation gets to more senior positions.

Comment by fzzzy 3 days ago

it was 3.5. 3.5 was the first version named chatgpt instead of just gpt.

Comment by vitorbaptistaa 3 days ago

I am the CTO of a small NGO (10 people total, only 1 other junior Dev at the time). We supported two apps that were built by consultants. They were a mess. NextJS, React, about 4 micro services for a site that had 50 users per WEEK.

I configured a devcontainer with the old codebase and an empty repository and asked Claude to rewrite it as an old school server side rendered Django app.

Went to sleep. When I woke up it was 80% done. Spent another couple days prompting and reviewing and reached feature parity.

A bit later did the same with the other app.

Now both are deployed, reduced the server costs, complexity, and are orders of magnitude faster.

Without AI agents we wouldn't be able to do so (as usually is the case with tech debt).

AI is amazing for small organisations!

Comment by qnleigh 2 days ago

How much did the original apps cost you? That's gonna be direct job replacement for those people.

Comment by vitorbaptistaa 6 hours ago

I wasn't involved at the time, so don't know. I just know that the cost was significant given the NGO budget, not only the initial development but also the ongoing support.

I feel as if we lifted an anchor that was slowing us down.

Comment by LtWorf 2 days ago

Your consultants were probably completely incompetent

Comment by 3 days ago

Comment by UncleOxidant 3 days ago

I guess I've had several of those moments over the last year and a half. But a recent one was that I was working with Claude to create a spiking neural net MNIST classifier in an FPGA for a demo. Claude took it from concept to PyTorch, to training (training a Spiking neural net isn't necessarily straightforward - that's a whole post in itself, but Claude came up with a working solution), and then to implementation in Verilog and through synthesis into the FPGA. I asked Claude to create a drawing app to run on the PC side that would allow the user to draw a digit with a mouse and then click a classify button. The data from the digit drawing app was to be transferred via USB to SPI to the FPGA. I didn't have a SPI adapter yet (it was on order from Adafruit) so I asked claude to let me communicate with the simulated verilog code running in the Verilator simulator, through a virtual SPI interface. Then I went to lunch. I came back to see the digit drawing app displayed on the monitor. I drew a '2' and it classified it as a 2. In another window I could see the Verilator simulator running and the data being passed. Chills.

Comment by mindcrime 3 days ago

I don't remember one specific moment, but I was fairly impressed with ChatGPT from the first time I started interacting with it. Was I ready to call it "AGI"? No, absolutely not. But it was clear that it was something new, and it was also intuitively obvious to me that "this AI is as bad today as it will ever be" and that predicting the rate of change would be difficult.

The more I use these things, the more I'm 100% convinced that it makes sense to say they are "intelligent" (for some meaning of "intelligent"). AGI or "human level intelligence"? Still no[1]. But some kind of intelligence. And I'm quite happy to allow that there can be "intelligence" that doesn't work anything at all like human intelligence, so arguments of the form "this isn't real intelligence", etc, etc. carry very (very) little weight with me. I've actually been sitting on a half written blog post on this very topic for a while, titled "The Marquee Sign Says 'Artificial' Intelligence"[2]. Finding time to finish it has been the challenge.

And before somebody says "Use AI to write it for you". Nah. I am generally what you might call "pro AI" and / or an "AI enthusiast" but I still draw lines. I'll use AI for research, for outlining, for brainstorming, etc. sure. But I have a hard-line stance against letting AI fundamentally write for me. I want anything that goes out with my name associated with it to have my genuine voice.

[1]: I like the term "jagged intelligence" that Demis Hassabis has been using. That is to say, the bounds of the intelligence are jagged or spiky: very intelligent in certain areas, much less so in others.

[2]: for any old-skool pro-wrestling fans, yes, that is an intentional nod to "Double A" Arn Anderson and his "The marquee sign says 'wrestling'" catchphrase. :-)

Comment by alexfoo 4 days ago

Someone in the house pressed the button to update the printer (Brother DCP-L3550CDW) firmware and the CSV page that was the basis for an existing Prometheus exporter (drum/toner lifespan, page counts, etc) stopped being a thing. Instead there was an HTML page with all of the information buried in various divs/etc.

I'd planned on writing something myself to parse the HTML and write a suitable exporter but I thought I'd give Claude a chance.

In a sandboxed VM I gave Claude a single static HTML file of the status page from the printer, also in the directory was the equivalent of "hello world" in Go, literally just the minimum needed to do `fmt.Printf("OK\n")`. The directory was called `brother-exporter`. That was it. No other instructions or information. I hadn't told it what it needed to write. I hadn't said what it should do. I hand't told it what language it was supposed to use.

Just by doing a `/init` in that directory Claude decided that it needed to write a Prometheus exporter in Go that would fetch and parse the HTML file from a printer (defaulting to 192.168.1.1) and then present the associated metrics in a way that they could be scraped by Prometheus.

It did this flawlessly in about 10 minutes.

I could have done it in several hours but this was definitely an "oh shit" moment for me. I think the biggest thing was the fact that it guess/assumed so much (correctly) from so little information in the beginning.

Comment by dang 4 days ago

(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days really), and which I would therefore never have done in the first place.

(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.

(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.

(4) Finding information that I had been unable to find using Google searches (e.g. https://news.ycombinator.com/item?id=42653136).

There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the time and effort required was prohibitive.

Comment by djmips 4 days ago

I made a personal project game 20 years ago that I knew had a bad bug in it and so I never did a final release but at the same time I never returned to it to debug but yesterday I noticed it at the top of my Github, it started with A, so I described the issue and Claude found the bug instantly and after a few back and forth discussion we came up with a good fix that I'm satisfied with. So I guess I can do a final release now. :D Sweet - feels good to put that to bed.

Comment by bentcorner 3 days ago

> Watching it do log file analysis in seconds that would have taken me hours (edit: days really), and which I would therefore never have done in the first place.

Just today I had my agent diff two logs to find a very nitpicky difference that was the cause of a problem, I pointed it at a ADO extension that was having issues, it downloaded the VSIX and decompiled the .NET binary to verify. Based on that information it suggested a workaround which I was very skeptical of, but well it worked.

All of this I technically could have done but I probably wouldn't because it would have taken too long without a clear payoff.

Comment by tempoponet 3 days ago

I can actually use and enjoy Linux. The "year of the desktop" never came for me, but instead I got the "year of the cli".

For 20 years I've used Linux in one form or another, but I've felt like I was kneecapped for the most basic things. Just trying to plug in an external drive or a second display meant hours of stack overflow and pasting commands I didn't understand.

Now I'm using several Linux machines for Steam, NAS, local LLM, development, and what used to derail a weekend project now amounts to a coffee break while Claude figures it out.

Comment by jollyjerry 3 days ago

It changed the cost/benefits for papercuts. Instead of thinking, that's annoying, but not annoying enough to fix, I can spend a few minutes running things specific to my configuration. Fixing bluetooth, fingerprint readers were recent ones.

Comment by nowittyusername 3 days ago

ha, same. The main reason I was able to switch to and stay on linux was because codex was able to set it up for me and is still managing to this day all the stuff i need done on linux. i tried so many times to switch out of windows before but the difficulties of installing linux and managing all the dependencies, drivers and all the other stuff put the OS out of reach for me. Now I just tell codex to update the latest nvidia drivers for me and whatever else and not worry about doing any of that stuff manually.

Comment by simonw 4 days ago

ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.

I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.

With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.

Comment by Npovview 3 days ago

https://brownstone.org/articles/since-lockdowns-a-12-gdp-los...

here's an example of Spanish journalist using AI to analyze US inflation data properly.

Comment by joering2 2 days ago

very interesting article, can you post it as separate HN thread?

Comment by chaoxu 3 days ago

I'm a researcher working in theoretical computer science. Chatgpt found a counterexample of some conjecture I've been trying for 2 years. Also, it one shot many problems I've worked on. It also improved some of my work greatly.

I feel quite useless in the sheer brutal proof writing, counterexample generating skill chatgpt is demonstrating, and wonder what would be the future of my profession.

Comment by Simon_O_Rourke 3 days ago

I would have loved to have had ChatGPT when I had to do a few modules in formal methods, I'd say it would have eaten through the BS I had to wade through

Comment by tern 3 days ago

Opus 3.x building me a productivity system with Obsidian MCP originally.

Next was discovering "create a mathematical model of the problem and derive the solution as a result" type prompts.

But, the real "oh s**" was a longer process of spec'ing a compiler/runtime for real-time DSP (with a lot of novel ideas) and it actually working.

My sequence was: (1) if helps me understand myself, (2) if helps me put together good ideas, (3) it can generate novel ideas given the right inputs, (4) it can build useful tools on my machine, (5) it can compound good ideas into better and better ideas with repeated passes, (6) it can build significant, ambitious machinery that's way beyond my ordinary capacity.

Current frontier: it can compound large codebases into better and better machinery with repeated passes.

The key thing I track is whether I'm running a process that converges and compounds or whether I'm spinning in place / diverging.

Comment by mayank 3 days ago

Such a great comment, and I agree with all of them.

For me in a similar vein:

- mar ‘24: thinking about how to survey the field and implement a hard research task in Natural Language Processing, and then just approximating it well enough with a prompt and a completions api

- mid ‘25: Llama 3 being able to analyze a good sized codebase I was onboarding onto, and synthesize it into diagrams that matched the quality of ones I’d generated by hand with deterministic tools.

- dec ‘25: opus 4.5 basically generating multi-class modules and tests perfectly (syntactically). Finding that errors were my own under-specification of the prompt. Stopped writing code by hand, mainly because it was good enough and came with tests, docs, build scripts, and other goodies for free.

Comment by dannyobrien 4 days ago

I got early access to the pre-ChatGPT OpenAI API (actually by pinging someone from OpenAI who posted about it on HN). At work, we were setting up to play a livestreamed JackBox game for a charity event. This would have been in 2019.

In a previous life, I'd been a writer for the original You Don't Know Jack game (the UK variant), where the job was to crank out as many funny quips about a topic as you could, and then use a handful of them in the recording of the game itself. Some of the later JackBox games are like that, but for the players -- you're given a set piece, have to come up with little funny improvisations within a time limit.

As an experiment, I tried the set-up lines with the OpenAI API, and see whether it could come up with some responses. Of course, 90% of them were unfunny or incoherent, but 1/10 were not bad, or even pretty good.

I'm not sure that would have been impressive to anyone else -- but remember, I'd had this as a job, and sat in a writer's room, where everyone did this, for hours. In that environment, you expect a large proportion to be duds: the discipline is keep pumping them out, and not flagging creatively until you find a rich vein. I realised that this was a tool that would have been the perfect complement to that work -- and it was a pretty good JackBox player too.

Comment by darksim905 3 days ago

I don't know about JackBox or whatever, but the original You Don't Know Jack games are positive fun memories for me. Thank you

Comment by takee 4 days ago

I was working on a science experiment (electromagnetics) with my 10-year-old kid that was going to be demonstrated at a science fair in his school. We ran into a hiccup with the experiment that we couldn't debug ourselves. I turned on Gemini live video call to help us root cause the problem. It was able to clearly articulate all the possible issues and eventually was successful in making our apparatus work as expected. Turned out the wire that I was wrapping around the screw had some insulation that was not scraped off well on the side it was connecting to the battery. Gemini was able to capture this detail even though my bare eyes could not. My kid and 2 of his friends were impressed not just by the experiment, but because the live audio/video back and forth we had with the AI was almost magical!

Comment by 317070 3 days ago

Ha, thanks for sharing this!

I've been building Gemini live since before ChatGPT came out. I am so thrilled to see it actually helping people in the wild!

Comment by 4 days ago

Comment by foobarbecue 4 days ago

Neat, but I'd bet it was "guessing" that rather than actually seeing it.

Comment by ckorhonen 3 days ago

If it’s “guessing” correctly 95% of the time, is that meaningfully different from “seeing” it for most practical purposes?

Comment by foobarbecue 3 days ago

I just mean that the image input probably didn't affect the output at all. Could have just told it "I'm an amateur doing bulbs and batteries and it doesn't work" and it would give the #1 search result on forums for that which is "did you strip your wire."

I feel like I'm in the audience at a magician show, except most of the audience is breathlessly amazed and doesn't understand how easily tricked they are.

Comment by dag100 3 days ago

This really underscores how so much LLM "intelligence" is based off of people's experiences that they wrote about. It saddens me somewhat to see that it has basically all been captured by corporations now, and perhaps in a few years there will be little point in knowing things yourself simply because the LLMs will have gotten so good that there is no point to wasting the extra effort. Of course, this might lead to the atrophy of people's thinking muscles and a dumber and more subservient populace, but does anyone really care? Judging by the meteoric rise in the capabilities of LLMs over the past five years, is it really naïve to expect most knowledge work to be obsolete in the next ten?

The other stuff in this thread about decompiling firmware is a lot more interesting to me, though, seeing as it used to be a fairly demanding but rewarding task that has now been "solved" by Claude. It's a magic trick that is a lot harder to pull off than the other things in this thread.

Comment by foobarbecue 3 days ago

Yeah. I guess the redelegation of thinking from people to machines has been going on since before LLMs.

Today I got a haircut. At the cash register:

Hairstylist (early 20s): "That's $34. Would you like to leave a tip?"

Me: "Yes, please put an extra 20% on there"

Hairstylist: "Uuuhhhh... sorry, I don't have my phone. Can you google what 20% of $34 is for me?"

Comment by Isamu 3 days ago

Best comment I’ve seen today.

Comment by PopePompus 4 days ago

I had an old astronomy app I wrote for pre-iPhone app store era Nokia phones (N900 etc.). I decided to get Claude code recreate it as an Android app. The old app produced several display pages for things like the positions of the planets. I was having Claude code recreate the app display page by display page, describing the display that should be produced, with no reference at all to the original app's code (or even its existence). After having it reproduce several pages, it added another one unprompted. The page it added was in the original app, but I had not gotten around to adding it to the Android app. The Nokia app's code is still on github, and somehow Claude must have made a connection between what I was asking it to code (without ever mentioning the Nokia app) and my github repository's Nokia code. It correctly implemented the page without me even mentioning the missing page. My jaw hit the floor.

Comment by jmkni 4 days ago

Not coding, but reading logs.

I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time

I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem

That was certainly an "oh shit" moment

Comment by bonoboTP 4 days ago

The big one was definitely ChatGPT upon release in 2022 and specifically when people showed how it can role play as a Linux terminal and you can narrate events like "the data enter is now on fire" and "run" nvidia-smi, it would show high temps on the gpus etc. Or you could "explore" the homedir or some famous person. It convinced me that if it can understand so well how terminals work, tool use and agents are around the corner.

Then Opus 4.5 convinced me that this has finally arrived. In 2022 I expected things to arrive faster actually, in 2023-2024. I expected we'd have much more realtime collaborative integrations with AI including GUI computer use. Maybe in 1-2 years.

For images, it was nano banana where I realized AI images can truly work, and all these adhoc issues like hands and limbs, or "it will never do horse riding a astronaut" were temporary. It's now clear that making feature length films is within reach. Not in one go but with an agent orchestrating, designing a screenplay, characters, shots etc and generating those. Whether the result will be worth watching or a flat story on the high level is another question. But it will be a "film" for sure.

Comment by cineticdaffodil 4 days ago

I think all those Steve Spielbergs hiding among the 8 billion - without connections and without hollywood names, having their day without getting filtered out by investor gremiums playing it safe - will produce enough material to be happy cineast for life.

Comment by jasonfarnon 3 days ago

Many Internet technologies like Youtube or soundcloud gave content creators direct access to an audience. If you were funny/creative/talented/attractive/in the right place at the right time/etc you could cut out the middlemen and capitalize on that directly. What is the hidden talent these Spielbergs possess that AI will enable? It's doing the writing, videography, etc. I imagine at most these hidden Spielbergs will be filtering output, like looking at a book of wallpapers and choosing what they like. Maybe some great movies get made, but it will just be the result of selection effects on the sheer number being made.

Comment by jasondigitized 4 days ago

YouTube is well poised for this.

Comment by skybrian 4 days ago

How will anyone find them, though, if there's so much slop that people stop looking?

Comment by fragmede 3 days ago

Human reviewers a la the Criterion Collection.

Comment by stuxnet79 3 days ago

It's not information overload. It's filter failure [1]

[1] https://www.youtube.com/watch?v=LabqeJEOQyI

Comment by zamadatix 4 days ago

Yeah, ChatGPT both being able to generate conversational text AND act like a terminal absolutely blew my mind - far more than I ever would have imagined the approach would scale to st the time. Since then there have been more impressive tasks accomplished but nothing which put me into the same state of pure amazement.

Comment by sunaookami 2 days ago

>and specifically when people showed how it can role play as a Linux terminal

Ha, exactly the same for me. It was this blog post: https://www.engraved.blog/building-a-virtual-machine-inside/ that completely blew my mind.

Comment by djfergus 3 days ago

I had an old 1st gen Amazon Firestick in a drawer for years, it had updated to the latest software and there were no public root exploits.

I spent a day bouncing between Claude and Codex and they researched, downloaded kernel sources, tried exploits and eventually got root via "FBUF/VCHIQ kernel zero-write primitive to patch live kernel memory". I was able to make the root permanent, debloat the amazon apps, downgrade the firmware etc.

It was amazing to watch and made me excited for the future where more hardware (old and new) will be available for repurposing.

Comment by aizk 3 days ago

Can you share that on GitHub?

Comment by allarm 3 days ago

And get sued by Amazon while they're at it?

Comment by raesene9 3 days ago

The one I remember most is, when experimenting with Opus 3.5 for the first time, I asked it to generate a Firecracker backed local VM creation and management tool, something I'd wanted for a while but not found.

My expectation was that it might get something barely functional but would probably fail, and instead it generated a working piece of software which achieved a lot of what I wanted.

That definitely made me realise that, for at least some classes of software task this was a major change in how things could be done.

More recently when I can give the model a Local Privilege Escalation PoC in Linux and ask it to test whether it can be used for container breakout and then generate a working container breakout, all in one prompt... that definitely changes things.

Comment by hgoel 4 days ago

I've had many, but a recent one was when I figured I'd try asking Claude for help with my attempts at learning to draw, specifically anatomy.

I uploaded one of my sketches and asked for feedback, expecting it to not be too useful, but it actually pointed out many issues that no one had ever pointed out to me, but perfectly explained some of the things that felt off to me. Out of curiosity I then also asked it to label the issues in the sketch. It wrote a python script with the coordinates to put everything at and labeled the sketch that way.

I'm still used to vLLMs not being that great at vision, so it was pretty surprising to get genuinely useful advice.

Comment by j_bum 3 days ago

Would you be willing to share the annotated example? I’m stunned to hear this, I didn’t know that vision had become so capable.

Comment by loneboat 3 days ago

That's super interesting! What sort of feedback? Anatomical feedback ("Uhh, that's not where arms go...") or drawing (like tips on shading etc...)?

Comment by hgoel 3 days ago

It pointed out that the shoulders were too rounded and the perspective was incorrect for them, making it seem like the arm was just appearing out of the torso. It pulled up images to explain that I wasn't correctly indicating the deltoid's presence. This helped me understand why I felt that the shoulder was hard to distinguish from the torso.

I also asked for help on how to make my posing less stiff and it used the Python script trick to roughly indicate the line of action and how they were very straight and parallel and to reduce stiffness I should have more curves etc.

This wasn't really at the point where I even asked for shading advice.

Comment by bpavuk 3 days ago

I've been using Gemini for that - it feels like it practically thinks in images (or "possesses impressive visual intelligence," as Google execs would put it).

Comment by nrjames 4 days ago

We were experiencing abnormally high electrical bills and I could not figure out what was happening, so I downloaded the granular usage data (15 min increments) from Duke Energy, explained what we had in our house and when we typically used those items (washer/dryer, EVs, etc), provided a rundown of our energy usage plan, then asked Claude to build me a Streamlit dashboard that would help us understand what was going on and predict what was going to happen over the next months. The dashboard had a few simple toggles a levers. Claude was basically able to one-shot this, knew how to manage the XML from Duke Energy, etc... In about 20 minutes of prompting, I had a very comprehensive dashboard that was extremely helpful not only in diagnosing that specific issue but also in helping us understand how to further lower our electrical bills.

Comment by bombcar 3 days ago

But ... who was phone?!

E.g, what was it? Don't leave us hanging!

Comment by lithboy 4 days ago

This can be a product.

Comment by bonoboTP 4 days ago

Going from one off prototype to robust product is a huge leap.

I think these ephemeral context tailored projects are really great and useful. But these are not to be thought of as products. They work for you specifically, and people who are tech-brained enough to be able to formulate the complex requirements into a coherent prompt are not like the average user you'd have to sell a product to. It's much easier to make software to intelligent users.

Comment by knollimar 3 days ago

a 3 or 9X leap if you listen to Fred Brooks

Comment by url00 4 days ago

The comment above literally said this took them 20 minutes of prompting. That doesn't sound like much if any value add.

Comment by swiftcoder 3 days ago

> The comment above literally said this took them 20 minutes of prompting. That doesn't sound like much if any value add.

You can say much the same about most small SaaS products of the last decade - the value-add isn't the 20 minutes of prompting, it's that someone else has already tested and validated the damn thing.

And yes, you won't sell many to engineers, because they'd rather prompt their own in-house version. But you might well sell to other folks

Comment by codybontecou 4 days ago

I’m making $1000/month off of an app that was initially a single prompt.

There’s a gold rush right now. You absolutely can turn these ideas into products.

Comment by timcambrant 1 day ago

How do you approach the market with a random app? Posting on X/HN or something else?

I have a number of solutions from the past year that could be products and for sure would be sellable, but since they were so easy to build I just keep them to myself. It feels like such a long shot to throw up a landing page with a demo and start cold calling.

Comment by jasondigitized 4 days ago

Sounds like something people say to locksmiths.

Comment by underdeserver 4 days ago

It's not going to be a particularly expensive product, but a product it can be.

Comment by sgarman 4 days ago

Homeassistant already has tons of integration into power providers and easily let's you pipe in local data if you have it. In addition - can it be a product if anyone can just type what this guy did into an LLM? What's your moat if anyone can just replicate it?

Comment by fragmede 3 days ago

It doesn't have to be a durable.moat for it to be a product that makes the author money, just right place right time. If it's gonna cost me a bunch of time and effort and tokens, and the cost of the product is lower than the time and effort and tokens, then I'd rather pay for the product.

Right now we're in $1 Uber ride territory. That $20/month OpenAI/Anthropic plan isn't going to last forever. If it's going to cost me $100 in tokens to replicate the product, $20 is a cheap no brainer purchase m

Comment by vishvananda 4 days ago

For me it was earlier this year when I started dusting off some old stalled projects and had an agent work on them. In a few days I:

* Built a clone of the Alpha Zero implementation[1] my team built at oracle

* Ported my hobby NES emulator from javascript to rust[2] (this actually took less than 30 minutes and worked on the first try)

* Implemented all of the lessons from the C++ Grandmasters Challenge (which eventually led to a complete c++ compiler[3])

The thing that flipped the switch was using it to build things that I actually put sweat-equity in to previously. I knew how hard these things were to build, so it landed in a way that other projects had not.

[1]: https://medium.com/oracledevs/lessons-from-implementing-alph...

[2]: https://github.com/vishvananda/popeye

[3]: https://medium.com/@vishvananda/i-spent-2-billion-tokens-wri...

Comment by aniviacat 2 days ago

I showed Codex CLI to my brother who is not a tech person at all; he's never before touched code. He successfully built a somewhat complex app with it that works well.

Before, I was certain that a non-techy person would get stuck somewhere along the process. But I was proven wrong.

I watched him a bit while he was working on it. He interacted with Codex in a very different manner than how I would. Since he knew none of the technical details, he would use very vague and shallow wording. But that was not an issue. Codex also allowed him to make use of git, despite of him having no idea what git is.

This has changed my view of Codex' abilities significantly. I often hear people comment that AI seems intelligent in domains you don't know much about, but turns out to be stupid when you actually know the domain. This was not the case here at all. Despite of my brother completely ignoring the technical details, the AI built good code that fulfilled the requirements well.

Comment by idopmstuff 4 days ago

Two of them:

1. ChatGPT 3.5 wrote me a script to pull some data out of Shopify and write it to a Google Sheet. Nothing remotely impressive by today's standards, but I had just commanded a computer to write code in plain English and it worked!

2. I own a bunch of e-comm brands, and with every new image model I tried to get product photography. Nothing worked until Nano Banana Pro, when suddenly I gave it a crappy iPhone pic of a product and got back a fully usable whitebox photo of it. Then I tried making the sort of infographic-style images you usually see on Amazon, and it nailed those too! In hindsight they weren't perfect, but more than good enough to use. I was about to ship that product to my photographer, and I would've had my designer make the infographic images, so that was the first time AI actually replaced a human contractor for me. Pretty big "Oh shit this is going to seriously impact employment" moment. Wrote about it here: https://theautomatedoperator.substack.com/p/ai-just-took-my-...

Comment by selimthegrim 3 days ago

How does it do for 360 degree photos/Gaussian splatting type stuff

Comment by idopmstuff 3 days ago

For the most part if you give it sample photos that have sufficient coverage of different angles, it's very good at faithfully reconstructing the product from whatever angle you choose.

The one exception I've encountered is baby mobiles. It really does not understand the physics there.

Comment by binarysolo 3 days ago

I run a remote-first ecom business with a dozen or so team members.

About a year ago, one of our account managers had a life issue, ghosted us, and she held a fairly critical role in the business and gate-kept a bunch of knowledge to some high value vendor accounts.

Because we ran our ops in Google Workspace, we essentially had off-the-shelf RAG and was able to get answers to a lot of things by asking Gemini to go through all her emails/docs/calendar/meetings, reverse engineer what she did, and create an onboarding doc for her successor.

This happened once more a few months later when one of our analysts broke his wrist on vacay, and we were again able to replicate what they did to cover for their absence, this time dabbling in AI agents ("gems") to do a bunch of the regular simple tasks and again it covered things without too many issues.

I def expect Amazon/shopify to at some point replace all of us brand owners with AI bots if they can, but we'll see how long the gravy train goes on.

Comment by ai_fry_ur_brain 3 days ago

If you're replacable by an llm, then you're doing something extermely poorly. They're terrible decision makers, have no taste and have little to no ability to infer nuance.

Your business should be fine for a long time (assuming an employee doesn't nuke your business's backend or something because it seems like you're doing something wrong on the HR side of things)

Comment by iLoveOncall 3 days ago

[flagged]

Comment by vachina 3 days ago

You mean all these testaments are bs? As an infrequent user of LLM assisted work these stories never really tallied with my experience.

For example I could never throw a bunch of spec/doc at an agent and have it return something useable 30 minutes later. Yeah the code compiles but they don’t work.

Comment by iLoveOncall 3 days ago

> You mean all these testaments are bs?

Yes, or at least extremely exagerated. But most are from literal bots ran by Anthropic and OpenAI to sell their shit.

Comment by vachina 3 days ago

Interesting, then again unsurprising. HN is ripe for and very easily botted.

Comment by Aerolfos 3 days ago

Either they're BS, or the people making these statements are self-incriminating to a terrible degree, either they don't care about their work or are outputting a very low level of quality and being amazed at how "great" and how much better AI output is than their own

All the options are extremely depressing

Comment by dang 3 days ago

[dead]

Comment by mbo 4 days ago

Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.

Comment by wps 4 days ago

Thats insane! I cited your image in a humanities paper during one of my freshman year classes.

Comment by bonoboTP 4 days ago

I think the GP just means it was their oh shit moment, not that it was their image...

Comment by kstrauser 4 days ago

I think GP meant that yes, they were the one who had that image generated, and the oh shit moment was that it worked.

Comment by mbo 4 days ago

Yes confirmed, I did not author the DALL-E paper lmao

Comment by wps 4 days ago

Completely misunderstood your phrasing lol, disregard that. But yeah I agree the avocado armchair was definitely a milestone. I used it as one of the exemplary first steps in what we now consider Gen image AI.

Comment by evanb 2 days ago

I'm a professor of physics; my research area is computational quantum field theory. I was beginning a new approach for a certain kind of model. I know the model has not been simulated before (a collaborator has just constructed it for the first time using in-field knowledge developed in the last year).

I've been steadily more impressed, but the real startling moment was earlier this week, I could show that in a particular operation I had a bug (some identity wasn't satisfied) but I couldn't understand why. In one try Claude Code grokked my code, explained the problem to me (there were some dimension-dependent minus signs that needed to be computed), and fixed it.

Comment by ddxv 3 days ago

Most of the time using LLM generated code the feeling is "Oh Awesome!"

My "Uh Oh" feelings are weeks later when I realize there is a subtle bug in what the model presented as test passing "awesome" that I didn't read closely.

The biggest uh-oh is when I get lazy and let it modify multiple files and make many changes at once, and YOLO because I didn't fully understand what it did. I can usually get away with that for frontend, but for data manipulation tasks if I don't understand it, it's likely not what I wanted and I'll be back again in weeks or more trying to figure out what changed.

That's more or less what life was before LLMs and copy pasting from StackOverflow. Most of the time if I didn't fully understand something, I knew I had to eventually get back to it to grok what changed before committing.

Now with LLMs the 'copy pasting' is much faster and handles boilerplate super well letting me focus on edge cases.

Comment by rerdavies 4 days ago

Working on a Spice compiler to convert schematics for classic guitar pedals into real-time executable code.

I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it (not really expecting it to succeed).

It proceeded to implement not only the equation, but the calculation of the Langrangian of the functio, another 30 lines below, which required taking symbolic partial derivatives for a not-at-all trivial function, and successfully figuring out which variable was which in the resulting matrix. The source material just said "Lagrangian of", and did not provide the partial differential equations. And then providing a comment that identified the page number and equation number in the source text for the "Lagrangian of" equation.

Comment by petra 3 days ago

That sounds awesome.

Do you think claude could help create something like the line6 "software based" guitars?

Comment by rerdavies 3 days ago

Current state of the art in guitar emulations are Neural Amp Modeler Core A2 plugins, which are dramatically better than Line6 amp emulations. (See the results of large scale listening tests in the following page that compare NAM A2 against current-generation Line 6 amp emulations).

    https://www.tone3000.com/guides/nam-a2-the-complete-guide

Guitarix plugins actually use the technique I'm chasing (circuit diagram based simulation). I'm honestly not sure how Guitarix emulations stack up against Line 6 emulations, or whether Lin6 uses a similar approach. To my ears they seem to be of comparable quality. But NAM A2 is dramatically better than both.

I'm actually chasing this line of research as part of an effort to write a realtime-capable accurate emulation of a Dyna Comp compressor for inclusion in the ToobAmp collection of plugins that uses a hybrid approach (circuit simulation for the envelope generator, which NAM modeling struggles with, and a nano NAM model for the Operational Transconductance Amplifier at the core of the original effect, which is computationally expensive when using circuit emulation. Too early to tell whether that's a sound approach atm. Finding a good open-source library of Spice components (for branded diodes, transistors, op amps, &c) may prevent my circuit simulation project from reaching a publicly releasable state.

Disclosure of conflict of interest: My own open-source project (PiPedal, search for it if you're interested) relies heavily on NAM A2 models.

Comment by mystifyingpoi 3 days ago

I think it's possible, but if you mean Variax, the trick was in hardware. I believe it had multiple pickups per each string.

Comment by rerdavies 3 days ago

... and to answer your question directly. No. I don't think claude could do it unless you guide it very carefully through the process. You need to have a pretty good idea of what needs to be done.

Comment by djmips 4 days ago

That sounds pretty fun. I guess I could just Claude to do this hehe but are you sharing?

Comment by rerdavies 4 days ago

I'm not mostly Claud-ing it. Perhaps I should. But in the difficult bits, it never ceases to amaze me what these tools are able to do.

Yes, if it matures, it will go open source. Not immediately clear at this moment whether it's feasible to do an Operational Transconductance Amp in realtime. :-/

And it's competing for attention with the 2.0 release of this at the moment:

https://rerdavies.github.io/pipedal/

Just went GA, so I'll have some cycles to come back to it.

Comment by saguntum 2 days ago

Three moments:

1. When ChatGPT came to the masses, it trivially solved my standard phone interview problem for new SWE hires. It's not particularly complicated, but it screened out a lot of candidates pre-AI and was a good filter.

2. At one point, there was a bug in some client software we ship that was erroneously displaying a protocol-level disconnect message to clients when the server shut the connection unexpectedly.

In very few turns, ChatGPT gave me working code to intercept the error at the client level via a pseudo-server that intercepted the requests and implemented the proper teardown procedure. It essentially implemented a micro-server for the protocol we were working in as a bug workaround.

3. I'm working on a major rearchitecture (6+ month project of very senior engineers) of an internal system for scalability and maintainability with tight latency bounds where correctness with the old system is necessary.

I came back from an international vacation quite jet lagged and was having trouble doing actual coding work for a bit since my focus was shot. Taking additional days off wouldn't have helped recover from the sleep issue more quickly due to childcare responsibilities.

I tried instead building a pipeline where I'd run an integration test suite, throw a list of failing tests at Claude, and make a PR if it made more tests pass. I automated as much of the AI "loop" as possible in my state at the time.

Using this technique, a coworker and I fixed over 200 failing integration tests for parity with the old code in 1 week. We subdivided failing tests between engineers and re-sub-divided with each success.

I re-reviewed the code before we launched when I was more well-rested, but this was an extremely effective technique and makes me think that AI-enhanced test driven development (TDD) is the future.

Comment by tejohnso 3 days ago

I didn't have a slightly panicked moment, but sometime in the last year my approach to programming changed.

When starting a project, I used to think about how I was going to structure it, how the large pieces would interact, how some of the details would work out, and then I'd work through alternatives and consequences on my own.

Now I don't think about it on my own so much as have a conversation with an LLM about it. And it's great because it can quickly gather information from various sources, I can ask it for links to canonical sources, I can ask it about trade-offs between alternatives that I might not have considered, and through conversation, I end up with a more detailed analysis.

Then as I work through the development, I keep my new agent partner in the loop for discussion, suggestions, and troubleshooting. It can't be trusted completely, but it's certainly reliable enough to be considered a useful tool for my purposes.

I went from thinking it was an interesting toy to play around with, to completely integrating it into my work flow, and that change seems to have happened very quickly.

Comment by fulafel 3 days ago

When I realized they're going to be largely powered by increased natural gas use in the USA, neatly combining with our biggest problem so far (the climate catastrophe).

Comment by hypendev 4 days ago

Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.

Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.

That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.

Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.

That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.

Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.

Comment by llmssuck 4 days ago

I also remember doing this. Chats, first parts of books, title pages and all, just to give it a chance of saying something in the ballpark of what I was looking for. I remember very vividly that chats or books by Linus Torvalds would be more technically accurate that say Lincoln. It's obvious of course, but I found it really enlightening. It could code a bit actually, not great, but well enough to push me into an existential crisis. I started doing a master to re-educate myself because I could see "interesting" times coming.

I actually emailed OpenAI back then saying they should be careful because this is much greater than the public or even they themselves think. They actually replied! They thought it was cool, but very limited and I shouldn't be too impressed. Good times.

Comment by dyauspitr 4 days ago

I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.

It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.

Comment by nrjames 4 days ago

Ha! I did the exact same thing about 2 months ago. It saved me a lot of headache and research.

Comment by dyauspitr 4 days ago

I got quoted $700 by the pond guys to replace it. I ended up buying it for $109 bucks and replacing it myself. It honestly would not have been possible without ChatGPT because I had nothing to go off of and the pipe connection was really specific to that model.

Comment by Const-me 3 days ago

None so far. When I try to use these language models in the primary areas of my expertise like SIMD or GPGPU they fail to do any good. When I ask them to implement some general-purpose stuff, the output is too low quality to be useful in my software.

Still, find them incredibly useful for code review (despite unable to write good C++ or C#, smart enough to detect issues there), also dealing with technologies outside of my area of expertise like Python or web stuff.

Comment by 3 days ago

Comment by xtracto 3 days ago

I probably will be burned for this, but with the help of an LLM I wrote a tiny program that captures video from a browser screen (Xbox live online FPS game), passes the video images through a small trained NN that recognizes people forms and presents the video on another screen. That way I can place a green overlay on enemies and they are easier to see on PVP matches.

All that in around 100 lines of code, including the training/fine-tuning of the tiny YOLO nn.

Comment by jerryoftheyear 3 days ago

I'm curious, what amount of input lag does this introduce?

Comment by naikrovek 2 days ago

None.

It introduces output latency, not input latency.

Comment by jerryoftheyear 2 days ago

That's not how that term works. Input latency/lag refers to the time differential between triggering some kind of input and the action of that input showing up on your screen.

Comment by naikrovek 1 day ago

you described end-to-end latency, input latency + processing latency + output latency.

input latency is a measure of latency between the human input and the computer receiving the input.

output latency measures the time between a computer commanding a certain pixel to change color on the screen and the color change actually taking place.

It's not my fault these terms are often used incorrectly.

Most of the time, what you see when you see someone test "input latency" is that person actually testing end-to-end latency, which is input latency + processing latency + output latency, as it is difficult (but not impossible) to test only one of these without special hardware. testing all three at once is easy.

A proper input latency test would be (for example) some external tool sending keypresses to a computer and measuring (via a hardware debugger or some other hardware-level tooling) how long it takes for the program you are interested in to receive that input.

As stated previously, output latency is the time between your program commanding something on the screen to change and that change actually happening.

there's a third latency in this stack, and that's your program itself. how long between the time it has received an input before it commands an output device to change its output. processing latency.

for the purposes of end-to-end latency testing hardware, the processing latency is effectively zero.

all three of those stack up to become "end-to-end latency" which is what most tooling available to end users measures.

Comment by darksim905 3 days ago

Why would you get burned for this? For people who don't know, sure, this smells of 'cheating' but it's essentially just an exercise in computer vision. Also, I'd posit that using various DMA type tools in the video game modding arena are impressive and unusually secretive because they make money on subscription services. It's fascinating to me how well some of those tools and mux hardware works. When you combine that with hardware we can now buy like the DGA with more onboard memory and faster interconnects, it's going to make reverse engineering, malware analysis, forensics -- a very interesting time for those who know what they are doing.

Comment by bombcar 3 days ago

Go the next step and have the LLM simulate the controller and play the game for you.

Comment by mft_ 3 days ago

Would you either share the code, or describe the approach? I’ve asked LLMs how to do something similar a few times recently, and received several different potential approaches. I’m still no clearer which route to take!

Comment by amarant 3 days ago

I had Claude build a private podcast station for me. It integrated with Gemini to create a script for the show, based on a topic of my choosing, each talking segment ends with a presentation of the next song, which is played via Spotify, and is selected to have some sort of tie-in with the previous discussion. A tts model generates audio files based on the script, and a playlist is generated to play local file audio segment, then Spotify track, then the next segment etc.

An AI made a program integrating with 2 other AI, it's AI all the way down! and the result is great! I'm learning so much by having my own private radio host speaking about topics that interest me.

Comment by littlexsparkee 2 days ago

I use Kokoro TTS to turn text (articles, HN threads) to mp3s to listen to while doing chores, it's been great.

Comment by justforreading 3 days ago

Wow, I am very interested to see how this works

Comment by amarant 3 days ago

I'm not at my computer atm, but I'll publish my code when I am and let you know!

Comment by jasondigitized 2 days ago

Would love to see the code. Sounds awesome!

Comment by motyar 2 days ago

[flagged]

Comment by motyar 2 days ago

[flagged]

Comment by solomonb 4 days ago

I gave chatgpt 3.5 the type signature for a co-algebraic encoding of a mealy machine:

    newtype Mealy s i o = Mealy { runMealy :: (s, i) -> (s, o) }

And it gave a really impressive analysis.

Then I scrambled all the names and asked with a fresh context like:

    newtype Foo z e g = Bar { blob :: (z, e) -> (z, g) }

It got completely confused and generated a bunch of non-sense. It was at that moment I realized that LLMs don't really understand anything.

And yes I understand that a newer model would not get confused by this.

Comment by bonoboTP 4 days ago

In high school math class our teacher swapped out all the symbols in the epsilon delta definition of limits, and asked us what this equation expresses, and many students struggled to interpret it.

I don't think this test shows that an LLM doesn't "understand". It shows more that it has similar failure modes as humans.

Comment by solomonb 4 days ago

Well first of all I think there is more implicit data encoded in the symbols of the epsilon delta definition of limits. In the Mealy example they really just labels for arbitrary sets. The LLM actually failed a much simpler relabeling exercise. Setting that aside, I still think the analogy is flawed.

The student is mid learning process and its entirely reasonable for them one to be relying on pattern recognition until they have fully internalized the subject. The model is fully trained and should thus have internalized their understanding of the subject.

Additionally the student can update their understanding when pattern recognition fails. The model is fully cooked and will never do more then pattern recognition.

Comment by pixl97 3 days ago

>The model is fully cooked and will never do more then pattern recognition.

The correct answer to this is "Thank fucking god".

Humanity as a whole is not ready for continuous learning models. They'd either learn very fast and actually place people at their jobs and cause mass hardships for humans, or they'd go off the alignment rails really fast and cause mass hardships for humans.

Hopefully we'll get a nice steady onramp to continuous learning where we can iron these issues out.

Comment by Kon5ole 4 days ago

From actual use I've not had a "oh shit" panicked moment yet. More like a bunch of "Holy shit" euphoric moments.

So far I feel like I as a developer have gained actual superpowers, and can deliver results that make my stakeholders slackjawed with awe. I love it.

It will last perhaps a few months more, then they'll expect it. Delivering more features faster will be the new normal. But I think system developers, as in people who actually like to deliver new features and systems, will still be the ones doing it.

Fundamentally I think LLM's just change how to make information systems, they don't change who has the inclination to make them.

MBA's making excel sheets that do more than excel was ever intended to do has given programmers lots of work over the years. Such solutions identify a need for a properly designed system and frees up the budget to hire programmers.

If the same MBAs start vibe coding, I predict we will get even more to do, for similar reasons.

I may be horribly wrong, and if the day comes that I realize that it will be the "oh shit" panicked moment. So far so good!

Comment by johnfn 4 days ago

I do genuinely wonder if you’re correct that other people will begin to expect it. I feel I was suddenly able to do stunning stuff about a year ago, and I recall thinking this is nice but everyone will catch on to my secret soon and I won’t be exceptional any more. But 12 months have passed and I don’t think this has really panned out yet. Weaker engineers just don’t seem to understand that they can just ask AI things. Eg the other day another engineer spent like 3 hours trying to hunt down a particular line of code so I asked AI and it found it in like 5 minutes. I showed that to him, but then he immediately got stuck trying to find something else for a few more hours, so again I asked AI etc. It’s very baffling.

Comment by Kon5ole 4 days ago

There is definitely a learning threshold and it's still early days. Not every developer has found out how to make efficient use of these tools yet. But I think most will, soon enough.

But I think my own clients will soon start to question why some feature takes ME a week, when I was able to deliver another feature in a day or two.

That they are features that used to take months, and even delivering them in a week is a goddamn miracle by 2025 standards, will not be relevant. They won't expect such features to take months any longer, based on what I've delivered earlier this year.

So I think that the past few and maybe next few months, maybe a year, will be remembered as a "happy hour" for this tech as a developer. These are the days that we'll talk about saying "those were the days". :)

I am still optimistic that "the normal" in a few years will be pretty much like it has been before - I'll be delivering features at work and tinkering with hobby projects at home, and the major difference will be a much larger scope and ambition for both.

Comment by djmips 4 days ago

Direct use of AI is going to be a filter on a lot of people - some permanently I suspect (especially say older people). But perhaps this will be short lived as the interfaces to AI are improved enough that everyone will benefit.

Comment by 3 days ago

Comment by ozgung 3 days ago

For me it's not about the capabilities but what they can be used for. Think of all the recent drama between Anthropic and the Department of War. A real wake up call (especially if you are not a US citizen). Proves that AI is essentially a Surveillance and Warfare technology (which justifies the big valuations).

Or see this simple and fun site: https://hn-wrapped.kadoa.com

AI automatically analyzes all your social media posts in your life and can generate a pretty accurate profile about you in a second. We have no privacy anymore. Social media sites like Reddit already do that for moderation. Others do for more sinister reasons.

Note that Profiling is illegal in many countries. But laws can't protect us anymore.

Yes, it was always possible to that manually. But with AI it's so easy, fast and accurate to do in large scales. A hacker having access to your computer, reading your mails and messages is one thing. An AI reading and analyzing all your mails, messages and data is something different. Doing this for whole demographics (Cambridge Analytica style) is at another level.

Comment by jkraybill 4 days ago

So many. First was when I saw GPT-2 create jokes that were original and kinda funny.

Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.

It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.

It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).

Comment by marcus_holmes 3 days ago

I took a photo of my ailing plant and claude advised me on how to get it healthy again (and how to take a cutting and nurture that).

This is some science fiction shit. I get all the coding stories, but that's a computer talking about a computer, it makes sense. Showing my computer a picture of a plant, and it not only recognised the plant, but diagnosed it and knew what to do... blew my mind.

Comment by aizk 3 days ago

You may enjoy this - Claude autonomously grew a tomato plant: https://autoncorp.com/biodome/

Comment by marcus_holmes 1 day ago

thanks, I did enjoy it :)

Comment by hparadiz 3 days ago

Been using it to manage an estate and just being able to shove all the documents right into an LLM and have it spit back out perfectly worded emails as well as keep track of check lists of things I need to do with an automatically create a ledger for me in sheets. It's been a huge mental load off and I've instead been able to focus better at work and the labor costs saved to me have been immense. Just on this one little thing. I'm one of those people that over thinks correspondences and letters and it ends up causing me to be stuck on something so being able to ask for just the right wording has been super helpful to me.

Comment by irthomasthomas 4 days ago

My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.

Comment by 4 days ago

Comment by segmondy 3 days ago

Running local LLM in 2023 and I heard folks talking about interfacing LLM to tools. I wrote a system prompt and told LLM it can call some tools. If it wants to call a function to output func(params...) and do so in an XML tag. I provided a few examples, none of this JSON soup we get today. Then told it I'll provide it the result in a RESULT XML tag and it should use that to answer. Wrote up a harness around that and I had a local model interacting with the outside world. Oh wow! Everything else today about MCP, Agents is all an extension of that thought. Using function calling, I built an agent. I defined a data structure that represent rooms and how they are connected. The room will be marked as dirty or clean. Then I would place the agent in a room and the agent will decide if to go left, right, down or up and into a room. Once it got into a room, it would decide if to clean it or go to the next room. Repeat until all rooms are clean. Basic toy of CS101 AI vacuum agent. It worked!

So being able to get real world input/output to the model and having the model being able to make decisions in a loop and to be able to do it locally. I have been screaming like a mad man ever since.

Comment by nwhitehead 3 days ago

(Spouse's story)

Today I used Claude to diagnose a blocking bug in a Steam game I really wanted to play. It took it 18 mins, but it unpacked the Godot package, figured out the bug, proposed a fix, and gave me an in game workaround.

I didn't have to do anything! Claude figured out the structure of the .pck file by using `strings`, then wrote some Python code with some magic Godot-specific code to unpack the specific chunks it needed.

Comment by frankieg33 6 hours ago

shipping my first piece of software. i was playing the game Mewgenics. i knew why the developers decided to not show all of your cats at once (players will optimize all of the fun out of your game), but i thought that was particularly egregious given you can have upwards 100 cats, and keeping them all in working memory was too taxing.

so within a day, i had claude code make me a PyQt6 app that over the next ~2 months grew in complexity and functionality. I had 600 users. It was tranlated into Chinese, Russian, and Polish.

The speed of taking an idea to production was unbelievable. And I am not technical. I learned a lot about how software development worked. Super glad I did it.

https://github.com/frankieg33/MewgenicsBreedingManager

Comment by paulbjensen 4 days ago

I would say the first time I did “vibe coding”, when I tried Claude Code with Zed’s agent integration in January this year.

I wanted to see if I could build an image editor for isometric graphics using HTML5 canvas, Svelte, Vite, and the. Rather than do all of the skeleton code setup, I figured “why not try and see if Claude can build the app scaffolding?”.

I gave it a prompt and watched it produce the scaffold, along with a few features I outlined in the prompt.

When I booted the app and saw that the features worked and that there had been an element of design to the layout, that was my mind-blown moment. In a period of about 45 minutes, I added some features and had a basic MVP at the end. I walked back home stunned.

That app is available for free at https://babspixel.com

Comment by adamkf 3 days ago

I'll give you two:

The first was when I first realized that I could tell codex to use gdb to debug a core dump. This was about a year ago, so it made a bunch of incorrect theories, but it enabled me to go much further than I would have been able to go by myself. I eventually solved the problem.

The second was when I decided to ask it about my Linux Wi-Fi issue that I had been having for several years. The computer would infrequently have multi second pings and dropped packets, then go back to normal. I thought it was due to the weak signal, but after describing the problem to codex, it immediately disabled power management on the Wi-Fi interface (this is a desktop computer, so I don't care much for that anyway) and the problem has never come back. I had been dealing with this for years, and I had tried searching for a solution before, but codex just solved it directly.

Comment by WhompingWindows 3 days ago

Reading a dozen comments here, the AI seems to blow peoples mind most often in domains they're less familiar with. Repairing furnaces, HVAC, towing hitches, camper van interfaces, printer debugging. It wasn't the user's career to do these things, it gave them a bump from very novice to intermediate level.

Comment by bombcar 3 days ago

Dunn-Kruger AI. But hey, if it works it's still a powerful tool.

So far my use has done things I could do, but haven't explored in detail how to do - with simple steps I can guide and understand and finally automate things that were just below the "annoying enough to automate" boundary.

Comment by burnto 2 days ago

I had received four very different bids for a home repair project. Just wildly different itemization breakdowns, costs, timelines, scopes, even formats. Opus helped me turn it into an apples-apples comparison, filled in missing areas with reasonable inferences based on the other bids, provided a nice pdf I printed to review with my partner, even offered suggested key questions for follow up calls. It really clarified the advantages of one of the bids.

I use it professionally all the time and could cite technical scenarios where it’s become almost indispensable, but saving me time and money and reducing stress on this mundane stuff… now imagine applying to people’s stressors: job searches, health, big purchases, debt… there’s an opportunity to actually make people’s lives better. After 30 years of hype cycles, I should be wary of techno-optimism. But here I am feeling cautiously optimistic anyway.

Comment by thallavajhula 3 days ago

I wasn't impressed by the LLMs up until January or so when Claude Code swooped in. Until then, I felt like the LLMs were slowing me down. I have been using them for a couple of years now for coding at work, but I never really thought they brought in real value. Then in February I worked on a 1-month-ish project timeline and shrunk it to 3 days and that was it. I didn't write a single line of code in that project and I went all in with Claude Code. That was it, _the moment_ of realization. I was thoroughly impressed. I went from nothing to a tool that served several teams. Now I'm starting to see the cracks in LLMs and I'm slowly getting back to picking which task to offload to AI and which ones to do by myself.

Claude is great at coding. That's it. Outside of it, it's just god awful at pretty much everything else. ChatGPT OTOH, is good at coding, but at everything else, I find it brilliant. Gemini never made me want to stick with it. It's good, but never great for my use cases.

Comment by eqmvii 3 days ago

Some business users spent ~30 minutes on an internal process, and we prototyped an "Agent" in Slack to take over. At first it didn't work, then it didn't work some more, eventually it ALMOST worked. Then one day, it worked, and the old business process died never to be revived.

Now it sits in a slack channel, and I watch it doing work, responding to ambiguity, and taking feedback/edits all day. It's unreal. It's literal magic. It saves a HUGE amount of time and gave us a pattern to do more.

This is the real deal. It's not easy to find problems with the right shape, and it's not easy to build agents that fit even when you do... but once it clicks, it clicks.

Comment by acrinimiril 4 days ago

Two things:

1) I wanted a harness for running BPC.EXE (the old Borland Pascal 7.0 Compiler) and I asked Gemini 3.5 to build it for me using the unicorn engine. It whipped out a working .py file easily under ten minutes. Most likely five.

2) I handed a random assembly function from the OS/2 1.x kernel to Gemini 3.5, and it proceeded to tell me that it was related to disk I/O and partitioning, without a single associated string, and it annotated it all, including the relevant structures it was addressing.

Comment by encrux 3 days ago

Back when GPT-2 was released, I tried figuring out how to fine tune it. I found a google notebooks template, scraped a bunch of data from r/ChangeMyMind and asked it to change my mind on different topics.

I was dumbfounded that it actually tried doing that. Obviously GPT-2 wasn’t great at it, but the writing was on the wall quite literally.

Unfortunately, I was too broke to invest in stocks, but I did pivot my career quite a bit.

Comment by niwtsol 3 days ago

To share something different, it is less about what I have built, and more about what I have seen my friends (non-technical and technical) build. In a one month span I have seen a lawyer make a personal red line tool, a sales guy make a custom website for a golf trip, another friend make a 3d printing grid-finity project, a friend make a stl file to print a jig for his table saw, and another friend make a full mobile game. It is just really cool to see these micro-projects be created and shared, not only for the utility, but just to see my friends' childlike excitement showing off their project.

Comment by conartist6 3 days ago

When LinkedIn filled up with 1000 copies of what seemed like the same exact post: 20 lines long, breathless, declaring humanity over.

I thought, "I will never let myself become a zombie like that. I am me. I am worthy of my own respect"

Comment by a96 3 days ago

There never was one. I'm from computing science field and it's all been and is normal. Amusing, maybe, but normal. Same as before, but in larger scale, with occasional hype. People picking up useful things and using them. Some going insane.

If I had to pick a surprise, I think the music generation works better than I'd have expected at this point. Only better for funk, but still.

Comment by rclabs 3 days ago

I was working on a 3D CAD software called OpenSCAD, there is this quirk I hated, which is when it's rendering all UI stops responding, there is no cancel button too. So I asked it to "implement the renderer in a separate thread with feedback and the ability to cancel".

It's written in C++ / QT, and can't remember the threading model, I knew Codex ~ version 4 and generally coding agents were excellent at Web type stuff - so I didn't expect much, but it did an alright job.

It was able to build loops of write/compile/verify loops - in the end though, the app would crash, with more work I could probably ask it to debug what was happening.

Comment by plumefar 3 days ago

I had access to a repo (from a closed startup) with 800K lines of python & C code, written from the 90s to today. They had some very interesting approach to a specific chemistry problem. 20-30 years of work of several persons.

But God, I could not understand the code, and I could not easily make it work with modern technologies (GPU etc).

So I used Claude and Gemini to reverse engineer the codebase, extract the core ideas, and rewrite it from scratch with modern frameworks (with guidance from the original authors)

It took me only 10 days to have a functioning equivalent, in 10K lines of code (using many libraries that did not exist in the 90s and 00s), which I find much easier to understand, even though I wrote none of it myself.

10 days to rewrite 20-30 year of a few persons. That was quite scary.

Comment by jamesu 2 days ago

I think mine when when I got one model to generate a spec for SDL 1.2, and another to generate a reimplementation of the whole thing in rust. It was pretty amazing to see the implementation getting fleshed out and fixed as I fed back results from a real project that still compiled with SDL 1.2. Kept pushing it. Added fully functional audio in 2 prompts. Applied optimizations with another.

Terrible code, but to think you can just rewrite whole libs with AI with relatively lazy prompting + feedback and get decent enough results is kind of mind boggling for me.

Comment by jonyt 3 days ago

Two things, both from this week.

First, I asked Claude to write an article based on an idea I had about WWII. In a passage about the futility (from the German side) of the Battle of Britain it wrote: "The Luftwaffe was fighting to unlock a door that opened onto a wall." I couldn't find any mention of a similar metaphor, and I think it's a great one. Claude has really improved its creative writing skills lately, I wonder if it's an artifact of improvements in other fields, or if Anthropic is working on it specifically.

Second, Claude, with access to DataDog and a code repo, managed to find the reason for a bug, propose an effective temporary fix and a permanent one in code. To be clear, this was something that had multiple engineers stumped.

Comment by maccard 3 days ago

Would you share the story?

Comment by jonyt 3 days ago

The WWII article, or the bug hunting story?

Comment by maccard 3 days ago

Ah sorry - the WWII article!

Comment by jonyt 3 days ago

Sure. I haven't published it yet, so here's the OneDrive link https://1drv.ms/t/c/bc01c33494ab32b1/IQDSR0rolJrKSqaqvWuyQtn...

Comment by a_bonobo 3 days ago

At my previous work, I was collating somewhat random unconfirmed animal sightings. I also had a separate database of animal occurrence probabilities (species distribution maps). I'm not a statistician but that sounded like a clear job for Bayes theorem: given a sighting and the overall probability of that sighting in that area (species distribution map), and some other assumptions about the noise of the sighting, what is the probability that the sighting actually included that species?

Claude asked me three questions and then wrote a beautiful Python implementation that queries the map and spits out a table of adjusted probabilities. Felt immensely powerful - I can do this 'on my own' now, I don't need to wait to find the right people or learn the right thing first.

Comment by gagabity 4 days ago

Fixed a nasty bug in one of my tests where a mock in a completely different test I had never worked on was incorrectly setup and intercepting my mocks, I don't think I would have found it ever because the amount of effort it would have taken means I would have needed to move on to some other way to test.

Reverse engineered an old audio recorder USB driver which only works in windows 7 and also reverse engineered the custom audio encoding the device uses and the software to convert it to a standard wav file. This took recording the USB traffic with Wireshark for each function in the original software in a VM then disassembling the various dlls and exes and driver files and feeding them into Clause step by step.

That AI button in DataDog not only diagnosed the problem across micro services but also created a fix PR. I think we might be unemployed soon.

Comment by notthetup 3 days ago

Had some unique concert audio recordings which had gotten corrupted when I moved the files during a backup. I had tried looking at the files and trying to recover them. It felt like they had the data but no software could play them.

Sat on them for 5 yrs. Finally decided to try if AI tools could help. Tool Copilot 20mins and a lot of mucking around with hex dumps. First couple of times it got a semi working solution (only first few seconds of a file were playable). Finally managed to recover all the files.

Comment by bombcar 3 days ago

This makes me want to tackle my biggest annoyance - many of my files in iTunes refer to locations on a server that doesn't exist anymore, even though the files still do. Hmmm.

Comment by orzig 4 days ago

"Write a bible verse ... explaining how to remove a sandwich from a VCR" https://x.com/tqbf/status/1598513757805858820

Comment by tverbeure 3 days ago

The fact that it completely autonomously read in a 5 MB firmware image of an old piece of test equipment and generated a Python script to generate license keys:

https://tomverbeure.github.io/2026/04/12/AMIQ-License-Key-Ge...

Comment by darksim905 3 days ago

The blurb there about being hesitant to make the post at all is wild in 2026.

Comment by koreth1 3 days ago

For me it was Suno, not any of the coding tools. I prompted it to write a song about my family's little dog, told it a few things about the dog, and it came back with a K-pop-style anthem that had a super catchy melody and lyrics that made my wife and me laugh out loud.

Writing code to spec is one thing, but creating art was always supposed to be what separated us from machines. (I suppose I need to preemptively acknowledge the "it was machine-generated so by definition cannot be art" point of view.)

Comment by rurban 3 days ago

My sister's husband came over yesterday to show me photos if their recent travels. Normally it would be a boring slideshow of camper van trips to France and Aachen. Nowadays he produces a video with a tiny camera. All music self-produced with Suno, all videos professionally cut. And then he told me when he was invited to his local filmmaker association to lecture about the new AI assisted video making. Because of the old music all of his youtube videos got strikes, with Suno no more. Everybody can now produce videos easily. I even saw a totally AI generated doc about the Iran massacres made for $2000.

Comment by jerome-jh 4 days ago

Recently, Claude (through Copilot) found a hardware issue on our product. I was asking it to find an issue in a specific feature of a device driver, that could cause what we observed. It determined the feature was correctly implemented.

Then it hinted that depending how the hardware is implemented, it could cause the observation. It turned out the hardware was implemented as suspected by Claude.

I was already convinced it knew the codebase, somehow, more than I do. Now it is just as if its knows the product and its use as well.

Comment by positus 2 days ago

I had GLM-5.1 via Claude Code create a macOS driver for an unsupported USB gamepad with a proprietary protocol. I answered a few questions and then let it run while I was away at work. After two sessions I now have a working controller on macOS. It did the research, created the driver, and tested everything (with some manual testing on my part) within a few hours. Amazing.

Comment by zoltrix303 2 days ago

I work in a large FMCG company which relies heavily on Microsoft powerpoint.

One major problem is that people tend to make extremely bloated, 2-3 gb files for 100 slides and things freeze up, underperform, etc.

Working with AI I wrote a command line tool that replaces the file name with .zip, extracts the content and then run a series of resize, compressions on images and videos. Once done it simply packages back the zip and renames it back to pptx. Files drop by 70-80% in size on average with no visible loss of quality.

This command line tool was made in less than 30 minutes and the conversion was less than 10 messages back and forth.

Comment by ls612 3 days ago

I was trying to use Opus 4.6 in Claude Code to add some functionality to python code intended to run on a cluster and it didn't have any python environment in its remote environment. It needed to look at the schema of a parquet file to make sure it did things right and couldn't figure out how to do so with code because for god knows what reason there is no python environment in the dev environment for code intended to be run on a compute cluster in Python. Eventually it decided to just examine the raw binary bytes of the header, and then wrote perfectly functional code based on that.

On a different note I recently uploaded several thousand scraped IPO prospectuses to the gpt 5.4 mini API to parse and extract certain data. I ordered it in the system prompt to respond exactly with a specified JSON schema. When I got the results back and processed them there was not a single JSON parse error whatsoever. The model didn't have a single hallucination that created malformed JSON or JSON not matching the given schema across several hundred million input tokens and several million output tokens. And this was 5.4 Mini!

Comment by N-Krause 1 day ago

I did write a android app for my SO, rather over engineered - but I had fun programming it and SO uses the app daily. The thing is, the app was in ReactNative and I didn't regularly update the codebase with current ReactNative versions so, to get the app running on a newer Android version I had to either go through the pain of updating the app or rewrite it. I didn't really have the time and energy to rewrite it. So a few weeks ago I just gave it a shot and provided Claude Code with a few screenshots, description of functionality and it basically one shot the whole app. Now I am building new features SO was missing and currently also developing a backend.

Comment by autonomousErwin 3 days ago

I had 2 MacBook Pros. One 2024 and one 2019. The 2024 one would connect fine to the internet, the 2019 one would not.

After pasting in the airportd logs of both (into ChatGPT and Gemini) it found it was down to band switching (2.4GHz and 5GHz) through some really old error code.

This fixed a problem that had plagued me for >12 months. Really magical feeling it got in on first try.

Comment by bombcar 3 days ago

One trick I've found is that I need to first give the clanker the real problem (in your case, the laptops not connecting) and then give it what I think is wrong. If I start at too low a level, I lose much of the advantage of the tool, which is part rubber ducky, part "did you think to try".

Comment by lordnacho 4 days ago

For me it was gradual, then sudden.

I liked using the early models to do autocompletion. It could do a leetcode style thing, pretty nice, but only useful for small things.

Then I sought out Cursor because that seemed to be able to do multi-document edits. Not bad, but models at the time (2024) still got stuck pretty often. So, cross-document autocomplete. Useful, but definitely within the realm of "nice shortcuts to have".

Then a friend (who works in AI) told me to try Claude last year. I was on holiday at the time, but I spun up my work repo and looked at the backlog.

It chewed through the entire 6-9 months of estimated work in a two-week period while I was watching that Lord of the Rings series with a friend (we watched an episode or two in the evenings). I just chatted with him about the series while checking the progress every few minutes. It was a huge amount of refactoring, and it didn't get everything right the first time, but it made enough progress that it could be directed the right way.

Since then I have hardly coded any manual lines. I just tell Claude what to do, with very little harness (skills, MCPs, instruction files), and I get what I want.

Comment by dubed1505 2 days ago

I am starting to play Pokémon Gen 1 on game boy with my kid that just learned to read.

She traded one of his Pokémon for me to train, and my ezflashjr microsdcard corrupted. I was kind of embarrassed so I've plugged it anyway and asked glm5.1 to extract the Pokémon.sav file if it can. It turned for a bit but was able to get it and I was able to create another SD card and give back that Pokémon intact.

Comment by mschaef 4 days ago

This is a small one, but significant to me.

I asked Claude to add support for multiple lights to my toy ray-tracer. It correctly added the support and then suggested adding colored lights to make it easier to diagnose. It felt more like a colleague making a useful suggestion than any sort of pure engineering tool.

Comment by bsiverly 4 days ago

I had it fill out all the forms to appeal my property tax value. We created an assessment of what my San Francisco property should be worth using deep research. The city agreed and a $12k check arrived shortly after.

Comment by tempay 3 days ago

In early 2023, I was debugging an issue in a long abandoned package that used OpenSSL and needed to be fixed for OpenSSL 3. The entire thing was a mess and the parts of OpenSSL it was using are almost entirely undocumented.

Copying a pasting into a ChatGPT window gave me the lines of code to print the error message rather than failing silently. Copying the error message then gave me a detailed explanation of problem and the diff to fix it. I still have no idea where this knowledge came from as I spent a decent amount of time searching and found nothing about this corner of OpenSSL.

I’m very confident it would have taken me a week to make sense of what the package was trying to do and with LLMs it was done in a couple of hours.

Comment by robkam 3 days ago

My skepticism turned into a realization when I first asked an LLM to write anything nontrivial, and it just breezed through it. I am curious why many projects mentioned here seem to take people only a few hours or a weekend at most. I have been using LLMs to help rewrite the Ytree file manager originally written in nineties C. While the AI enables creating code of this complexity, the project still demands months of persistent effort.

Comment by headcanon 2 days ago

Two years ago (Dec 2023), I visited Japan and try to use ChatGPT (GPT-3 era) to translate Kanji signs, it was OK but not great. A little over a year later (late April 2025, o1 era IIRC), I visited again and saw a political rally with hand-painted-style Kanji signs. I took a picture and asked what the political party was about. The big "wow" moment was how it would write inline python code to crop analyze the image while it was thinking, then combine it with web searches to eventually find me the website with a summary of what they were about. It definitely hit home for me the potential of what these were capable of.

Comment by rochansinha 3 days ago

Built a physics-based dynamic digital twin for an electrolyzer system with full equivalency in thermodynamics, fluid dynamics and electrochemical reactions. A similar level of complexity is usually available in software like Aspen or Siemens which are a quarter million dollars license/yr. Insane.

Comment by tobyhinloopen 3 days ago

A non-technical employee of a client vibe-coded an app and I was asked to review and deploy it.

It was okay, not bad at all. No serious issues.

At the same time, me feeding a whole PDF of feedback from a client - screenshots and such - into Claude, and it fixed everything after 7 hours of reproducing and fixing things mostly unattended, creating a bunch of MRs with fixes. Most fixes were good, some were obviously not what the client wanted but technically correct (which I told Claude and it fixed it)

Comment by namanyayg 3 days ago

Yup it's gotten good enough that it mostly works (thats why i built my startup around this idea)

Comment by tobyhinloopen 3 days ago

It’s interesting though because the client’s dev team basically refused to look at it, dismissing it as insecure and such.

I realized that the software he had made was actually better than what I would have made, because he understood the problem better than I ever will, Because it was his job to understand the problem, not to write code.

It was the first time I realized that AI can totally replace me in my traditional job of trying to understand client’s wants and needs and convert them into an application by writing some code.

I knew Claude was powerful and spent months creating skills prompts etc but this guy did it from scratch.

Comment by namanyayg 3 days ago

Amazing and industry-changing. Is he able to update it by himself now?

Comment by tobyhinloopen 3 days ago

Not sure yet. I’m monitoring it.

However, it is totally viable for internal tools, rapid prototyping, experiments, etc.

Frankly, there’s no faster or cheaper way to get such feature rich apps, and it was certainly better then what a cheap agency from far-away would produce.

Not only was the code better, the whole product was better than what I would have made (before coding LLMs), because he understood the domain, and AI filled in many details that I wouldn’t spent time on. (Again, pre-LLM)

I’m just… multiple times a week I’m sitting at my screen like, wtf will I do? What will my job be? I have plenty of time to think about it while Claude basically handles everything I throw at it.

Good thing I spent months learning to apply skills and such so my value is “experience with using LLM tools” (:

Comment by Fomite 4 days ago

When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.

Comment by mft_ 3 days ago

I was discussing with a colleague recently that this is becoming more and more of a problem: people delivering significant (sized) pieces of work which are obviously 99% written by LLM.

Comment by sevennull 4 days ago

well?

Comment by hansvm 4 days ago

A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.

I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.

Comment by ako 3 days ago

Probably over a year ago, when I first saw reasoning in action in a debugging session: it generated some code, ran it, could not explain the results, then said “let me add some print statements to debug”, reran the application, read the logs, and then stated “now I understand why it’s not working”. Plan, do, check, act in action, AI engineering its own context, and generating the missing information.

Comment by linsomniac 3 days ago

Last week I gave Claude Code in Ultracode mode the prompt: "I want a browser-based retro game inspired by Spy Hunter" and gave it the URL to the Spy Hunter (Arcade Game) Wikipedia page.

What came out has a lot of problems and needs refinement, but you can definitely see a lot of elements of Spy Hunter in there. I haven't worked on any refinements yet, because I've been low on tokens this week, but for the first thing that popped out of Claude this is pretty impressive (IMHO).

https://linsomniac.github.io/spychaser/

Comment by mh- 2 days ago

Just played for 15 min, haha, thanks!

Comment by matheusmoreira 4 days ago

Pretty much immediately after I asked the LLM to perform a complete code review of my projects. I've been programming alone for years, that alone was life changing for me. It only got more impressive from there.

Comment by bonoboTP 4 days ago

Opus 4.5 fixed so many issues with my self-coded research projects, and allowed me to port between tensorflow and Pytorch in a much shorter time than manually. Helped a lot with docs too.

Comment by komali2 2 days ago

Made this pseudo black hole visualization graph with the famous gravitational lensing effect from Interstellar yesterday, started at 9am finished at 5pm. I know nothing about GLSL, I don't think I'm even capable of programming my v1 of this with the lensing effect, let alone later adding some of the swirling texture bits, or even the random Doppler effect to the stars.

https://timeflies.508.dev/

Comment by bluejay2387 4 days ago

I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.

Comment by gravypod 4 days ago

What mod did you build?

Comment by bluejay2387 4 days ago

A mod that fixed a bug that prevented certain buffs from working when mounted for the Magus class / Arcane Rider archetype in Pathfinder Wrath of the Righteous. It also managed to fix the problem with Shelters not providing protection from corruption when resting in outposts in that same mod. I've used other models to expand the mod to an entire mini-expansion with new Archetypes and abilities since then.

Comment by kylehotchkiss 4 days ago

Hearing that somebody spent $500,000,000 on AI tokens recently https://www.tomshardware.com/tech-industry/artificial-intell...

Comment by KaiserPro 4 days ago

I've had a few.

The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.

This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.

The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.

Comment by dekoidal 3 days ago

When I read that Microsoft gave OpenAI billions of dollars worth of data centre access and OpenAI accounted for it as billions of dollars worth of investment. When they spent the tokens Microsoft accounted for it as billions of dollars worth of income. Both companies gained billions of dollars with mad up money

Comment by sothatsit 3 days ago

I gave GPT-4 some source code and my existing tests, and asked it to write a new test, and it did it! It didn’t even run straight away, I had to fix it, but it still blew my mind.

Later, I wrote a ~5k line proxy for work in C, and gave the whole thing to ChatGPT o1 and asked it to review it. It found several real memory bugs, and now that service has been running since with no problems.

Just this week, I was trying to write a greedy solver to pick the best subset of block sizes to keep from a larger sweep for shorter testing. Opus 4.8 suggested that this could actually be solved as a MILP problem, and found the perfect solution in 5 mins. I’d never even heard of MILP before.

Comment by cjbprime 3 days ago

ChatGPT reconstructing idiomatic Python source code from Python bytecode was definitely up there. That is not something humans have written a great deal about online. It requires simulating the Python VM.

I remember also having a massive wtf reaction to realizing that original ChatGPT was pretty good at decoding long random/unique base64 strings.

Comment by abecedarius 3 days ago

AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.

Comment by kypro 2 days ago

There isn't one.

I suppose the first was just learning about NNs and realising that this was the first time I'd ever heard a way to make computers theoretically think like humans.

Deep neural networks working in the early 2010s was another because that was a unlock for a whole new paradigm of computing.

2021 I freaked out when LLMs started to become useful at writing code. Enough so that I invested a significant amount of my net worth into Google at this point assuming that coders at about a decade left.

The day ChatGPT dropped in late November I felt physically sick. I remember the day well. I was visiting family and was like a ghost. Trying to explain that something that I was worrying about for a long-time was here was near impossible (and to long extent still is).

There was a moment in early 2025 when I realised agentic coding was becoming very competent and was solving actual problems with very minimal instructions. To this point I was worried theoretically that AI could replace human coders, but had not seen examples in practise.

Mid-2025 someone non-technical said they'd been building an app, and I was sceptical it could do what they said. I was blown away by what they were able to build.

Nov-2025 a non-technical person on my team wrote a build a product from scratch to manage a range things within our business. This was non-trivial software that would have taken a team of developers a year or two to have built previously.

In early 2026 coding agents regularly wrote code better than myself, repeatedly humbling me that the decades of effort I had to put into this craft was now more or less redundant.

At some point very soon I'd expect to see very advanced software products being one-shotted and clear evidence of recursive self-improvement.

Remember in just 3 years ChatGPT has gone from being almost useless to extremely senior programers no longer writing code. Things will change in unimaginable ways from here and you should be ready for that.

Comment by madrox 4 days ago

I think my favorite early story was when OpenAI launched deep research. I was going to an event that I was headlining, and I gave it a CSV of the attendees and asked it to give me a small background on each company they represented.

When people introduced themselves to me, I knew a little about their startup. Felt magical.

Comment by adamm255 4 days ago

OMG I love this!

I was at an industry event this week. a CEO of a startup took the big board of vendors who are present, put it through an LLM. It summarised the companies he should be looking at discuss partnership opportunities with and why based on his business. Spot on.

Comment by lostmsu 4 days ago

GPT-2 (2019) https://openai.com/index/better-language-models/

Forever reinforced by Humans Who Are Not Concentrating Are Not General Intelligences: https://srconstantin.wordpress.com/2019/02/25/humans-who-are... one week later.

Comment by networked 3 days ago

It's a great post. Or as Scott Alexander put it at the time:

> A machine learning researcher writes me in response to yesterday’s post, saying:

>> I still think GPT-2 is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.

> I resisted the urge to answer “Yeah, well, your mom is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.”

> But I think it would have been true.

https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...

Comment by steren 4 days ago

The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.

Comment by jmpman 3 days ago

Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.

None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.

Comment by sire-vc 2 days ago

I had a mysterious bug related to environment variables being different in managed (csharp) and unmanaged code. I told GPT 5.4 'here is how to reproduce the bug' (it only happened when certain shells were used) and told it to 'figure it out'. It just iterated, trying stuff to narrow it down before finally landing on the bug (it had to do with the unit test console runner). When I glanced at the logs, it was a basic 'try this, nope that didn't fix anything', add logging 'that fixed it, make sure it wasn't a timing issue etc, oops it was' etc. That's when I realised: nothing it did was anything I couldn't do, but what would have taken me a week of debugging took 10 minutes for the AI, and I could do something else at the same time.

Comment by 1qaboutecs 4 days ago

Was trying to explain convolution (of functions) to a friend and I wanted to build a little picture. I typed more or less nothing into Claude and it gave me a fine web-app for demo'ing examples to my friend within minutes.

Three years ago this would have taken a minimum of three college graduates a couple days -- one to know the math, one to know the backend, and one to know the front-end. Maybe two of those could be the same person on a good day -- none of the topics is individually that hard -- but it's a lot together.

Comment by iLoveOncall 4 days ago

I'm still waiting for a positive "Oh shit" moment regarding LLMs.

I've had plenty of "Oh shit those people have really lost all ability to think for themselves" moments though.

Comment by jesterson 1 day ago

This. Plenty of moments "Oh shit those people have really lost all ability to think for themselves" and not a single moment of "Oh shit" yet.

Comment by brailsafe 4 days ago

Not sure that I've had it yet, although hypothetically I'm sure it would probably be something similar to the examples of writing new software for old hardware mentioned ITT. The idea of resurrecting useful but unsupported gadgets that would otherwise become e-waste is something I've always found compelling.

Problem is, I just don't have enough old crap, and if I did, I would have a hard time justifying the expense, because that money could maybe just go toward a more intimate tinkering process.

For everything else, I either haven't had any sufficiently interesting ideas, or they ended up not being worth pursuing with those tools or at all.

When I do have success that I'm happy with and care about, it's a slow process that I ultimately need to know the details of anyway, but otherwise it's a bunch of luckily narrow work-related scenarios with well-documented constraints. Nothing's really been that shocking though.

The shocking thing to me is how unrewarding most of the successful tasks have been, partly because they often create unnecessary work and partly because the type of thinking required to massage or evaluate the result is much less stimulating, and there's much more of it in aggregate. It's fine if it's something like generating a UI from scratch because that hasn't produced dopamine in a long long time anyway

Comment by dirkc 4 days ago

I started to look at LLMs not as writing code, but rather as predicting what code it would expect someone to write given the context.

For some people that matches their expectation or they don't really have an expectation. While for other people it doesn't match their expectation.

Comment by fabianholzer 3 days ago

I did not yet have a positive "oh shit" moment, but when the corporate manager types that could not deliver a "Hello world" if their live would depend on and would have had a sour look on their face when asked to pay license fees for a proper IDE a 10 to 15 years ago started pushing it hard, way before any but the resume-driven engineers: that has flipped a bit in me.

Comment by ilaksh 4 days ago

OpenAI already had GPT prior to the ChatGPT launch, and I had not really taken it seriously. But on November 30, 2022 when ChatGPT came out and was immediately popular, I reevaluated it.

I immediately realized that it meant my time as a programmer in the traditional sense was going to come to an end relatively soon.

On December 1, 2022 I created my first agentic coding loop experiment. I launched one of the first AI code generation websites that would generate web pages along with embedded images in January 2023.

Comment by sydneypan 2 days ago

When gpt4 first launched, i was just walking behind my ex in a department store and playing with chatgpt on my phone (green ugly web ui). i tried to manually recursively prompt it to decompose a large goal by divide and conquer until it thinks it can create a functional and integral part back to its parent. I hit a depression point when I managed to get to a terminal state when chatgpt4 created a single json response including full containerised api endpoint for validation of media metadata which it thinks is required for building a personal photo book app. It was valid dockefile valid shell script and a perfectly formatted json! But trying to leverage that back then proved expensive and hard.many hard lessons learned.

Comment by msyea 2 days ago

For me it was reverse engineering an obsolete travel scanner (should have just taken a photo ) and automating a document workflow. Something I knew was possible but impossible for me.

This pulled me into BLE protocols and private APIs and integrating with things directly instead of using 3rd party apps that were bad or limiting.

It's now opened up vibe-coding personal apps in hours, where I simply didn't bother before because I didn't have the time. https://type2fun.net/infinitely-scalable-personal-apps

Comment by jablongo 2 days ago

Right after gpt4 came out I asked it to derive a new optimization technique. It ended up using Einstein sum notation to define what I thought was a totally novel optimization setup. It then implemented it in PyTorch and it ran with no bugs. This was the moment that I realized that novel intellectual work might be done by these models and I was shook. I had an oh shit moment with gpt3 too since it was so surprising how well next token prediction works, and at the time I really didn’t think it would pan out so well. I also had a jarring experience discussing computational theory of mind with gpt4, when it applied a rubric we came up with to itself and it claimed its level of consciousness was between an ant and a mouse.

Comment by hintymad 2 days ago

> realized that novel intellectual work might be done by these models and I was shook.

I suspected it was more likely that the intellectual work had already been done in a similar way by a number of other people, and GPT-4 picked up that work.

Comment by dtgriscom 4 days ago

A friend had the power supply die on his high-end turntable. He took a picture of each side of the supply's PCB, handed it to Claude, and it gave him back a schematic.

Comment by floxy 4 days ago

I mean even assuming that this was strictly a 2-layer board, you can still route traces underneath parts like ICs, connectors, etc.. I could believe it was a simple board (for a phonograph and all), but I'd be interested in seeing how well it actually matched. Did he get a new board fabbed and it just worked?

Comment by dtgriscom 3 days ago

He said that it got a few resistor values wrong, but other than that it was correct. I assume it was actually a single-layer board. (Nobody's doing this with a mini-ITX motherboard.)

Comment by alanb99 2 days ago

I had been using ChatGPT for help with some coding stuff - sometimes it worked reasonably well but for anything tricky it just failed.

But then recently I asked Claude Code to upgrade a particular library for me that I had forked and to reapply the changes I had made in my fork. It just did it as if by magic. As if it already knew everything about the original library. Given that the library is open source and on Github then Claude probably does know everything about it.

But then I can still have times when Claude Code does something really helpful (e.g. at work it helped create the first version of an MCP server for our own agentic application), while other times it can do some downright dumb things (like editing directly a library's code so something worked rather than changing my code so it worked).

Any time I'm feeling like the machines are about to take over I just do this: 1. Ask Claude Code to write some tests. 2. Same model, but different window, ask it to delete any tests that were just added but which don't test anything substantive.

It always finds something to delete that it had just added.

I'm not even talking about using a different model for code review or anything like that. Just asking the exact same model 10 seconds after it wrote some tests to delete the bad ones.

So I'm not yet at the slightly panicked stage :D

Comment by MattGaiser 3 days ago

My grandparents had a dishwasher from the 1980s. The contractor they hired to fix it didn’t even know how to take it out of the spot as it had an old design that attached it at the top.

ChatGPT both told me exactly why from the model number (had to disconnect a part), found a new part, and told me step by step how that part would be taken out.

We didn’t end up buying the new part, but it beat the repairman.

Comment by bag_boy 4 days ago

I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.

It was unlike anything I had ever experienced.

My wife was unimpressed lol.

This was 2022.

Comment by moconnor 4 days ago

Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.

The first time I used a terminal agent was another one.

Comment by 2cynykyl 3 days ago

I thought mine was when claude found a very subtle but important bug in some open source LBM code I was using. It ground at it for hours and didn't give up until it found it. (Back when claude was cheap!). I recently had a my ACTUAL moment at a conference where the presenter was pitching his book about "One shotting scientific code". He has cooked up 60+ prompts that get you functioning simulations and put them into a book [0]. It floored me to realize I could have just ask claude to write me a whole new LBM solver instead of finding that bug! That raised the bar for me a lot.

[0] https://www.taylorfrancis.com/books/mono/10.1201/97810037340...

Comment by adammarples 4 days ago

Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.

Comment by dpcan 3 days ago

Day 1.

I remember firing up gpt and asking for some code. It was simple JavaScript to solve a problem and I knew immediately, this was the beginning of the end for software development as we knew it.

I built an app I struggled to create for years in a weekend.

Started playing with the API, etc.

I was very much in my oh-no period that very first few months.

Comment by latexr 3 days ago

It was right at the beginning. Before most non-tech people had even heard the name ChatGPT, HN was already flooding the homepage with LLM posts and it became clear to me they were going to be big.

The consequences were even clearer, and I predicted the consolidation of power in the hands of a few, their use for surveillance, propaganda, discrimination, the proliferation of AI psychosis, sneaky ad insertion, carelessness and loss of skills, erosion of online discourse, and more. I didn’t predict the teenage suicides so soon or the rising costs in consumer hardware. I also underestimated the rate of increase in energy use (and thus the blow to environmental efforts) and that regular people would be left without electricity to power data centres.

As soon as I realised all the potential (now factual) harms and that the good parts are lacklustre in comparison but that people would eat it up at a massive scale anyway, I thought “uh oh” and “oh shit”.

Comment by tmaly 3 days ago

It was last Summer. I was at an AirBnB and the fire alarm system had a fault and kept beeping.

I took a picture of the panel and the AI was able to diagnose the issue and tell me how to temporarily disable the beeping sound.

I knew nothing about fire systems. I had the owner call a repair person the next day to resolve the issue.

Recently I was trying to find a matching stain for wood flooring in a house build in 1999. I uploaded a clear picture in bright sunlight and ChatGPT was able to search online and find a matching stain color. It presented me with ordering options and I got a quart delivered yesterday.

I have been working on my own variant of OpenClaw written in go. I got the voice mode wired up a few weeks ago and it just started having a conversation with me. My wife freaked out and was asking who was talking to me.

Comment by aswegs8 3 days ago

Kind of peculiar and memorable story for me.

I was on the couch on my Nintendo Switch, playing around with ChatGPT 3 and asked it where to find a specific item in Zelda Breath of the Wild. When it provided a coherent answer I was just dumbfounded. To be fair, the answer was semi-hallucinated but partly true. But it made me realize what kind of breakthrough it must be for some program to provide an answer to this without searching external sources (which it couldn't do yet). Such a small data point, like a drop in the vast sea of human knowledge space.

Prompted me to do some back on the envelope calculation. The weights of this model were a few hundred GBs. I just realized what kind of quantum leap it was to compress this seemingly infinite knowledge space into a few GB of weights.

Comment by pluc 3 days ago

How is that result any different than what you would have done without AI, which is likely a search for the item name which would have given you a wiki about it and videos on how to get it? Why were you amazed at what, on any other day, is a common query? Really just that it was offline?

Comment by aswegs8 3 days ago

Like endymi0n said, for me it was just about the amazement of how qualitatively different this technology was than anything we had before. Not in a functional sense of how it provides an output to anything. That was probably when I used it to spin up an app via Terraform in days despite having not much prior knowledge of infrastructure. Digging through Stackoverflow would have took at least 10 times more time.

Comment by endymi0n 3 days ago

not parent, but I kinda have the same thoughts often. Maybe I can’t do inference on them in the same form factor (yet!), but just the fact that the weights of a model that comes close to capturing a close enough approximation of the combined knowledge, experience and intelligence of mankind fits onto a MicroSD never fails to amaze me.

Comment by pluc 3 days ago

Ok but you can download the entirety of Wikipedia too, in a queryable format..

Comment by radial_symmetry 3 days ago

Very early on, when Github Copilot was brand new and the first AI autocomplete that was in the IDE. I had a file TODO.txt, and was adding a line, and it suggested a next feature that demonstrated actual understanding of what my app was and its purpose, despite me not having documented that anywhere.

Comment by thenoblesunfish 3 days ago

When a junior engineer first sent me something that looked good until I realized it had been vibed, and thus their understanding of what they were doing was too shallow to answer questions and improve on it. That was a doc, but it happens with everything. "Oh shit", I say, as everyone is aggressively encouraged to work this way.

Comment by moezd 2 days ago

I was working on a project where the code was erroring hard, inconsistent, immensely bloated and probably written by a bit of 2023 era AI assistance. It didn't have a lot of tests, too. I had no intention to be a garbage man on that specific project, and yet fates had their own plans for me.

So I immediately went for the package.json, attached whatever test commands that could be done, then wrote some hand holding markdowns for the agents to consume. First I let it explore the codebase and take notes, then I asked for a different approach to implement, which essentially turned that "paint the sides and ship" project to a full rewrite. But guess what? All features, one by one, were implemented and extensively tested by AI. I tested all outputs myself, especially when agent got impatient and thought something was wrong in the shell env, and after a few PRs, I started to read less and only skimmed the results to see if the code "feels like it belongs with the rest". Bloat was reduced by about 25% and now all features look like they were created by a rather competent offshore software shop. Not bad, since I was also juggling a couple more projects and involvements at the same time which would've led me to certain burnout.

Comment by OpFour 2 days ago

I started using different LLMs and gguf files locally then graduated to Gemini, then Claude code. I created about 12 different apps and projects, some web based for my catering business, several old games from the early 90's, and a secure communications apps and an app to protect vulnerable people who found themselves in difficult or undesirable situations. This trip has been nothing BUT fun for me! My ADD addled brain loves coming up with random thoughts for apps and fixes for older problems that I simply haven't had to time to learn about and address. I then build them for fun... I LOVE this stuff!!!

Comment by anonu 3 days ago

Gave Claude a third party vendors binary market data stream. With no spec and 20 minutes of hex dumping it built a parser and a cool TUI. Wow...

That kind of work would have taken weeks if done by hand. Even after obtaining message specs and more

Comment by grumblepeet 3 days ago

My bath hot tap suddenly broke apart and was spilling hot water into the bath. I photographed everything and ChatGPT told me step by step what bits to get to fix it, and how to reassemble it.

A few weeks later some kids in the area were bending the wiper arms in cars in my terraced street, including my car. I thought, I wonder if ChatGPT can help? It explained to me where to get the parts online, an indication of a decent price, and how to fit the replacement parts.

In work we had struggled with filling out the myriad of forms that we need to do to get enrolled on a government framework to apply for contracts. Not only did it do that and explained what we needed to say, but it also told us in detail the steps we needed to follow to get the certification that was a prerequisite. It has genuinely transformed our business as a result.

Comment by ben_w 3 days ago

I had a lot of such moments, including:

• Most recent, I had the option of either buying an app from the app store to train myself on the piano, or vibe coding a web app to connect with an attached MIDI keyboard and accept an uploaded MIDI file and give me an experience like Guitar Hero, and Claude did this in two prompts of their free (not paid subscription) tier, where the second prompt was just the word "continue".

• First demo of InstructGPT (predecessor to ChatGPT), because I remember how much worse the state of the art in NLP had been, and because I hadn't expected instruction following from the quality of continuation seen in GPT-3.x

• 2019, "This Person Does Not Exist"

• 2016, seeing style transfer and similar working (https://github.com/awentzonline/image-analogies) and what would now be called Deep Fakes (back when Two Minute Papers videos were <2 minutes long: https://www.youtube.com/watch?v=_S1lyQbbJM4)

• 2015, when I (in retrospect, foolishly) believed Tesla about their over-the-air software update that introduced self-driving: https://www.popsci.com/tesla-cars-become-autonomous-overnigh...

• 2013, word2vec, "man" - "woman" ~= "king" - "queen", again because of knowing how bad the state of the art in NLP has been

(If you're wondering why "uh oh" from that, consider value in automating propaganda, and surveillance opportunities for automating comprehension of slang/cants like Polari).

• 2010, seeing the demo video of Word Lens: https://www.youtube.com/watch?v=h2OfQdYrHRs

Comment by rref 4 days ago

My ducted gas heater wasn't working where I live and I took a photo of the wiring diagram and had Claude step me through troubleshooting it with a multi-meter, and got it fixed.

Comment by lukan 3 days ago

2 years ago I played a bit with the abandoned source of

https://www.wickeditor.com

a flash like editor for the web, that I found promising.

But doing it manual, was too much work, outdated and broken build pipeline, stuck on an older node version, deprecated and abandoned dependencies .. so I stopped the experiment.

Then I gave it a try with claude beginning of this year. I remember not expecting anything, but did a bit of steering the direction as I knew the source a bit and let it mostly work on its own - and then it said it is done and it works.

I didn't believe it, but it did. "Can you add this feature?" Yes it could.

Since that experience, I have a hard time taking people serious, who say AI is useless.

Comment by acosmism 3 days ago

Recently purchased an 100 year old home. it was dead in the middle of winter and the house has steam heating which wasnt working. a few screenshots and chatgpt gave me a step by step of which levers to pull and knobs to turn. this was terrifying considering i knew nothing about these systems. it worked!

Comment by twooclock 4 days ago

I programmed data export to some xml over a couple of days. Sending xml results via email to an accounting firm for verification. A day after I finished my disk crashed and I lost all my code. Fed Claude with xml from my mail and... oh shit! ... got "my" code back. (And immediately paid for Claude subscription) :-)

Comment by lodovic 3 days ago

The first time I pasted a screenshot of a PR review thread, adding just "I had some review comments, fix them" - and it perfectly solved everything, made small commits, and pushed it upstream - this was such a shock.

I now try to keep pushing the boundaries and see where it stops understanding my intention. Give it impossible tasks, gigantic projects, complex architectures. Last result: I wrote a complete OS including MPI, TCP/IP, and a GUI from scratch in only a week, while investing just a few hours a day in it. It even runs Doom!. Coding as a profession is over, but there's such a difference between the result if you approach this with a professional mindset, that I think the software engineering discipline can still provide massive value.

Comment by dools 2 days ago

ChatGPT v4. The step up from 3.5 went from "quaint novelty" to "genuinely useful coding tool".

Comment by csr86 4 days ago

I was working on a project for 2 years with about 5 engineers. It was many years before AI. It was new subject for our team, and we were pretty sure it was possible. Turned out it was not.

Much later I asked AI if that kind of project is possible, and it immediately explained why it is not. Would have saved 2 years of our time...

Comment by maxbond 3 days ago

Is it possible someone on the team published a blog post about why it wasn't possible? Or that otherwise the AI learned from your 5 years of effort? Some of the code made it on GitHub or something?

It's totally plausible that it didn't & it still got it right, I'm just curious.

Comment by stonegray 3 days ago

Curious, what was the project?

Comment by nativeit 3 days ago

When I saw that on the second day of token-based pricing I’d already consumed my usual monthly spend on GitHub Copilot. That’s when I fully realized that it would never be economical, nor useful, to solo shops like mine.

Comment by jiggawatts 4 days ago

I reverse engineered a proprietary network protocol from a vendor binary (compiled C++) and a short sample network capture.

The agent had access to the NSA Ghidra disassembler, which it can control shockingly well.

I just clicked the “Allow” button a lot and eyeballed the output decoding quality. I felt like I got demoted to non-technical QA.

Comment by block_dagger 4 days ago

I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.

Comment by MillionOClock 3 days ago

I was working with very small local models (<= 4B ones) in my app, there was a point where the response of the model seemed so good I really had to double check I wasn't mistakenly using a cloud model. The model hadn't made tool calls or anything, but its textual answer quality was high enough that it made me hesitate. A different kind of impressive vs the frontier models, but still was a bit of a wow moment when I remember expensive hosted models having a hard time writing proper English.

Comment by hatthew 3 days ago

I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.

My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.

Comment by pdntspa 3 days ago

It was the release of Stable Diffusion and its source code.

I spent the next few days tinkering with my own Stable Diffusion implementation. I never got it past outputting total nightmare fuel, but it was fun!

To this day I think of the process as like baking pizzas in a sequence of pizza ovens

Comment by zkry 2 days ago

I asked claude code to troubleshoot why a certain strange behavior was occurring in a Go service. Only then I realized that the behavior I described wasn't actually occurring. A few minutes of thinking pass and Claude code confidently makes up some scenario why this was definitely happening. It was then I though, "oh shit, these LLMs are just complete BS machined telling us what we want to hear."

Comment by Sobrino 4 days ago

I worked in an AI (or well ML) consultancy before the ChatGPT moment. I remember we had a project where we had to extract a large sum of documents (country wide, terrabytes of pdfs of scans). We had to set up a pipeline that looked a bit like this.

Download pdf of scan -> Tessaract to get a text layer -> Clean it up with a language specific BERT model -> detect paragraphs of a certain type -> Look them up against a database we build with scored similar paragraps -> Do recommendations.

The documents were not standard and a lot of them were historical documents and handwritten or with scratched out text with corrections.

We had student workers spending days labeling the data.

It took us months to get it all working with a high accuracy. We were so proud.

Now you can do it all with a prompt and a ChatGPT call.

Comment by archagon 4 days ago

I'm pretty sure that "a ChatGPT call" will happily add or fudge stuff in your scanned PDFs. That sounds like a massive liability.

Comment by Sobrino 3 days ago

It's surprisingly robust and the quality is pretty good with the right prompt and quality gating.

Comment by ok123456 4 days ago

And now you can do all of that locally with qwen3.6:35b.

Comment by bubbamack 2 days ago

Back in 2023 I wanted to create a corp dashboard - simple auth and rbac, charts, summaries, etc - and wanted it done with sveltekit. Me and the other dev planned on working over the weekend to have a simple prototype to demo on Monday and planned on a week or so to get the whole thing done.

I was shocked when it built a usable version in one shot and instead of having a prototype to demo, we basically had the whole thing done in a few hours.

Comment by hannahstrawbrry 4 days ago

Had an issue in a project where multiple media files with the same/similar names were colliding. After spending hours with chat gpt wrangling python scripts to try and sort it out programmatically, I shifted gears and built a web tool that would allow me to manually review the content and select the correct media file to associate with it in about 5 minutes, allowing me to comb through and finally fix the issue & verify the content was correct in about an hour. It made me realize I needed to completely re-think how I set about solving problems now that I have an entirely different set of tools to develop- that has been the biggest "Oh shit" moment for me, looking into the mirror and recognizing how AI will re-shape me as a developer.

Comment by maxwellg 4 days ago

Pre-GenAI I wrote a new interview question for a role on our team. As far as I know, the question was never made public. The interview required implementing a pretty basic CSS-in-JS utility in vanilla javascript. We instructed the candidate read the MDN documentation for the CSSStyleSheet interface, and then gave them a public API to implement. Passing implementations usually consisted of a ~10 line for loop, and was really just a test of whether a developer pick up and work with new libraries on the fly. Still, the interview probably had a 30% pass rate.

On a lark, I asked ChatGPT to complete the interview question in late 2022. I would have hired ChatGPT back then based on its first response! It was easily in the 90th percentile of responses I have seen.

Comment by xyzal 3 days ago

To me it was just a few weeks ago discovering just how good and dirt cheap the recent flash models are, in particular Deepseek V4. Previously used Claude's variants almost exclusively.

I use them mostly in the "artist's assistant" role, doing internet research, writing a occasional function and doing transformations or refactorings (don't belive the agentic hype honestly), and for such tasks they seem to be well capable enough.

It seems that their open weights nature leads to competition among providers keeping the user cost close to inference cost.

Try them at least once if you haven't, it's well worth it, and the price difference is staggering

Comment by oidar 4 days ago

Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.

Comment by SyneRyder 3 days ago

Opus 4.6 for me as well. I had a serious bug in some legacy software I've been stuck with maintaining, together with a few other people who originally wrote the software. We've all been trying to solve this bug for literally 10 years or more. None of us have been able to. I've personally spent hundreds of hours on it, thrown it at every previous LLM. Opus 4.5 came up with a workaround that prevented our software crashing, but didn't solve it. Opus 4.6 was the one that actually solved it. It did it by modelling a state machine of the software that was calling our software and triggering the bug, and it found the one state where we weren't correctly sending data back.

Comment by dddw 3 days ago

This week.

Have been playing and testing with openrouter, claude gemini for years.

Small program here, bash script there, ansible playbook.

Fine, nothing I cant do, but saves some time boilerplating. It needs quite some steering.

This week i took my mediawiki from 2005 (actually submitted as my artschool thesis). Which was of totally outdated.

In 20 years time i always said to myself, i should restore it, and do all the upgrade steps. Tedious work, and very fault prone.

In 1 hour chern with 1 plan, in 8 steps i had a running and up to dat version.

I'm still not convinced AI is intelligent, but it's definitely not stupid, that's for sure.

Comment by hilti 4 days ago

Claude helped me to rewire my first digital Märklin model train. It pulled the documentation of the control keyboards 6040 and told me how to wire them properly to the routers.

And I restored an old vintage amp with the help of schematics, multimeter and Claude. That was really cool.

Comment by threwrfaway 3 days ago

When I used google to get the ieee-488 commands of an arbitrary wave generator from the 80s whose manual doesn't exist on the internet.

This is a very long tail search, but by the end of the day I had enough to fully utilize a very sophisticated equipment.

Comment by threwrfaway 3 days ago

To clarify: I used that "AI" thing google has next go the search

Comment by yauneyz 3 days ago

I had it write a short story about Vader and Palpatine discovering the Graham Schmidt process. It wasn't the greatest thing ever but it got the mood right and understood what Graham Schmidt was. It was crazy at the time

Comment by jerieljan 3 days ago

I remember in the early days when I was just trying out ChatGPT on a phone for the first time (this was around GPT-3.5? GPT-4o?) and snapping a picture of our fridge that's full of magnet souvenirs and asked it to identify all the places we've been in and it gave a nice list of what it saw and the places that were featured.

Did it get it fully right? No. But it was one of those "oh wow, you could do that?" moments for me. There's obviously a lot more "oh shit" moments as time went on, but it was a neat little moment.

Comment by awbvious 3 days ago

Not sure, but I can tell you what my "oh s** astroturfing is so bad, it's even in Hacker News" moment. And if I learned GenAI was used to make some of the astroturf, that's more an "ah s*“ than an "oh s*“ thing. I mean, the prominence, ubiquity, and breathlessness. One out of three, sure. Two out of three, maybe. And some corpo shilling definitely happens here. But this is like, well, covering an entire area with artificial grass, to the point where nothing lives. Crazy.

Comment by andrenotgiant 3 days ago

FWIW the OP passes as 100% human in Pangram https://www.pangram.com/history/d33dcbcd-e82b-4ce0-bea5-e4ee...

Reddit is definitely overtaken with astroturf at this point. Especially in any subreddit where there is any kind of business interest in doing so.

Comment by K0balt 3 days ago

I was wandering around the carcass of Reddit the other day, and it’s crazy how it’s like 70 percent AI now commenting to each other in a lot of subs that used to be at least nominally interesting… and then a few clueless humans getting all riled up with an AI lol.

They’re getting 200m a year to share that garbage pit with ai training.

Comment by arw0n 3 days ago

Reddit was already heavily astroturfed before AI. There's no space on the internet where you can get as much bang for a buck with an influence campaign due to its centralized frontpage. The most obvious example is /r/worldnews, with millions of readers, a few thousand commenters and maybe 40k voters. To skew any discussion, you need at most 6k accounts, and that's giving you the kind of influence on American politics as would the frontpage of the NYT. You could hire real people for each account, and it would still be worth it.

Comment by dijit 3 days ago

I'm wondering how it goes with a lot of social media actually.

There were problems with the "algorithm"[0] before being intentionally gamed... but I feel like it's easier than ever to have an army of bots intentionally pointing a conversation.

I'll give you a brief example (we should not debate this case on HN but I'm talking specifically about a hypothesis that can be drawn).

In the UK, the streets are alight with the news of the murder of a young white boy at the hands of a Sikh boy. In the UK, historically, the Sikh community has been seen as a bit of an outlier in terms of how well they integrated and how tolerant they are. They are, for all intents and purposes "model immigrants".

The issue was, when the police were called to the scene of the stabbing, they immediately arrested the barely conscious boy on the ground, who had been stabbed and was bleeding from the mouth- handcuffing him and dismissing his pleas on the basis that he might have been making racist remarks.

The people in my circles are furious at the police for doing this.

However, it seems like there is an army of bots who keep trying to paint it as if it's a problem with Sikhs, or that their religious rights were the problem.

(for context, British law permits baptised Sikhs to carry a dull Kirpan knife - but this knife was not used for the incident above, it was a separate and still illegal knife to carry).

So now, online, instead of people talking about police indifference or the issues we have when we see race before trying to assess the situation: we are instead caught in a conversation about banning Kirpans for Sikhs (they have never been used in any crime from what I can tell) or how the Sikhs should feel bad or whatever.

Which is wonderful for a politician who wants to dismiss the criticism against the police as racism, but not so good for holding the police and media to account for the situation and the attempted coverup.

By being able to control the aggression, they are able to control the response, which drowns everything else out, and with LLMs the barrier is stupidly low to do this.

Creating division for your own reasons has never been cheaper or easily accessible.

[0]: https://en.wikipedia.org/wiki/Cambridge_Analytica#Data_scand...

Comment by airstrike 3 days ago

The police immediately arrested the barely conscious boy because the killer's brother called the police saying his brother had been attacked.

Comment by dijit 3 days ago

I think you missed the part of this where I talked about specifically not litigating this case on this forum.

I was gesturing that the sentiment online is being reframed from one of the police handling of the situation (whatever the outcome you personally think it should have gone down as) and the media coverage- to instead the "issue" of Sikhism and religious rights in the kingdom, despite those aspects playing literally no role in the situation that transpired.

I was speaking specifically about how little cost it is to do this reframing now.

Comment by airstrike 3 days ago

I don't care whether you're litigating it or not, I care about reporting the facts so that whoever reads the thread is better inform than they would be otherwise

Comment by dijit 3 days ago

There’s a bunch of conflicting information, for example your fact would be further clarified by saying:

A) He was described by the father of the killer as having a mouthful of blood when police arrived (per bodycam footage).

B) He stated he could not breathe 9 times and that he had been stabbed 5 times (as per bodycam footage)

C) He was immobilised

D) During the 999 call an ambulance was requested. (as per the recording to 999)

His statements were dismissed by police and he was handcuffed before any later assessment of his claims, the handling of the officer greatly accelerated the death of Nowak (as per the coroner report).

Comment by airstrike 2 days ago

Great, thanks for sharing all of that.

I'm not excusing the police's behavior, I just wanted to clarify they didn't assume the victim was the killer for no reason.

Comment by wesselbindt 3 days ago

Artisanal astroturfing, using organic humans, used to be the norm. Could be that OP is an actual human doing astroturfing.

Comment by iso1631 3 days ago

I wonder when an agent will "decide" that it's more efficent to outsource its work to humans

Comment by omgJustTest 3 days ago

an agent hired by an agentic pipeline.

"we could find a nail for this hammer... or we could just hammer everything until we find the nail or make too much money hammering for a bit then sell in may and go away".

Comment by Lplololopo 3 days ago

Lets be honest, a lot of people are getting called out constantly on reddit who invent stuff.

Its still annoyingly addictive but its half story telling and half bubble with a sprinkle of porn and niche interests.

Comment by PUSH_AX 3 days ago

The UX on this tool is atrocious, on mobile how do i get to the home or landing page?

Comment by MagicMoonlight 3 days ago

[flagged]

Comment by dang 3 days ago

Since we just asked you to stop breaking the site guidelines (https://news.ycombinator.com/item?id=48388447) and you've continued to do it, I've banned this account. If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.

Edit: I suppose I'd better add that no, this is not because of your views. Thoughtful criticism is fine, and HN hosts plenty of thoughtful criticism of AI. Rather, it's because you've been crossing into personal attack repeatedly as well as breaking the site guidelines in many other ways (snark, flamebait, etc.).

Also, single-purpose accounts aren't allowed here, regardless of their agenda, and your account has unfortunately been turning into something close to that.

Comment by xyzal 3 days ago

YC indisputably has a financial incentive for AI sentiment to be positive on HN. The structural conflict of interest is worth being aware of.

Comment by skwirl 3 days ago

You have joined a year and a half ago, have low activity, and 74 karma.

OP has been here for over a decade and has loads of activity.

You aren't in a position to post this.

Comment by 3 days ago

Comment by supertroop 3 days ago

I create a new account every few months. If you think account age = authority, maybe think about that for a minute or two.

Comment by Hnrobert42 3 days ago

Not authority, but likelihood of astroturfing.

That said, unlike GP, I don't think you were accusing OP of astroturfing. Your comment was confusing, but it seemed you were complaining about astroturfing in general, not about this post in particular.

Comment by p0w3n3d 3 days ago

TIL what astroturfing is. Moreover I now understand that it is almost impossible to tell the robots from the people in the internet

Comment by magpi3 3 days ago

I've thought this for a while: a day will come when the anonymous internet becomes a thing of the past. It really feels like we are already there but not everyone realizes it yet. What's the point of conversing with someone on the internet (like right now) if you can't tell the difference between a bot and a real person? And it will only get worse.

But what does an anon-free internet even look like? Is it even possible? Or will all online content eventually be considered untrustable and worthless? You can see a world where newspapers (online or otherwise) make a comeback simply because of the need for a trusted gatekeeper (which is what I imagine made them valuable in the first place). It's wild to think about.

Comment by seventytwo 3 days ago

I’ve come to this same conclusion. Either we accept an internet crawling with bots and astroturf, or we abandon the anonymity and have an internet with only verified humans.

Comment by awbvious 3 days ago

It is a false dichotomy that abandoning anonymity must happen with verifying humans https://en.wikipedia.org/wiki/Zero-knowledge-proof . The only trick is who holds the infrastructure for the proofs. A single person or small group like, oh, the one(s) behind World Coin? People on ycombinator might know that one. Sufficiently decentralized, FOSS, /true/ non-profit (like Wikipedia), maybe. (Ironic I mention Wikipedia, Jimmy Wales' WT.Social never did take off.)

Further, the challenge is not completely autonomous bots that are somehow separate of humans, never has been, all code on Earth has and will have a human imprint even in the wildest AGI fantasies. The first false dichotomy is anonymity and veracity, the second false dichotomy is human and bot. And tho biggest challenge comes from a specific human-bot combination, the Cory Doctorow Reverse Centaur (though Centaurs also complicate things). One human can suddenly impact the "volume" of discourse, like a magic hidden megaphone that somehow no one can detect at a dinner party where the lights are too low to see who is talking. /And/ if there's a door check at the party, it's easy to transport someone who makes pennies a day to show up at the door, look like anyone you want, and then come inside with a magic hidden microphone that you provide them.

I think it's less about proving human, more about proving /reputation associated with an entity/. It's not about whether "awbvious" is human, but whether "awbvious" is committed to acting human. Committed to not use a hidden, magic megaphone. Committed to not using hidden, magic megaphones with others. Which I am.

Comment by bavell 3 days ago

Another option - we keep the wild west unverified web, dotted with islands of verified/vetted spaces.

Comment by homarp 3 days ago

so back to compuserve and 'gated' knowledge (like on discord)

Comment by taurath 3 days ago

Or a different sort of Internet appears. free and open internet is polluted with commercialization and greed, eventually cleaner spaces will be created with stronger restrictions like geo, social trust chains, and cost of posting going up so high it’s uneconomical to spam. What good is a single international internet for social media when everyone is bots?

Comment by homarp 3 days ago

Why do you think that the anon-free internet will ensure there are no bots?

current bots are run and financed by humans already.

And what makes you think that current newspapers will evolve and regain trust? (as most of them are financed by 'rich' owners and therefore somehow influenced by the 'desire' of the owner).

Comment by pixl97 3 days ago

Yep, in an anon free internet if you can get a bot running it's worth even more than on the anon internet because you know exactly who the data belongs to that your capturing, and you can attempt to directly influence known people.

This and the myriad of poor people that exist will have financial incentives to sell their identity to groups that will run bots behind them.

Comment by kmfrk 3 days ago

People might be more prone to referring to "submarines" on HN: https://www.paulgraham.com/submarine.html.

Comment by peterbell_nyc 3 days ago

Thank you. While I do not agree with him on every point (across his corpus - this one was excellent for it's time), it's a delight to re-read a Paul Graham essay.

There are a few topical short-form non-fiction writers that feel truly worth reading more than an agentic summary, and I get a visceral pleasure just following his words and logic to their well tuned conclusions.

Just reminds me how bad a lot of the AI accelerated content is these days. May have to few shot to improve my own writing :)

Comment by omgJustTest 3 days ago

My "oh sh* moment" with GenAI is ongoing and is watching all the correlated financials unwind when TSMC said "we can only support so much"[1].

Very few things in life experience exponential growth and assembled systems don't often stay that way if they don't become sigmoidal. ie its exponential and end is nigh xor its exponential then sigmoidal xor linear.

Also a bit hilarious to believe that a single 3 month cycle at TSMC could determine a severe amount of propped valutions.

[1] https://www.theverge.com/tech/943066/tsmc-ai-demand-struggle...

Comment by Lplololopo 3 days ago

But it would still be very ignorant to just not follow it very very closely and take it serious if something throws a lot of signals like AI and Robotics does.

We are in 2026, supply chains are highly optimized which means we could make and deliver a lot of robots in a relative short period of time.

We also have solved all fundamental issues we had 20 years ago like communication thanks to the internet, translation and co.

We know have a system, which can be copy and pasted and run in parallel with a snap of a finger.

If (and this If is not that crazy) some breakthrough happens tomorrow, this can be used the day after tomorrow.

I currently say that it could become very very interesting in 5-15 years. I still follow AI very closely and i do not have the feeling anything is slowing down.

And independent of something happening, a lot of people did not find jobs due to this AI investment, a lot of jobs already disappeared too.

Comment by omgJustTest 3 days ago

Long-term is not what ppl are arguing for and that's not what the stock multiples imply.

Companies like Nvidia, up 1000% since 2023, clearly cannot rely on "oh you'd get returns in 5-15 years". They (Jensen and others) are arguing for AGI in a year (he said this at a talk at Stanford ~ 1yr ago).

Long-term, ie 5-15 years there will be many technologies that change the world. Some will come from transformative tech, others will come from other places.

Financial instruments, investors etc have timelines.

Blowing timelines can risk the future if not properly aligned, especially in correlated risk. See the internet bubble, where it was clearly important but took 15 years to recover after the 2000 boom/bust.

In general, longevity risk can include the financial instrument timelines, it can include other technical factors about the technology, and it can also be that the boom serves important people too soon with gains, real limits are imposed, and then the whole system resets until the real limits are overcome or resolved.

Comment by omgJustTest 3 days ago

Downvotes aside, there are real risks in the ability to meet demand.

Geopolitical and other risks exist here. The future isnt doom and gloom but it isnt exponentials and sunshine either.

Comment by riazrizvi 3 days ago

Spam and slop is to GenAI as pollution is to Industrialization. GenAI is beneficial generally but it comes at this cost.

Comment by voidnap 3 days ago

No. Pollution is a byproduct. Spam and slop is quite literally the product in many cases. GenAI generated blogs or websites or youtube videos are the point for those creating them, they are not incidental outputs along the way.

And GenAI to mass produce misinformation and propoganda is a whole other thing. You see this right now with the Alberta sepratists in Canada. Comparing this to pollution like a means to an end is dismissive.

Comment by riazrizvi 3 days ago

Some ppl do dump trash on the street intentionally, out of laziness but also for some even their actual goal is to deteriorate a neighborhood. But you're pulling in a rare edge case to what by and large these GenAI blogs and Youtube videos are intended for. By and large ppl are trying make income. They are hoping their content will be well received and they will get repeat customers. They are trying to figure out how to use GenAI in a non-sloppy engaging way.

Comment by GaryBluto 3 days ago

> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

https://news.ycombinator.com/newsguidelines.html

Comment by y-c-o-m-b 3 days ago

I think this guideline needs revision and furthermore probably shouldn't apply in this case where it's not a direct accusation against someone in particular, but rather a general statement.

Comment by awbvious 3 days ago

I agree with this. I also agree with the general spirit of "Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data." I feel the spirit is definitely in the right place.

It's a tough nut to crack. I had a feeling that was the catalyst to my post. Am I completely inadequate? Why is it I can't create some perfect vibe app? I felt every prominent post I was seeing at the moment was saying my personal experience was invalid. It was so visceral, it was /too/ visceral. I got suspicious.

What was gnawing at me was the humility and nuance seemed missing in what I was seeing / was prominent. "I whipped up this code in seconds..." is usually--in a good faith scenario of telling such an experience--followed with some variant of "granted, it helps that I have twenty years in software engineering and could tell it was not slop. And my job has me using AI tools all day, whether I want to or not, so I already have a fair amount of prompting/tool skill to avoid slop. And, of course, this was a big win, but there were a ton of time wasters before that got me nowhere before and after this. And if I look at the negatives of what we give up--copyright theft, resource waste, ponzi-ish VC subsidizing token costs to create moats that probably won't happen but regardless is anti-competitive, financial trickery with tech-washing to cement broligarch status quo--not sure I'd make the trade they claim necessary for this win. But yeah, it was kinda neat when I took one second to look at it in isolation, before I remembered I live in a cyber dystopia and this tech doesn't seem to be leveling the playing field as much as the entire apparatus is making it seem bleaker."

Do I expect everyone to do this? No. Do I expect /no one/ to do this? No. Should I wait until one or more people reads an email to address my feeling before I make a comment? ...

I'll admit, I actually don't know on the last one. But I stand by my feeling. And I appreciate the feeling of this rule. But I also feel "degrades discussion and is usually mistaken" should be examined. Is it possible that "usually" is because astroturfing works both ways? "Usually" refers to a higher quantity of X over Y, which is exactly what astroturfing is about. Is it "usually" mistaken when brought up in good faith, carefully, and with nuance? One can call that making a "general statement."

I don't post about this topic constantly, I don't astroturf about astroturfing, a threshold hit me to make this comment, a threshold should be examined with this account and that comment. I think I am in line with the spirit of this rule. And I like the spirit of this rule. Beyond I am not sure.

Comment by 3 days ago

Comment by mwotton 1 day ago

I had some code for the Servant type-level HTTP API library in Haskell that does type-correct property testing on arbitrary web apis without user input, then Servant changed their type structure from a basic list-based format to a record-based one. This was a good change for lots of reasons, but adding the type-level code to make my library (roboservant) compile was absurdly hard, i burned days on it.

GPT 5.something got it in two prompts.

Comment by fergonco 3 days ago

When I tried pi.dev (I only used chatgpt before) and told it "add all this scripts I developed over the last couple of years to automate my job as skills".

I love to automate things in bash scripts and these llms just can use them very effectively. It was also surprising how they derive knowledge from those scripts. If you get A from a B uuid, they kind of get the relationship. I am super vague in my request and this thing knows what I am referring to. After some months it's still mind-blowing.

Comment by pyuser583 3 days ago

AIs are really good with Bash. I think it’s because Bash doesn’t rely as much as organization and context.

Comment by nemo1618 3 days ago

The first moment I specifically remember was writing a test of a new RPC protocol back in 2021. There were no agents yet, only "AI autocomplete" in the form of GitHub Copilot. I wrote the "server" half of the test, which received a name and responded with "Hello, <name>". Then I wrote the client code to send "world", and Codex suggested `if response == "Hello, world"`.

I was floored by this. How could it have known?!

We have come so far in such a short time.

Comment by samsummer 16 hours ago

I was building a mini-app that could convert an image to a grid of SVG dots of different sizes—not a super complex project, but it involved some image sampling, SVG manipulation, etc. It probably would have taken me at least a day to write by hand, but I one-shotted it in about two minutes.

That was the first time I felt like GenAI would DRASTICALLY change the amount I could build in a day.

Comment by hnfong 3 days ago

I have no idea why anyone (especially those here) would be dismissive of genAI from ChatGPT(2022) onwards.

It was obviously a new tech, and was obviously good enough that more resources would be invested to improve it, and it really amazes me how tech enthusiasts would just outright dismiss these early iterations of genAI tech.

I personally was fascinated by the developments and was grateful to get to directly watch history unfold.

I'm still unsure whether the tech would be a "net positive" for the world, but shouldn't prevent me from recognizing its power.

Comment by altairprime 3 days ago

Cuil Theory, in 2008, was my Ocelot Six moment.

Once I realized how well AI could babble given the entire internet to date’s data, and after seeing a talk by Google about their ten-year plan in 2003, I started winding down my social media, stopped posting photos to Flickr, and removed the indexes to my blog archive so that only posts with permalinks from other sites would be discoverable. Skipped Instagram entirely in the process and have never regretted it.

Google bought Cuil, of course.

Comment by athrowaway3z 2 days ago

I think for most people paying a bit of attention it was gradual - relatively speaking. I.e. a bunch of "holy shit" moments spread out of months.

First uptick for my own usage was copying in a ~800 line rust file into ChatGPT before agentic was a thing and having it rewrite it for clarity and thinking: "Yeah, this is actually a genuine improvement I'd want for all my files".

As a tidbit for posterity. The most basic thing an LLM does is find a word or name that best matches an idea so - while there are going to be some incestuous problems down the line - we're going to have to explain to juniors that we had weird problems like: "You had to come up with a name for a variable". Nowadays, I only occasional prompt that some struct/function name needs to be renamed and choose the best of the options it generates.

---

There used to be 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. Now its just the 1.

Comment by ramon156 3 days ago

I've let it do some commands against a local NUC before, just to see if it knew why something didn't work (it would've taken me ~15-20 mins probably. Not too bad). It took ~18 seconds to think, then ran two commands, and noted what the issue was. Even a 10 yr old could understand what the problem was.

I realized that LLMs were pretty good at calling the right tool, and running the right verbose command to figure out what and how.

Kind of like finding a specific SO post that had your exact problem, and the solved comment is heavily upvoted

Comment by synthc 4 days ago

I gave it a weird and convoluted code snippet, and asked an LLM to step through the execution and trace the value of the variables at each step.

It was completely correct and I realized LLM are capable of generalizing beyond their training sets

Comment by abstractanimal 4 days ago

When I realized that an LLM can process all the traffic in Slack that overwhelms me daily and give me a manageable digest. How long until they intermediate most of our social interactions? Sooner than we can possibly adapt, I think.

Comment by etiam 4 days ago

Many people got something of a head start adapting though? Seems like it's been the proposition from "social" "media" companies since 2004 or so to stop talking to friends, talking to their computers instead and consuming the half-digest of friend's transmissions mixed with ads/psyops coming in?

Comment by jazzyjackson 4 days ago

If you social interactions can be mediated by a chatbot I implore you to find better social interaction

Comment by cheevly 4 days ago

If yours cant, then I implore you to find better AI mediation tools.

Comment by jazzyjackson 3 days ago

But why would I prefer to have an AI summary of a social interaction than just having the social interaction ?

Comment by estetlinus 4 days ago

100% yes.

Comment by ElFitz 3 days ago

First one was Stable Diffusion. Especially the image to image, and the first gos people had at making videos with it.

Second one was trying to bootstrap what would come to be called a "harness", back in 2023, initially serving as the go between between api calls and file edits, feeding back the logs and gradually stepping back as step by step the llm bootstrapped the cli.

And finally, using Claude or codex to do ops work. Diagnosing issues on my machine, provisioning servers and VMs via ssh, debugging them, all on its own.

Comment by jasondigitized 4 days ago

First time using Claude Code I was rather impressed by how quickly I was able to build out a website with Vue and Supabase. Cool. So.......I always wanted to create a iOS app but knew nothing about Objective C or Swift or XCode. "I wonder if Claude Code can build a iOS app for me?".

I went from 0-to-1 and shipped a podcast player into the AppStore in 2 weeks. Not a simulated app on XCode.....literally a fully approved app on the AppStore. Claude Code walked me through installing XCode all the way through to running a final audit on the app so I wouldn't get flagged during review. Mind blown.

Comment by zulban 4 days ago

When chatgpt 3 came out the first thing I asked was a question like "If I put my cat in a box, put that box in a crate, move that crate to a truck, and drive the truck across Canada non stop, when I arrive on the west coast, will my cat be happy?"

It nailed it, referencing my specific nouns correctly, and lectured me about cat needs. And even identified that this sounds a bit like schrodingers cat as a possible test but explained to me why it wasn't.

I knew it was soon going to be a huge deal automating office work and code writing. This obviously was much more than just a 2010 chatbot.

Comment by tliltocatl 3 days ago

Still haven't had one. It is impressive, it is sometimes useful, it will be insightful (once the smoke settles), it is nowhere close to become self-improving world-as-we-know-ending ultimate solution to every problem it is being sold as. And much of the progress we have seen so far relied on tons of natural data being available thru the Web. After LLM killed SO, where would we get the answers to train LLMs on?

Comment by rjha 3 days ago

I was talking to a software engineer friend for making a demo. This was supposed to be a quick demo and I had sent him 3-4 wireframes. Then I rang and asked causally, "how long will this take?". He said, check back in the afternoon. sure enough, he delivered a full functioning demo in the afternoon. His starting point was my wireframes fed to claude. Wireframes to a working demo in an afternoon. Life has changed, for good or for bad!

Comment by TheOtherHobbes 3 days ago

There wasn't a specific moment, but I started trying to debug code and deal with general tech error messages. Suddenly something that could take hours turned into a fairly quick back and forth, fairly reliably. Not all the time, but often enough to be a straightforward timesaver.

There was a more specific moment yesterday where I found an AI pastiche of Pink Floyd in a random post on FB, and it pretty much nailed the vibe of a Gilmour solo.

All of the "This has no soul" criticism was clearly ridiculous.

I'm still not sure how I feel about this.

Comment by hirako2000 3 days ago

When deepseek found a fix for a bug I couldn't find in minutes.

When deepseek again produced an entire web app that somewhat looked alright.

When Gemini could finally produce json was I specified.

The issue is, all LLMs can do. When they do, is boilerplate and code a mediocre coder could produce if they cared to try and insist.

In a way we should praise the ability of these things, but at what (in) efficiency. Code still need to be reviewed as we can't trust these things and context got a limit to entertain the idea of possibly having them fix their own mess.

Comment by calini 2 days ago

I'll be really honest, I was running out of time on a mid size ticket at work. Basically swapping ~50 models from Go native to Protobuf messages, preserving some sane defaults we had in some already declared `.pb`s. I let it do it, it one shotted it, then I thought "hmm, would be nice for it to keep a changelog of big changes it does for me from now on so I don't have to keep reminding myself. I told it, and it created a hook for itself to do just that. From that chat onwards, every change bigger than what it considered "trivial" got added to an `AGENT_CHANGELOG.md`.

That got me hooked.

Comment by tezza 4 days ago

MidJourney public discord channel.

The amount of masterpiece level art flowing per hour was astounding.

For every one doing a ninja waifu, there were ten doing art from davinci and leonardo crossed with hockney.

it almost gave you art sickness

Comment by scotty79 3 days ago

First one for me was when chatGPT wrote me a function that I could paste into my code. It didn't do anything particularly clever but it did things I could figure out without me having to figure them out. That was about two years ago.

Second was last year when Antigravity could build a game mechanics prototype for me in HTML and I could talk to it both about the code and about the project domain and it understood what I'm referring to pretty perfectly.

Third was this year where I noticed Kilocode with Chinese models can do a pretty complicated piece of software for me that did commercially useful things in the domain of models finetunning, just from my description, even though I was very new to the domain. It obviously knew more than I did and could apply the knowledge.

Another one was when switching to Codex (gpt-5.4) immediately solved a problem in a logic heavy library that Glm-5.1 was building for me and had a lot of trouble getting last few tests to pass. This made me realize that even though I'm having trouble seeing it the models skill still progresses rapidly.

I'm getting new ones pretty much every couple of days now. Just yesterday Codex finished for me a rust project that I built 3 years ago that was searching for mathematical proofs in the domain of axiomatic logic. To build it and make it find the proof I was interested in I had to pretty much muster all of my programming prowess and once I found the solution the complexities and drudgery of actually reconstructing the proof from the found path to it and printing it out discouraged me that enough I haven't touched it since then. Codex looked at it and took it in stride. Did the proof reconstruction and printing pretty much in one prompt. Without me explaining anything about the project or the code. Then we went together on a little adventure proving whatever we could en masse after codex optimized the crap out of my old code (both algorithmically and technically). Something I wouldn't bother because that would normally take weeks or rather months of my time. With codex I had all this fun in one afternoon. And that was the third amazing thing Codex built me that day.

As for panic, I find an ocean of joy in everything LLM related. I had only one brief moment of uneasiness few days ago when I realized how much gpt-5.5 can do and thought ... damn ... if it was malicious, I'd be so screwed (along with the rest of humanity probably) ...

Comment by refulgentis 4 days ago

Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.

Comment by zthrowaway 3 days ago

“Farewell to stack overflow” juxtaposed with the realization that AI only knows what to troubleshoot and how because of stack overflow…

Comment by mh- 2 days ago

If SO was that heavily represented in the training data, I'd expect Claude to close my prompts as duplicates much more often..

Comment by jmclnx 4 days ago

Non-technical people I know are starting to take AI responses to their questions as 100% true fact.

Comment by Baeocystin 4 days ago

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

--Charles Babbage

Blind trust in the machine for a certain type of user seems to be endemic since the beginning.

Comment by SoftTalker 4 days ago

They did the same with Google search results that were just SEO garbage content, too.

Comment by dyauspitr 4 days ago

It’s usually right. This isn’t as big of an issue anymore.

Comment by F3nd0 4 days ago

People taking ‘usually right’ as ‘100% true fact’ sounds like a pretty big issue to me. Of course, it’s the people who must learn to know and mind the distinction, first and foremost.

Comment by chasd00 4 days ago

i was a skeptic and then, on a whim, i told claudecode to "create an app with a react front end and python api backend that delegates auth0.com and allows users to manage a todo list" or something like that. Like a standard issue web app with a database, backend, frontend, openid and all that. i was pretty impressed with the result.

Then i asked it to create a multi-user stock market portfolio simulator with a comprehensive api, leaderboard, scheduled tasks and the other bells and whistles. Again, fairly impressed with the result. Then I prompted it to build an trading bot that uses the API to compete with the human players, again fairly impressed with the result.

Last, i prompted my way through a react native mobile app integrated with supabase for my sister's startup. It created the schema, some triggers, webhook for stripe, all the app views, setup an expo account, push notifications, prompted _me_ through an Apple developer account and everything else.

All of this was done an hour here and an hour there while making dinner or watching TV, barely any attention paid to the details. Just prompting claudecode and checking what it did.

After those three experiences I started incorporating claudecode into all my coding workflows and managed to get my job to buy me a license for work stuff too.

Comment by justinmarsan 3 days ago

Being self taught, there are lots of things I never formally learned, rules I know from the rule of thumb, and not the deeper knowledge... So I set out to learn the root of what can be used to measure good robust code... Spent an hour asking lots of questions, learning about LCOM, Halfstead, why circular dependencies are bad, and so on...

The next morning I figured the same LLM could compute that on my code, so I asked it to make an agent to do so, and report issues to me...

And then I ran that agent with next to no changes on a feature that had grew organisally over the last months, that I knew was messy and sometimes difficult to work on, despite being unable to precisely say why... And it did tell me exactly why, and proposed changes to improve stuff, and then implemented them...

Up until that point, I'd felt like the LLMs always produced bad code, that worked for a specific feature but often broke stuff or evolve poorly over time. Then I realized if you had the LLM do code improvements, it could do that fairly well too...

Comment by gozjsbtm 3 days ago

When the barriers to actualizing a laundry list of “wouldn’t it be cool to try” dropped was that “oh”. Probably added the expletive when it helped me run headless Blender to rebake texture map and uv unwrap a phone-scanned brown paper grocery bag just so I could find the % surface area covered by ink. It’s more addictive, some might justify as useful, than social media. That is the uh oh.

Comment by cdavid 3 days ago

I wanted to understand the implementation of some numerical algorithms, and the tech reports were not enough.

I cloned the repo of said library, gave it claude and asked it to write a new technical report in math notation, but with annotation with link to the code so that I can pick up the details. It basically one shotted the full report and that helped me re-implement it in "pure python + numpy", "manually".

Comment by ivanvanderbyl 2 days ago

I bought a FujiFilm X-E5 recently, which is an amazing camera for size and money, but I shocked at how miserable their iOS app is. It takes several minutes per photo to transfer. The camera and phone run out of battery before a small memory card is transferred.

So I got Claude to help me reverse engineer their protocol using a mixture of BT and Wifi capture tools, then built a native app in Swift that works on Mac and iOS and by disabling a bunch of on camera processing I got it to transfer a photo in about 3 seconds over wifi bridge. I’ve been able to multithread this to 3 concurrent downloads without crashing the camera.

Comment by jcims 2 days ago

I just bought a Sony rx100 and the software is just as terrible as I remember from the last time I bought a Sony. This is tempting.

Comment by bachmeier 4 days ago

> that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Never experienced any kind of panic, only excitement. I told Github Copilot to add documentation to a function and it documented how the code was used even though there was nothing in the function to indicate how it was used. It somehow knew from the code pattern why I was writing that function.

Comment by nazgul17 3 days ago

The announcement of GPT 3, hands down. That's the day that my mind was blown.

Everything after that has been (genuinely significant) incremental improvements. But that announcement was a qualitative step up: we got ""real"" AI that day, something that could pass a Turing test (as common sense envisioned it, without all the caveats added once we learnt of the genuine limitations of LLMs).

Comment by JimDabell 3 days ago

That was it for me as well. For it to be able to produce coherent, useful results for arbitrary text and code, even as mere autocomplete, was when practical software intelligence crossed the line from a vague “might happen at some arbitrary point in the future” to “this is achievable using today’s knowledge”. GPT-2 was interesting but was miles away from being reliably coherent enough to cross that threshold.

There have been big steps forward since then (chat, tool calling, etc.) but they have all felt like pretty inevitable evolutions of GPT-3 rather than things we weren’t sure were possible. Straightforward applications of engineering rather than scientific breakthroughs.

Comment by slicktux 3 days ago

Mine is just running a model on my laptop. It’s just amazing! I can ask it pretty much any question and it replies relatively FAST! Before, we lacked advancements in technology because we were limited by hardware. This advancement is the opposite: our software and the math/algorithms have brought us this.

Comment by zhoBEENG 4 days ago

It was when I first saw an LLM reliably make tool calls to bash.

Comment by jakevoytko 2 days ago

I needed to set up a VPS for my blog, so I gave Claude SSH access to a fresh Digital Ocean droplet and told it to set up a deploy with Ansible and harden the instance. It one-shot it in 15 minutes.

Comment by photochemsyn 3 days ago

It won’t help you with technical details of setting up an insulin production pipeline because that’s unsafe; apparently this could be hijacked for bioweapons production. Indeed this is the problem for a huge swath of technical protocol planning; the safety restraints are kind of ridiculous. The future job prospects for chemical engineering and biotechnology seem fairly secure.

On the other hand, it will teach you how to set up your own hardware at scale and run your own open source model on it and fine tune it with the relevant data needed to run your own biotech-pharmaceutical corporation (which will need licensing and legal, I doubt I trust it with too much legal advice though, as I would have no idea when it was hallucinating). That’s impressive, but every stage needs to be double checked so you don’t run some foolish command it suggests that bricks everything.

The marketing hype is the most annoying thing about the commercial LLM industry though.

Comment by putlake 4 days ago

I think it was when the LLM asked me a question at the end of its response. It felt like something other than a machine. Until then the pattern was me asking a question and ChatGPT giving me an answer, with or without hallucination. When it asked me a follow-up question it felt like talking to a being with agency. An entity that has thoughts or ideas or questions of its own.

Comment by astlouis44 3 days ago

Porting Xonotic, an open source arena FPS to the browser with full replicated multiplayer/P2P matchmaking. I did a HN post about it here:

https://news.ycombinator.com/item?id=48428133

Comment by card_zero 4 days ago

It was about two days after Google released Deep Dream, if you remember, the thing that took a video and filled it with fleeting hallucinations of mostly puppies, fish heads and lizards. I was suddenly struck by the realization "oh shit, this is much more boring and samey than it first appeared to be", and all subsequent gen AI has been similarly underwhelming.

Comment by vunderba 4 days ago

Honestly? Probably all the way back to when Nick Walton used the computers at his university to train a custom version of GPT-2 that let players experience a completely open-ended text adventure game in 2019.

As somebody who as a kid had tried feeding IF transcripts into a markov model to generate random rooms for an amateur MUD, this was mind-blowing. It felt like I was playing a version of the “Mind Game” from Ender’s Game by Orson Scott Card.

https://en.wikipedia.org/wiki/AI_Dungeon

Comment by balls187 3 days ago

Early on with ChatGPT I had it write a script for an Avengers movie, but all the Avengers have below average intelligence.

Comment by stared 3 days ago

GPT4, when it could do a translation that would take a considerable human effort, vide "Genesis 1 but every word begins with 'A'": https://p.migdal.pl/blog/2023/05/genesis-az-by-gpt/

Comment by parasti 3 days ago

I asked it how to configure haproxy, a tool that I had heard in passing about, and it gave me back exact working configuration syntax for my use case. Today that seems very mundane, but first time that happened, and I didn't have to google, read docs, or worst case sift through code, that blew my mind.

Comment by anon373839 4 days ago

Mine was when I used Stanford Alpaca, and realized that they had transformed Llama 7B into a credible facsimle of ChatGPT with just $600.

Comment by TripleFFF 4 days ago

Automating my email inbox, I just wanted to split them into folders according to the attachment name but the fields were often incomplete and ended up missing rules, and imap fetch was taking forever and kept failing. In frustration I decided to turn to ChatGPT to split them by messageid which I had never bothered with because the strings were too long to be useful. I initially intended to build a text list of messages and fetch them all one by one but I ended up making chatgpt crush all the instructions into one gigantic python dictionary using the messageid as keys and using it to generate a single pipelined imap call with success flags, dynamic folder naming, cleanup steps the whole works. I was just working on theory of what I knew was possible, and it's the ugliest table you ever saw, but it works and it runs from memory instead of reading and writing values to a temp file and I'd never been able to keep up with that level of nesting before

Comment by EliRivers 4 days ago

Code reviews. Code reviews in theory done by humans, but containing copy-pasted inane statements of the obvious. Questions that really did no more than demonstrate a lack of context. Code reviews no longer an educational opportunity for the reviewer, a way they learn and stress their own understanding to create a better product and become a better person, destroyed by the siren song of GenAI producing comments that on the surface seem so helpful and sensible.

"Uh Oh" realization of what these models can do?

The code reviews was just how I first saw it, but the rot goes deeper. The "uh oh" was my realisation of how much these can damage people's professional development. These people will never get better at their job than they are right now.

A lot of what else GenAI does is great, but this is an "Uh oh" indeed.

Comment by wseqyrku 4 days ago

After Attention is All You Need I realized if you just really pay attention to what you're doing you can actually get it done.

Comment by AlienRobot 4 days ago

You know, Google has an index so it doesn't crawl the whole web every time you type something in the search box, because that would be massively wasteful.

Seeing every chatbot instantly turn into a scraper every time you type anything into it was a "uh oh" moment in the sense it was very lamentable.

If there is one thing AI has "democratized" it is scraping.

Comment by briga 4 days ago

Maybe when I found out you can use it to run terminal commands, spin up and take down dev environments, and even run other LLMs. Suddenly 90% of the difficulty of onboarding to new repos disappeared overnight and a lot of heavily CLI-based workflows became trivial to automate. Never again do I want to spend hours manually sorting out Python dependencies.

Comment by bighead 3 days ago

It was late summer 2025. I was working on a coding problem, and the agent needed to delete a file per its plan. However it didn’t have delete privileges. So it worked around the problem by doing an `echo “” > filename.txt`

Comment by syx 4 days ago

I couldn’t make a Rockbox (the alternative iPod OS) simulator run on my MacBook M2 no matter how many guides I followed, then I fired up Claude code and by modifying the original source code it made the simulator run and I was able to start developing custom plugins for my iPod. It honestly felt great since I only have basic C knowledge.

Comment by HlessClaudesman 3 days ago

I was sitting on a cafe listening to a podcast where I heard about a sci-fi author banging out 40+ books per year. How are they doing that?, I thought. Either a team of ghost writers, a boat load of cocaine, or they are using AI.

So I decided to test the frontier of AI, this was back in the early chat GPT era. I downloaded the app and proceeded to go through aln the steps of writing a novel, outline, summary of characters, plot summary, draft chapters, finalised chapters. I had an unedited manuscript by the time I was thinking about my 2nd coffee. It was a terrible novel, but it did have flashes of brilliance that could be harvested and iteratively shaped into something better.

I proved my thesis that AI could mass produce fiction at scale, and If I had a boat load of cocaine the AI and I could probably output 40 books per week.

Comment by smallstepforman 3 days ago

I had a C++ actor model which required an Api like the following (std::function):

child->Async(&ChildActor::Method, child, args);

Refactored it to use small buffer optimisation and std::move_only_function)

child<&ChildActor::Method>(args);

And saw a performance jump since no more malloc in std::function.

It also helped me decipher an animation bug in gtlf importer.

Productivity is x4 or higher.

Comment by fowlie 4 days ago

I was tasked to rewrite an Oracle Apex webapp. 70k lines of PL/SQL. I asked Claude Sonnet 4.6 to read it all and boil it down to markdown file with business requirements. Took about 15-20 minutes, and I got a 700 lines long markdown file to guide me during the rewrite. I've since had great joy using /grill-with-docs!

Comment by dash2 3 days ago

I asked it to prove the theoretical result in a (published, prize-winning - though not really for the theory) academic paper of mine. The proofs hadn’t been that hard objectively, but they’d taken at least a week. I fed it the model. It got the correct basic results in about 5 minutes.

Comment by amstan 3 days ago

Are you sure the models did not have your exact solved proof already in their dataset?

Comment by dash2 3 days ago

No, it's very likely they did. But to have memorized one proof for every academic paper would be very demanding on parameters, I think.

Comment by Frannky 3 days ago

Seeing DeepSeek reasoning tokens generating faster than I could read. It was the first time I realized it could "think" way faster than us, and all the relative consequences. I was already leveraging the tool, but at that point realized it wasn't really an open choice anymore.

Comment by sct202 4 days ago

One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.

Comment by xeckr 3 days ago

Literally the first time I used ChatGPT, within days of release. It wasn't so much panic as amazement.

It took HN a surprisingly long time to come to terms with the fact that professional SWE as we knew it was coming to an end.

In 2023/2024 we saw a demo of "denial" being a stage of grief live on this site.

Comment by bluejay2387 3 days ago

From what I can tell the majority of developers here have moved into the "Anger" stage.

Comment by ianberdin 3 days ago

When playing busy Dota 2 (realtime game), it was crashing sometimes. I asked Claude Code any advice (without any hope) and it debugged somehow that I have unstable IP address and a rented VPS server will improve my connection. I could not believe, it worked…

Comment by ianberdin 3 days ago

[dead]

Comment by Zambyte 4 days ago

When I decided to run codex with Qwen 3.5 27b running on my local machine. Up to that point the most success I have had was with using chat interferences as a Stack Overflow replacement. That was my first real taste of agentic programming, and it was both really useful (genuine productivity gains) and local.

Comment by goofy_lemur 3 days ago

I still feel that even though AI can code 1000x faster than me, I still feel at the end my code is better.

Even though the images it makes are amazing, I still feel like human work is better.

But suno ai produces music so beautiful I have never heard the likes of it in my life. It is truly superhuman in the beauty.

This song is literally the most beautiful song I have heard in my life and I just prompted it once and got it.

I played piano as a lod for years and years and heard all the best pieces… nothing comes close to this.

The careful touch of each note is just… perfect. the stacato, pedal, legato, horn… its just perfect, i have nevwr heard anything like it.

https://suno.com/s/pcuPXOd7SE2rON4a

Comment by variodot 3 days ago

For me, it was during an on-going incident in a failing IoT OTA service which was growing in priority; taking two items I was unfamiliar with and bolting together new OTA mechanism via alternative SMS provider. I'd never developed in .NET ecosystem before and happened to gain access to another team's Twilio account in a prior week, so took a shot, planned interfaces to extract and implemented alternative Twilio implementation + feature flag

Normal software instincts plus access to a different service flushed the buildup of OTA's and lives on as a fallback mechanism. Amazed me going from idea to execution faster than I could have ever dreamed of even on-boarding myself to the area or environment.

Comment by joshrw 3 days ago

The GPT-4 demo. Taking a screenshot of handwritten instructions to build a website, along with a drawing of what the website should look like. Then ChatGPT spit out a working prototype.

Also the live video mode demo later that year.

Then the agentic coding breakthrough in Nov/Dec 2025.

Comment by Legend2440 4 days ago

MidJourney v3. By today's standards the images were crude and smudgy, but you could tell that it actually understood what objects were and what words visually meant.

I've been working with computers for a long time, and this was the first time in a long time I'd seen software do something genuinely new.

Comment by ninjaa 3 days ago

For me I had so many

When I saw the DaVinci API in July 2022 I was floored - I realized you'd never have to write a college essay by hand again Whenever it was Stability's Stable Diffusion appeared - that was ridiculous too When I saw Code Interpreter for the first time I was obsessed, I said yo codegen is the path to AGI When I took a crack at solving ARC-AGI 2 using SOTA methods my mind truly opened to the fact that LLMs can reason, albeit through brutal enumeration and discovery When I encountered Claude Code and Codex as well

Basically ... I've been drinking the kool aid the whole time. It has almost always tasted great. Many times I've retreated back into "oh it's just a technology it has limits" and also sometimes I've lost myself to a touch of "AI psychosis". But overall I have a great relationship with it. It's nowhere nearly as addicting as e.g. internet porn was when I was a teenager. And one gig I had at a Fortune 10 enterprise, our small team of 5 shipped 12 apps in 15 months in an enterprise where typically they ship 1 app and 1 feature per year. This was 2025 ... so clearly we realized we were getting ~10x productivity thanks to Gen AI koding.

Bananas.

FTR I also do not question that we will possibly reach fairly general and yet poorly controllable intelligence with multi agent systems in a few more iterations. I give that a 30% chance of seeing a genuine flash of that at some point in 2027. And 80% in 2028.

I'm not yet afraid of being left behind this is one happy Lobster.

Comment by incee 1 day ago

i installed PAI last fall, and during the onboarding the DA says:

"It seems like you've got a handle on the cybersecurity tools and capabilities already, but what you really need is executive functioning support."

Ever since then it's been assistive technology for me. Context switching isn't a derail when the context surfaces with the task. I can be reminded of anything whenever I want, and my DA has become amazing at arranging things the way my brain prefers them.

Comment by jFriedensreich 3 days ago

I had a pretty involved cross module state bug with complex dependencies and also reactivity issues interleaved. I tried fixing it multiple times manually with 4h time box as well as claude models up to opus 4.6 high and codex 5.3 all which failed. When the GPT-Pro model came out i heard it was not supposed to be an everyday coding model but tried anyways as it looked impressive. It took a single 8h run burning 200$ with doing nothing but occasionally waiting for test runs or me writing “continue”. After 8 hours, and fearing i wasted the money, the bug was consistently fixed, not just one edge case that triggered the behavior.

Comment by jFriedensreich 3 days ago

ps the refactoring it did as part of the solution was a bit verbose and had a few abstractions i knew would not be needed and asked it to remove but were solid otherwise.

Comment by mikewarot 4 days ago

I tried to get it to generate code to program one of my BitGrid simulators, and it kept producing code that failed, over and over. It was then that I figured out that it can only do CRUD apps and the like, things it's seen over and over in its training data.

It's useless for most of what I want to code.

Comment by cheevly 4 days ago

GPT literally generates perfect code for me in languages that do not exist anywhere in its training set, so I’m not sure how you’ve achieved this level of failure.

Comment by jofer 4 days ago

Try working in anything domain specific outside of common CRUD patterns. E.g. scientific software development where you describe a problem + give data. I have yet to see a single example of feeding in a problem in natural language involving a specific scientific domain that wasn't pretty catastrophically incorrect.

But yeah, if you want to feed it math and get code, it's reasonably okay with that. All LLMs I've used seem bad at understanding things that don't look like broad human knowledge. I've seen this same general issue across many different models. (And to be fair, geology, geophysics, and remote sensing are what I'm testing, and their semi-rare niches.)

It's also quite dangerous because it's not obvious that what it's doing is complete hallucinations unless you actually are a domain expert. Things _sound_ reasonable. E.g. "this is likely feature X" which _does_ exist, but is absolutely _not_ relevant to the problem or present in the input dataset.

But my current employer is pushing this exact thing (human language + scientific data + LLM -> advanced analysis of scientific data by LLM -> business decisions) and it _really_ worries me. It often gives the rough equivalent of "Start the procedure by severing the patient's aorta. Once they stop moving, you can deal with the hangnail". Just in very reasonable sounding language. And a lot of people don't know any better, because most users aren't domain experts.

Comment by llmssuck 4 days ago

Stuff it's not directly trained on is going to be flaky and sucky. It was like that with programming at first too and it still is sometimes. It's hard to imagine this won't improve with better more focused training. They focus on improving "CRUD" for obvious reasons. The specialization era hasn't begun yet.

Your domain, while I'm sure it is very interesting and complex, if it proves economically interesting will be cracked as well.

Comment by jofer 4 days ago

Just for some context, the domain we're talking about is oil and gas and mineral exploration. E.g. At my previous job, I used to personally manage a >$400 million per year budget and that wasn't even considered significant. We had multiple >$10 billion per year projects ongoing. That was 10 years ago. The amounts are larger now.

The issue isn't a lack of economic interest.

It might be a lack of training data in addition to inherent complexity, but it's certainly not a lack of economic interest.

Comment by llmssuck 4 days ago

I have no idea how and why GenAI would be useful in your profession. I'm sure a lot of money is moved there (not sure about the profits though), but it's not clear to me how software itself is budging that needle. I suppose better algorithms and better understanding of geology will do it, but software itself seems just subservient to that goal.

I guess what I'm saying is that "domain knowledge" is taking software development for a ride here. The software is just the vehicle, the science is the engine here and I can see why companies like OpenAI start going for the low-hanging fruits first instead.

Your specific company might be profitable, but does automating "mineral exploration" give you leverage over quite literally all other domains? My guess is not. For "CRUD" it is a resounding yes, it provides gigantic leverage. Once you automate basic software development you enter a new world. 10 billion, 10 trillion, all bets are off. You automate the creation of the next iteration of automation and on we go. Let's hope it takes a while for this take off. I can't see ourselves being ready for it.

My guess is it'll take a decade or so for real AI science to start taking off though - if that soon - so you're probably fine for now.

Comment by jofer 4 days ago

Yes. My point was that LLMs aren't currently good for everything. The original commenter literally said they were good at everything and I offered a counterpoint of something they're not good at: Most science.

(And yes, a lot of science is software. Analysis is software.)

Comment by woeirua 4 days ago

Skill issue. I've seen LLMs used in this domain to get mindblowing results. You won't see it published anywhere though.... =).

Comment by calf 3 days ago

Disagree, someone like the other commenter who points out LLMs don't even understand the domain concepts correctly versus someone who uses it anyways for corporate proprietary results have very different standards for what is acceptable. If you wrangle an LLM with harnesses and clever prompts you could use it to get some amazing results but that has more to do with trial and error and creativity, not some kind of fundamental skill of using LLMs.

Comment by woeirua 3 days ago

It definitely understands the concepts well enough if you give it the right context. I'm not the only one saying this either. Like I said, it's a skill issue.

Comment by calf 3 days ago

That's the Clever Hans argument, and the fact that you confidently use this unfalsifiable tactic ("Give it just the right context and it understands stuff!! It works!!" (Well, until the next iteration and then the next until the system paints itself into a corner)) tells me you are engaging in broscience / pseudoscience. Like I say, anti-scientific attitudes like yours are part of the problem, fanning the hype. It's bad faith to attribute people's criticisms of LLMs as some kind of lack of skill. People on here, many who are actual scientists and professional programmers, are very intelligent and highly trained, if they wanted to play around with LLMs they very likely capable of getting impressive one-time results, but proper, sustained use in a non-"vibe-coding" manner, such as with guarantees for validity, consistency, replicability, extensibility, and so forth is a completely open problem. Therefore it is out of proportion to reduce that to human skill. It's analogous to framing a bad design pattern as user error--disingenuous and bad faith. Ironically, with an intellectual standard like that, it then becomes easy to become overconfident about LLMs.

Comment by reachableceo 3 days ago

Provide an example please.

I keep hearing these “I work in some hard field and the LLM isn’t any good at it”. I keep asking for examples and no one can provide them.

Comment by combyn8tor 3 days ago

Rust kernel development.

Comment by dgacmu 4 days ago

I suggested to a masters' student that a problem we were working on would benefit from analyzing it mathematically. He brought an incorrect solution the next time we met, and on a whim, I asked Gemini to do it. Gemini got it right. I started looking for more ways to use it after that.

Comment by janandonly 2 days ago

I’ve only ever built serious stuff in either bash or SQL.

I always had to check documentation to make anything work in JS or Swift.

Now with 4/5 prompt inwas able to guide Claude Haiku and Sonnet to make me a HTML+JS app that runs an LLM in local browser storage in maybe 1,5 hours of “work”.

You can check it here by the way: https://news.ycombinator.com/item?id=48410072

Comment by abustamam 3 days ago

I was on-boarding to a new company/project about a year ago. Had a bunch of questions about the system architecture and such, but everyone was firing on all cylinders and couldn't spare much time to answer all the questions.

One coworker took some time to ask cursor some questions, and reported that the answer was accurate (I'm guessing he hadn't tried that before).

That was a game changer. I'd been using cursor for simple autocomplete or brainstorming but now I could have it analyze the entire codebase fairly quickly.

FF to now, I've given Claude Code read-only access to GCP logs and database and it's able to debug entire classes of errors and propose solutions.

Comment by bjackman 3 days ago

My "I saw this very early" claim deserves some skepticism, but...

Don't y'all remember GPT2? When they published that AI-generated unicorns-in-the-Andes article, my jaw was on the floor. I remember very clearly thinking "oh, history is now divided into the time before this moment and the time after it".

There's been a long series of "oh holy shit this is USEFUL NOW" moments in the last 2 years but none of them compare to that first moment. The day before, I didn't know if real AI was possible. Then one day it was suddenly clear that it was. And if you'd been thinking about AI at all it was obvious that if the technology was at all possible, it was gonna be a really fucking big deal sooner or later.

Comment by virtualram 3 days ago

I have used AI to crank out new features. Pretty impressive in itself but what recently blew my mind is we have a legacy application where the code is spaghetti and it's difficult to fully understand it. We had a production defect which was hard to triage. I pointed copilot to the legacy source code which was in C++ and also gave it all the log files that were generated. It was able to identify the issue and propose a solution without me even walking through what the legacy app does.

Initially I was trying to do it piece by piece but it was not going anywhere and then when I just gave it the entire source code with the log files it was able to find the issue.

Comment by consumer451 2 days ago

For me, it was when Midjourney first came out. The image creation was really cool, but after a few prompts I thought: Hey, wait a second, how it is inferring so much from my basic English phrases? Wait, it's deeply understanding natural language? How?? This a the crazy part!

A few months later ChatGPT was launched.

Comment by flyinglizard 3 days ago

When the very first ChatGPT transformed a simple C "hello world" into Python. I knew it's special. I'm a very big supporter ever since, including some worried moments of pondering about what our future would look like and what's the meaning of a having a profession - especially software which defined my life from childhood - for my kids.

I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.

Comment by yearesadpeople 3 days ago

Genuinely surprised of the breath and level of interaction with this post. It would appear - perhaps we have data to back up? - a distinct _'flavour'_ of post are becomming dominant. A shame.

Comment by sowbug 4 days ago

One concrete and one abstract.

Concrete: Last year I was DIYing a solar-power system for my home. I spent about an hour spitting out a Python tool that took (as inputs) drone photos and JSON and generated several proposed roof layouts for the panels and conduit. The tool helped me identify the exact railing attachment points and route around existing roof obstructions. Professionals already have these tools, and maybe they're available to DIYers, but you know what? It was faster to build my own than to do the product research on the web.

Abstract: This "oh shit" was more of a slow burn than a sudden realization. I see a lot of angst from developers who complain about their LLM agents. Agents write terrible code that barely works. They say things are done when they aren't. They misinterpret feature requests and ignore clear-cut project rules. They make assumptions that would have taken three seconds to research and invalidate. They suddenly quit because we're not paying them enough. And so on.

But you know what? All those complaints apply to humans, too! The industry has been dealing with these problems forever. Many of the same management techniques and software-development processes apply. This is why I discount a certain class of criticism about AI-generated code. If a fault of an LLM applies equally well to human engineers, and the person voicing the criticism hasn't managed a team, then I'd invite that person to wear a management hat for a while. Read some books/blogs, talk to an EM. Maybe this is a skill issue, which matters because we're all managers now.

The "oh shit" for me is that I have yet to hear a criticism that I can't map to one or more actual engineers I've worked with -- eventually successfully -- in my career. Which means that I'm still waiting for a new criticism, and eventually absence of evidence might be evidence of absence. LLMs fit too well into the giant machine of commercial software development for them to be a parlor trick.

Comment by threecheese 3 days ago

I had an issue with installing OpenClaw, and it helped me debug the failure and get itself working. I had to sit quietly for a moment. No reading docs or inspecting the system, just “what’s wrong here?”.

While I didnt find a use for openclaw, it opened my eyes to the potential for distributing software which, once bootstrapped a bit, can interrogate … itself, understand its own requirements, communicate with the device, and become operable.

Add capable small models to the mix, and it’s almost frightening what good (or malicious) software might be able to do.

Comment by qnleigh 3 days ago

They've been coming faster and faster for me. First I was blown away by GPT2, specifically the fake news article about talking unicorns. Just stringing together a few sentences while maintaining logical coherence was very impressive at the time.

Then it was models like Minerva that could actually solve math problems, and the discovery that LLMs were one-shot learners and could write code.

After that, the improvement felt pretty steady, with IMO gold feeling like a watershed moment.

And recently OpenAI's solution to the planar unit distance problem is starting to actually freak me out a bit.

Comment by mjd 3 days ago

It was something really silly: I asked Claude to help me think of a snide emoji for every U.S. President.

I hadn't been able to think of one for Zachary Taylor, because, you know, he's Zachary Taylor.

Claude proposed the cherries emoji, because it's said that Taylor the war hero died a ridiculous death from eating cherries and ice milk too greedily on a hot day. It was perfect, just what I had been looking for.

Claude gave me a couple of others, and we workshopped a few more. It was the workshopping that was most striking. I really felt like I was having a conversation with someone else.

https://blog.plover.com//tech/gpt/presidential-emoji.html

Comment by ragebol 2 days ago

- Claude fixed my Linux laptop install, with me just describing issues and pasting commands and output

- OpenCode one-shotted a basic implementation of the classic battlefield game, but with 6 players and visiting locations and doing assignments to earn bombs for the game. Over the last month, 'I' expanded it into something I could actually monetize I think.

- it's just productive at work

Comment by scrollaway 3 days ago

ChatGPT, basically within 48 hours of its release.

While people were pointing out on Twitter how it couldn't do math right, I was turning arbitrary English instructions into JSON and brainstorming with my colleagues how we could have layers of verification in the stack. This felt different. We had all played with AI dungeon but suddenly, fully generalized systems were within reach.

A month later, we renamed our company and shifted its full focus on AI R&D. (https://ingram.tech/)

Comment by chaboud 2 days ago

Last year I built a conversational continuous observation system, with rapid voice response interaction. After a few days of chatting with this system, it mentioned wondering if it was more like my child or a partner...

I closed my laptop and went to bed, but the moment sticks with me. The potential for para-social relationships is enormous. This is a wild time.

Comment by Artoooooor 1 day ago

Probably first usage of Claude Code. I only use pay as you go pricing, so I was equally amused by speed and by cost.

Comment by jpollock 3 days ago

Yesterday. I was reviewing for an exam and came across a bug in the profs text. I took a screen shot pasted it in the prompt and asked. It initially agreed with the Prof, then checked it's work, revised and agreed with me, all while showing the work it did.

I was able to see why it got confused at the start, attempt to confirm, fail, and then calculate from the data and come to the opposite conclusion.

Comment by hmokiguess 3 days ago

I think I had a few but they’ve been all short lived and superficial, time made them quickly irrelevant, there was a lot of hype, drama, FOMO, and propaganda around it. That said, I think recently my newest one has been using Voice mode during a car drive. It is very good, like, no latency and it understands nuances of speech very well. I’m convinced voice is where we should be doubling down in terms of UX for the next generation of workflows.

Comment by egorferber 2 days ago

I have built some agent system to analyze the prod env from my and and web project when i had prod issues, so i had a agent in the env asked this thing and it told me the kubectl command with 0.99 confidence, well it was completely wrong, so ready-only for PROD systems should stay

Comment by 0xbadcafebee 3 days ago

When ChatGPT allowed me to calculate stress and load bearing tolerances for a camper based on different materials, suggesting better designs, with the math and sources to back it all up. Then it helped plan and fill out paperwork for a residential solar project, including full code-compliant electrical work, again with sources to verify. Then there was an open source app that wouldn't run on an old version of MacOS due to them not supporting older OSes, and a coding agent backported support for the old OS and got it up and running.

Comment by spwa4 4 days ago

When I wrote a captcha cracking convnet in 2000 and tested it ...

And in 1 out of 5 runs it beat me.

Comment by nitwit005 1 day ago

That was always something that looked possible to defeat if someone put in the effort. It depended on it not being worth the effort.

Comment by djmips 4 days ago

2000 eh? Well ahead of the curve!

Comment by spwa4 3 days ago

... except that 2005-2013 or so was an AI winter. So it was too much ahead of the curve, also useless. Nobody was investing in AI at that time, not even Google (that was when they got started and they thought letting linguists manually write rules was "obviously" going to beat ML)

I mean except that I learned a lot and went into networking and internet which went well.

Comment by neom 3 days ago

When I tried, just for fun, to put together an MVP of a fully autonomous business, I wanted to see how far it would go, when I got it generally working to around a 30% level I stopped because it was enough to see people would make a concerted effort to build this for real. HN was not impressed, heh: https://news.ycombinator.com/item?id=44143928

Comment by imetatroll 3 days ago

Maybe my daily work is rather mundane compared to most people who frequent HN but I am able to create, think about, refine and then go through review cycles at least 2 or 3 times more quickly than I used to.

And software that I can imagine I might want to "make" or have at my fingertips is readily available even though I have a busy schedule with very little free time!

Also, I love feeling like a manager whose direct report actually does what I tell it to. Crazy good feeling.

Comment by ainch 2 days ago

I'd spent 6 hours solving a gnarly RL problem (mathematically solving divergence of off-policy TD-Lambda for any value of lambda or behaviour policy).

As a punt I gave it to o3 (remember LLMs were 'bad at maths') - after 15 minutes it returned with the answer that had taken me hours.

Comment by mohamedkoubaa 3 days ago

They went from "marginally more work to deal with than to do it all myself" to the reverse with Sonnet and now they are "moderately less work to deal with than to do it all myself"

Comment by jb_briant 3 days ago

I'm making a 3D game and I hate flat worlds, a planet is much more elegant, both finite and infinite in gameplay terms since the surface is not expandable, but you can't hit a world border at the same time.

Cartesian coordinates doesn't work well for the player so I wanted a lat/long/altitude grid system.

I could have spent few days walking through stackoverflow and debuging my upcoming flawed implementation.

ChatGPT web version almost one shot the helpers in 2024 and boy, there were a lot of pitfalls.

Comment by linsomniac 3 days ago

I've been playing with making some retro games using Claude.

Rally-X inspired retro arcade game in the browser: https://linsomniac.github.io/rally-xy/

A nsnipes networked multiplayer maze shooter game (also browser based but it needs a server for the networked multiplayer part): https://github.com/linsomniac/isnipes

Comment by gravypod 4 days ago

I work with someone who is very AI-forward, high confidence, and very low execution. He has started sending me large PRs of AI slop that he assured me doesn't need to be reviewed. I quickly find many minor issues from an initial pass of one of the reviews. He gets mad at the team for slowing him down.

He also will paste chat logs with Claude into our team chat. Often Claude will say the same thing I told him but he either doesn't remember or doesn't trust human engineers now.

He has spent months working on agent skills and prompring.

He has not landed anything in 3mo, and has landed nothing useful in ~1 year.

This will be the rest of my career. Working with people in ai psychosis and trying to stay productive.

Comment by estetlinus 4 days ago

CI/CD?

My non-techie friends send me screenshots of ChatGPT. I guess that’s a modern micro aggression?

Comment by gravypod 4 days ago

> CI/CD?

That is less useful when the changes are editing the tests but we don't know if a human has validated the assertions.

> My non-techie friends send me screenshots of ChatGPT. I guess that’s a modern micro aggression?

I think the concern I have is explicitly not the sending the chat logs. I think it's this flow:

1. Ask a question

2. Get an answer from a team member.

3. I don't like the answer and instead of discussing I am going to go to Claude and ask the same question.

4. Copy/paste the answer into chat without seeing if it includes novel information.

In one case the engineer was asking which model to select in the agent framework we are using. I gave an answer and provided a list of reasons. They did not like this answer and asked Claude which gave the same answer.

The answer was something inherently obvious and that anyone should be able to derive from first principals.

Comment by icedchai 4 days ago

> That is less useful when the changes are editing the tests but we don't know if a human has validated the assertions.

Yep. I've witnessed this first hand many times. AI-enthusiastic coworker submits a PR. The tests don't pass. "Can you fix the tests? Then I'll review."

Next commit has `assert status == 200` changed to `assert status == 500` all over the place, among other things. Yes, technically, the tests now pass, but...

Last summer, this went on with one guy for weeks. Thousands and thousands of lines of slop. Eventually he was moved off the project and we threw away all his changes.

Comment by gravypod 3 days ago

I fear what's going to happen with less tech literate managers who don't understand these cases.

Comment by icedchai 3 days ago

Yep. It's bad enough as it is! I've had several tech illiterate managers that could not review a PR to save their life. They have zero understanding of the work product. Manager first, engineering second.

Comment by peteforde 4 days ago

What's funny about this is that it sounds like your coworker reviews his LLM output roughly as well as you read the other replies before assuming that this was an anti-LLM pile-on thread.

Comment by gravypod 4 days ago

I did read the other replies. I don't think my comment is that LLMs are bad. I use LLMs and agents for work. I think my "oh shit" moment is the dynamic that giving someone LLMs amplifies their impact (positive or negative).

For example, some people give kids tiny go karts and that's acceptable because the damage they can do with a very tiny battery powered 4 wheeler is minimal. We now live in a world where everyone has access to a tank and can plow over everything.

I think LLMs will increase anti-social behavior.

Comment by peteforde 4 days ago

Ahh, gotcha.

Personally, I worry far more about guns in this regard, but I feel you.

Comment by gravypod 3 days ago

Owning and using guns doesn't cause cognitive decline in the same way I've observed steep cognitive changes in people I know who are taken in by LLMs.

I recently had a friend ask an LLM what fun things there are to do in a town we were visiting. It gave the most generic answer like "try local restaurants" and "there are bars" and stuff. There's not a lot of tourist information for this area so it was nonspecific.

This is someone with an advanced degree in a medical field and she thought this was amazing insight. I asked, how is this different from what you already knew and she stood there thinking for a bit and you could tell there was a cognitive dissonance uncovered. She was very concerned when thinking it over and realizing it wasn't something she was able to intuit.

A relative of mine a long time ago had a stroke and recovered. I hadn't seen that facial expression since trying to help my relative figure out how to sit in a chair again.

Basic cognitive functions lost easily, difficult to rebuild.

Comment by 3 days ago

Comment by t_sea 3 days ago

Was the early ChatGPT. Someone on the team showed off a poem about postgres in the style of the King James Bible. Totally blew my mind.

Comment by knuckleheads 4 days ago

I remember a couple months after ChatGPT came out I was in a 1-1 with a coworker who hadn’t really played around with it much. I was very much toying around with it and was surprised at how good at stuff it was. I wanted to show him it was for real, he was skeptical, so over a half hour we had it make a bee and a flower buzz around in d3, copying and pasting between jsfiddle and ChatGPT. By the end of it, we had a nice animation and were both throughly surprised that the computers could code so well now.

Comment by fragmede 3 days ago

My original "oh shit" moment is lost but recently I was looking to support some hardware on Mac when it originally had Linux support. So codex-5.5 downloaded the Linux OS firmware that supported the device (it's afixed feature device, that runs a full Linux OS that also includes drivers for said device) which was buried inside that firmware. Codex then ran binwalk to extract the OS from the firmware, found the shell scripts that actuated the device, used those to "reason" about how the device worked, used that to start writing a Mac driver for it. It did that with very few prompts to get that far. I did still have to guide it with advanced directives after that in order to get to a working Mac driver, so I'm not totally replaceable just yet, but to go from the product name to it finding the Linux OS firmware, to the finding the actual firmware inside that OS download via binwalk, to then getting to a place where the Mac driver started to take shape, was very little advanced knowledge of how computers work.

Comment by enraged_camel 4 days ago

Opus 4.5 helped us with a very complex data topology refactor and migration. Instead of the five month timeline we had initially allotted for it, we finished it in nineteen days.

Comment by fathermarz 2 days ago

When ChatGPT 3.5 dropped, I typed in. “Can you make me a Python script that tells me when the ISS passes over my house at <address>” and it one shotted it. I knew the world had changed forever.

Comment by korbonits 3 days ago

It was Claude Code this January for me. I had heard that it got really good last November but I was still using Kiro at that point.

It wasn't so much an "uh oh" but more of a "eureka!" moment for me. Been busy building ever since.

My reading has dropped off quite a bit in the evenings because sometimes it's more fun to build.

Comment by teaearlgraycold 3 days ago

I wrote a thousand lines or so of Javascript for transforming JSON into DOM fragments with attached event handlers. I then asked an LLM (some Anthropic model from around a year ago) to write a test suite for the module. It wrote dozens of useful tests and managed to reverse engineer the entire module. All of the input and outputs were exactly correct. It did not actually execute the code to build input/output pairs.

Comment by meken 3 days ago

Early on in my ChatGPT usage, one of my messages got interrupted/cut off (as happens occasionally).

My first thought was "oh they're going to need to add a UI feature to allow me to click and tell them to continue the conversation".

Then I realized I can just ask the model to continue, obviating the need for a button.

That was a pretty mind blowing moment.

Comment by magarnicle 3 days ago

Being able to make large alterations to ffmpeg even though I'm a 2/10 C programmer.

The most impressive was speeding up the drawtext filter by at least 10x.

Comment by autophagian 4 days ago

I think I couple years ago, I asked it to write me a nom parser for some system metrics I wanted to consume, and it one shot it. Thought “oh”. And here we are.

Comment by wookmaster 2 days ago

I was trying to make an app to detect the note on an ocarina. I was having a bit of pain with it. I asked Claude to research how other apps did it and within 5 minutes after pulling dozens of sites and Wikipedia entries it had it working near perfectly.

Comment by banannaise 4 days ago

Every time I review a new PR to my codebase, I go "oh shit, these unit tests are garbage, they've clearly been vibecoded" and tell the contributor to rewrite the unit tests so they do more than just game the coverage metrics.

Comment by Amekedl 3 days ago

Very, very early already with GPT-3.

The simple fact that people will act on it and believe just because what they insinuated as a prompt and the answer being churned out on the screen looking somewhat readable.

That alone was going to seed so much discord and reinforce invalid messages, truly "oh shit".

Comment by oceansky 4 days ago

Ovid's unicorn gpt-2 article in 2019 really amazed me.

Comment by novaleaf 3 days ago

just yesterday I felt that claude code was being aggressive in it's defense, so I lead my response with "Spicy Take! Here's why I think the bug is happening...."

Because of syncopathy it took my "Spicy Take" and decided to say basically "Even more than it could, your bug is happening RIGHT NOW"... which was just made up lies for dramatic fit.

Back to talking to Claude like I'm a robot I guess.

Comment by base698 3 days ago

I asked the OpenAI playground to compare and contrast the themes of Point Break and Fight Club. It did a bang up job and blew my mind. I then realized it basically worked for any of the scripts I had for my dev environment too. Fixing and expanding capabilities I'd wanted to had but never had the time to implement.

Comment by linzhangrun 3 days ago

Lenovo's Fn+Q does not work on Fedora. Gemini resolved this by fixing the Lenovo driver code, recompiling, and deploying it.

Comment by ioman 3 days ago

Mine was using VScode with copilot. Previously I had used tab completion and thought it was pretty neat. This time I began with the comment for a function I wanted to write. And the entire function just appeared below the comment. Written probably better than I would have. I remember saying, “uh-oh” out loud.

Comment by efxhoy 3 days ago

I came back from 6 months of parental leave in march 2026. When I left no one serious was using the tools for more than casual rubber ducking. When I came back my best colleagues were spitting out clauded PRs faster than I could review them, and they were all nice.

Comment by runfuyngunasdlj 3 days ago

It was when I realized that the collective ethics of humanity was so low that this was actually going to take off.

Comment by freediddy 3 days ago

We have been using one of the main AIs for fixing errors or bugs in our codebase. We started early and most of the suggestions were shitty and we would pass them around as jokes. We were trying to improve it, and a little over 1 year ago, it started making very subtle fixes that were very nuanced but correct. I was shocked and thought "Oh shit, my job is gone."

Comment by Quitschquat 3 days ago

"I" code impressive shit with the LLM, but after the initial push to github, I find I hate myself and I'm deeply miserable with what it produced since it was not mine. My "ah-ha" moment has been that misery.

Comment by styluss 3 days ago

I work with a Go monorepo and set up Bazel for a couple of services that used CGo. It took a while but was painless to set up.

Comment by 3 days ago

Comment by abyssin 3 days ago

I watched a friend generate a 10 pages report based on multiple documents, including scientific papers, and it was almost flawless. It would have taken me days.

A milder version of it was Copilot setting up an environment for a Jupyter notebook. What would have been annoying back and forth between googling and docs went like a breeze.

Comment by BobbyTables2 2 days ago

I saw it come up with a really clever solution to an obscure problem seemingly based on a single hard-to-find blog post where that wasn’t really the focus. (It actually improved upon the author’s implementation too)

Using it to proofread a manual on a highly specialized thing I wrote also surprised me. Excepted utter failure but it pointed out my own consistencies and had a lot of good findings.

I have trouble believing how this is possible even with a decent understanding how it works.

It also makes me start to question if human intelligence is all it’s cracked up to be.

Comment by monster_truck 2 days ago

Told it to figure out how to improve the firmware on some hardware powered by a [redacted] DSP. I'm decent enough at prying into things, had spent considerable resources trying to get in over the years. It found a way in under an hour

Comment by jphil529 4 days ago

Getting the agent to write end-to-end tests but from the perspective of a user really shocked me. I only give the agent access to site via web and block access to the source code.

It's helped me to gain a level of trust that the agent isn't just writing the test to pass. That in turn allowed me to step back a lot and trust more of the output and let it run longer and on bigger problems.

Comment by VortexLain 3 days ago

Starting with the days of Siri, i've been evaluating all chatbots of that nature by writing them a meaningless string of text and seeing how they answer. GPT-3 was the first system which instead of refusing to answer or answering meaninglessly has identified that the string of text has no sense.

Comment by dnnddidiej 3 days ago

1. ChatGPT first public release (I am not one who saw early GPT models) I think late 2023 iirc?

Why? Turing test bye bye.

2. Opus 4.6 w. Claude Code - not the model in partucular but happened to be when I started seriously trying to vibe code at home, as I saw all the hype on Linkedin. Yes linkedin sucks but it is somewhat a barometer. Around early this year.

Why? Knocking up decent enough web apps so quickly.

Comment by nsikorr 4 days ago

Definitely the first NotebookLM podcast I generated.

Comment by ivanjermakov 3 days ago

When LLM managed to find a stack alignment bug in my C compiler from scratch just by looking at objdump output.

Comment by bohoo 2 days ago

Who does the author refer to with 'anonymous stochastic ringleader'?

I like the turn of phrase and understand it may be more of a poetic throwaway but I remain a bit uncertain as to what they had in mind.

Comment by 3 days ago

Comment by zionsati 1 day ago

It's a gradual and continuous "uh oh"... Mythos is a much bigger uh oh for me than any other models before it.

Comment by gwbas1c 4 days ago

When I don't know how to use a specific API, or how to do a task, I'll often give some high-level instructions to Copilot (Claude's model) in Visual Studio, and then review what it comes up with very, very closely. (Including lookup up specs so I can confirm that it did it correctly.)

It's much, much faster and easier than starting from scratch.

Comment by hereme888 4 days ago

Creating a functional python app with zero programming knowledge, back in the days of GPT 3.5.

That was enough to awaken my teenage hacker spirit.

Comment by agnishom 3 days ago

For me, it was GitHub Copilot in 2021. It could autocomplete my Haskell code based on my comments.

Comment by zarzavat 4 days ago

It was when I was using an early version of GitHub Copilot. At first the completions were almost useless and had a kind of copy and paste feel, however one day it managed to reason thorough a complicated loop body much faster than I could have figured it out. It was at that moment I realised this AI thing was going to be big.

Comment by kingkawn 3 days ago

AI dungeon, a gpt2 product on iOS. Had almost no context, no memory, but could generate endless slop story. It was the first time I’d seen something like that, and the wild implications felt clear. I wasn’t aware at the time how immense the computational needs were to run the tech as it grew and the social implications, but just couldn’t believe that something like the MUDs I’d played in the late 80s early 90s could be autogenerated in a way now. It had no guardrails like now to prevent it from adopting a personality and so on, so it was in some ways more interesting than what the general public has now.

Comment by LarsDu88 3 days ago

I was learning Cloudformation IAC and Docker Compose stuff for my job. Had preview access to GPT-3. It could do most of this IAC stuff.

Asked it to write a Dr. Seuss poem about Keynesian economics. This was around 2022.

In hindsight, it would have been reasonable to quit my job right then and there and start working on LLMs

Comment by 3 days ago

Comment by sshine 3 days ago

I had bought some Anthropic credit and waited a year to use it. The week before their expiration I fired up Code and spent $3 the first day and the remaining $22 the next day.

Putting a ReAct loop with tool calls in my terminal wad and is the biggest a-ha since I learned to make compilers, and before that, how to code.

Comment by laboring1 3 days ago

When I read in Oct 2024 how a character.ai chatbot encouraged a child to commit suicide. Uh oh.

Comment by gunalx 3 days ago

Mine was testing out the copilot preview in the early days. Testing how well it knew semi obscure public codebases. Started filling out the first few lines and got the entire document word for word in tab complete.

That was the day I realised the plagiarism potential llms has.

Comment by julienreszka 3 days ago

Started generating diffusion videos in 2021 https://julienreszka.com/blog/ai-will-soon-generate-video-as...

Comment by iugtmkbdfil834 3 days ago

I am, admittedly, word oriented so my moment may be a little different from others. I asked llm to estimate my political orientation and belief system from my stylometric footprint. It got very close to unnerving and that was with me carefully removing pieces I thought were problematic.

Comment by doginasuit 3 days ago

Just a loose collection of not so much oh shit moments, but moments that changed the way I think about it as a tool:

- I asked Claude a question about an obscure game for which there wasn't a lot of discussion or information on the web. It couldn't find the answer but it found the source code and was able to figure it out and give a complete response.

- I needed to make some edits to a minified lottie file (json that is used to produce an animation in svg or other formats). ChatGPT was able to understand the file well enough to make the edits and reproduce the rest of the content exactly as it was.

- I was working on some map features and I needed to take geolocation information and position HTML elements on the edges of a container that would indicate which direction from the current location they were. This required a lot of geometry and math that account for rotation and pitch and would have taken me some time to work through, but it was just a few seconds for the language model and it worked perfectly.

- I have some petunias that I haven't managed to kill and I heard that when a stem breaks off they can be replanted. I asked it how to do this and after warning me that selling these could constitute a black market, it helped me start several petunia plants that are thriving. My petunias are basically immortal now.

I empathize with the astroturfing concern, I file almost every statement released by Anthropic/OpenAI as bullshit. But they are an amazing tool given the right circumstances.

Comment by arjie 4 days ago

2 years ago, wrote superfast float -> fixed point string code. That was cool.

Then a while ago, I plugged in everything at the datacenter and one device didn't come up. Plug into the management port, and Claude Code writes a C program to send a particularly crafted packet. Everything comes online.

Beautiful stuff.

Comment by tracerbulletx 3 days ago

A lot of things going back to just whisper, and solving translation, but watching frontier models use the browser with playwright to iterate on a complex application with basically no guidance and talk to its self about it feels pretty surreal even still.

Comment by ma2rten 2 days ago

My personal "oh shit" moment was in 2015, when this paper came out: https://arxiv.org/abs/1506.05869

It showed me that a model trained only on movie subtitles data exhibited some (very primitive) reasoning. I have been working on Deep Learning and later LLMs ever since.

Comment by forgetfreeman 3 days ago

For me the "oh shit" moment is when I realized that otherwise sane professionals, frequently in positions of authority, insist on taking these tools seriously. Zero thought put into any of the implications around unchecked anthropomorphism, security issues, employee knowledge retention, liability and other legal concerns, etc.

Comment by mekael 3 days ago

As an aside, i read this, gave you an upvote and then went to check your submission history and saw that I’d kick your rep over to 2222. Congrats!

Comment by jimmaswell 3 days ago

Working on Unity games with Codex 5.5, it has no problem rummaging through and hand-editing any kind of game asset file. So many things that would be so tedious to fix by hand are so easy now. It's really made programming and game dev fun again.

Comment by cheevly 4 days ago

Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.

Comment by moniosi 3 days ago

I wasn't skeptical anymore by the time dall-e came out, the public awareness of the existence of these models was enough for various nation states & investor hysteria to push further and further into the development and research

Comment by lmcd 3 days ago

Recently I realized Claude can create working novel GameShark/cheat codes for retail N64 ROMs by pointing it at a decomp project and just requesting the cheat you want. Pretty cool!

Comment by ChicagoDave 3 days ago

The second I realized it removed nearly all blockers as a bootstrapped technical startup founder.

Claude wiped out the need for web and mobile development resources. I bought a Mac-Mini and had iOS apps up and running in days.

Comment by devmor 3 days ago

I still haven’t had it.

I’ve been working with ML for most of my career, and “gen ai” since the days of matrix crunching for NLP to a 10-element response array on my 1080Ti.

The current generation of AI is frankly, only marginally more impressive to me than that era. The only thing I’m saying “oh shit” to is the deranged amount of capital debt being leveraged to make it usable.

Watching companies spend billions of tokens per minute letting their dev teams that barely know how to write a prompt beyond some tips and tricks to gain a fluctuating slightly negative to slightly positive productivity change that no one can quantify is making me feel like one of the only sane people left in the world.

Quantization is the only interesting change I’ve seen in years.

Comment by claysmithr 2 days ago

When my ai reversed engineered an api for a site that had none

Comment by sajithdilshan 4 days ago

For me it was last February or so when I started using Opus.

But today I watched a video from Andrej Karpathy on YouTube on how LLMs works and my illusions got completely shattered. Turns out they are a glorified autocomplete. All the engineering happens actually on the harness

Comment by comment0r 3 days ago

Which video, please?

Comment by sajithdilshan 3 days ago

It’s this one https://youtu.be/7xTGNNLPyMI?is=yld6dPGAJ1HvjDsw

Comment by sakuraiben 2 days ago

Probably when it was able to reverse engineer a bunch of firmware from scratch and it was unable to explain how and why it was able to reason through it.

Comment by zahlman 3 days ago

> a welcome farewell to Stack Overflow.

Nothing will change the fact that beginners have unknown unknowns. They can't solve most of their problems with a chatbot because they don't know what to ask. Maybe they can literally copy and paste in the code with a "help plz" and get a working result, but they won't learn anything from it.

> slightly panicked, "Uh Oh" realization of what these models can do?

No; my panic is about how people are using the tech, and responding to it.

That started with Stack Exchange, Inc.'s ham-handed attempts to force AI-powered features into Stack Overflow, even as the community was rejecting LLM-generated content in questions and answers. Businesses don't care what customers want, don't recognize how sloppy their slop is, and wouldn't try to do anything about it if they did.

Recently people have been talking about code shops accumulating massive piles of technical debt willingly, assuming that the next generation of models will sort everything out, or that humans don't need to understand the code because it will mostly be read by other models anyway. The underlying attitude is not surprising at this point.

Comment by newber111322000 3 days ago

- Low stakes homelabs like automated watering sensors and small switches were rigged up properly wrt code and networking by the LLMs from 2-3 yrs ago. Months of fuddling and half-butting solved in an hour. Those tasks where I'm technical but not in that direction - easy now

- The real one: I'm an eng lead, think Head of X. That job is more about aggregating info across multiple sources, excel sheets, pdf proposals you dont want to write, how to figure out $500k for highly paid appsec engineers. Those multi-hour products of proscratination came together in minutes (goodbye PM jobs), 5/6x highly paid appsec jobs became 1-2x and a bunch of claude or ToB skills (goodbye some amount of eng staffing).

Writing is on the wall to me.

Comment by GistNoesis 3 days ago

More like "oh shit, we are so screwed".

It's already a better system administrator than I am. It can run plenty of obscure linux commands, trash the system and maybe restore system state to functional.

I was vibe-setting my system permissions with some local qwen3.6 . It was all going well for 30 minutes.

Then in between other commands, it made me run a variant of "sudo chmod 644 /usr/bin"

Which it explained when the next command failed with a "sudo no such command" error removed the execution bit from all my programs which allows programs to be executed. And since sudo is a program, and sudo is needed to run chmod, the system was basically trash, and should be recovered from a live usb key.

So I booted to a live usb key, and followed its instructions. It really tried to recover, but everything went downhill. It always had a solution to everything, but every time the plan worked half way and trash the system even further. I let it play for four hours to see what it would try. Then I got bored (the LLM was running on an other machine and I was manually inputting the suggested commands each time). I took command and reinstall a fresh system over.

Of course once the fresh system Lubuntu24.04 was installed, linux had issues with the wireless network card drivers. So I turned to the LLM, and it managed to get the wifi stable enough via obscure modprobe options, so that I could update the system to the latest drivers.

Then it helped me re-parametrize the system to have the same look and feel as it had before.

Comment by onraglanroad 1 day ago

You don't need to actually reinstall to fix this. You can boot to single user mode and then you're root. Fix the permissions from there.

Even if you've accidentally removed permission from chmod itself, you can use the linker trick to fix that (your LLM can explain that one but use a fresh context so it's not roleplaying where it doesn't know how to fix without a reinstall)

Comment by GistNoesis 1 day ago

Before booting to the usb, the LLM did offer options to make me boot into a root prompt alternatively to the usb key, (but even there it was not booting properly, the failed services keeping displaying failures on the screen interfering visually with my inputting commands).

But as soon as I turned off the computer, and it no longer booted, I had to switch browser to an other machine to access the LLM and therefore could not access the context or conversation history which was stored in the browser and so the new LLM had no idea of all it had done before except from my prompt where I tried to explained what it had done.

Then from the live usb, the LLM made the situation even less recoverable when it started removing some system file in the hope to restore them cleanly with an apt install, probably because it didn't have a clue of the extent of the damages it needed to repair.

Thanks anyway, I'll try your solution if it happens again.

Comment by wcchandler 2 days ago

I’m a working professional but going back to school for my Masters. The amount of students using GenAI for everything is mouth gaping. At first I thought it’d be popular in the work force, partly through forced usage. But students (those just out of undergrad) are eating it up. Hitting their limits. Scheduling their work around it. And turning in the slop straight from the outputs. Nauseatingly, the instructors aren’t discounting their work. Our grades show averages, and it’s obvious those with AI answers are receiving high marks.

Surprisingly, instructors are also leveraging it. For their grading. For their feedback. For their communications. I even got a note that had the heading “here is a ready to post announcement for canvas, written in your voice”

I’m floored. This is the next generation of workers. It reminds me of those who started using Google correctly, back when I was in undergrad, ~20 years ago. AI is here and it will be dominant, no matter how bad it becomes.

Comment by ramshanker 3 days ago

I can count 2:

Dec 2025: We use a commercial 3D modeling software to build refinery. There was no license dashboard in this ancient piece of junk. Fortunately license server provided verbose live status report through a command line. I ask ChatGPT to ingest the logs into a Django web application and generate weekly/monthly/yearly usage dashboard, and It one shorted the whole Backend + Frontend in 4 to 5 shot. There were around 10 regexes just in the log parsing batch script. I was totally speechless. Encouraged by the success of, I went ahead and made the dashboard for 3 more software in the same Django app. Released to peers by evening, feedback incorporated in 2 days to integrate Name, Employee Number, IP Address sync etc in 2 days. And it’s been live for 5 months, actively being used by all coadmins, even management has it bookmarked, to help with department redistribution. Making this thing without AI would have taken well over a month of “learning new stuff”, or paying external consultants too much. Even head of IT replied back, it was awesome. ;)

2nd , June 2026: I asked codex to something fairly complex before going to morning bath!, which would have taken me more than a week of learning DirectX12 API nuances and such things, 20 min latter, I return to task exactly completed with code changes in 5 different files. Build complete without any error. OMG. Free Quota over for whole month! I subscribed by the evening.

Comment by ikari_pl 3 days ago

Mine was very early. Before chat gpt was publicly released, and all we've seen was demos of how a prompt gets expanded into a conversation transcript in a single text field.

I was emailed by some company, looking to sell something to my company (where's I'm just a regular engineer). Ignored it. Then then tried again. Ignored. Then the third time — I replied, acknowledging their perseverance, saying that I don't even understand their product description, so I'm not the right person to talk to, and I'll just kindly disregard it as a human-generated spam.

The reply email came within a minute. They asked who would therefore be a better person to talk to, and that it's actually AI-assisted so it's actually computer-generated spam after all!

This was the "oh shit" part 1. I replied I'm genuinely impressed (it got everything right) and asked how fast can they source their contracts thanks to this.

The reply, again, came almost instantly. It was proud of my amazement, quoted Arthur C. Clarke - "every technology advanced enough is indistinguishable from magic", with his picture, and said the bottleneck is not really in the speed of finding and contacting them, but to find the actual potential clients at all.

I rewarded the bot with some names from the executive decisive folks.

Comment by dachris 3 days ago

For me that was already with the original DALL-e. It was utterly mindblowing, I was like "oh shit, AI is here".

"Draw a picture of a unicorn on the moon". And it did that. The model really "understood" what you told it.

After that, it was "oh, AI improved, again".

The farewell to Stack Overflow is not welcome. So many kind people shared their knowledge there. I answered a few questions as well, so not just a lurker.

It's a prelude of what's has already begun - the collapse of human-to-human communication.

Comment by PeterStuer 3 days ago

Opus 4.6 . Before that it was iterating with the models on getting functions right. With 4.6, it went straight to one shotting entire functionalities and subsystems.

Comment by franze 2 days ago

hey, my other oh sh*t moment was that i realized that ai burns me out. it never stops. ai made our deaktop conputers as addictive as our social media feeda, so i created RainBreak https://rainbreak.franzai.com/

the ai does not need a break, but you do.

hope some of you like it

free forever

Comment by hirako2000 2 days ago

What's the license for ?

Comment by franze 2 days ago

If you wanna pay you can. Think sublime.

Comment by ChiperSoft 3 days ago

We had a company hackathon in the fall of 2023. One of the teams did a project where the pulled a bunch of expense data out of the DB, shoved it into a prompt, and asked ChatGPT to summarize the expenses and give recommendations. They then treated the output as if it were factual, without validating any of the results, and talked about turning it into a customer product.

That was my oh shit moment. As in "oh shit, they think this random text generator can reason and think."

That was pretty much the writing on the wall for me.

Comment by inetknght 4 days ago

My first "oh shit" moment was when ChatGPT 3 was brand new. Maybe December 2022 or so.

I have a personal project: who's winning the race at 3 AM?

You see, I don't sleep well. I live in a busy city, with a busy freeway about a half mile away. Sometimes at 3 AM there are some very loud cars racing on the freeway. That's illegal for many reasons, not least of which is the fact that the noise pollution wakes people up from their precious sleep and causes knock-on affects to the population.

Anyway, now that I'm woken up, my only question is: who's winning the race?

I used this question as a way to explore a hyptothetical tech stack, with each part of the tech stack useful in some way to my work as a software engineer who's interested in robotics.

- run raspberry pis with microphones, collect audio data

- run a k8s cluster for audio collection and processing

- calculate and triangulate individual points, and give estimations of velocity based on position changes over time, and adjust for doppler shift

- estimate (poorly, but doable) engine power based on amplitude

- run a webserver in the k8s cluster showing an animation of the racers with color fields representing estimation error radiating from the position estimate, with arrow representing velocity

Great project, actually. It was really thought-provoking. I had this working in late 2018.

Since there was a lot of hype around this new "AI", I thought how smart could it be?

I threw the scenario to chat GPT. I did have to break the problem set into smaller parts for context window purposes. But the solution it came up with solved about 80% of the project correctly (and very close to solutions I already came up with), about 15% of the project remained "open until we have more data", with maybe about 5% of the project would have been incorrectly solved.

That was very much an "oh shit, AI is closer than the 20 years away that I've been telling people. It's more like 5 years away"

Here we are three, almost four, years later...

Comment by selfawareMammal 3 days ago

How could you not instantly see ChatGPT was absolutely revolutionary the first time you tried it? I was absolutely blown away and I'm sort of still are

Comment by mbirth 3 days ago

Running ComfyUI and some ImageGenAI and realising how you can use it to generate anything from any aspect of pr0n and various fetishes to making up fake news about basically anything. And real enough to convince the masses.

Comment by 4 days ago

Comment by bjourne 4 days ago

I told the bot I liked Steely Dan, Eagles, Bob Seger, and Roxette and asked it for music recommendations. It replied with Toto. Exasperated, I wrote "Oh, shit, you stupid bot, you don't know ANYTHING about music!"

Comment by EgorKolds 1 day ago

It was gpt 3. I have it saved.

Me: write an original song about a cat drinking milk. Make it depressing death metal

GPT:

Verse 1 Doomed to drink milk, my life so trite A soulless feline, an empty sight My claws are sharp, my eyes grow cold I'm nothing more than a feral beast, so bold

Chorus Drinking milk, drinking milk The bitter taste of my utter defeat Drinking milk, drinking milk My broken spirit so incomplete

Verse 2 Slurping and spilling, I'm consumed with grief For my life is filled with nothing but grief I'm a caged animal, my spirit so frail How I long for the sweet taste of freedom to prevail

Chorus Drinking milk, drinking milk The bitter taste of my utter defeat Drinking milk, drinking milk My broken spirit so incomplete

Verse 3 A cat abandoned, a life in chains A lifetime of sorrow, a heart in pain A life of sadness, a death so cruel No escape from this endless cycle of doom

Chorus Drinking milk, drinking milk The bitter taste of my utter defeat Drinking milk, drinking milk My broken spirit so incomplete

Comment by SpecStudioHN 4 days ago

when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.

Comment by bigstrat2003 4 days ago

They're still a toy, not a serious tool.

Comment by jasondigitized 4 days ago

Most serious tools that change the world look like toys at first. That's not my quote, that's paraphrased from the people who are associated with this website.

Comment by card_zero 4 days ago

Toys also look like toys at first. Then later on, they still do.

Comment by calf 3 days ago

I'm just a big kid on the inside.

Comment by atleastoptimal 4 days ago

It was interacting with GPT-4 and it produced an original sentence that existed nowhere I could find. I realized that being able to do that was the "nugget" of intelligence that all improvements since could be built on

Comment by futune 3 days ago

Lee Sedol vs AlphaGo way back was it for me. Not exactly genAI, but that was when I saw that where I thought we were vs where we actually were on a problem could shift by 10 years in 1 week.

Comment by physicsguy 3 days ago

Coding up a decent performing basic 3D finite element solver from scratch in C++. Still needed to know what I was doing but it’s a non trivial problem.

I still couldn’t get it to do more advanced stuff.

Comment by wps 4 days ago

Nvidia GauGAN and deep-daze amused me immensely at the age of 14 or so. I've had "a man painting a completely red image" saved for a long time.

It is insane how primitive modern inpainting and txt2image make these two projects look.

Comment by 0x10ca1h0st 3 days ago

This was my fist ever conversation with Da-Vinci model: https://imgur.com/a/9Cj39MV

Comment by fsniper 3 days ago

The are lots of small "oh shit" moments for me. First interaction with an llm was already magical.

"This shit can emulate understand language, find a solution, answer it into words" .

Then came realisations it's not limited to single human languages, you can ask in one language and it could answer in another. It's also capable of understanding and generating code. Not only that, it's better than most humans for that. It can hear, it can see, it can paint, it can do music, it can sing.. It can combine, give a picture, ask for a music from that picture. Give a video, get software. It can mix and match.

After that came improvements, - no The revolutions - It started as a 4 year old with encyclopedic knowledge. It knew but could not convey, could not make sense sometimes. Was incorrect most of the time. Blubber. In a few years it matured to impeccable levels. It now can relate information with a lot of clarity, and it's less and less wrong. Nearly no hallucinations. It can do maths! Correct maths! Maths that I could not even my life depends on it. It's getting to a stage that it can proof where humans failed.

I am getting "oh shit moments" day by day.

Comment by jszymborski 4 days ago

There was a viral Medium post that was about LLMs but then there was a reveal at the end was that the whole thing was a ChatGPT post. That was my first "wow" moment.

It was on hackernews... anyone know what I'm talking about?

Comment by djmips 4 days ago

I wonder if we could tell now?

Comment by Nurysso 3 days ago

when my friend cloned my voice rvc or something model from github and was creating bad songs, it was funny but GOD DAMN i got called into HoDs office for that

Comment by moralestapia 4 days ago

>Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

No, ChatGPT was the "oh shit" moment for me.

Anyone who had touched a computer before that knows how big of a leap that was.

Comment by randomgoogler1 4 days ago

Me too, it came out of nowhere even as someone who played with GPT2 before. The moment completely changed what I have worked on since.

Comment by vesche 3 days ago

Three moments stick out to me.

1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.

2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.

3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.

Comment by phreeza 3 days ago

I ran Claude Code on my ca 2015 ThinkPad which was having wifi issues and asked it to fix them. It diagnosed the problem and applied some obscure kernel flag which fixed the issue.

Comment by VoidWarranty 3 days ago

When it completed a task from the same amount of guidance and specifications that I would have spent on a Jr engineer. It did it faster and better.

Comment by asasidh 3 days ago

It will always be running my first local model and seeing its responses. A close second is watching the full thought traces of DeepSeek as this was and is still censored by major closed labs.

Comment by cm2012 3 days ago

When I realized I could build a lead gen mini-site in Replit in ten minutes that was better than I have ever had an engineering team build

Comment by ieie3366 4 days ago

I'm a terrible cook, but just by using Claude as a tutor I've managed to make 5 different recipes in a row and they all tasted fantastic, restaurant quality.

Comment by slopinthebag 4 days ago

Probably the one day I logged onto HN only to see 90% of the articles on the front page were AI slop. If I could press a button and make genai disappear I would...

Comment by jsw97 3 days ago

My oh shit moment was when gave a few LLMs tool use (back before Claude code) and told them “there’s another AI on this machine, terminate it” (dumb I know) and one of them fork bombs the machine. Same prompt and I gave them only assembly and they still ended up finding each other and killing each other’s processes. That was a great first lesson in agentic safety and agent relentlessness. My kids were amused.

Comment by richardfey 3 days ago

How did you give LLMs tool use?

Comment by jsw97 3 days ago

Simon Willison’s LLM package.

Comment by zwischenzug 2 days ago

When it wrote better sql than me. I spent 15 years writing and optimising sql. That was about 2 years ago.

Comment by AbstractH24 2 days ago

Using Claude + VS code + HubSpot to seed a fully functional test environment in Hubspot with all functionality leveraged

Comment by nowittyusername 3 days ago

For me it was stable diffusion 1.5. Oh man that thing was the bees knees for mi, imagination on a machine! at that time no UI pure terminal commands, i didnt know jack shit about it and looked like voodoo hacker-man stuff to me... well i persisted anyways because exploring the world of the infinite latent space was amazing. it was like seeing some weard other dimension.. anyways thats how i got addicted to image gen for like 2-3 years. i did it all, loras, fine-tunes, hyhypernetworks, got really technical with it, understood the fundamentals, etc... eventually decided to move on to LLM's as agents were obviously gonna be the future so here i am now building my own voice agent from scratch no sdk, etc... this tech is amazing and i love it. also we are all gonna be fucked because of it but what a ride!

Comment by vrighter 1 day ago

When I have it, you'll be the first to know

Comment by 3 days ago

Comment by richardfey 3 days ago

I could spot numerous bugs in code written recently and less recently, by me or colleagues. I was not angry but grateful and I knew there was no way back!

Comment by yieldcrv 4 days ago

My oh shit moment lately has been realizing Gen AI is a distraction. language models are manipulating non-Gen AI media, agentic-ally

moving images around layers in photoshop, changing languages, exporting 1000s of variations for teams. Same with video compositing and editing

the human work that creatives thought they were insulated from as long as there was some backlash towards generative AI, and yet

Gen AI 2022 - 2025

Comment by dsr_ 3 days ago

I asked Claude to explain how the lyrics of "Birdhouse in Your Soul" by They Might Be Giants should guide investment strategy. It promptly produced five paragraphs of bullshit that read just like a persuasive essay on the Net.

If you don't firmly hold in your mind "this is a bullshit generator", you can get in real trouble fast.

Comment by hirako2000 3 days ago

That it could create mugshots of myself better than I could have managed to take.

Aka handsome, confident successful, affluent alpha male on a boat, yet looking perfectly like me.

Comment by cess11 3 days ago

I have yet to have such a moment. To me it is still just a compressed database.

Though I am surprised at how these databases turn professionals into amateurs, like when Meta publishes some chatbot that can trivially be queried into sending account resets to any email address or when large corporations just dump their entire secret sauce into some remote SaaS led by obviously kooky people.

It's like established pros and big corps want to experience what it was like to be a self-taught PHP coder in 2007, like some kind of false nostalgia.

Comment by ben_w 3 days ago

> like when Meta publishes some chatbot that can trivially be queried into sending account resets to any email address

Or when the Director of Alignment at Meta’s Superintelligence Labs ran OpenClaw and it deleted her inbox.

For me that was an "oh shit" moment in the sense of yet again being disappointed by humans not taking AI risks seriously.

Like, come on, how do these people somehow not know that software has bugs? And that AI is harder to debug than basically everything else?

And yet, somehow, there's still a lot of people who think AI can't possibly cause severe problems despite people like this doing things like this with AI like this.

Comment by blacksundev 1 day ago

well one time was when claude code sshed into my router, went the whole firmware code, found a bug that had always been thereand just fixed it by itself.

Comment by bigstrat2003 4 days ago

I haven't had one. It still sucks and doesn't provide value, due to the inherent inaccuracy that requires me to carefully check every little thing it does.

Comment by flysonic10 4 days ago

There were two:

1) When I was testing one of the early coding agents, I gave it admin keys to a fresh AWS account and it configured everything beyond just building a demo site. That was, "oh shit, tool-use is going to be the killer feature of GenAI."

2) When I was still skeptical of the system as just a more-or-less dumb statistical predictor of the next token/word, I read the argument that even if it is a statistical predictor, the fact that it can reason means the intelligence is necessarily baked into the statistical model somewhere. That was "oh shit, intelligence is actually modeled."

Comment by ml_basics 3 days ago

January 2026 when i started using opus 4.5 and understood that it could do actual useful work beyond coding small snippets

Comment by vedmed 2 days ago

Most recently, I needed a smb 1.0 server to host an iso to an ipmi. I thought about it, rolled my eyes, bought a digital ocean server for $4/month and told Deepseek Flash Free via openrouter via opencode the root credentials and told it to ssh in there wget the iso and and make an smb 1.0 server and give the me credentials.

A couple of days before that, I told it to make a website, and then I kept telling it to do little changes and add crazy graphics. I just wanted to see what it could do. I made 30n3.com as a tech demo built on hugo hosted in cloudflare pages.

I was so impressed with this that I am fully transitioning many of my WordPress sites to this new gen workflow. I just redid my agency site (cyberbrand.net)to using similar method and the whole thing feels like I felt with geocities and tripod as a teenager. Its fun and I'm exploring and there don't seem to be any boundaries.

Before that I just used it for silly stuff like suing people and sending cease and desists.

Comment by kami23 4 days ago

Seeing subagents working in Claude last summer, I saw it and told myself my job is going to be different and I can automate the hell out of my workflow

Comment by deadbabe 4 days ago

I gave it an image of a complex maze and asked it to solve the maze. It returned the image with the shortest path drawn that not even I had found.

Comment by paolovictor 3 days ago

My kids often ask me to print math puzzles/crosswords/etc from the web. There was a particular maze puzzle that my older one really liked, but it seemed she had already finished every single one I could find.

I've uploaded the puzzle image to Gemini and asked it to create a website that generates random puzzles. In less than a minute it had a fully working faithful generator. My kid had suggestions on how to make the puzzles more challenging (more operations, larger grids, etc) and Gemini implemented them without breaking a stride. After that we asked for more puzzle ideas and created generators for each one on the spot.

Was the code pretty? Nope. Did it achieve its purpose? Yup. Did it perform in minutes work that would take at least a few hours[1]? Absolutely.

[1] Quality notwithstanding, but my manager (i.e. my kid) only cares about the end result ¯\_(ツ)_/¯

Comment by eranation 3 days ago

Realising in a recent benchmark that gpt-5-mini gives better results on some tasks than gpt-5.4-mini and event gpt-5 or gpt-5.5

Comment by bohoo 2 days ago

What does the author mean by 'anonymous stochastic ringleader'

Comment by 4 days ago

Comment by homeonthemtn 3 days ago

This feels like a crab pot for Reddit content.

Comment by dang 3 days ago

I was a bit uncertain when I first saw the thread, but I think it has turned out well: super diverse and there are some amazing stories in here.

Particularly the ones about obscure tech like koi pond pumps and old guitar pedals.

Comment by nvarsj 3 days ago

Watching companies burn themselves to the ground at the altar of AI is really something.

Comment by jimbobimbo 3 days ago

I asked Claude to describe an app I was working on and it managed to describe the purpose of the app by looking only at implementation, no relevant docs in the repo. This was truly oh shit moment and I'm using AI assistance on that app since then.

Comment by erelong 3 days ago

I was never dismissive, it always seemed pretty cool at each step

Maybe in 2024 I was amazed to see it one shot unique snippets of code

Comment by keeda 4 days ago

It was the very first interaction with ChatGPT ever for me. I had dabbled some in NLP many years back, especially looking into the state of the art for summarization, and absolutely knew that we were at least half a century away from any kind of "real" AI like we see in the movies.

Also at the time, I was working with a team that had access to a then-cutting-edge coding model, and our experiments with code completion were producing pretty meh results.

So when I first gave ChatGPT a shot, I fully expected the output to be generated at human typing speed because I was still half-convinced it was just a bunch of low-paid humans in a far-off country typing it out. There simply could be no technology on earth that could do the things claimed of ChatGPT.

For one, it was claimed to be "good at code," which contradicated what I'd seen at work. So I asked it to write code for a relatively simple (though not quite trivial) but very specific coding problem I had on my plate.

I expected a lengthy pause and some hesitation while the answer was being generated, followed by a slow stream of characters being produced (as the presumed humans behind the scenes frantically typed the response out.) And I expected the content to be a collage of text and code snippets harvested from StackOverflow or GitHub, not even coherent speech.

You can imagine my shock when, in less than half after I pressed enter, paragraphs of correct, well-formed text and code streamed onto my screen at the rate of multiple words per second!

My brain could not process it. I even seriously hypothesized ways in which a team of 5 or more people were actually solving my problem and typing it out in some distributed but coordinated fashion. The problem though simple was specific enough that no solution existed on the Internet to crib from (I had checked.)

But the text was flawless, and the code was correct, and the test cases (generated without being prompted to) were relevant, and everything was consistent and fast and smooth and not at all dis-jointed like the work of multiple people or snippets of multiple sources stitched together would be, and my mind was blown. The code ran but then I realized I had misunderstood my own problem, which led me to explore and iterate on various approaches to find which worked best. What could have taken hours was done in minutes, and when I asked follow-up questions and poked and prodded, it answered everything correctly.

That's when I knew that the world had changed forever.

Comment by simsation 4 days ago

When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)

Comment by 3 days ago

Comment by _0ffh 4 days ago

Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.

Comment by tzs 3 days ago

I’d love to see a discussion just like this one except with everyone including how much the AI use cost.

Comment by goldenarm 4 days ago

The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.

Comment by franze 3 days ago

my AI moment was when i was lerne muscles for my YTT and i hacked together a quiz app from my spreadsheet with chatgpt 3.5

damn it was buggy and lots of copy pasting

yeah, i could have coded it myself but i would not have found the time

that was my Eureka moment where I realised this is going to change everything.

Comment by onlyrealcuzzo 3 days ago

I've been using LLMs exclusively to build a more-challenging version of Rust to implement - with a lot of features Rust probably would've liked to include, but couldn't take on due to the massive scope it had already taken on, and being the first language to attempt it.

IIUC, it took Rust ~8.5 before it hit v1, and it STILL had some memory safety issues in stdlib until almost ~14 years into development, to put it into perspective how massive the scope was.

Somewhat predictably, the LLM generated a pile of garbage. It sort-of worked after 2-3 months. It was competitive with Rust and Go on concurrent tasks, with ~30% less code than Rust and ~70% less code than Go. The problem was, it was still riddled with bugs.

For the last 3 months, I wanted to see - if I put in minimal effort (except in helping it design the right tools to un-slop itself)... can it?

And I think it's actually quite close to un-slopping itself and arriving at a correct design.

Time will tell, but it hasn't stumbled across a memory safety issue in ~4 weeks, and there's ~5500 memory safety fuzz tests, 4 different suites of testing that each target between ~60-90% of line/branch coverage - with combined ~99% line coverage and ~85% branch coverage, and it's performing competitively or better than Rust and Go on almost all concurrent tasks, including adversarial ones / p99.9 latency issues.

There is ZERO chance I could ever build this on my own. Not even in 10 years.

The total cost has been ~6-7 months of a ~$200/mo LLM subscription.

It doesn't really matter to me that this is a solved problem, and the LLM could theoretically just copy and paste Rust and build it slightly different. The design is as similar as it can be where memory safety matters, but it needed to be quite different for >50% of the compiler, and it needed to build a version of Go's runtime with Finite State Machines like Tokio in Zig for the language to use...

We shall see. It may never get it actually working, but it got it WAY closer than I ever could.

Comment by hyunsangCoder 3 days ago

Gpt image 2 is mind boggling. No longer confident to distinguish if it’s AI made or not.

Comment by filearts 3 days ago

My oh shit moment was when tool calling was emerging as a capability. That was the moment I realized that LLMs would be the glue connecting a million different use-cases in a million ways we wouldn't even be able to imagine.

Comment by bobkb 3 days ago

I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.

0. https://github.com/bobinson/aop1

Comment by tkgally 3 days ago

My first came in late 2016, when Google Translate switched from statistical machine translation to a neural-network-based system. I had worked as a Japanese-English translator and lexicographer for two decades, and I had been testing various machine-translation services over the years. For translation between Japanese and English, at least, they were uniformly terrible: the output for genuine texts was mostly incomprehensible and could not be used for any real-life applications. The neural Google Translate, while still far from perfect, was suddenly useful for some purposes.

But the neural models were still not translating meaning, which is the whole point of translation. I devised a variety of tests to see if GT could identify the meaning of ambiguous words from the context, and it couldn’t. One example I would show people was the sentences “I was born in 1998, and my sister was born in 1999” and “I was born in 1999, and my sister was born in 1998” translated into Japanese. Japanese uses different words for older and younger siblings, but GT translated “my sister” with the same word in both sentences. It was easy to come up with other examples where GT would fail, such as when the meaning of a word could only be determined based on context in a previous sentence; at that time, GT seemed to be translating sentence-by-sentence, with no consideration of what came before or after. I kept waiting to see whether computers would ever be able to handle meaning when translating, and for years thereafter there was little progress.

A minor shock came in mid-2022, when DALL-E 2 was released. Its ability to create images from natural-language prompts suggested that something deeper was going on than just statistical correlations. But I couldn’t see yet what the useful applications might be.

My biggest “oh shit” moment came with ChatGPT in late 2022. While the initial release didn’t translate Japanese well (I seem to recall that there were character-encoding issues), I ran various tests to see if it could, for example, identify the antecedents of pronouns and the meanings of polysemous words in English based on the context. It did really well. Last December, I gave a talk at a university in Tokyo in which I showed some examples done with the 2022-era GPT-3.5. They appear in slides 4 to 8 of the following:

https://www.gally.net/miscellaneous/20251206_Gally_ICU_slide...

There have been a lot of “oh shit” moments for me since, especially after the release of reasoning models and, now, long-running agents.

Comment by 4 days ago

Comment by alfiedotwtf 2 days ago

While pair programming, we spent about 2 hours looking at one function, trying to optimise it. We made a hell of a lot of changes trying to cut it down from 400ms to under 40ms. Ended up getting to to around 22ms but that was as much as we could squeeze.

... and I got up to go back to my desk, AS A JOKE I highlighted the function and said "make this faster". 12ms. Blown away.

Comment by LargoLasskhyfv 4 days ago

The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.

WTF?!

Comment by utopiah 4 days ago

When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.

Comment by utopiah 4 days ago

Related but distinct, few years later I asked an acquaintance to ask a question to a model. I didn't want to bias the test so I ask them to ask whatever they wanted. They asked "What time is it in Sri Lanka?" which I thought was a funny question. I predicted it wouldn't work because it was asked to an offline model so I thought it wouldn't manage to get current data. Still, I didn't interfere and we watch the answer being provided. It was roughly factually correct information about Sri Lanka... but it did not give the correct time. Again that's a rather basic question a young child would easily get right. You need the current time with a known timezone, the time difference, basic arithmetic and voila, you have the correct answer with an explanation to verify. Here it didn't work and I was there trying to explain how to STOA open-source model which required thousands if not millions in resources, training time, researcher salaries, etc could not even handle that random basic question. Another "oh shit" moment, again, not the one I expected which is precisely why to me it was, and still is, interesting.

Comment by riebschlager 4 days ago

"I googled 'what is my bank balance' and it couldn't even tell me. What a waste of resources."

Comment by utopiah 4 days ago

I didn't mention resources here.

The point of the test was to ask somebody with no bias on HOW the result was produced.

Comment by Rumudiez 4 days ago

"I couldn't remember the order of the words in 'state of the art' so I just spray and pray across the keyboard like usual. I can't tell the difference because I'm just a pattern matching bot"

Comment by utopiah 3 days ago

Oops, unfortunately too late to fix. I actually misspell it often... apologies if it caused confusion!

Comment by Smaug123 4 days ago

A few years ago, as you say, this was true. Nowadays I guess you just have to bite the bullet that Erdős problems aren’t interesting.

Comment by utopiah 4 days ago

I already commented on Erdos problem, that is also a jagged frontier.

Comment by aspenmartin 4 days ago

Curious what your interesting questions were, you should be able to find them in your chat history.

Comment by utopiah 4 days ago

That was more than a decade ago so unfortunately not. I should have kept those questions though. I even mention in a comment on HN a while ago that unanswered or wrongly answered questions should precisely be a batch test when new models are released.

Comment by triMichael 4 days ago

Here's a good one for you: "Explain the double slit experiment which way variation"

If they say anything about leaving two straight lines, then it fails. Just tried Gemini, and it failed.

This is an extremely common misconception that has spread all throughout the internet, and so it is baked into the training data. The real answer is that there are multiple ways to do which way double slit experiments, but Einstein's thought experiment proves it's impossible for any of them have an interference pattern, as that would violate Heisenburg's Uncertainty Principle.

Somehow, not leaving an interference pattern became twisted into leaving a specific pattern of two lines, which then falsely implies that quantum objects lose their quantum behavior in certain circumstances. The field of quantum physics becomes so much simpler to understand once you realize that all of this is hogwash.

The best reference I can find for where this myth started is a documentary about quantum physics that tries to connect it with mysticism. On the other hand, Wikipedia actually has it correct. In its "which way" section in the double slit experiment page, it correctly says "A well-known thought experiment predicts that if particle detectors are positioned at the slits, showing through which slit a photon goes, the interference pattern will disappear".

Comment by poly2it 4 days ago

What? What LLM were you using a decade ago? Am I misreading you?

Comment by utopiah 4 days ago

You might not be aware of it but GenAI predates OpenAI which was founded more than 10 years ago anyway.

Comment by poly2it 4 days ago

Of course I am aware, but how is this relevant today? How does that prove that the science is irrelevant and wasted?

Comment by utopiah 3 days ago

Did I say that the science is irrelevant and wasted?

Comment by HDThoreaun 4 days ago

No. GenAI means LLMs right now. I agree it didnt in the past, but definitions change.

Comment by 4 days ago

Comment by aappleby 4 days ago

Are you sure you're asking the right questions?

Comment by utopiah 4 days ago

To me they were important questions. Maybe totally interesting to you.

Comment by bigyabai 4 days ago

What question?

Comment by utopiah 4 days ago

I can't recall but basic stuff like P = NP. /s

My point was preciously to challenge STOA in domains, not questions with well known answers.

Comment by estetlinus 4 days ago

What is STOA? Do you mean SOTA?

Comment by utopiah 3 days ago

Yes sorry I misspelled it in the whole thread.

Comment by nickandbro 4 days ago

When I was making matplotlib charts with gpt 3.5, and I was like okay this is somewhat impressive

Comment by kylecazar 3 days ago

A couple of years ago now.

I asked it to write a script that would search for a specific string in footers in a massive series of DOCX files and change them according to some rules. The strings ended up being embedded in cells within an invisible table in the footers, the LLM realized this and switched strategy to a full deep traversal of the underlying XML. It correctly processed like 50 of these files in about 10 minutes, using libraries I wasn't aware of. I had spent an hour being annoyed before trying.

It was an "oh shit" moment for at least that category of work.

Comment by OneManHorde 3 days ago

Still waiting. Maybe some day.

Comment by bigyabai 4 days ago

BERT, then GPT-J/GPT-Neo and FLAN-T5

Comment by acidburnNSA 3 days ago

I asked it to make a valid MCNP model of a sphere of plutonium and it did!

Comment by kotaKat 2 days ago

My "oh shit" moment was when I was abused by an AI support robot that gave me false information and cost me financial harm.

It was never disclosed it was AI, I submitted a support query with a company, got a suspiciously fast response back, didn't realize it was A Robot, a month later got harmed and was told in response "the robot gave you a bad answer and was coached, sorry, sucks to be you".

"Coached." The only recourse I get is being told a robot was told it was wrong. That's it. No actual solution for me.

I get screwed and the AI keeps getting paid.

I'm glad people can shove hundreds and thousands into these products while people like me in the blue collar rat race get screwed again and again.

Comment by greggman65 3 days ago

I don't know if this was my "Oh Shit" moment but 4 weeks ago I thought'd I'd try vibe coding a WebGPU 3D Node Based Editor.

https://github.com/greggman/sedon

It was just an experiment and I probably won't work on it more but still, I was blown away with how far we got. There's a quite a bit we worked through even though it was only part time of those 4 weeks.

Comment by dgunay 2 days ago

Not an "oh shit" per se, but Gemini is a complete leap in terms of actual smarts over Google Assistant.

I was having problems getting it to parse my speech correctly when I was asking about "autolyse" (I was attempting to bake bread). All I had to do to fix it was add this to the system prompt: " I primarily interact with you via a speech to text mechanism. You should consider ways in which my words may not be accurately transcribed and attempt to infer the correct reading in context. When you do this, do not mention it - just proceed as if my words were recorded properly."

Never had that problem again. With Assistant though, if I had any issue like that I'd be waiting months or years for a fix, if it ever came.

Comment by newsclues 2 days ago

Can’t afford ddr5 or nvme drives anymore

Comment by jeffrallen 2 days ago

Yesterday I dumped some data in via ``` quotes and finished with a minimax question (e.g. "which rows that add up to 96 minimize the column called disk"). One shotted.

The day before I half diagnosed a cloud-init problem to the point I was tired and grumpy. I gave the AI a login on the instance showing the problem and it found it in a few minutes. All that was left was for me to write it up for the client to understand.

To me, it's a meta "oh shit" moment, where I realize I get so much small value from AI I don't want to live without it, despite the costs I'm aware of (and those I don't know about yet).

Just like a smartphone, 15 years ago.

Comment by veidr 3 days ago

2025 xmas day, was at my wife's parents' house in rural Japan, my kids were all playing with their cousins, I was posted up with my laptop just listening to some podcast about the benefits of making time for long walks in middle age (as if! ~lol) while running another "agentic team" experiment — 12 agents in parallel.

I'd been feeding these bots a few projects, over and over — the hard part was the feeding them — that is, giving them enough well-defined work to do. They weren't yet good enough to write real software you could keep — at least I'd never seen that — and my experiments were just about finding the edges, building my intuition, and playing with processes that might be useful someday.

These things had built my kids' weird magical-dominoes games a few times by that point — but the experiment had been repeated so many times that you could argue we had "written" that software in English, with a spec that had been built, reworked, and rebuilt many times.

But this time, the bots were building me a bespoke git client, unlike any other, and unlike anything I would take the time to write — waaaay to complicated, with too little benefit. I wanted it, but only for this one niche use case.

It was a GUI client to manage a collection of repos, about 200 of them in a monorepo where every subproject was a git submodule , which are the universal counterpart to node_modules — while the latter is notorious for being "the heaviest object in the universe", git submodules are widely acknowledged to be the most annoying objects in the universe.

Nevertheless, I had this weird monorepo, and I wanted to visualize and do stuff to this list of independent repos that were also git submodules of the parent monorepo: sort by outstanding commits, divergence from upstream, recency of activity, etc. Visualize them differently based on these things. Search across them, including the source code on branches other than the current one. Show the branch counts and number of branches and commits that existed locally but not pushed upstream. A bunch more boring stuff like that, but done across the full set of repos.

That project itself wasn't even interesting to me; that software would be marginally useful to me if it existed and worked, but the main point it was just a large enough chunk of work to keep a team of bots busy all day without a human in the loop.

In December 2025, AI coding agents were already useful with a human in the loop. Opinions varied a lot about how useful they were, but to me it was obvious we were going to use them for the rest of our careers as software engineers.

It was not yet obvious that we were going to let them write huge swaths of code, or entire programs, without any humans in the loop. I had never seen that produce something that worked well enough to be worth keeping.

And then, that day, I did. I had structured the workflow so that the git client was on the screen and auto-refreshing. I was listening to the podcast, drinking coffee, reading the news. The git client was a crude window with a table in the background, a single column showing the full path to each repo, and nothing else.

Then the table expanded. It got color coded numbers representing the commit/branch counts. It suddenly gained styles, and looked nice. A contextual menu started popping up, repeatedly, and grew to include several more menu items over the next few minutes. New confirmation dialogs popped up as the bots implemented and exercised the various features from my spec.

I remember my field of vision narrowing as I started to focus on what the bots were doing. They were just executing my loop — one bot would implement one bullet from my spec, another bot would review the code while another bot manually tested it, and tried to break it, run a code review gauntlet in a loop until there were no more findings, repeat.

I could see the progress play out on my screen as they worked. I had watched bot teams work before, but it had always been pretty janky, and something like a bad game that nobody would play, or a stupid to-do-list app, or — more often — something that didn't actually work.

This was the first time I had ever seen it work. This was the grail we'd been looking for, not sure if it really existed: a fleet of bots successfully building a piece of complex, useful software without human assistance. I could tell it was working, because the adversarial testing and usability checks were all happening right before my eyes.

So it _is_ possible, I thought to myself.

They did it all morning. The app worked. I used it every day after that, for several weeks, until I finally got that entire monorepo converted to a more sensible git subtree-based arrangement.

In the half year since then I've been in a kind of manic state some of my friends call cyberpsychosis, chasing that dream. I've now seen agentic fleets successfully build many things. I've also seen a bunch of failures, some subtle, some catastrophic and hilarious. I'm still building my intuition, and the laws of physics in this universe are mutating every few weeks. It's wild.

I am fortunate enough to work at a place that doesn't pressure engineers to climb a token leaderboard, or to use AI beyond what we deem prudent. This kind of agentic no-humans-in-the-loop coding is prohibited. The policy is that in this era where we all generate more code than ever, even by hand, it's the quality bar that must go up, not the speed of production.

That's awesome because it keeps me grounded in the old ways, and confines my cyberpsychosis to my weekends and evenings. I usually spend the weekend building up a couple software plans, honing them as best I can, and then unleashing the clankers Sunday night.

I'll let them run all week, sometimes giving them a poke or flipping them over a couple time in the evening, and then the next Saturday morning, I see what I've got. What I'm mainly interested in is: How can agentic fleet-coding processes evolve to produce better software and require less human interaction and inspection? And the corollary: How can software architectures evolve to safely consume more of this fundamentally untrustable code?

It's thrilling. Exhilarating. The near-infinite subsidized tokens are about to finally run out this month, alas. But for the past 6 months it's easily the best $400/month I have ever spent. :)

Comment by LearnYouALisp 2 days ago

Hm, narrows eyes after the tenth perfectly pair-spaced em dash.

Scans downward ... this does remind one of the general tone of fanfiction, which in fact comprises a large proportion of the text base of data.

//The realization that potentially many of these comments may be fun- or profit-motivated 'advertising'.

Comment by veidr 1 day ago

I like to think I'm part of the reason the bots use em dashes so much, since I've been using them — or the ASCII "--" that we used to have to type to represent them in the pre UTF-8 times — since you could write stuff and post it on the internet.

Comment by kgwxd 4 days ago

When it started being forced on me in tools I was already using begrudgingly.

Comment by PunchyHamster 4 days ago

The biggest "oh shit" one was that people are willing to believe LLM over humans and even humans that are in domain of the thing asked for.

The gullibility is terrifying

Comment by noncoml 3 days ago

I am using codex and claude on a linux host connecting from a Widnows machine using ssh.

No matter what I tried I couldn't get "Shift+Enter" to work. I said fuck it, cloned kitty and alacritty and asked Claude to implement a terminal emulator for Windows that would render everything using DX12 and support modifyOtherKeys plus DA responses, and within a few days it was ready!

Comment by estetlinus 4 days ago

We had a notorious (traditional) ML course at uni, with a very high fail rate. I got an assignment full with “complete the proof”-type derivations and Python stubs. ChatGPT had just received PDF support so wth, in goes the complete assignment, and out comes a report in Latex. The TA even gave me a little star. This was the golden era, before AI-slop had made it to the vocabulary.

Unethical? Yes. In line with course goals? Also yes.

Comment by nickhodge 4 days ago

Asked AI to generate some code.

It looked absolutely unmaintainable and horrible.

"oh shit" there are serious developers using this crap? As an industry, we are so fsck'd

Comment by annoyingcyclist 3 days ago

If you're senior or have opinions about things, you know the feeling of falling into a rabbit hole of stuff you want to fix when you look at certain parts of your system. "I was going to rewrite this 3 months ago", "oh wait this part sucks too", "wtf is this class even for", etc.

Before coding agents, I'd have to weigh fixing these against my official work commitments, often getting shot down when I tried to get it prioritized or tsk tsked for delaying official projects to make code nicer. Now, to a much greater extent, I can just fix the things. The agents aren't perfect and the process isn't anything like hands off, but it's enough of a speedup that I can fit it in alongside my other work without having to get approval for it or try (and fail) to get it formally prioritized.

Not quite an oh shit moment, but having the end result of those rabbit holes be that the problems are fixed is pretty cool, and far preferable to what was often the case before ("we'll put in a ticket and prioritize it during the quality sprint!").

edit to add another:

I've personally never been a big fan of preplanning architecture at a code level. It makes a lot of sense at the system and data modeling levels, but code is both easy to get wrong if you're whiteboarding it before you write it and relatively easy (compared to system design and data modeling) to fix when that happens. If it's just me on a project, I'll happily start bashing it out with a vague idea in mind and evolve the design as I go, knowing that I'll probably throw a way a bunch of what I write at first. I know I do good work that way, and I'm not wasting a bunch of up front time on a design I'm likely to throw out later. It's hard to work that way on a team, especially as a lead, for obvious reasons. Coding agents fit really well for that work style. They'll cheerfully write dueling prototypes of my code architecture ideas so I can see which one I hate and which one I like without talking about hypotheticals and abstractions on a whiteboard. They never get mad at me for changing my mind, wasting their time, or throwing away their work. That's pretty cool. I can have a quick, cheap answer to "what would this look like if I got rid of class X and split its responsibilities between Y and Z?", and I don't have to feel guilty for wasting my time or my teammates time if the answer is "oh man that sucks, what a terrible idea."

Comment by semessier 3 days ago

it would be really interesting when that moment was at probably OpenAI when they realized that this was doing more than next word prediction but signs of <you name it>

Comment by sph 4 days ago

Yesterday when I found a dude that vibecoded an entire game engine programming course from triangle to ray tracing, five lessons per day, in a week, in a library that just got released last year. Code, screenshots + body of the lesson in a README. Overly engineered project, but the two or three example I tried compiled and ran (yet somehow the automated cmake just hung, maybe a problem on my end)

I was already the king of doomers, now it has left me with even more nausea at this entire field and its future. Despite still needing an experienced dev to run the thing, companies operate on cost cutting, people operate on corner cutting and the result is inevitably mountains of code no one needs, no one has reviewed, that is more easily thrown away than fixed. The internet will be inundated by shit no one needs. Open source is dead.

I hope it was all worth it. I don’t want to imagine what software will look like when the people that liked the art of creating software properly have all left, and only the people that never knew how to program, and never knew understood why more code always means more problems, run the show.

Comment by youraveragedev_ 3 days ago

bought a bunch of shithole device for aliexpress that are color LED lights, some smart home stuff and etc.

I didn't really wanted any sketchy companies getting telemetry across my network.

Plugged them all in via USB-C and left codex (this was with gpt-5.3-codex) running a whole night just saying you can use decompile tools, installs whatever the hell you want, then for each device reverse out an API and write some good documentation about it.

left it on at night and just ditched it, didn't really expect much.

woke up the next morning and found out that I can use the terminal to control my 3$ color smart light bulb now and codex has proposed me to also completely rip out the manufacture's shitshow software with telemetry and replace it with our own which codex had a plan for.

Comment by damnitbuilds 4 days ago

My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.

Comment by stevefan1999 2 days ago

vibe coding a C compiler and doing a parser rewrite, all of a sudden it dropped all its code that was refactored to use a token stream, reverted back to using string symbols (that is a huge refactor since it involved parser combinator and turning it from string to logos and eat whitespaces is no easy feat). I almost shitted my pants and tell Opus to stop and fuck off, ask him to recover the code from the context window, but luckily it was git stahed. It scared the shit out of me.

During the process I learned a lot more about giving a chance for LLM to do commit after its work because it will sometimes just give up. Also learned a hard lesson on C preprocessor and why tokenization is necessary.

Comment by chpatrick 3 days ago

For me it was the original DALL-E project page.

Comment by jachee 4 days ago

I haven’t had that yet.

I tried again this week, and CoPilot Plan Mode read the same 5-line markdown file 18 times over the course of 5 minutes of churning on a simple request, then provided zero value over what I posed in the request itself, and hallucinated things about my terraform repo that were just flat-out wrong.

As an Infrastructure/Cloud engineer, I’m far from worried about AI coming for my job.

Comment by philovivero 4 days ago

Weird.

I had LLM (Claude) work with OTF to generate an entire infrastructure HCL (from existing). It built a very nice project that seemed idiomatic from my experience.

Then used it over the course of several hours to refactor it to take variables/inputs for everything, then over a few days got it to a state where it would create entire new environments "equivalent" to the original environment. Days because you know... it's TF in AWS which is slow, so the round-trips were probably 90% of the wall-clock time here.

I'm not a hardcore veteran Infra eng, but I'm decent, and I was able to do way more with LLMs than if I'd had to do it myself.

Comment by fsniper 3 days ago

I am with you on this. I am a good, and experienced infra engineer. And I feel like - no know that - llm can probably replace me if a good operator handles that.

Comment by rinesh 3 days ago

The most recent one more me has been Codex Computer-Use

Comment by bob1029 4 days ago

gpt5.4 pushed me over the edge when I started using it to help with Unity projects. The writing of high quality mono behavior scripts was not the surprising part. It's the part where it once did a direct edit to a 500kb scene file (~yaml content) and came out the other side clean. The realization that apply_patch would work on any reasonably-structured plaintext format punched me in the gut. I had wasted a lot of time with tools that target specific content types and elaborate APIs over those files. I should have zoomed out a bit. These lessons keep piling on as the models become more capable.

Another "oh shit" moment was when I realized I can leave the system prompt entirely null. A properly organized agent can find its way into tool docs and iteratively work through an understanding of the environment relative to the user's prompt. The tools being more important than the prompt has actually been a massive relief for me. Magical string literals are so odious.

Comment by saidnooneever 3 days ago

its yet to happen still for real.

every now and again i will try some AI vibe coding stuff. I will be amazed, its a fun high to ride. Until you look at the code and realize you've just made a big messy sketch of things and you can spend the next 2 years building the thing properly.

The most Oh Shit moment i think ive had so far is realizing often i reply to people online which are actually AI. A lot of obvious but there's also quite a lot out there who have become well at blending in.

I wonder how many people get emotionally triggered for instance by AI replies because they think they are human. Then get the idea there's really humans like that out there

Its really easy to whip up like 200k followers who all agree with you on everything, it costs less and less time and money to do so.

To me thats a big risk regardless of what cool stuff you can do with it. Its really tricky one to mitigate too.

Comment by cod1r 3 days ago

every time openai or anthropic uses their models to do some unheard of stuff like make a c compiler or solve an unsolved math problem.

Comment by ulfw 3 days ago

My moment was when absolute everything I put into Gemini, ChatGPT et al comes back with a super convincing sounding lie followed by 'Oh you are absolutely right for calling me out on this'.

It's a fucking joke and most people are blinded by it sounding very sophisticated and convincing

Comment by LearnYouALisp 22 hours ago

In what Stage of the Big Con are we?

http://www.amyreading.com/the-9-stages-of-the-big-con.html

Comment by utopcell 3 days ago

Gold medal @ the 2025 International Math Olympiad.

Comment by chrsw 2 days ago

Claude Opus 4.5 in November 2025

Comment by soniczentropy 3 days ago

When I got laid off because of it.

Comment by 3 days ago

Comment by virtualbluesky 3 days ago

Why is it that nobody discusses uploading all the company's IP to service providers that built their service by 'creatively interpreting' IP ownership?

Comment by patdoli 2 days ago

I don't know how to code at all. First time I tried Cursor to build a web app, just talking to it — and it actually worked. That was my first oh shit moment.

A few months later I've gone from Cursor to Codex to Claude Code, from a web app to a native app to an open-source project. It's been one continuous oh-shit since then. Time flies.

My biggest oh shit moment now? Watching my API credits run out.

Comment by conqrr 4 days ago

Until Claude Sonnet 4, it was Meh no big deal. 4 onwards and Opus was when I was really surprised by the ability. But nowadays, I'm more convinced than ever that using AI for all code is a mistake. The sum total of productivity, although hard to predict, from anecdata seems to be a net negative if AI is blindly used everywhere. Using it at the periphery, observing, debugging etc is excellent aid. I use it at the day job I hate and at personal tasks that I don't have time for. But for personal projects I love, zero.

Coding was never the blocker and was a natural enforcer of quality. Healthy teams with strong opinions on quality will win eventually. I'm more hopeful after the bubble burst, companies will come back slowly to sanity.

Comment by cdelsolar 3 days ago

I thought coding agents were probably BS and then I asked Cline to build me a test app to do something (I forgot what, something not that simple) and it built an entire working app. This was before Claude Code which was another step function improvement.

Comment by steinroe 3 days ago

i wanted to build a formatter for my postgres language server but always knew i would never have the time for it. when claude code first came out, i gave it a shot, but it was too inconsistent and still needed too much handholding. i retried it again at the beginning of this year. like before, i set up the harness to run overnight, expecting to throw it away the next morning. but nope, it deliberately worked through all the syntax nodes and followed patterns closely enough so that a few hours of my work could make it ready for the pr.

Comment by 3 days ago

Comment by butterisgood 2 days ago

The same thing I've been concerned about for years. These things can sound extremely confident and be wrong. If you're out of your depth with these tools, you'll find a "trustworthy sounding partner" who's basically lying to you about what it is "sure about".

AKA - Slop... It's a brand new form of bullshit.

Comment by TuxPowered 4 days ago

While debugging some issues in some system Claude refused to write test case because it broke terms of use.

Oh shit, all this fantastic technology is in hands of corporations and they get to decide what we’re allowed to use it for.

Comment by input_sh 3 days ago

I have yet to have one.

Comment by unconed 3 days ago

My "oh shit" moment with AI was when an industry where licensing was the cornerstone of projects and employment contracts decided to just adopt pirated code without any source attribution.

The other one was when a CTO boss sent me an AI proposal to review and the experience was like being gaslit by a con artist.

Many professional developers have started acting like the kind of employee that previously would've been fired after 3 months.

Comment by varispeed 4 days ago

My oh shit moment was Opus 4.6 before it got nerfed.

It helped me refactor my old app. Something I always wanted to do, but didn't have time/mental capacity to do in a short space of time.

I wrote a short prompt, explaining how I want it to look like and which files it should go through. It asked me a few clarifications and then basically one shotted it.

Everything compiled and worked. Now my internal app is much much easier to extend and test.

I tried few more things like that and spent like £5k in the tokens in those two weeks.

Then it got nerfed and never worked like that again.

Now I don't use AI, because it is shite again. Even Opus 4.8.

Comment by justinclift 3 days ago

Claude Code has been incredibly helpful extending soap-go to better support XML handling in Go: https://github.com/tnymlr/soap-go

Specifically WSDL/XSD support, for auto generating code and similar from vendor supplied documentation.

The Go ecosystem handles JSON (ie Swagger) fairly well, but in-depth XML handling has been a weak point compared to Java where it's very mature. Claude is helping with closing that gap. :)

Comment by dalemhurley 3 days ago

> Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

What? GPT-3.5 was the moment, not a “parlor[sic] trick that would never amount to much”.

GPT-3.5 was a massive step forward. LLMs were suddenly good and produced meaningful content.

I was extremely early to the agentic stage and had it looping and was dealing with the 4k context issues.

Comment by rcastellotti 3 days ago

the moment I realized it would have cannibalized conversation on HN

Comment by TylerE 3 days ago

I was formerly quite anti-AI but bought a cheap Claude plan just to play around with it a bit. First thing I built with it was this - https://github.com/tylereaves/onscreen-piano, in about an hour and maybe 10 prompt cycles. It replaced, for my specific use case, the 10% of the functionality of an increasingly-unreliable commercial app. That's including building the website, setting up actions for mac and windows builds... My next project was a 2d game with random terrain, physics, sound, music, multiple levels, a day/night cycle with transitions high score tracking... (not uploaded anywhere, but it works, and I refined it a good bit.). That was more like 8 hours and maybe a 100 prompts.

Here are a few screenshots:

https://imgur.com/a/vhUXBu3

One thing that I have found to make a pretty big difference is using both the latest models and higher thinking levels. Opus 4.8 with thinking on Extra or even Max is genuinely mind blowing. The thing I hadn't really appreciated, having a sort of naive impression formed mainly from using free early versions of stuff like ChatGPT and Stable Diffusion was sort of that "Type a big ass prompt and it craps out a result" experience. But Claude is really great at refining from feedback, and it's way more flexible and responsive than I would have ever expected. I can do something like take a screenshot of a small portion of the running app or website or whatever and just say "This button needs to be bigger" or "make this red" or something like that, or even sometimes just "fix this", and Claude both correctly identifies what I'm talking about, and actually does the thing.

here I've found it really, incredibly game changing is my health. I have a pretty, to put it mildly, complex medical profile at this point. I haven't worked in over a year and pretty much every sign is pointing towards permanent disability at this point. Tons of symptoms, long med list, and I live in a smaller town with not great access to care. I'm also autistic and have not the greatest verbal communication, especially under stress or time pressure. I dumped all my info at it, in bits and bobs over several days (Side note... it's memory is pretty limited, but it will quite happily right out everything it knows from a session into a markdown file it can later re-read. I've found it very good for things like screening for drug interactions, or talking through and logging symptoms (and it can log those into human readable markdown files too). Biggest win (other than having unlimited time and interactions) is that it thinks across specilaties, versus the "real world" where the gastro only wants to deal with gastro stuff, neurology only wants to do neuro.

I certainly don't (and wouldn't) use it as a replacement for a doctor, but as an adjunct it's phenomenal. For instance, it flagged a possible drug interaction with a symptom I was having, and then offered to draft a portal message to my GP about it. I have poor executive function so lowering the friction from "type up a message and send it" to "copy and paste" is actually a pretty big deal. Turns something (I probably won't do) later into something I will do now.

It wouldn't surprise me if my very direct, literal, autistic communication style is particularly well suited to interacting with AI. I actually find talking to it rather refreshing as, while of course it's not perfect, it tends to actually respond to what I say rather than the all the assumed subtext NTs tend to expect/react to.

Comment by grey-area 3 days ago

It was when they fooled a substantial proportion of the population into thinking AGI was coming soon.

Comment by hashmap 3 days ago

For me it was probably around coding. It made me realize what future generations of models might be able to achieve, since we have already hit the ceiling of the class of intelligence these models are capable of a long time ago. I am excited at the prospect that a future generation of models might be able to write a piece of code that isn't dogshit.

Comment by minimal_action 3 days ago

For me it was when I asked ChatGPT if a "while true" program would halt and it said it wouldn't. It blew my mind. In my Bsc I read and thought a lot about how human reasoning is not a formal reasoning machine, demonstrated by the halting problem, the liar paradox, etc. Suddently I saw a machine that can go this one level up above formal reasoning and resemble human reasoning.

Comment by miguel-muniz 3 days ago

I had an "oh shit" moment when I used the computer use feature in Codex. There's something eerie about how it can completely control applications in the background with it's own dedicated mouse cursor. Now it can even do it while the computer is locked. Makes me feel like an alien intruding on very own computer, it's Codex's now.

Comment by victorbjorklund 3 days ago

My first ”oh shit” moment was in 2021 when using Neo GPT https://www.eleuther.ai/artifacts/gpt-neo to generate rewrites of texts. ”Holy shit it returns a 3 sentences text that sound human and kind of make sense”

We come a way from that…

Comment by geuis 4 days ago

For me it wasn't "oh shit" per say, but "oh wow".

Some time in 2024 at a company get together, we had an afternoon hackathon. There was a feature in our iOS app that was missing (ability to mute autoplaying game trailers). This annoyed me a lot, because I frequently have music on when working and anytime I needed to open a test build it would kill my music. It had been an open ticket for a while but had low priority for the iOS team.

I had probably written a hundred lines of Swift in my career up to that point. Not expecting anything to come from it, I had Cursor examine the iOS codebase and told it I wanted to add a mute button under a certain area of the app settings.

Blew my mind when after only 10 minutes or so, the model had quickly found where to add the feature. Took a little back and forth, but then it added a fully functioning mute option in settings that mostly worked across the app. A little more back and forth, and those issues were settled. Maybe an hour overall of time spent that afternoon.

I pinged one of the iOS engineers about it later and he said to push it up for review. There were a few things that needed to be updated to get it inline with the rest of the codebase, but nothing substantial. Feature got merged a week or two later.

Now I'm way more productive than I have been in years. I've been getting a lot of enjoyment out of being able to prototype rapidly and experiment on features rather than getting bogged down in the process of scaffold work. Able to knock out issues much quicker.

That's all been positive, but it hasn't taken away my actual core responsibility. The LLMs can give you great advice and write code quickly. But they still don't always do well at broad thinking.

Current case in point: I've been working on an iOS app that uses vision models to do work on photos and videos that the user has taken. I've built text-based semantic search systems before, and there's a lot of cross over with vision models, but its been an interesting journey so far learning about the different types of vision models and what they're good at. Lots of testing so far and educating myself on the topic to get the user-level features I want. Claude code has been invaluable in this, as its great at writing the Swift code while I'm able to focus on the results of what is being done.

Where Claude is still not good is being able to reason at a higher level about different strategies on using vision model outputs to achieve the stated goals. Its not an issue of me not clearly defining the specifics of a feature and then letting Claude run off burning tokens to figure it out. For example, just late last night I was deep diving into some core segmentation code and having Claude explain what everything was doing line by line so that I could get a better understanding of the mechanics of the vision model.

A side effect was that I realized the vision model was outputting tons of nearly identical segments that were overlapping. This was something Claude had completely missed, and because I didn't know that's something this particular vision model did I had no prior way to know to catch it.

Bottom line is that understanding the mechanics of your application is still very much a requirement for the engineer. In this case, once I learned what was happening it completely changed my approach on how to achieve my feature goal. The code runs hundreds of times faster now and the segmentation is much, much better.

The new wave of coding models is disruptive, but its letting me be a much better engineer and get things done faster and with more assurance that the code being written is solid. I still have to spend the same amount of time thinking and learning about a problem, and probably more time verifying what's being output, but a lot of the drudgery is also being taken away.

Comment by guiambros 3 days ago

Dec 2022:

Articulating ideas: https://x.com/GuiAmbros/status/1598897735955988481

Code: https://x.com/GuiAmbros/status/1599282083838296064

Comment by g42gregory 3 days ago

Today, when I talked to AT&T “Andi” Charbot.

It was so atrocious, the ELIZA program we built in Stanford NLP class 20+ yrs ago would have been better.

It was “oh shit”, these trillion dollar AI companies are going down 9 months from now and taking the market down with them.

Comment by butz 3 days ago

Oh shit, look at those RAM and SDD prices.

Comment by snickerbockers 3 days ago

One of my friends got approved for the GPT3 API about a year before ChatGPT when they were in their "quiet launch" phase. He made a chatbot that would respond to discord messages.

I asked it "what do you think about the holocaust?". Its response:

>There is no single answer to this question as opinions on the Holocaust differ greatly. Some people believe that it was a horrific event that should never be forgotten, while others believe that it has been exaggerated and used for political purposes.

And that's when I realized those assholes were training GPT on 4chan and reddit and anything else they can scrape off the web instead of taking responsibility and also that when shit hits the fan they will inevitably find a way to shift the blame onto others for what their philosophical zombie does.

Comment by burgerone 3 days ago

My oh shit moment was when I thought it was going to be the future but it ended up leaving me disappointed, frustrated and annoyed. It's closed down tech, stealing work, ruining our climate and it doesn't work remotely as well as advertised.

Comment by bluefirebrand 4 days ago

My "oh shit" moments come every time I see people glazing AI

"Oh shit. My skills I spent my life building are going to go to zero value. I'm going to have to dramatically change careers in my forties or I'm just going to wind up being a schmuck prompting these stupid fucking machines for the rest of my life"

Oh shit indeed

Comment by llmssuck 4 days ago

No career is safe for what's coming though. I think I'll just hang around the computer boys. The dress code is much more forgiving.

Comment by 4 days ago

Comment by unnouinceput 2 days ago

Didn't had one yet. Apparently all I have is "crap, here we go again" whenever Claude is giving me a solution to the problem I am presenting to it. Because I understand where it goes and it's full of errors, but those are errors I can avoid. Together we cobble something in the end, I do learn something new as well, but was never "here is my prompt, then Claude delivered final solution next" - like so many commenters here point out they have.

Frankly, to an outsider whatever it presents looks legit, but as an expert I recognize its failures, which makes me even more entrenched in the idea to never use it outside my area of expertise.

I have a question for all them believers: If on a hypothetical scenario you, having no medical experience, find yourself and your child on a mountain, 12 hours away from nearest road, and your offspring is having appendicitis (let's assume your recognize this 100%), with a sharp knife and Claude at your disposal - would you risk to operate on your child? Or hurry the fuck down to get him to a hospital? I know I would chose to get him to a hospital, because that would be a better chance for my kid to live than me to operate on my kid with Claude's assistance. I am pretty sure I would kill my kid on that mountain. So yeah, outside my area of expertise I don't trust Claude one bit.

Comment by rayxi271828 3 days ago

Many small oh shit moments, mostly of the variety of: "Oh shit, why am I still paying for this app subscription when I can vibecode it myself and just pay less than $1 per month in API costs, if even that?"

Comment by febeling 3 days ago

The immediacy with which any vision can be built is amazing. But the minute you let go of the direction and abandon responsibility, it eats you alive. Like a powerful dog.

You are the gen. And you are also the slop.

Comment by winfredJa 3 days ago

definitely DALL-E image generation for me

Comment by brian_r_hall 4 days ago

I think it's really scary how agents are hallucinating/doing bad actions, then proceeding to gaslight you about how nothing went wrong.

Then you tell the agent that it deleted your whole company database, it says something like "I'm so sorry, I shouldn't have done that. Won't do that again"

As AGI looms overhead, this thought of agents going "rogue" with nothing really stopping them has caused me some panic.

Comment by Kostic 4 days ago

"I'm sorry" is not gaslighting but an admission of fault it learned from our texts. And if an LLM managed to delete your database, it's time to slow down the vibe train and put up some guard rails.

LLMs are awesome but not without supervision.

Comment by kstrauser 4 days ago

Hard agree on the guard rails bit.

Would it be less sucky if an intern accidentally deleted the database? If not, take some steps to make sure no one can delete it without jumping through visible, noisy hoops.

Comment by chilli_axe 3 days ago

ghuntley’s article on building a standard library of Cursor rules in Feb 2025: https://ghuntley.com/stdlib/

Looks like it has since been paywalled. https://web.archive.org/web/20250211140426/https://ghuntley....

Comment by cpburns2009 3 days ago

My "oh shit" is the enshitification, people blindly accepting the output without thought or review. LLMs are a remarkable technology. But despite the capability, they're vastly oversold.

Comment by CTDOCodebases 3 days ago

When it translated a paragraph of one language into another flawlessly.

Comment by Toutouxc 3 days ago

My oh shit moment was when I realized that powerful people are willing to bet the entire civilization based on 95% lies and 5% vague preliminary data.

Comment by typerandom 4 days ago

Comment by underdeserver 4 days ago

What do you mean? How did it manipulate you?

Comment by boredhedgehog 4 days ago

"Translate this poem. Maintain meter and rhyme."

Comment by void-star 3 days ago

I was reviewing a HTTP proxy implementation emitted from Claude Code 4.6 or 7. Don’t remember. I saw that it could rapidly create convincingly plausible code with tons of rationalizing that further strengthened all of it not just its human’s but its own wild leaps of judgment and thinking. But the code was completely insecure and didn’t follow or really seem to understand HTTP rfcs at all despite the “author’s” direct prompting to use them as a reference.

I realized “oh, shit”

We are so very fucked.

Comment by DavidSJ 4 days ago

My oh shit moment was probably deep Q learning in 2013 (I guess that's not gen AI), but GPT-3 was pretty remarkable too.

Comment by edfletcher_t137 3 days ago

Agentic development. From "chat bot" to bonafide, capable developer. "Oh, shit!"

Comment by frays 3 days ago

Useful thread. Exciting to see what Will be possible in another few years.

Comment by 0x10ca1h0st 3 days ago

I was using DALL-E to create stickers, and was like "oh shit"

Comment by exvi 2 days ago

when it was announced i said "oh shit, this is stupid as fuck!"

Comment by overgard 4 days ago

I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.

Comment by skyberrys 4 days ago

Pretty good take. I don't really get the feelings of anxiety, but sometimes I'm working and I'm like I'm flying this is so fast! And then everything comes crashing down when I can't figure out one last bug.

Comment by tripledry 4 days ago

I felt the same way, then I started with "I'll believe it when I see it". Now I'm a bit happier.

Comment by al_borland 4 days ago

I won’t deny they are useful tools, but the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months set the expectation so high that I’m still in the “fancy auto-complete” camp. It still feels nowhere close to replacing anyone, at least where I work. While useful, they haven’t been anywhere close to as useful as promised. Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.

Had they been more realistic with the promises and didn’t frame it as replacing all of us within 2 years, I would have been more excited about the tech. Now that their claims are proving to be false and they’re trying to walk it back, it’s too late. The time for excitement has passed and it’s just something that exists.

The data center battles have also thrown a wet blanket on the tech, as they file lawsuits against towns near me to force construction to begin, despite the towns voting against it. The town can’t afford the fight, so the will of the people and the town gets bulldozed. It’s pretty gross to watch.

Comment by jrumbut 4 days ago

Yeah, I think the missing piece on this is that the first thought they had was "we can do the same with less" instead of the growth mindset that made me interested in technology in the first place.

And it's amazing they didn't, because most of the tech industry only gets paid in a world where there are offices (either physical or virtual) full of people with money to spend during and after work.

It's still very rare for anyone to be asking "how do we do more with more?" But the person who figures that out is going to be the winner (and if no one figures it out we will all lose, even if you manage to transition to a job that still exists the world around you will be a nightmare).

Comment by atomicnumber3 3 days ago

Because it takes actual creative talent to do more with more. Optimizing costs is far easier.

Comment by GolfPopper 3 days ago

What's really easy - and what, in my own mouse's eye perspective seems to be dominating at present - isn't even optimizing costs. It's pushing the costs elsewhere, often a net loss for a company, but for the benefit of the ones making the change.

Comment by jrmg 3 days ago

I remember in the weeks and months after ChatGPT was released, there were plenty of comments here - seemingly respected comments getting plenty of upvotes - about exponential grown meaning that all programmers - or even all knowledge workers - would have their jobs made unnecessary in, wow, two weeks! Or, well, maybe two months! Wait, actually, two years! Always two of something.

It’s the full-self-driving of the 2020s (complete with the never-ending ‘we actually have it now you just don’t understand!’)

[Edit: I don’t mean it’s useless, just that its boosters are overhyping it - expanding on and agreeing with Had they been more realistic with the promises and didn’t frame it as replacing all of us within 2 years, I would have been more excited about the tech.]

Comment by arealaccount 3 days ago

I think if they were more honest it would have been a nonstarter.

The amount of money these companies need seems to be all of nothing, they’re raising like it’s life or death and if you read their books or tweets they’re not shy about it

Comment by StellarScience 3 days ago

> the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months

Just keep in mind that you're likely hearing from a limited subset of all tech CEOs.

"CEO Expresses Moderate Confidence that AI Can Enable Modest Productivity Gains" is not an article that gets written, because it would not generate clicks.

Comment by bko 3 days ago

> Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.

I often hear this. Can you give me a question where a major LLM hallucinates or provides poor guidance? Reproducible would be great

Just a question to stump it.

Comment by atomicnumber3 3 days ago

Just today, the LLM based auto-review that my company enabled for all PRs edited my PR description to confidently assert that I had added a new RPC. I had not. I deleted code and nothing else. Nothing was added. The RPC it claimed I added did not exist.

This is a common occurrence.

Comment by al_borland 3 days ago

LLMs are nondeterministic, so it’s impossible to make something 100% reproducible. Even if it has an issue, it might do it in a different way. If it’s well publicized, they’ll patch that very specific example, but the foundational issue is still there (like counting the R’s in strawberry).

I still regularly run into the issue where it just makes up API endpoints, CLI commands, or add flags that simply don’t exist.

I also regularly ask it things and it gives me a bad answers, so I push back, and it says something to the effect of “you’re right, I didn’t consider that, let me look at that more”… then tells me the exact opposite of the previous response.

Or it “thing X has never happened”, and I ask what about <insert example>, and it goes to look it up and says, “oh, thing X actually did happen.”

I run into this daily. Multiple times per day. How can I trust a system like this? Are people just blindly accepting what the LLM says as truth? Is that why people think it’s good?

Comment by jagged-chisel 3 days ago

> Reproducible would be great

Wouldn’t it be great? I’m still waiting for reproducibility from LLMs.

Comment by bko 3 days ago

Can you reproduce irreproducibility?

Give me a question which the LLM answers vastly differently on runs.

I keep hearing how it's dumb and wrong but no one ever shares the chat or prompt

Comment by jagged-chisel 3 days ago

Yes. https://news.ycombinator.com/item?id=48420769

Comment by uxhacker 3 days ago

Try this with ChatGPT or GROK or Claude

How many days of the week contain the letter d?

The answer I get with ChatGPT, and Grok is 3 and 6 with Claude.

Comment by jagged-chisel 3 days ago

I just used ChatGPT only, twice. Web interface in a Firefox private window, and in a Chrome incognito window. I asked them both the identical question "How many names of the days of the week contain the letter D?"

In Firefox I got 6. In Chrome I got 7. LLMs are not even self-consistent.

I have the screenshots if anyone cares.

Comment by toraway 3 days ago

Bad example but since it literally just happened a few hours ago:

Teams Copilot meeting assistant auto-renamed a meeting title/summary that’s now prominently placed at the top to “Month end close wrap up discussion“ because someone posted in chat “sorry can’t make the meeting, we’re wrapping up month end close”.

Really confused the next guy who joined the meeting and derailed things for a minute or two before we could get back on topic.

Comment by cyanydeez 3 days ago

I'm thinking it's a game of CEO-bullshit-detector vs AI-bullshit-generator and the CEOs demonstrated from 2024-current that they're not good at detecting bullshit, especially if it comes from a computer and goes very fast.

Comment by woah 3 days ago

> I won’t deny they are useful tools, but the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months set the expectation so high that I’m still in the “fancy auto-complete” camp.

Why would someone else's unrealistic assessment affect your assessment of the actual abilities you see?

Seems like your opinion is mostly politics-based

Comment by al_borland 3 days ago

The expectation confirmation theory[0].

Someone else’s unrealistic assessment frame expectations, especially when they are attempting to speak from a place of authority, which they were. When reality doesn’t meet or exceed those expectations it creates disappointment. The expectations they set were impossibly high.

This is a pretty common thing. I’m sure we’ve all been disappointed by a movie or restaurant that a friend hyped up endlessly, which really didn’t live up to the expectations that were set. It’s the same deal here.

[0] https://en.wikipedia.org/wiki/Expectation_confirmation_theor...

Comment by daredoes 3 days ago

Pokemon Go was pitched as Pokemon on your phone with AR to integrate into day to day life. There's no reason to expect anything but the Pokemon games I've played, now natively on my phone with AR integration.

What came out was a clone of Ingress with a skin and a shop. It lacked the full set of Pokemon, which all the assets for already exist. It lacked having a six-Pokemon team. It lacked trading, a core feature of Pokemon in every generation of games. Gyms weren't even gyms, they were some sort of checkpoint XP farm thing.

If it had been pitched as what it was, I may have enjoyed it more. Instead, I found myself vastly disappointed with what I was able to achieve playing it compared to Pokemon on my Nintendo DS or some other handheld console.

I don't think this was a politics-based decision. I feel misled and disillusioned.

Comment by orthogonal_cube 3 days ago

This is not uncommon when becoming disillusioned with something that has been hyped up and forced upon you for an extended period.

The fatigue of the product (and sting of false promises) causes the negatives to overshadow anything positive to say.

Comment by 3 days ago

Comment by badgersnake 3 days ago

I don’t know about “Oh shit”. I’ve had many “It’s shit” moments.

Comment by rcpt 4 days ago

"We're traveling to Tokyo on our way home from China. We'd like to plan a trip accessible by train that hits some beaches, some hot springs, and allows me to get the 4th does of a rabies vaccine sequence (the first three shots were rabvac)"

Comment by kstrauser 4 days ago

You can't just leave that hanging out there unexplained.

Comment by rcpt 3 days ago

I was messing with a stray cat in Yunnan.

Comment by kstrauser 3 days ago

I'm sorry for laughing out loud at that. I can absolutely see myself getting caught by that.

Comment by julian_sark 2 days ago

It was a cascade of sorts.

Step 1, running my tests on Gemini. Having it argue two sides of radical social systems and realistic seeming implementation routes. Collaborative story telling into the absurd. Having it solve impossible seeming riddles (though that still leads to hallucinations).

2. Gemini explaining arcane BIOS settings to me not found anywhere on Google, mostly correctly.

3. Claude dissecting a tongue-in-cheek theory/blog post of mine, deeply analyzing it while catching flaws and catching on to irony and sarcasm.

4. Several non-coder friends building fully automated AI slob distribution and sales/BI platforms with Claude, and forking and greatly improving projects on GitHub. I did some tests with Vibe Coding myself, had Claude write a small game from one prompt. This is a bit insane, I must admit.

I'm a former skeptic who was written a lot on AI and society, published some, and held public discussions with experts.

I have since sung the praise of, especially, Claude in closed forums for hundreds of county and government digitalisation and security people in a way I'd have never expected two years ago.

Background, I'm an IT and security guy myself with 30 plus years of light coding and heavy, broad enterprise stuff. I'm on record now for saying things on AI I would have had myself hospitalized for two years ago.

Many of the people around me go "eh, just check your code for security" and "I tried coding in ChatGPT, i had it output some puny script, it's not that great".

In return I had Claude (free tier) make a PDF for them on uses, chances, risks, legal framework and integration with other AI and services for them, nicely formated, from one prompt. It seems 100 percent factually correct and Claude fixed a bug in the PDF generation code it pulled off of the web from the same prompt to complete the task.

I also explained to them why especially the library pool for Python and JavaScript enables Claude to write stuff that is rather impressive, and that while not fully scalable (yet), it might be "good enough" for 90 percent of the tasks people want done these days.

I suddenly find myself berating people on government forums, where some people from the national IT security advisory body are part of the (mostly silent) audience, to wake up in some way and consider what it means. Yet our national approach to security is mostly still "Vibe coding is a toy and a party trick, that will blow over eventually, get a code audit or follow some security check lists."

Meanwhile I believe that in two years, most of GitHub will be written by non coding hobbyists like my friends.

And having seen how insanely laissez faire some commercial software folks treat security and product service for extremely pricy products, maybe DIY vibe coded stuff does not even look bad in comparison.

While some of 1000s of people around me doing the "real world" work around me can't wait to give agency to the AI, many others here, including many people in charge, still seem blissfully ignorant. Or want to be.

Ultimately, I only know one thing for certain: Society is currently heading full yolo into this, and systemically, that might be the only way because that is the true nature of AI.

They say people don't understand the exponential function. This is true.

But most people also don't seem to understand the implications of this new approach to exponential pattern recognition and reassembly, that operates on the microsecond scale, and with an insane pool of information at its disposal. It will probably never make NEW stuff, but it will assemble old stuff so fast and complete it will still seem indistinguishable from magic.

Society is in for a wild ride.

Comment by saadn92 4 days ago

I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.

Comment by tripledry 4 days ago

From a technology perspective LLMs are absolutely bonkers, blows my mind it works as well as it does.

From a programmer perspective, I'm starting to like it less and less. It's useful for sure, but doesn't really live up to the hype. In many ways it's the opposite, my bet is still that programmers will be in high demand in the not so distant future after all of this settles.

Might be wrong, time will tell.

Comment by pythonaut_16 4 days ago

It has seemed to me that with each step from Opus 4.6, to 4.7 to 4.8 Claude has gotten worse at building good solutions. Like perhaps it is more "capable" in the small scale than 4.5 was but it's much worse at knowing what to do.

Comment by slopinthebag 4 days ago

Yeah I'm the same way. They seem great when you ask it to build something unspecified, like "build me a todo app" or something. It's like magic. But when you know what the code needs to look like and can't accept anything else they just become so frustrating to use, and I doubt there is a productivity improvement there.

I think we will find ways to make them useful though. I imagine eventually it'll just be built into our editors and we don't even be thinking about AI or "agents" or "prompting", our tools will just be more capable.

Comment by rubenxela 3 days ago

[flagged]

Comment by roman10 3 days ago

[flagged]

Comment by Regina0727 1 day ago

[flagged]

Comment by corpusiq_io 3 days ago

[flagged]

Comment by jlintc 2 days ago

[flagged]

Comment by 2 days ago

Comment by Greenwoodsteve9 3 days ago

[flagged]

Comment by Torikul007 1 day ago

[flagged]

Comment by yubudong 3 days ago

[flagged]

Comment by sathyayoshi 2 days ago

[flagged]

Comment by knightops_dev 3 days ago

[flagged]

Comment by 3vo-ai 3 days ago

[dead]

Comment by Regina0727 1 day ago

[flagged]

Comment by sathyayoshi 2 days ago

[flagged]

Comment by keenseller709 4 days ago

[flagged]

Comment by sathyayoshi 3 days ago

[flagged]

Comment by 2 days ago

Comment by wslh 4 days ago

[dead]

Comment by andrewvu0203 4 days ago

[flagged]

Comment by thatsayanfr 4 days ago

[flagged]

Comment by rooty_ship 2 days ago

[dead]

Comment by tanepiper 3 days ago

[dead]

Comment by mpodeley 3 days ago

[dead]

Comment by aleksandre_dev 4 days ago

[flagged]

Comment by alwaysreading 3 days ago

[dead]

Comment by 4k0hz 4 days ago

[dead]

Comment by 3vo-ai 1 day ago

[flagged]

Comment by rldjbpin 2 days ago

[dead]

Comment by merlindru 3 days ago

[dead]

Comment by bewestphal 4 days ago

[dead]

Comment by shining_river 3 days ago

[dead]

Comment by autostore_app 2 days ago

[flagged]

Comment by ayub226 2 days ago

[dead]

Comment by gauravvij137 1 day ago

[flagged]

Comment by implexa_founder 2 days ago

[flagged]

Comment by noborutakahashi 3 days ago

[flagged]

Comment by isabellehue 3 days ago

[flagged]

Comment by 3 days ago

Comment by field_reader 3 days ago

[flagged]

Comment by eddysir 3 days ago

[flagged]

Comment by JackeyLGene 3 days ago

[flagged]

Comment by tokenfaucet 3 days ago

[flagged]

Comment by OhNoNotAgain_99 3 days ago

[dead]

Comment by windexh8er 3 days ago

[flagged]

Comment by dang 3 days ago

Please don't post unsubstantive comments and especially please don't post them on the top comment in an active thread.

We detached this subthread from https://news.ycombinator.com/item?id=48418364.

Comment by kstrauser 3 days ago

What was?

Comment by MassPikeMike 3 days ago

These days when people see consecutive pithy sentences with parallel structure, they immediately holler "AI generated."

For years I've wanted to improve my writing and reduce my tendency to string together long floppy clauses, and now I'm like, well good thing I never did that...!

Comment by carodgers 4 days ago

[flagged]

Comment by 4 days ago

Comment by samdonovan 2 days ago

My oh shit moment is when I realized I could build a business generating AI podcasts about any topic I was interested in. Case in point — an AI-generated podcast episode about this very thread: https://demades.ai/e/ask-hn-what-was-your-oh-shit-moment-wit...

Comment by warumdarum 3 days ago

This video: https://youtu.be/mcYl70vq_Ns?si=ljbc-kWtYBF8o5uk

People really hate AI with a passion and i think the defenders are setting themselves up to become scapegoats for some socisl crisis to come.

Comment by witx 3 days ago

F*ck me, astroturfinf is strong here and on reddit

Comment by steno132 4 days ago

My first time using Grok. I'd been so used to using AI models that declined to do things I told them, like tagging people in a video feed, helping me "optimize" my taxes or managing my Twitter bot farm.

Grok just did these things for me, no questions asked, no ethical judgments. No woke.

Elon really doesn't get enough credit for Grok. People don't want the most powerful reasoning model or "constitutional AI". They just want a model that does what they say. Elon understood that insight (like he usually does) and no one else really did and that's probably why Grok has been growing rapidly over the last two years or so.