What it feels like to work with Mythos
Posted by swolpers 6 hours ago
Comments
Comment by eithed 4 hours ago
These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.
Comment by an0malous 3 hours ago
The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.
Comment by jimbokun 2 hours ago
They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.
Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.
Comment by an0malous 1 hour ago
Imagine not being an architect and using Claude to put together a building plan, then concluding it’s basically done but we might need a real architect to double check the measurements. It may even be true but I’d be skeptical if it’s always non-architects saying this.
Comment by shimman 1 hour ago
Comment by cgearhart 3 hours ago
The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.
Comment by qaq 3 hours ago
Comment by rpdillon 1 hour ago
Comment by acedTrex 3 hours ago
this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.
Comment by hypfer 4 hours ago
Comment by coldtea 3 hours ago
I clicked one of his examples intrigued "a snake game where the snake is self-aware and crazy things happen;". Played for 1-2 minutes, and it's the classic 1980s snake game. Am I missing something? What is "self-aware" about it? Some funny messages at the bottom of the screen? And what are the "crazy things"?
Comment by starshadowx2 2 hours ago
Comment by vunderba 2 hours ago
I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.
Comment by jstummbillig 4 hours ago
Comment by eithed 4 hours ago
Comment by munk-a 3 hours ago
Comment by unholiness 3 hours ago
The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.
Comment by nomel 3 hours ago
Comment by chickensong 1 hour ago
Comment by eithed 1 hour ago
But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.
Comment by jimbokun 2 hours ago
Comment by eithed 1 hour ago
Comment by adamtaylor_13 4 hours ago
Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.
Comment by grafporno 3 hours ago
Comment by JumpCrisscross 4 hours ago
It also burned through my usage quota like a late-90s Hummer.
Comment by matheusmoreira 2 hours ago
Yeah. I have a Max 5x subscription and Fable burned through 16% of my weekly quota in a 40 minute code review session. It didn't even finish the review, it switched back to Opus 4.8 in the critical memory safety parts where I actually needed Fable.
I feel like I'm going to get priced out of these models soon. I should probably try to get the most out of Fable until June 22nd.
Comment by cyanydeez 4 hours ago
Comment by Ferret7446 3 hours ago
It's not just salary, but also safety/labor regulation, legal risk, vacations, sick time, personal conflicts, HR, benefits.
Even when automation is more expensive on paper, it's generally still cheaper
Comment by rstuart4133 1 hour ago
You underestimate what these models cost. Uber's budget is $1,500/dev/month. I gather that was put in place because the dev's were going through $6,000/dev/month, which Uber decided could not be cost justified.
Fable costs at least twice as much, or $12,000/dev/month.
Fable can apparently work for hours without supervision, which means a skilled engineer can now have it working on many tasks concurrently. I would not be at all surprised if they can put a nought or two on that number. If you do that, you are well out of "what a human costs" territory.
Comment by gopher_space 1 hour ago
As far as I can tell this part of the job isn't really on anyone's radar anymore.
Comment by TheOtherHobbes 2 hours ago
I can't help thinking there might be some kind of strategic issue here.
Perhaps someone should ask Mythos about it.
Comment by warkdarrior 3 hours ago
If you get $100,000 per year as a SWE, and Anthropic offers a coding model for $100,000 per year (but working 24/7), then you'll have to give up all of those addons that make the fully burdened cost of the employee. Say goodbye to vacation, sick time, benefits, etc.
Comment by Qhemlomo 4 hours ago
We know this model will be cheaper and faster with time.
And we have not even reached the timespan/timeframe were we have ASIC style models.
OpenAI has to do something which will beat Fable otherwise Anthropic won. China currently overtakes cars, pv, batteries and very soon silicon chip making, it has all the incentive to also take over AI.
Comment by camillomiller 3 hours ago
Comment by Qhemlomo 3 hours ago
Comment by throw939494555 3 hours ago
I find it good for code reviews.
Comment by Our_Benefactors 3 hours ago
Comment by PunchyHamster 4 hours ago
Comment by cyanydeez 2 hours ago
Comment by olafmol 4 hours ago
Every sw dev knows this is a very dangerous, and unrealistic, assumption.
Comment by ecocentrik 4 hours ago
"Posterior beliefs about market demand are purely referencedependent: holding dollars raised constant, they track only performance relative to the founder’s self-chosen goal—jumping half a standard deviation at the threshold, responding steeply for the first ten points past it, and flattening thereafter"
Humans generally don't verbalize data this way. The summary document is also very fluffy.
Comment by huflungdung 3 hours ago
Comment by mohsen1 4 hours ago
In a project like mine (https://github.com/tsz-org/tsz) I am constantly frustrated that models were not doing enough research and were not taking into account other situations. Again and again models would produce code that would fix one thing and break 2 other tests that were "unrelated".
With Fable it seems like tasks are taking much longer (I have not seen a pull request from Fable sessions yet) but reading the transcription of those sessions I can see how it is doing the right thing by not leaving any stone unturned.
As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share
Comment by anematode 4 hours ago
Comment by layer8 3 hours ago
Comment by mohsen1 3 hours ago
But overall, this is pretty normal for compilers to have this sort of "unexpected" tests failing due to some work in an area. It happened to me when I was coding everything manually back in the day too
Comment by anematode 2 hours ago
That's not what a clean setup means... I mean good separation of concerns, established invariants, etc.
Comment by nxmxksisksnssb 4 hours ago
Comment by selfawareMammal 4 hours ago
Comment by jenniferhooley 3 hours ago
Personally I don't really care, because I like coding and learning myself and DeepSeek Flash is all I really care about. But it's really easy to have a ton of benchmarks where the top models can't get anywhere close - and I like to test them on these problems to see how good they are getting.
Fable 5 is def a little better than 4.8 btw.
Comment by ianm218 4 hours ago
A small portion of this effort is having a high quality Lua in Rust repo. I’m using mythos to fix some of the performance issues with my Lua interpreter that gpt 5.5/ opus 4.8 had stone walled on.
Not sure if Mythos will be able to crack this but it has been running for a couple hours now with some promising results.
Performance charts linked here if your curious https://github.com/ianm199/lua-rs
Comment by mplanchard 1 hour ago
Comment by ianm218 1 hour ago
The other reason is that because mlua is just a wrapper around the C code, it has unsafe you can't really get around. So for example Lua is used in Redis, which has this critical CVE https://github.com/redis/redis/security/advisories/GHSA-4789... that a memory safe version of Lua wouldn't have to deal with.
Mlua is still fine or even better for many other cases though!
Comment by mplanchard 53 minutes ago
It just seems like a lot of hassle to write a lua interpreter, although it would be nice to see a high quality one in Rust :)
Hematita was promising, but looks abandoned.
Comment by ianm218 46 minutes ago
And yes it seems like there has been many attempts to get a solid Rust Lua over the years and most never reached parity so hoping some people can find use case for it! This one is at full parity in terms of behavior and performance is getting to within striking distance.
Comment by mplanchard 39 minutes ago
Comment by jstummbillig 2 hours ago
On the margins, suppose the prompt is literally: "Build a feature complete, high polish Facebook clone". Facebook is complex but likely not super complicated tech, and still I would assume that (after having burned through a substantial amount of tokens) you would find substantial enough differences in the outcomes between different models on that prompt on various fronts.
The above ask is obviously not useful, but what's preventing you from taking on bigger chunks until you approach the limit? At some point you would hit a boundary, where the diff will be obvious.
Comment by mervz 4 hours ago
Comment by Our_Benefactors 3 hours ago
Myth. Total myth! I recently had to beg for more RAM after continually hitting swap space which causes tools like dictation to stop working, failure to load certain websites without rebooting, and so on. Devs do in fact need powerful machines and the ~$500-1000 an employer saves upfront in machine costs is dwarfed by productivity losses.
Giving your engineering employees new machines in a 2-year cycle that are between the middle and high end is one of the cheapest ROI decisions that a tech org can make.
Comment by oarsinsync 2 hours ago
Comment by mohsen1 4 hours ago
Comment by gopalv 6 hours ago
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
Comment by matneyx 6 hours ago
We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."
Comment by giancarlostoro 4 hours ago
Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.
Comment by toss1 2 hours ago
Sounds like we've nearly reached in coding the point where Paul Bunyan [0] has his epic competition with the chainsaw... and loses by 1/4" and history forever changes...
Comment by torginus 1 hour ago
It's some prompt engineered AI harness, that guides the AI to create stats after it researches a subject and ingests the data, but I'm not sure what is it that the tool actually does on top of this.
Comment by neogodless 6 hours ago
Comment by petesergeant 4 hours ago
Comment by ModernMech 4 hours ago
Comment by mattbettinson 3 hours ago
Comment by giancarlostoro 4 hours ago
At this point, pay me significantly more, and I'll do it.
Comment by warkdarrior 3 hours ago
Ha ha, that's how you negotiate yourself out of a job!
Comment by giancarlostoro 1 hour ago
Comment by PeterStuer 6 hours ago
Comment by ASalazarMX 5 hours ago
Comment by wongarsu 4 hours ago
Comment by hypfer 4 hours ago
There are people that almost feel physical pain if something is unnecessarily incorrect.
+ That if the mental model of something is accurate, it is actually _more_ work to say something that is incorrect than just saying the correct thing.
Comment by wongarsu 4 hours ago
Comment by hypfer 4 hours ago
Similiar to "My game just crashed".
Jira otoh is not yours, because it's in the cloud. It might be "my internet connection", "my browser" or "my account" that is having trouble.
___
Hm. "My train got delayed" is interesting in this context. I don't find that offensive. But that also might be because trains don't seek rent the way SaaS does? Not sure.
I guess trains do not hold me hostage. They might just be a container in which someone does that.
Jira, cloud LLM inference or similar otoh..
Comment by ASalazarMX 2 hours ago
I guess the main difference is that TAAS has many different trains where the experience varies wildly, so it helps to be specific on which train you're licensing; but LLMs are the same product for everyone, and you can't stay with say, ChatGPT 1.0, you get the same choices as everyone else.
Comment by ASalazarMX 4 hours ago
If you had your own on-premises LLM, that would indeed be your LLM, and it would make sense to compare it to the on-premises LLMs of other people, as your setup particulars would affect the result.
Comment by dasyatidprime 3 hours ago
Comment by hypfer 3 hours ago
There was a time where one actually bought software to own it.
This time is.. actually it is right now. Please leave at once.
Comment by RugnirViking 3 hours ago
That's ridiculous. You wouldn't respond to "I went to visit my doctor yesterday" with "but slavery has been illegal since forever!" Similarly it would be foolish to respond to "where should we meet? my place or yours" with "but we both rent!"
Comment by calvinmorrison 4 hours ago
Comment by w4yai 4 hours ago
Comment by ASalazarMX 2 hours ago
Comment by giancarlostoro 4 hours ago
Comment by throw939494555 3 hours ago
Comment by giancarlostoro 1 hour ago
Comment by hedgehog 6 hours ago
Comment by cyanydeez 4 hours ago
I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.
Comment by theturtletalks 4 hours ago
https://isochronic-passage-chart.netlify.app/
Doesn’t work too well on mobile but looks interesting
Comment by skipants 4 hours ago
Comment by neom 3 hours ago
Comment by jampa 4 hours ago
I also see some logic flaws. It overlooks the option of going to a major hub to access faster aircraft, rather than hopping on local hubs.
Also, immigration and customs are cleared at the first airport you arrive at in the country, not at the last one.
In some countries, you need to clear immigration even while going to a third country, so 1 hour is not enough to do it.
Comment by neaden 4 hours ago
Comment by thepasch 4 hours ago
> Switched to Opus 4.8: Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.
Comment by matheusmoreira 2 hours ago
Comment by SupremumLimit 23 minutes ago
Other commenters have pointed out that his isochrone map contains a lot of nonsense as well.
So the most charitable interpretation here is that this is a case of Gell-Mann amnesia.
Comment by ElijahLynn 2 hours ago
And I'm excited to try it, but also have a fear that I will like it too much and then won't have access to it in 2 weeks... But maybe I will and maybe it will be worth it and I'll just pay a bunch of extra for it and it'll be great!
I think the article could be improved by actually sharing more feelings. I clicked on the article for feelings but I didn't see that many feelings described.
Comment by pu_pe 3 hours ago
[1] https://isochronic-passage-chart.netlify.app/
[2] https://mapitout.welcome-to-nl.nl/
Comment by recursivedoubts 5 hours ago
Comment by asdK120 6 hours ago
He is a professor but sadly also an AI shill. He should switch to advertising washing power.
Comment by MostlyStable 6 hours ago
Comment by dthread3 6 hours ago
Comment by Philpax 2 hours ago
Comment by cadamsdotcom 5 hours ago
Comment by lijok 6 hours ago
Comment by fdsdfsdfzxczxc 6 hours ago
Comment by whyenot 6 hours ago
Comment by CuriouslyC 4 hours ago
Comment by wxw 3 hours ago
I don’t see why working longer is a pro. The results don’t seem much better than you’d get from putting Opus in a long loop.
Comment by warkdarrior 3 hours ago
Care to share the results you got from Opus working on the same prompt? It should be easy to compare quality.
Comment by Aperocky 4 hours ago
The first item on the article, the first thing it showed, was wrong though.
It is 100% faster to go from London to New York in 1881 than Volgagrad. Or any of the Russian hinterland colored green or Turkey or Egypt.
Comment by patcon 4 hours ago
the map is for 2026, yeah?
Comment by Aperocky 37 minutes ago
Comment by mjamesaustin 3 hours ago
Comment by mawadev 4 hours ago
Comment by ComplexSystems 2 hours ago
Comment by vb-8448 3 hours ago
There is only one hint: 475k tokens in the screenshot when OP asked the model to fix some behaviour, but it would be fascinating to know the total tokens amount.
Comment by ElijahLynn 2 hours ago
Comment by steve1977 3 hours ago
Is it a hard problem or is it just labor intensive?
Comment by warkdarrior 3 hours ago
Comment by 382hi 6 hours ago
Comment by giancarlostoro 4 hours ago
Comment by PaulHoule 3 hours ago
Comment by root_axis 6 hours ago
Comment by LogicFailsMe 3 hours ago
Edit: A couple hours in and I just got my first gaslighting attempt from the model. Good times!
Comment by catigula 4 hours ago
Just an FYI this guy is an AI hype-beast. Some of his tweets are truly out there.
Comment by dogmayor 3 hours ago
Comment by zb3 4 hours ago
Comment by zuzululu 4 hours ago
What makes me excited is that GPT 5.6 (its actually GPT 6) is going to be crazy
Comment by ThejaCH 4 hours ago
Comment by the_doctah 6 hours ago
Comment by boringg 4 hours ago
Comment by younglunaman 4 hours ago
What?
Comment by warkdarrior 3 hours ago
Comment by honeycrispy 4 hours ago
Comment by et-al 6 hours ago
Comment by astrange 6 hours ago
Comment by 0x1ceb00da 6 hours ago