Is This the Dawn of the Tokenpocalypse?

Posted by pseudo-usama 1 day ago

Comments

Comment by motbus3 1 day ago

With no details, a bird told me of a project which estimated using several millions of tokens per day to automate a team work which got laid off. The operation is now a mess, there is no one willing to be considered liable and since the cheap model they used is about to be retired the company is going to see a 4x increase in price at least.

I have the feeling that the age of 'i can't be blamed by AI stuff' will be a "this was the computer guy mistake" for a moment.

PS. I've been using Claude opus 4.8 and it is worse than 4.6 and I will say that even sonnet 4.6 is better. PhD. Level of software and engineering I believe! I know many PhD who never coded or worked anyway

Comment by RamblingCTO 1 day ago

Glad I'm not the only one. Almost every factual thing with new opus is wrong (and it now even happens with 4.6?). I asked it about car stuff yesterday and it totally misrepresented how a car axle even looks like fundamentally. Today I talked about my CV and it was just plain wrong. I don't know what happened, it wasn't like this a few weeks back and I'm even considering cancelling claude alltogether. GPT 5.5 for coding is fine and way more stable, but regular work is just broken.

Comment by motbus3 1 day ago

By differences in the release dates between 4.7 and 4.8 it seems it was more likely an attempted bugfix

But 4.8 still underperforms on most tasks. I have things running where 4o-mini does it considerably better repeatably.

They might have tuned it for a particular reason and I would not doubt that the harness has been made worse.

Sometimes it teases me to think it does wrong things on purpose

Comment by 1 day ago

Comment by user_7832 1 day ago

On the topic of older (Claude) models being better... anyone knows anything close to 3.5 (or 3.6) era Sonnet? It was by far the best LLM I had ever asked my doubts too. It actually explained in a human way, not like some AI I need to re read thrice to understand.

(I've used modern Gemini 3.1 pro & claude too. Modern ChatGPT is just as useless, I've never heard a human speak in points. The human brain never encounters that irl.)

Comment by Chu4eeno 1 day ago

This was obviously a conscious choice from the leadership at he frontier labs, and especially OpenAI, considering how 4o turned out.

I don't think they expected the ELIZA effect [0] to explode as much as it did when they started including feedback directly from users into posttraining the next generation, so to be safe they've likely added several regimens of synthetic data ensuring ChatGPT tries to steer away from ELIZA.

[0]: https://en.wikipedia.org/wiki/ELIZA_effect

Comment by picofarad 1 day ago

I'd have to see representative examples but there are thousands of models available, obliterated, remixed, distilled, cloned, compressed, and so many more.

I really liked the way copilot was last year, but I switched to deepseek because I don't trust MS.

Grok cracks me up, but I refuse to give elon more money than I'm already forced to by circumstance outside my control and budget.

Comment by motbus3 1 day ago

It is hard to say because there is "affection" memory that it was better than what we had before so it seems it was better.

In my humble opinion that serves nothing, it improved gradually, not exponentially up to 4.5

4.6 seems to be a minor step and the latest 2 are pure rubbish

Comment by prodigycorp 1 day ago

To me this is clearly a skill issue. Several millions of tokens per day is peanuts, even if uncached. gpt-5.5 is $5 per million of input tokens.

Anybody doing things seriously understand how to optimize their workflows for smaller models once they start to lock in processes.

Comment by motbus3 1 day ago

You talk without even knowing what the thing is about. It is easy peasy to spend millions of tokens per minute if you have the content for it.

This is not about you chatting with your char gpt window for sure.

Comment by zozbot234 1 day ago

The expensive tokens are output, not input. A useful rule of thumb is that a million tokens per day means about ~10 tok/s on a 24/7 basis.

Comment by prodigycorp 1 day ago

Even then, i highly doubt any sort of automation is producing on the order of several millions of tokens daily. The issue I see with the org in parent comment seems to stem from management and not any sort of token repricing.

Comment by motbus3 1 day ago

I can't say more. But it is totally possible.

Comment by platinumrad 1 day ago

I don't doubt that the operation as a whole is a disaster, but they should be able to avoid the price increase by using one of the many other cheap models like DeepSeek V4 Flash right?

Comment by motbus3 1 day ago

Deepseek V4 flash and pro are insanely good. Even it was for the same price

Comment by tabs_or_spaces 1 day ago

For me it depends on who you listen to

If you're following a bunch of people who are from LLM labs, you're going to be more incentivised to tokenmaxx because it's in the Lab's best interest tonget you to behave that way.

Practically, many companies aren't labs with endless runway. Companies hopefully follow a PnL model. And when you look at things with that lens, many of the times the LLM use case falls apart.

You're seeing a bunch of companies starting to realise that tokenmaxing yields very little ROI.

Even the LLM labs, the guy that spent $1+mil tokens has nothing to show for it in terms of revenue to the company. And you have to keep sinking that much into AI for ... "features".

There are some good use cases for AI. I ended up with a positive ROI on a greenfield project myself, albeit on a small scale.

The way that AI has been making people have totally irrational decisions on executive, pure business and technical standpoints is simply mindblowing. I don't understand how people can't take a step back and see what's actually happening from a macro perspective.

Comment by elictronic 1 day ago

Gambling. Crypto. Tulips. Ponzi Schemes. Easy money always nets the suckers. You see it enough times and you just sigh.

This to shall pass. After enough bullshit people will become fed up and enforcement of existing laws will start breaking up the most egregious items. New laws will pass. People will make and lose fortunes, and we will live on.

Comment by __alexs 1 day ago

Human systems are not good at rapidly adapting to change.

AI could be absolutely perfect and we'd still struggle to deploy it in a value generating way simply because it will exceed our ability to adapt.

So tokenmaxxing might be the wrong thing to do, but only because it's focussing on the wrong problem rather than because it doesn't actually work.

Comment by vineyardmike 1 day ago

I'll take the contrary position and say that I think the "tokenmaxxing" we've previously seen was useful (but shouldn't continue indefinitely). My TLDR position is that TokenMaxxing was a way to force discovery of Product Market Fit.

The push by companies to incorporate AI into everything is (depending on the company) either hype and cargo-culting or it was an attempt by management to (1) try and discover if/what new workflows or tools could use it and (2) force the haters to use as it got better.

Where I work, there is an obvious split between people who have been willing to use AI, and those that hated it from day 1 and mocked the "stochastic parrots". Senior folks were disproportionately haters, and generally didn't see much productivity lift from early AI stuff. They strongly resisted the mandates to use AI, and completely missed the "agentic" inflection point that other colleagues experienced. The more willing users saw Claude Code/agents and were able to experience this as the genuine benefit it can be. Now that the more senior folks are using agentic programming, they're genuinely able to maintain code quality and see meaningful speed improvements in coding tasks.

Today, tokenmaxxing doesn't make sense because we found the product-market-fit of agentic coding. Now that most (?) employees are onboard with using it, the industry can shift focus to cost-effective usage and positive-ROI usage. For example, Uber shifting to a fixed per-employee token budget.

Comment by red-iron-pine 1 day ago

> The push by companies to incorporate AI into everything is (depending on the company) either hype and cargo-culting or it was an attempt by management to (1) try and discover if/what new workflows or tools could use it and (2) force the haters to use as it got better.

"we need to figure out if we can replace you with AI, or if it just extends your abilities"

Comment by apothegm 1 day ago

Usually it’s the seller whose responsibility it is to find PMF, not the buyer.

Comment by vineyardmike 1 day ago

Most tech companies are buyers of AI, many are also sellers of AI in their own products, and a few are also building it.

Comment by apothegm 14 hours ago

Sure. But the ones who are “tokenmaxxing” (I hate that term) are generally maximizing their usage as consumers.

“Try and discover if/what new workflows or tools could use it” is something that’s supposed to be done by the companies selling a product so they can then convince people to buy and use it — not something that the buyers are supposed to do.

Comment by utopiah 1 day ago

> Companies hopefully follow a PnL model.

Eh... this is HN. The goal is precisely to reach BS escape velocity and SpaceX is the model to follow. It's not healthy IMHO (I'm not an economist) but that's definitely the arm race VCs actually fund. Lose for years if not decades, achieve market dominance, squeeze. Very very few winners and for those the path is precisely NOT to follow PnL.

Comment by gitgud 1 day ago

> Kirsten: All of this, to me, illustrates how quickly things are moving. I mean, when you really think about it, the whole tokenmaxxxing thing has become a thing, peaked, and now is seen disfavorably, within six months

Pretty sure from inception the phrase “tokenmaxxing” was never seen in a positive light…

Comment by leoncos 1 day ago

Perhaps advanced AI isn't cheaper than humans.

Assuming the intelligence of a model continuously improves with scale, the token price of the best model will become increasingly expensive.

I know that tokens are currently experiencing rapid price drops, but they will eventually encounter physical limitations.

Comment by unglaublich 1 day ago

Why? What physical limitation will dictate that we can't have 1B tokens for cheap?

Comment by drcxd 1 day ago

Assume you build a machine that can simulate some system 1:1. Then it means the machine is exactly the same as the system, and the cost of running it will no less than the system itself.

If you want to reduce the cost but still get something useful, you have to make some abstraction, and we all know that any abstraction is leaky.

Comment by dwattttt 1 day ago

Thermodynamics is a harsh mistress

Comment by Tuna-Fish 1 day ago

We are very, very far from thermodynamic limits. Lots of people have done the math, and current-gen systems use ~1000000000x more power than the Landauer limit, and ~100000x more power than ideal digital implementation on existing CMOS.

Currently, most AI systems work so that there is a large pool of memory on one side, compute on other side, and a very fat pipe between them. 90%+ of all energy goes into moving data from one side to the other, and selecting the specific element you wish to use from the large pool of ram. The energy cost of holding that data in memory and reading it from the memory cells, and the energy cost of doing the actual computation with low-precision FP are both trivial in comparison.

The systems are built this way because this is the most flexible architecture, and can be used for many different kinds of workloads. But the workload of a transformer in no way requires this flexibility. All the data is fairly local to the execution units that consume it. If you design a system as full PIM, where each ALU is associated and located with the small storage pool that contains only the elements used by that alu, and then tile that out to implement the full model, you cut out most of the energy cost of moving data. The cost is you need much more silicon to implement a working system, but the benefit is not just improved energy-efficiency, but also token speed and silicon efficiency.

The industry is moving towards such designs, with many startups working towards it with different approaches, Nvidia's recent aquisition* of Groq, etc. There is a well-understood path towards ~1000x higher token speeds at ~1000x better energy efficiency, that requires no new innovations, just investment of money into specialization.

There are even more gains if you move the weights into ROM, but that would require you to specialize not just for a specific type of model, but also for a specific set of model weights, ala Taalas.

I find the AI discourse is diseased because on one side you get people breathlessly overestimating the current state of the industry and progress that's going to happen in the next ~2 years, and on the other side people assume that the technology as is is what it will always be and completely ignore that the industry is aware of and actively working towards many ways to improve hardware, it's just that complex leading edge silicon chips take years to take from idea to working products, and transformer inference was only very recently proven to be a market large enough to specialize for.

Comment by red-iron-pine 1 day ago

the first law of thermodynamics is you do not talk about thermodynamics

Comment by Oras 1 day ago

I might have a different take, I’m happy with this price per token so only those who’re using it for value would use for what they want.

There are so many useless cases such as people bragging about their token consumption that has no product and no value add, or those with OpenClaw doing useless automation that could be a Python script.

Comment by sankaritan 11 hours ago

Agree on the point of that the shortage of tokens is causing a bit more responsible behavior with it comes to AI use, which is not necessarily a bad thing. The scarcity is a bit sad for solo-devs, but there's also hope this may encourage less slop and more thoughtful use of the tools while society adjusts.

There was an interesting discussion with creator of PI how even if LLMs are producing less errors than humans, they are producing them 10x faster and issues can compound a lot faster too. Introducing intentional breaks, even if by necessity, can help with that and not taking shortcuts that can be solved by throwing millions of tokens at any problem.

Comment by mullingitover 1 day ago

I'm guessing it's going to be an absolute banger of a month for forge[1] and the like.

[1] https://github.com/antoinezambelli/forge

Comment by xvxvx 1 day ago

The linked Reddit thread is quite hilarious. Earlier this year my company hired a new CEO and his first company address was solely to tell everyone to use AI or they’d lose their job and become unemployable in general.

I knew right there and then that he was a moron. There’s something about American companies where the best and brightest rarely show up in senior management. It seems to be populated by some weird class of golf playing NPCs that figured out how to game the system and bring all their cult members along for the ride.

My own company spent 2+ years enforcing extreme austerity, to the point of firing the very people who built everything, only to run wild with AI spending and seeing little results from it.

Surely, out there in the wilderness, there is a company staffed by intelligent, skilled people. Right?

Comment by cultofmetatron 1 day ago

> Surely, out there in the wilderness, there is a company staffed by intelligent, skilled people. Right?

of course there are but you don't hear about them.

Comment by lifestyleguru 1 day ago

Intelligent skilled people had been ghosted for so long that they don't bother applying anymore. Now the economy is just tree shaking, watching who will fall down. Personally I'm still irritated by the blockchain bubble and haven't even noticed when AI made me unemployable. Once in an airplane I overheard two kids from two different countries. One's job was to figure out where AI can be used, other's job was to figure out where AI can be used.

Comment by npodbielski 1 day ago

For the long time this worked oh right: get to know right people, wipe some asses, lick some other, play some golf and be sociopath. But right now it does not cut it anymore. You have to be either smart, skilled or know your business and IT somewhat to now how and to what extent or if at all you can use so called AI in your company. People like you described are out of their league entirely.

Comment by vrganj 1 day ago

I beg to differ. Look at our overlords today.

Musk. Zuck. Bezos.

All three are buddying up with government officials, all three routinely embarrass themselves when they try to talk shop.

Only difference is they're much more socially awkward and less superficially charming than the stereotype would suggest.

Comment by npodbielski 16 hours ago

I was talking more about middle management. Some smaller CEOs like the parent comment suggested. Those people, as you said, are not our overloads, because they lack intelligence and technical skills.

Comment by aianus 1 day ago

If you genuinely refuse to use AI for any part of your job in 2026 you are the moron.

Of course, you can go too far in the other direction but that's not what your comment is describing.

Comment by teejmya 1 day ago

Found one of the cult members.

Comment by operatingthetan 1 day ago

Even on the consumer side AI providers are enshitifying the plans. Everyone saw this coming three months ago plus.

The corporate side seems to be well... stupid? Execs asking their people to burn tokens do not understand the politics and cadence of business. Corporations do not actually demand more work to be completed in the way we traditionally think. Creating a lot of stuff in a corporation tends to naturally banish most of it to the void because that stuff requires other people to exist and engage with it in order to use it, deploy it, get customers using it, etc. AI does not take up that slack in the way that we are being told because it lacks agency. For most people in corporations the problem is not that they can't do their work, their real jobs are mostly being political nodes in a vast system. There is no solution on the table to change that at all.

Comment by somewhereoutth 1 day ago

Yes. As makers we tend to assume that the more that is made the better, and that simply by having lots of (shiny!) stuff we will get paid/honored/favored etc, whereas in fact often this stuff becomes someone's problem somewhere.

Comment by captainbland 1 day ago

Probably the cost model for LLM providers for consumers will be somewhat subsidised by providers basically linking up extremely specific profiles about users and using these to sell products directly in an agentic pipeline which includes agentic commerce. Maybe it's less one click purchases and more one prompt purchases. Of course this stands to be pretty bad for consumers in a lot of ways (deeply invasive marketing, possibly being missold products).

Of course the question remains, who is supposed to be buying products through this system if AI systems continue to displace jobs?

Comment by raffael_de 1 day ago

I think this is bigger news than many here care to admit. Most companies' AI "strategy" is very naive and depending on subscription based cost calculations. Token-based billing will force those companies to actually think deeper about costs versus benefits. And that's going to be a rude awakening. Other AI providers will follow suit sooner or later. Most probably did by means of intransparent compute cuts and Token throttling.

Comment by rsolva 1 day ago

This might help Mistral sell more on-prem solutions. Not only do you get to keep your data, it might make more financial sense too.

Comment by somesortofthing 1 day ago

it'd be really funny if we got the RSI -> ASI world, all human labor became worthless, etc., but everyone with any money in the labs also lost their shirts because OSS is maximally good for most inference anyway.

Comment by hanzeweiasa 1 day ago

The token consumption concern is real but I think the framing misses a key trend: specialized models for specific domains.

In legal tech, we run domain-specific models for contract review that use 90% fewer tokens than general-purpose LLMs because they understand legal document structure natively. The token cost per document dropped from dollars to cents.

The real "tokenpocalypse" is for use cases that try to do everything with one general model. As the ecosystem matures toward specialized tools (similar to how we got specialized IDEs for different programming languages), token efficiency improves dramatically.

The analogy holds: general-purpose models are like Swiss Army knives — useful but inefficient. Domain-specific models are like proper tools — more expensive upfront but vastly more efficient for their domain.

Comment by lukas221 1 day ago

it's simple, how much dollars you get out for every dollar put into tokens

as Jensen said, get ready for $1000 per mil token

those for which this price makes sense will push out those for which it doesn't - to lower models or to local models

but those who want to run local models need to compete for hardware with the data centers, which have strong scale effects thus will always be able to out price local hardware allocations - can already be seen now as hardware makers get out of retail business

Comment by anonzzzies 1 day ago

But that will tank literally all AI companies immediately as, sure, some will pay it, but by far most won't. Anthropic will be gone in 1 day, so will OpenAI.

Comment by lukas221 1 day ago

you allocate tokens from top down - first exclusivity deals - Citadel pays $10 bil to get exclusivity access to GPT-6 for 3 months before anyone else, then you price it $1000/mil, then whatever compute is not used you sell GPT-5.9 at $500/mil...

Comment by elictronic 1 day ago

AI companies are speed running the old Cable company model.

Hoping your customer base is so old they forget to cancel the subscription might not work so well this time. “Popcorn eating ensues”

Comment by picofarad 1 day ago

[dead]

Comment by owebmaster 1 day ago

You are hallucinating, my friend.

Comment by yfontana 1 day ago

Useful context for this is that token usage keeps rising at an exponential pace. I mean, we don't have numbers for the big labs, but Openrouter's numbers are quite telling (can't post link because corporate decided to block all "non-validated AI tools"), and I think they're probably representative of the global trend. +500% year to date, +50% over the month of May alone. It's unsurprising that providers are struggling to find and pay for the compute.

Comment by Gigachad 1 day ago

A lot of it feels very wasteful currently. The providers are giving out incredibly subsidised services so consumers are consuming incredible amounts. Once the prices go up to cover the costs people will re evaluate what’s actually generating value and what was just waste.

Comment by prodigycorp 1 day ago

using openrouter leaderboards is no good. being on top of the board is marketing, so some labs are gaming that number. all marketing spend.

Comment by dude250711 1 day ago

Isn't it weird not to see a wave of C-level firings for making such a basic mistake?

Comment by worldsayshi 1 day ago

I see at least two potential positives in this:

- The frontier AI companies have realized they won't be able to count on gaining ground and earning more in the future through sheer moat. They have to start earning right now.

- The playing field on the market got a whole lot more even as a result. Now everyone is competing on cost and quality - while there are still a lot of competition. AI suppliers can't easily get away with subsidizing their own product and enshittify later.

I might be missing something obvious here? It feels to me that if the frontier AI companies thought they could gain a lot more moat they wouldn't raise their prices this much this early? And their current moats/head start doesn't seem insurmountable?

Comment by mrweasel 1 day ago

The idea probably was to pour billions into technologies powering these LLMs, and gain a moat. It then turns out that this isn't as hard a problem as expected, it's just very expensive. So as long as you have money, you can be an AI company, the money is the moat (unless you take a shortcut, like DeepSeek) and money is running out.

I don't think you're missing anything, but I am surprised that the forces behind the AI companies did. They do need to start making money, but I don't think anyone has a plan as to how they are going to do this. As for enshittification, that was always on the table for the free tier, it was also going to be the drug deal strategy, were the first hit is free.

The cost of AI is still to high, datacenters aren't being completed, the hardware is to expensive, electricity is to expensive, the technology is good, but requires hand-holding. We're going to see AI being deploy more sparingly and more targeted, so the cost is justified.

Comment by lenkite 1 day ago

> They do need to start making money, but I don't think anyone has a plan as to how they are going to do this.

Doesn't this just mean price increase ? What is not clear is how much the price needs to increase for AI companies to break even some time. 3x increase ? 10x increase ? Even more ? No one seems willing to give a clear number.

Comment by mrweasel 1 day ago

You can only increase the price so much. With every price increase you're going to lose customers, which could lead to further price increases.

I'm not entirely convince that the AI companies can raise prices and keep enough of their customer base to make their current strategy commercially viable.

They could also lower their production cost, but that runs counter to building/buying new datacenter capacity. Realistically I think they need to look for applications where cheaper models are just as good and niches that where the ROI on AI is more clear.

Comment by worldsayshi 1 day ago

Realistically they can't go that much above the actual cost for inference since the customer can always switch to self hosted or inference only providers. Or their models have to be significantly better than the open source models for the foreseeable future. They will never be able to charge much more for their lower tier models.

Comment by Yizahi 1 day ago

They already do it. Antropic hiked prices several times, Google just hiked prices in April/May by a lot (by lowering limits in plans a lot). It will continue regularly. I remember when 200$/m plan was first unveiled there were screams about insanity of it all. Today, if anyone complains about own LLM experience, the first question from the comments would be "are you "at least" paying 200$/m plan, for the poors?" like that is the baseline now and 1000$/m is a serious consideration. And looking at Google, they are slowly shifting whole features upwards in tiers. As does Antropic.

Comment by owebmaster 1 day ago

> Doesn't this just mean price increase ?

Have you heard about Deepseek? In a world were it (and other Chinese open models) didn't exist, OpenAI and Anthropic would be profitable already

Comment by mrweasel 1 day ago

> In a world were it (and other Chinese open models) didn't exist, OpenAI and Anthropic would be profitable already

How so? The existence of e.g. DeepSeek doesn't lower the cost for OpenAI. OpenAI have almost a billion users, or so they claim. Adding even another billion users isn't going to help them, unless they can keep cost under control.

Comment by operatingthetan 1 day ago

>AI can't easily get away with subsidizing their own product and enshittify later.

They have to do it in reverse order which seems to be maybe impossible. I contend that SOTA models are still quite bad at what their companies claim them to be good at. They remain confidently wrong more often than they should be. The public also is tired of 'slop' and will continue to push back on it.

Comment by lukas221 1 day ago

the moat always has been and will be compute

and we are fast approaching limits which will be hard to overcome - electricity, chips

Comment by worldsayshi 1 day ago

Don't you think that we will eventually get more specialized hardware that will greatly improve efficiency? Running neural networks on GPU:s seem like quite wasteful?

Comment by lukas221 1 day ago

we will, but data centers will compete for that hardware with retail consumers

Comment by worldsayshi 1 day ago

Such competition will only turn into a moat of the hardware suppliers never manage (or choose) to fully adapt to increased demand?

But then the real moat should be on the hardware side anyway?

Comment by mortar 1 day ago

https://archive.is/bUjP4

Comment by yuppiepuppie 1 day ago

Betteridge's law [0] - NO

Anecdotal experience - my coworkers will use the "max-think" and the most expensive model on every change they do with Claude, pumping out 100k's of tokens just because they can (and brag about hitting the limits).

I suspect this kind of behaviour will need to change in the very near future.

[0] - https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

Comment by GreenSalem 1 day ago

We need cheap tokens from China...

Comment by red-iron-pine 1 day ago

they're not gonna stay cheap, mate. there is a price to pay there, too.

Comment by cultofmetatron 1 day ago

deepseek pro and flash are dirt cheap.

kimi-k2.6 can do a pretty damn good job with vision for optimizing ui design workloads in a loop. not cheap but significantly cheaper than anthropic.

mimo 3 is jsut pretty damn good when you need a high end reasoning model - also reletivly affordable.

I was able to run gemma and do some coding locally on a 32 gb machine. it was slow as molasses but the fact that it worked at all on a local machine that wasn't desinged around AI workloads is great.

Its only a tokenpocalypse if you rely on these closed and frankly overpriced american models. is opus better than kimik2.6? arguably yes but not 16 times better from what I've been seeing.

Comment by rvz 1 day ago

> “Can these AI labs collapse that cost [and] progress the tech enough in a way that it eventually meets in the middle with customers’ appetite for spending?” Sean wondered.

It depends where you buy the tokens from. Jevon's paradox exists in China and not in the US for now.

> In just a few months, companies became obsessed with “tokenmaxxxing,” then turned against it due to the high costs.

Casinos (in the US) telling customers to spend more on tokens, introduces free spins, discounts, resetting limits on peak hours. Then introduces new slot-machine that promises to give better odds to the gamblers, but instead is more expensive to use.

The ones in China did the opposite and made their discount on tokens permanent.

All this 'tokenmaxxing' was an outright scam. Now the AI companies want you 'tokenmaxxing' your agents on loops as the token prices increase.

Comment by ReptileMan 1 day ago

Why everyone insists that api inference is sold at cost and not at 1000% margin? This genre of article is based on that assumption.

Comment by vrganj 1 day ago

How would one position one's portfolio if one was worried things are about to start crashing hard?

Comment by simianwords 1 day ago

I think it is easy to make some ragebait doomer articles with eye catching headlines. There are a lot of people who are ready to eat up AI catastrophism because something about apocalyptic predictions and catastrophism seems to attract certain kind of doomer-pilled people.

Here are my concrete predictions

1. Token costs will come down and performance will go up

2. Everyone will spend even more on LLMs not less - the article points at small blips but if anyone thinks it will go down from now, you are mistaken

3. AI Companies will be profitable

If anyone wants to counter bet on me, please go ahead.

Comment by Quarrel 1 day ago

> 3. AI Companies will be profitable

but many of the current crop will never return money to investors.

I largely agree with you, but the huge investments currently being made will be very hard to get a return on. Token costs will come down, performance will go up, and you want to be in the business of selling the picks & shovels, not doing the mining.

Which is of course why nvidia, google & TSMC are in pretty good positions, but even their valuations have some bubble in them.

Comment by simianwords 1 day ago

Respectfully, do you want a bet that AI companies like OpenAI and Anthropic can't become profitable?

I mean this is a sort of conspiracy theory and I genuinely don't know why people think AI is particularly hard to get money back from?

> I largely agree with you, but the huge investments currently being made will be very hard to get a return on.

Why do you find it huge? Anthropic went from $1B to $44B revenue in a few months and this is unprecedented.

1. The margins on inference are huge

2. There is genuine moat because AI models have personalities strengths and weaknesses that's so they are definitely not fungible

I think a lot of handwaving goes on but it comes in the form of some latent concern that AI might just be profitable. But the reality is that it will be.

None of the "selling picks and shovels" analogies will stick.

Comment by somewhereoutth 1 day ago

1. Both costs (going down) and performance (going up) likely are, or will shortly, approach asymptotic limits.

2. CFOs are seeing the token spend on the bottom line, and are not happy. CFOs don't care about 'the next big thing', they just count beans, and they are coming up short. CFOs tend to be the grown-ups in the C-suite, they will shut things down if they need to.

3. See 1. and 2.

Comment by 1 day ago

Comment by zombot 1 day ago

As an exception to Betteridge's law of headlines: Yes.

Comment by Natalia724 1 day ago

One thing that seems under-discussed is how quickly token cost changes product behavior once people move from demos to recurring workflows.

When the interaction is exploratory, the marginal cost feels invisible: ask again, summarize again, try another agent. In a business workflow, the same pattern becomes a metering problem. You have to decide which parts actually need a frontier model, which can use a smaller/local model, and which should not be generated at all.

That probably pushes AI products away from "chat with everything" and toward much narrower tools with explicit ROI: less open-ended generation, more constrained pipelines, caching, evaluation, and human review at the points where mistakes are expensive.

Comment by NurcanPYSBG 1 day ago

[flagged]

Comment by josefritzishere 1 day ago

Other than being very expensive, unprofitable, environmentally destructive, and projected to cause mass unemployment... what's wrong with AI?

Comment by 1 day ago