Cloudflare's AI Platform: an inference layer designed for agents
Posted by nikitoci 1 day ago
Comments
Comment by mips_avatar 19 hours ago
Comment by bryden_cruz 8 hours ago
Comment by handfuloflight 7 hours ago
Comment by jonfromsf 12 hours ago
Comment by mips_avatar 11 hours ago
Comment by ascorbic 4 hours ago
Disclaimer: I work at Cloudflare, but not on this.
Comment by vladgur 18 hours ago
Comment by mips_avatar 17 hours ago
Comment by embedding-shape 16 minutes ago
How fast is "super fast" exactly, and with what runtime+model+quant specifically? Curious to see how how 4x 3090s compare to 1x Pro 6000, could probably put together 4x 3090s for a fraction of the cost compared to the Pro 6000, but the times I've seen the tok/s in/out for multiple GPUs my heart always drops a little.
Comment by whereistejas 1 day ago
Comment by brikym 8 hours ago
Comment by kentonv 53 minutes ago
Also note that as of recently, the concurrent limit applies only up to the point that response headers are received, not during body streaming.
Comment by vjerancrnjak 3 hours ago
Comment by ncrmro 5 hours ago
Comment by mikeocool 22 hours ago
But in practice, it's basically impossible to use that way in conjunctions with workers, since you have to bind every database you want to use to the worker and binding a new database requires redeploying the worker.
Comment by AgentME 20 hours ago
Comment by eis 8 hours ago
Comment by eis 21 hours ago
But even without network issues that have plagued it I would hesitate to build anything for production on it because it can't even do transactions and the product manager for D1 openly stated they wont implement them [0]. Your only way to ensure data consistency is to use a Durable Object which comes with its own costs and tradeoffs.
https://github.com/cloudflare/workers-sdk/issues/2733#issuec...
The basic idea of D1 is great. I just don't trust the implementation.
For a hobby project it's a neat product for sure.
Comment by ignoramous 17 hours ago
How did you work around this problem? As in, how do you monitor for hung queries and cancel them?
> D1 reliability has been bad in our experience.
What about reads? We use D1 in prod & our traffic pattern may not be similar to yours (our workload is async queue-driven & so retries last in order of weeks), nor have we really observed D1 erroring out for extended periods or frequently.
Comment by eis 9 hours ago
You just wrap your DB queries in your own timeout logic. You can then continue your business logic but you can't truly cancel the query because well, the communication layer for it is stuck and you can't kill it via a new connection. Your only choice is to abandon that query. Sometimes we could retry and it would immediately succeed suggesting that the original query probably had something like packetloss that wasn't handled properly by CF. Easy when it's a read but when you have writes then it gets complicated fast and you have to ensure your writes are idempotent. And since they don't support transactions it's even more complex.
Aphyr would have a field day with D1 I'd imagine.
> What about reads? We use D1 in prod & our traffic pattern may not be similar to yours (our workload is async queue-driven & so retries last in order of weeks), nor have we really observed D1 erroring out for extended periods or frequently.
We have reads and writes which most of the time are latency sensitive (direct user feedback). A user interaction can usually involve 3-5 queries and they might need to run in sequence. When queries take 500ms+ the system starts to feel sluggish. When they take 2-3s it's very frustrating. The high latencies happened for both reads and writes, you can do a simple "SELECT 123" and it would hang. You could even reproduce that from the Cloudflare dashboard when it's in this degradated state.
From the comments of others who had similar issues I think it heavily depends on the CF locations or D1 hosts. Most people probably are lucky and don't get one of the faulty D1 servers. But there are a few dozen people who were not so lucky, you can find them complaining on Github, on the CF forum etc. but simply not heard. And you can find these complaints going back years.
This long timeframe without fixes to their network stack (networking is CF's bread and butter!), the refusal to implement transactions, the silence in their forum to cries for help, the absurdly low 10GB limit for databases... it just all adds up. We made the decision to not implement any new product on D1 and just continue using proper databases. It's a shame because workers + a close-by read replica could be absolutely great for latency. Paradoxically it was the opposite outcome.
Comment by kylehotchkiss 23 hours ago
Comment by Normal_gaussian 16 hours ago
No downtime snapshots would be the best but I'd be quite happy with a blocking backup on a set schedule that can be set from the GUI / from the cli / from a config file. Its a huge PITA having to play 'trust me bro' to clients and their admins with custom workers and backups.
I currently stream it D1 dump -> worker(encrypt w/ key wrapping) -> R2 on a schedule, then have a container spin up once a day and create changesets from the dumps. An external tool pulls the dumps and changesets.
Comment by rs_rs_rs_rs_rs 21 hours ago
Comment by jillesvangurp 7 hours ago
Comment by dpark 20 hours ago
Comment by BoorishBears 21 hours ago
Cloudflare seems to be building for lock-in and I don't love it. I especially don't understand how you build an OpenRouter and only have bindings for your custom runtime at launch.
Comment by switz 21 hours ago
Comment by eis 8 hours ago
Comment by sf_tristanb 1 hour ago
Comment by minglu 14 minutes ago
Comment by james2doyle 21 hours ago
Yes, you can see the same "hosted" ones on there, but when you look at the models endpoint, there are much less options at the "workers-ai/*" namespace. Is that intentional?
Comment by james2doyle 21 hours ago
Comment by samjs 20 hours ago
Thanks for the feedback, and good catch. Looks like that endpoint is pulling from a slightly out of date data source. The docs/dashboard currently are the best resources for the full catalog, but we'll update that API to match.
Comment by VikRubenfeld 1 hour ago
Comment by godzillabrennus 1 hour ago
Comment by kol3x 2 hours ago
Comment by strimoza 5 hours ago
Comment by __jonas 5 hours ago
Comment by dalenw 45 minutes ago
Comment by RITESH1985 18 hours ago
Comment by datadrivenangel 21 hours ago
Comment by hemangjoshi37a 6 hours ago
Comment by lateral_cloud 5 hours ago
Comment by TheServitor 7 hours ago
Comment by bm-rf 1 day ago
[1] https://developers.cloudflare.com/ai/models/
[2] https://developers.cloudflare.com/ai-gateway/features/unifie...
Comment by samjs 23 hours ago
We'll be adding prices to the docs and the model catalog in the dashboard shortly.
In short: currently the pricing matches whatever the provider charges. You can buy unified billing credits [1] which charges a small processing fee.
> Finally, would be great if this could return OpenAI AND Anthropic style completions.
Agreed! This will be coming shortly. Currently we'll match the provider themselves, but we plan to make it possible to specify an API format when using LLMs.
[1]: https://developers.cloudflare.com/ai-gateway/features/unifie...
Comment by agentifysh 21 hours ago
Comment by yoavm 1 day ago
Comment by bm-rf 1 day ago
Comment by ashleypeacock 23 hours ago
Comment by Invictus0 3 hours ago
Comment by minglu 5 minutes ago
Comment by messh 19 hours ago
Comment by pizzly 19 hours ago
Comment by mips_avatar 19 hours ago
Comment by pprotas 1 day ago
Comment by yoavm 1 day ago
Comment by indigodaddy 1 day ago
Comment by ramesh31 1 day ago
Comment by Jack5500 1 day ago
Comment by pjmlp 22 hours ago
Comment by throwpoaster 1 day ago
Comment by kylehotchkiss 23 hours ago
Comment by neya 1 day ago
Comment by kinnth 16 hours ago
I love everything about openrouter. So kinda a fan boy.
Comment by 6thbit 1 day ago
rant aside, they are greatly positioned network wise to offer this service, i wonder about their princing and potential markup on top of token usage?
i presume they wont let you “manage all your AI spend in one place” for free.
Comment by koolba 1 day ago
Of course they will. In return they get to control who they’re routing requests to. I wouldn’t be surprised if this turns I to the LLM equivalent of “paying for order flow”.
Comment by 6thbit 1 day ago
Comment by nhecker 1 day ago
Comment by 6thbit 18 hours ago
Comment by nubg 22 hours ago
Comment by wahnfrieden 1 day ago
edit: Why downvote? It's correct, and it's a risk that competitors handle better, including for their CDN products (compared to Bunny CDN). Maybe you are just used to the risk and haven't felt the burn yourself yet. Or you have the mistaken notion that there is no price at which temporary downtime is worthwhile to avoid paying.
Comment by rl3 15 hours ago
Speaking of:
https://news.ycombinator.com/item?id=47787042
I really hope that person gets a resolution from Cloudflare that doesn't financially ruin them.
Comment by james2doyle 21 hours ago
Comment by mbtrucks 1 day ago
Comment by james2doyle 21 hours ago
Comment by ernsheong 1 day ago
Comment by charcircuit 1 day ago
Comment by PUSH_AX 1 day ago
Comment by reconnecting 18 hours ago
Comment by mbtrucks 1 day ago
Comment by tln 21 hours ago
Comment by stult 1 day ago
I immediately pulled all my sites off of Cloudflare and I will never use that godawful nightmare of a company for anything ever again. If they can't even host a generic help bot without screwing it up that badly, why would I ever use them for anything at all, never mind an AI platform?
Comment by allthetime 21 hours ago
Comment by stult 2 hours ago
There's not a lot of UI surface area that a user can touch that can even theoretically affect the NS detection process because that process happens in CF entirely "under the hood" as it were. You more or less just have to wait for CF to detect the DNS changes. That said, I tried everything I could think of to try to trigger their detector to reset, including deleting and recreating the site from scratch in CF. After another few days of combing through CF docs and forums, and after changing and reverting every setting I possibly could, I concluded there was no workaround available to me as a user and tried to reach CF as I described above.
Having done this many times before, I am quite certain that I set the nameservers correctly. I even had two other very experienced engineers review what I had done to make sure I wasn't falling victim to some mental blindspot that prevented me from recognizing what the problem was. I think every SWE has had the experience of spending an enormous amount of time debugging a problem only to realize they mistyped a magic string somewhere, but for whatever reason their brain just straight up refused to recognize the typo, but unfortunately that was not the case here. The other engineers saw what I saw and also were unable to fix the problem.
I was subsequently able to set DNS up on Vercel without any trouble at all. Bottom line, the issue was almost certainly a bug in Cloudflare's code. That indicates a code quality problem to me, which, in combination with the reckless incompetence that it takes to try to automate customer support with a chatbot that doesn't even have accurate information about their own processes and basic contact information, never mind a reasonable escape hatch to actual human-provided support in unusual cases (even for a paying customer), has led me simply not to trust them to deliver a reasonable quality product anymore.
They didn't even maintain any mechanism for reporting bugs to them, which is just insanity because it means there is no way to inform them even in extreme cases like a critical security bug. I get that they want to cut costs by reducing the employees needed to deal with customer service complaints, but it costs practically nothing to have a little feedback form somewhere, especially now that an LLM can handle most user feedback processing. Or failing that, a functioning support email address or phone number. But they can't even clear that incredibly low bar.
All of these issues could have been avoided with a very limited application of ordinary common sense and foresight. Whoever programmed their chatbot did not take the time to set up a decent RAG system with up-to-date information about their support processes and how to contact them, even though that is an obvious requirement for a tech support chatbot. They should also have recognized the business risks posed by exposing their customers to a system which lacks any escape hatches for outlier cases requiring actual human support, which risks alienating customers like me by forcing us to jump through Kafkaesque bureaucratic hoops just to get simple problems addressed, and--even worse--making it impossible to resolve such problems after jumping through all their hoops. The team implementing this chatbot didn't even think to include a contact form as a last resort method for reporting problems to them when the chatbot gets in over its head.
Most people hate this kind of LLM-provided customer support without any human escalation options, because the bots often end up uselessly looping through some debugging steps that simply do not work for the customer's specific issue for whatever reason, which feels like slamming your head against a wall repeatedly. It's a truly infuriating user experience and is practically guaranteed to destroy the business's public goodwill and reputation.
All of which means they are gutting their customer service department following some process that lacks access to these very basic insights, which screams mismanagement to me.
I'm not exactly a huge customer, but between my personal and business sites, I plowed $45k into CF last year, and will spend not another penny on them this year, or ever again. Maybe that's not huge spend in the grand scheme of the tech industry, but at a minimum that amount of money should entitle me to some human-provided support. My annual spend alone could provide the budget for multiple offshored CSRs. If I am spending enough money to buy a car, the least they can do is let me send them an email when I have a problem instead of just throwing me to the wolves.
Ultimately, they have a much weaker moat now than at any point in the past, because LLMs make it so much easier to build out critical functionality in-house that previously would have been worth paying someone else to manage via a SaaS. And while I may not be a big enough customer for them to worry about in and of myself, I am also not the only person affected by these business practices. Every affected person increases the reputational harms suffered by Cloudflare, with another alienated customer like me bashing CF in posts like this or in conversations with their friends and colleagues in the industry. Those harms should be very concerning to CF's management because it is extremely difficult to recover lost goodwill.
Comment by redoh 6 hours ago
Comment by kantaro 16 hours ago
Comment by ZihangZ 15 hours ago