Tell HN: I cut Claude API costs from $70/month to pennies
Posted by ok_orco 3 days ago
The first time I pulled usage costs after running Chatter.Plus - a tool I'm building that aggregates community feedback from Discord/GitHub/forums - for a day hours, I saw $2.30. Did the math. $70/month. $840/year. For one instance. Felt sick.
I'd done napkin math beforehand, so I knew it was probably a bug, but still. Turns out it was only partially a bug. The rest was me needing to rethink how I built this thing. Spent the next couple days ripping it apart. Making tweaks, testing with live data, checking results, trying again. What I found was I was sending API requests too often and not optimizing what I was sending and receiving.
Here's what moved the needle, roughly big to small (besides that bug that was costin me a buck a day alone):
- Dropped Claude Sonnet entirely - tested both models on the same data, Haiku actually performed better at a third of the cost
- Started batching everything - hourly calls were a money fire
- Filter before the AI - "lol" and "thanks" are a lot of online chatter. I was paying AI to tell me that's not feedback. That said, I still process agreements like "+1" and "me too."
- Shorter outputs - "H/M/L" instead of "high/medium/low", 40-char title recommendation
- Strip code snippets before processing - just reiterating the issue and bloating the call
End of the week: pennies a day. Same quality.
I'm not building a VC-backed app that can run at a loss for years. I'm unemployed, trying to build something that might also pay rent. The math has to work from day one.
The upside: these savings let me 3x my pricing tier limits and add intermittent quality checks. Headroom I wouldn't have had otherwise.
Happy to answer questions.
Comments
Comment by LTL_FTC 3 days ago
Comment by kreetx 3 days ago
Comment by queenkjuul 3 days ago
Comment by ok_orco 2 days ago
Comment by LTL_FTC 1 day ago
My old threadripper pro was seeing about 15tps, which was quite acceptable for the background tasks I was running.
Comment by ydu1a2fovb 3 days ago
Comment by LTL_FTC 1 day ago
Comment by R_D_Olivaw 3 days ago
Comment by LTL_FTC 1 day ago
Comment by 44za12 3 days ago
Comment by homeonthemtn 3 days ago
Comment by 44za12 3 days ago
I map them by task type:
Tiny (<3B): Gemma 3 1B (could try 4B as well), Phi-4-mini (Good for classification). Small (8B-17B): Qwen 3 8B, Llama 4 Scout (Good for RAG/Extraction). Frontier: GPT-5, Llama 4 Maverick, GLM, Kimi
Is that what you meant?
Comment by gandalfar 3 days ago
Comment by DANmode 3 days ago
Comment by tehlike 3 days ago
Comment by viraptor 3 days ago
Comment by ok_orco 2 days ago
Comment by deepsummer 3 days ago
Comment by DeathArrow 3 days ago
Comment by toxic72 2 days ago
Comment by joshribakoff 3 days ago
Comment by dezgeg 3 days ago
Comment by ok_orco 1 day ago
Comment by arthurcolle 3 days ago
Comment by ok_orco 3 days ago
Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.