Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens
Posted by zdkaster 4 days ago
Hi HN, not sure if anyone would be interested, but just wanted to share that I've been maintaining my small tool called 'lowfat' that helps me filters some of my verbose CLI output. It's a single binary, works as an agent hook or a shell wrapper. It has a plugin system to customize filters per command.
The idea is pretty simple: agents don't need the full kubectl get -o yaml or any 10k-line dump to make decisions. So that lowfat sits in between, strips the noise, and passes through what matters. Here's my real report after 2 months of personal use:
lowfat history --all
lowfat plugin candidates
─────────────────────────────────────────────────────────
# command runs avg raw cost savings source status
1 kubectl get 101x 14.4K 1.5M 93.9% plugin good
2 grep 103x 13.5K 1.4M 96.2% plugin good
3 git diff 81x 995 80.6K 57.9% built-in good
4 kubectl 90x 485 43.6K 33.6% plugin good
5 docker 127x 5.5K 693.6K 96.1% built-in good
6 ls 489x 117 57.3K 56.2% built-in good
7 find 30x 16.5K 495.0K 95.5% plugin good
8 git show 63x 490 30.9K 38.0% built-in good
9 git 177x 368 65.2K 76.1% built-in good
10 git log 86x 556 47.8K 78.5% built-in good
11 kubectl logs 5x 3.6K 17.8K 43.0% plugin good
12 git status 86x 152 13.1K 58.0% built-in good
13 docker ps 20x 467 9.3K 52.8% plugin good
14 kubectl describe 6x 656 3.9K 1.2% plugin weak
15 docker images 9x 940 8.5K 61.8% built-in good
16 k get 2x 2.1K 4.2K 35.9% plugin good
17 terraform 10x 395 3.9K 32.1% plugin good
18 git commit 32x 77 2.5K 0.0% built-in weak
19 docker build 8x 487 3.9K 37.6% built-in good
20 docker compose 22x 979 21.5K 89.4% built-in good
total: 4.4M raw → 4.1M saved (91.8%)
My toolset above is kind limited, but it works pretty well for my usecase without any interruption
Kinda help me not reaching the token limit for my company Bedrock limit usage and keep optimizing the saving on the go for later usage.But, why not alternatives (https://github.com/zdk/lowfat#alternatives) ? The answers are: - My goal is to make the core lightweight but extensible via plugins i.e. not trying to bundle every command in the installed binary so that people own their output filters. - Customizable per usecase via plugin or filter pipelines as I am using my own toolset. - Customizable for non-public CLI tools, for example, some enterprise might have their interal CLI tools that public won't have access. - People should own their data. So the design is local-first, No telemetry forever. - I kinda love UNIX-style composible pipes, so lowfat-filter has implemented this style. - Be able to adjust aggressiveness of the filter, so we can control that we won't strip something the agent needed.
GitHub: https://github.com/zdk/lowfat
Anyway, if anyone is interested, feedbacks and questions are welcome!
Thanks!
Comments
Comment by alex7o 4 days ago
Comment by onlyrealcuzzo 4 days ago
It's a massive red flag to me when you could get decent data to see if your thing actually works, and they don't even attempt to...
Have the LLM use your tool, run it on several of the coding benchmarks. If you're stingy, run it on the ones that don't cost much.
Otherwise, I'm going to assume it doesn't actually work. If it did - Claude, Antigravity, Codex, Pi, or some major player would bundle tools like this into the CLI / harness.
AFAIK, none of the major players do. That's a sign to me these don't work in general.
I've tried building some tools specific to bug fixing. Intelligently feeding context massively helps smaller models. But, what I've found - surprisingly - is that a smaller, much better focused, including a lot of helpful data as well, has almost no impact on larger models compared to what they do by default.
You do save some tokens, though, which is what they're claiming - but not ~99%...
Comment by hansvm 4 days ago
None of the major players are incentivized to care about this, especially not over other opportunities. Why would you expect them to integrate it?
One of the biggest wins you can institute for your own codebase if you use agents is writing your own harness, by a huge margin. The defaults are fine, but you can do better.
Comment by Cpoll 3 days ago
Comment by onlyrealcuzzo 3 days ago
Why can I do better than Pi?
I don't want to build my own harness and deal with the bugs... I want to build my project...
My understanding is that Codex / Claude / Gemini subscriptions don't work with custom harnesses.
It's pretty hard to beat 5x more usage if you have the $200/mo subscription by using the API instead.
Comment by smallerize 3 days ago
Comment by hboon 3 days ago
Comment by Bnjoroge 3 days ago
Comment by hboon 2 days ago
Comment by Bnjoroge 5 hours ago
Comment by doix 4 days ago
Your suggestion to using coding benchmarks doesn't really capture the whole picture. I haven't seen a benchmark using kubectl.
> AFAIK, none of the major players do. That's a sign to me these don't work in general.
It's a lose/lose for major players. If it works well, it will lower their revenue. Also there's a high risk it'll significantly worsen results for some people, even if it improves results for others.
Comment by taude 4 days ago
Comment by no-name-here 4 days ago
VS Code launched it as a feature in their bundled AI functionality last month: https://code.visualstudio.com/updates/v1_121
Comment by onlyrealcuzzo 4 days ago
Defaults imply working...
Comment by irthomasthomas 4 days ago
Comment by unphased 3 days ago
There is definitely tons of value to extract from this line of thinking.
Comment by varispeed 4 days ago
Comment by jahala 4 days ago
Reducing tokens and also turns is quite worthless if the LLM doesn’t solve what you put it to do.
Comment by esafak 4 days ago
Comment by jahala 3 days ago
Comment by alex7o 3 days ago
Comment by onlyrealcuzzo 3 days ago
Unless something is like 25%+ more cost effective on Gemini for a task, I would not assume those savings are going to transfer to GPT.
If you need to run a test this expensive and slow for every release, hobbiests aren't going to do it.
And if you wanted any broadly specific improvements to coding like they all claim, the costs would be in the thousands per release even for a single for a single model.
And they almost certainly would not be eye popping.
If the models could be SUBSTANTIALLY better, Google and Anthropic and OpenAI wouldn't be finding that out from a hobbiest making wildly unscientific claims.
Comment by jahala 3 days ago
On the previous large benchmark run, i proved 40-50% cost reduction per correct answer.
I'm not sure why the vendors aren't using token filtering/compression more in their tooling, but perhaps they don't mind users feeding them more data and using more data.
Comment by poelzi 3 days ago
Comment by zdkaster 4 days ago
Comment by zdkaster 3 days ago
Comment by giancarlostoro 4 days ago
Comment by jemmyw 4 days ago
I don't know about cost saving, but if it's keeping the context size down I've had a lot better results using subagents to keep a higher order conversation clean for longer.
Comment by lxn 4 days ago
Comment by exitb 4 days ago
Comment by mywittyname 4 days ago
Comment by threecheese 4 days ago
What would be useful:
- examples of text that can be filtered, and why that would be valuable
- a data flow diagram of runtime behavior, showing how filtering removes unnecessary contextComment by zdkaster 4 days ago
Comment by mbreese 4 days ago
But the one thing I expected to see in the Readme was an example of: takes this tool run output: XXXXXX and converts it to: XX for a savings of 40% of tokens.
This looks like a nice (and useful) project, so thanks for sharing!
Comment by naorsabag 1 day ago
Comment by alkh 4 days ago
Comment by zdkaster 4 days ago
Comment by threecheese 2 days ago
Comment by wood_spirit 4 days ago
Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.
Comment by unphased 3 days ago
Comment by zdkaster 4 days ago
Comment by devdoc83 4 days ago
Comment by zdkaster 4 days ago
Comment by itsthecourier 4 days ago
Comment by zdkaster 4 days ago
Comment by ramon156 4 days ago
Comment by nixpulvis 4 days ago
Comment by itsdesmond 4 days ago
Comment by mf_kevintruong 14 hours ago
Comment by fcanesin 4 days ago
Comment by zdkaster 4 days ago
Comment by jondwillis 4 days ago
Comment by cityofdelusion 4 days ago
A proper benchmark will compare a large sample of identical prompting with and without the tool, against a specific harness. Once you apply Amdahl’s law, there is no way this saves 91% of tokens holistically, which the title implies.
I work in a non-tech company and these sorts of things keep going viral, with no understanding and with no comprehension of what is actually going on. Engineering is gone and cargo cult magical incantations are in.
Comment by zdkaster 3 days ago
Target user here in HN should be tech-savy and this tool is not designed for non-tech because it is required highly customized from user to get the result user want.
Anway, would you mind putting the correct title here ? I will consider to update.
Comment by rahulyc 4 days ago
Comment by 0xCAP 2 days ago
Comment by clutter55561 4 days ago
LLMs were trained in the typical full-fat output found everywhere on the internet, and all of sudden they get a slightly different response that may look like nothing they have seen before.
Does that really save tokens in the long run?
Comment by zdkaster 3 days ago
Comment by pradeep1177 3 days ago
Comment by tegiddrone 4 days ago
Comment by zdkaster 4 days ago
Comment by avocadoking 4 days ago
Comment by tim-projects 4 days ago
Comment by KuhlMensch 3 days ago
But it'll be a case of measuring first, then perhaps a staged integration of a tool like this.
Comment by tuo-lei 4 days ago
Comment by CuriouslyC 4 days ago
Comment by urax 1 day ago
Comment by davidetroiani 4 days ago
Comment by neuralkoi 4 days ago
Comment by sakuraiben 4 days ago
Comment by pradeep1177 4 days ago
Comment by CharlesW 4 days ago
Comment by zdkaster 4 days ago
Comment by pradeep1177 4 days ago
Comment by zdkaster 4 days ago
Comment by pradeep1177 3 days ago
Harness: I'm about to commit. Good use case Harness: What has changed from X to Y. Bad use case NO?
Comment by anoop4bhat 3 days ago
Comment by zdkaster 3 days ago
Comment by joud_hq 1 day ago
Comment by jazzen 1 day ago
Comment by xuanlin314 21 hours ago
Comment by winphoto 3 days ago
Comment by songting591 3 days ago
Comment by keenseller709 4 days ago
Comment by xuanlin314 2 days ago
Comment by xuanlin314 3 days ago
Comment by bonigv 2 days ago
Comment by Wenary 3 days ago
Comment by sikamikanikobg 3 days ago