The local LLM ecosystem doesn’t need Ollama
Posted by Zetaphor 1 day ago
Comments
Comment by cientifico 1 day ago
One command, and you are running the models even with the rocm drivers without knowing.
If llama provides such UX, they failed terrible at communicating that. Starting with the name. Llama.cpp: that's a cpp library! Ollama is the wrapper. That's the mental model. I don't want to build my own program! I just want to have fun :-P
Comment by anakaine 1 day ago
Comment by nikodunk 1 day ago
brew install llama.cpp
llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000
Go to localhost:8000 for the Web UI. On Linux it accelerates correctly on my AMD GPU, which Ollama failed to do, though of course everyone's mileage seems to vary on this.
Comment by teekert 1 day ago
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model
Edit: @below, I used `nix-shell -p llama-cpp` so not brew related. Could indeed be an older version indeed! I'll check.
Comment by adrian_b 1 day ago
There are 2 main reasons. One is the tokenizer, where new tokenizer definitions may be mishandled by the older tokenizer parsers.
The second reason is that each model may implement differently the tool invocations, e.g. by using different delimiter tokens and different text layouts for describing the parameters of a tool invocation.
Therefore running the Gemma-4 models encountered various problems during the first days after their release, especially for the dense 31B model.
Solving these problems required both a new version of llama.cpp (also for other inference backends) and updates in the model chat template and tokenizer configuration files.
So anyone who wants to use Gemma-4 should update to the latest version of llama.cpp and to the latest models from Huggingface, because the latest updates have been a couple of days ago.
Comment by roosgit 1 day ago
Comment by teekert 1 day ago
I'm now on:
$ llama --version version: 8770 (82764d8) built with GNU 15.2.0 for Linux x86_64
(From Nix unstable)
And this works as advertised, nice chat interface, but no openai API I guess, so no opencode...
Comment by homarp 1 day ago
Comment by teekert 1 day ago
Comment by zozbot234 1 day ago
Comment by Eisenstein 1 day ago
Comment by cyanydeez 1 day ago
Comment by Eisenstein 1 day ago
Comment by cyanydeez 21 hours ago
Comment by Eisenstein 20 hours ago
Comment by OtherShrezzing 1 day ago
Comment by eterm 1 day ago
In fact the first line of the wikipedia article is:
> llama.cpp is an open source software library
Comment by RobotToaster 1 day ago
Comment by gettingoverit 21 hours ago
Comment by homarp 1 day ago
Comment by adrian_b 1 day ago
On non-Apple PCs, "llama-server" is what you use, and you can connect to it either with a browser or with an application compatible with the OpenAI API.
Perhaps using "llama-server" as the name of the project would have been less confusing for newbies than "llama.cpp".
I confess that when I first heard about "llama.cpp" I also thought that it is just a library and that I have to write my own program in order to implement a complete LLM inference backend.
Comment by mastermage 7 hours ago
Comment by figassis 1 day ago
Comment by mijoharas 1 day ago
It makes a bunch of decisions for you so you don't have to think much to get a model up and running.
Comment by zombot 1 day ago
Comment by throwa356262 23 hours ago
If you today visit a models page on huggingface, the site will show you the exact oneliner you need to run to it on llama.cpp.
I didn't measure it, but both download and inference felt faster than ollama. One thing that was definitely better was memory usage, which may be important if you want to run small models on SCB.
Comment by JKCalhoun 1 day ago
Plenty of alternatives listed. Can anyone with experience suggest the likely successor to Ollama? I have a Mac Mini but don't mind a C/L tool.
I think, as was pointed out, Ollama won because of how easy it is to set up, pull down new models. I would expect similar for a replacement.
Comment by Zetaphor 15 hours ago
Comment by samus 1 day ago
Re curation: they should strive to not integrate broken support for models and avoid uploading broken GGUFs.
Comment by omgitspavel 1 day ago
And you can blame docker in a similar manner. LXC existed for at least 5 years before docker. But docker was just much more convenient to use for an average user.
UX is a huge factor for adoption of technology. If a project fails at creating the right interface, there is nothing wrong with creating a wrapper.
Comment by ekianjo 1 day ago
This does not absolve them from the license violation
Comment by well_ackshually 1 day ago
>One command
Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`, and that running things in the terminal is already a gigantic UX blocker (Ollama's popularity comes from the fact that it has a GUI), why are you putting the blame back on an open source project that owes you approximately zero communication ?
Comment by zozbot234 1 day ago
It's not the GUI, it's the curated model hosting platform. Way easier to use than HF for casual users.
Comment by Eisenstein 1 day ago
There is a TON of difference. Ollama downloads the model from its own model library server, sticks it somewhere in your home folder with a hashed name and a proprietary configuration that doesn't use the in built metadata specified by the model creator. So you can't share it with any other tool, you can't change parameters like temp on the fly, and you are stuck with whatever quants they offer.
Comment by alzoid 1 day ago
The current offering have interfaces to HuggingFace or some model repo. They get you the model based on what they think your hardware can handle and save it to %user%/App Data/Local/%app name%/... (on windows). When I evaluated running locally I ended up with 3 different folders containing copies of the same model in different directory structures.
It seems like HuggingFace uses %user%/.cache/.. however, some of the apps still get the HF models and save them to their own directories.
Those features are 'fine' for a casual user who sticks with one program. It seems designed from the start to lock you into their wrapper. In the end they are all using llama cpp, comfy ui, openvino etc to abstract away the backed. Again this is fine but hiding the files from the user seems strange to me. If you're leaning on HF then why now use their own .cache?
In the end I get the latest llama.cpp releases for CUDA and SYCL and run llama-server. My best UX has been with LM Studio and AI Playground. I want to try Local AI and vLLM next. I just want control over the damn files.
Comment by ryandrake 20 hours ago
sqlite3 my database.dbComment by Eisenstein 23 hours ago
Comment by croes 1 day ago
Comment by amelius 1 day ago
Comment by mech422 1 day ago
Comment by FrozenSynapse 1 day ago
Comment by UqWBcuFx6NV4r 1 day ago
Comment by Zetaphor 1 day ago
Comment by brabel 1 day ago
It’s truly open source, backed by Mozilla, openly uses llama.cpp and was created by wizard Justine Tunney of CosmopolitanC fame.
Comment by cachius 1 day ago
Comment by kashyapc 1 day ago
Comment by Mario9382 1 day ago
Comment by robot-wrangler 1 day ago
Comment by kelsolaar 1 day ago
Comment by julien_c 1 day ago
uh actually, _we_ did (generates a Docker-style manifest on the fly)
Comment by 0xbadcafebee 1 day ago
I started with Ollama, and it was great. But I moved to llama.cpp to have more up-to-date fixes. I still use Ollama to pull and list my models because it's so easy. I then built my own set of scripts to populate a separate cache directory of hardlinks so llama-swap can load the gguf's into llama.cpp.
Comment by AndroTux 1 day ago
I’m open to suggestions, but the alternatives outlined in the blog post ain’t it.
Comment by mentalgear 1 day ago
> LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in.
> Jan(https://www.jan.ai/) is another open-source desktop app with a clean chat interface and local-first design.
> Msty(https://msty.ai/) offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.
API wise: LM Studio has REST, OpenAI and more API Compatibilities.
Comment by shantnutiwari 1 day ago
So no, they are not alternatives to ollama
Comment by Zetaphor 15 hours ago
Comment by adrian_b 1 day ago
As other posters report, now llama-server implements an OpenAI compatible API and you can also connect to it with any Web browser.
I have not tried yet the OpenAI API, but it should have eliminated the last Ollama advantage.
I do not believe that the Ollama "curated" models are significantly easier to use for a newbie than downloading the models directly from Huggingface.
On Huggingface you have much more details about models, which can allow you to navigate through the jungle of countless model variants, to find what should be more suitable for yourself.
The fact criticized in TFA, that the Ollama "curated" list can be misleading about the characteristics of the models, is a very serious criticism from my point of view, which is enough for me to not use such "curated" models.
I am not aware of any alternative for choosing and downloading the right model for local inference that is superior to using directly the Huggingface site.
I believe that choosing a model is the most intimidating part for a newbie who wants to run inference locally.
If a good choice is made, downloading the model, installing llama.cpp and running llama-server are trivial actions, which require minimal skills.
Comment by justinclift 1 day ago
For a (brand new!) newbie, it's very, very likely to be information overload.
They're still at the start of their journey, so simple tends to be better for 90% of users. ;)
Comment by Philip-J-Fry 1 day ago
LMStudio is listed as an alternative. It offers a chat UI, a model server supporting OpenAI, Anthropic and LMStudio API interfaces. It supports loading the models on demand or picking what models you want loaded. And you can tweak every parameter.
And it uses llama.cpp which is the whole point of the blog post.
Comment by AndroTux 1 day ago
Comment by homarp 1 day ago
llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000 (with MCP support and web chat interface)
and you have OpenAI API on the same 8000 port. (https://github.com/ggml-org/llama.cpp/tree/master/tools/serv... lists the endpoints)
Comment by AndroTux 1 day ago
That's what I meant by model management. I'm too tired to scroll through a bazillion models that all have very cryptic names and abbreviations just to find the one that works well on my system with my software stack.
I want a simple interface that a tool like me can scroll through easily, click on, and then have a model that works well enough. If I put in that much brain power to get my LLM working, I might as well do the work myself instead of using an LLM in the first place.
Comment by throwa356262 23 hours ago
2. Choose the model they recommend
3. Run the one-liner the site gives you
Bonus: faster access to latest models and better memory usage
Comment by AndroTux 6 hours ago
Do you think that this 229B parameter model will work on my consumer PC?
Stop pretending like HF is in any way beginner friendly.
Comment by kgeist 1 day ago
I remember changing the context size from the default unusable 2k to something bigger the model actually supports required creating a new model file in Ollama if you wanted the change to persist (another alternative: set an env var before running ollama; although, if you go that low-level route, why not just launch llama.cpp). How was that easier? Did they change this?
I remember people complaining model X is "dumb" simply because Ollama capped the context size to a ridiculously small number by default.
IMHO trying to model Ollama after Docker actually makes it harder for casual users. And power users will have it easier with llama.cpp directly
Comment by flux3125 1 day ago
Just in case you haven't seen it yet, llama.cpp now has a router mode that lets you hot-swap models. I've switched over from llama-swap and have been happy with it.
Comment by throw9393rj 1 day ago
Comment by ottah 1 day ago
Comment by BrissyCoder 1 day ago
Easier than what?
I came across LM Studio (mentioned in the post) about 3 years ago before I even knew what Ollama as. It was far better even then.
Comment by rowendduke 1 day ago
Resumable downloads seem to work better in llama-cpp.
I love the inbuilt GUI.
I used ollama first and honestly, llama-cpp has been a much better experience.
Maybe given enough time, I would have seen the benefit of ollama but the inability to turn off updates even after users requested it extensively made me uninstall it. Postman PTSD is real.
Comment by kashyapc 1 day ago
The point of the article is not to expound on how user-friendly "Ollama" is. It's about exposing the deception and shameful moral low ground they took.
Comment by myfakebadcode 1 day ago
Agreed ollama is a good intro but once you move beyond it starts to be a pain.
Comment by Eisenstein 1 day ago
Comment by u1hcw9nx 1 day ago
1. MIT-style licenses are "do what you want" as long as you provide a single line of attribution. Including building big closed source business around it.
2. MIT-style licenses are "do what you want" under the law, but they carry moral, GPL-like obligations to think about the "community."
To my knowledge Georgi Gerganov, the creator of llama.cpp, has only complained about attribution when it was missing. As an open-source developer, he selected a permissive license and has not complained about other issues, only the lack of credit. It seems he treats the MIT license as the first kind.
The article has other good points not related to licensing that are good to know. Like performance issues and simplicity that makes me consider llama.cpp.
Comment by maybewhenthesun 1 day ago
A license is what it says in the license, nothing extra. It's a legal document not a moral guideline.
I do think it's a very good idea to always use the GPL (even though commercially minded types always get their panties in a bunch about the GPL) for any user-facing software, to force everybody to 'play fair and share'. The only reason to use MIT imho is for a library implementing some sort of standard where you want that standard used by as many people as possible.
I don't understand people who use MIT for their project and then complain some commercial firm takes their contributions and runs with it. If that's not what you want don't use that license.
Apart from license terms and moral obligations being a bad mix, companies don't have morals. Don't get me wrong, I think they should have! But they don't.
People have morals. Groups of people (a company, a country , a mob) not so much. Sadly.
Comment by WobblyDev 1 day ago
Comment by duskdozer 1 day ago
Comment by maybewhenthesun 16 hours ago
Comment by WobblyDev 1 day ago
Comment by usernomdeguerre 1 day ago
At the time I dropped it for LMStudio, which to be fair was not fully open source either, but at least exposed the model folder and integrated with HF rather than a proprietary model garden for no good reason.
Comment by zozbot234 1 day ago
Actually they do. It's environment variable OLLAMA_MODELS in the server configuration file.
Comment by ekianjo 1 day ago
Comment by zozbot234 1 day ago
Comment by trinix912 5 hours ago
Comment by andreidbr 1 day ago
Had to go down the same rabbit hole of finding where things are, how they're sorted/separated/etc. It was unnecessarily painful
Comment by dizhn 1 day ago
This is the reason I had stopped using it. I think they might be doing it for deduplication however it makes it impossible to use the same model with other tools. Every other tool can just point to the same existing gguf and can go. Whether its their intention or not, it's making it difficult to try out other tools. Model files are quite large as you know and storage and download can become issues. (They are for me)
Comment by zxcholmes 1 day ago
Comment by kgwgk 1 day ago
Edit: or maybe that was your point. I guess that for historical reasons this is a kind of generic name for local deployments now (see https://www.reddit.com/r/LocalLLaMA) just like people will call anything ChatGPT.
Comment by abhikul0 1 day ago
Comment by denismi 1 day ago
pacman -Ss ollama | wc -l
16
pacman -Ss llama.cpp | wc -l
0
pacman -Ss lmstudio | wc -l
0
Maybe some day.Comment by mongrelion 1 day ago
There are packages for Vulkan, ROCm and CUDA. They all work.
Comment by yjftsjthsd-h 22 hours ago
Comment by rf15 22 hours ago
Comment by yjftsjthsd-h 21 hours ago
Edit: Or perhaps put differently: If ollama includes a copy of llama.cpp and has a non-AUR package, why can't there be a non-AUR package that's just llama.cpp without ollama?
Comment by FlyingSnake 1 day ago
I just installed llama.cpp on CachyOS after reading this article. It’s much faster and better than Ollama.
Comment by tmtvl 22 hours ago
zypper --no-refresh search llamacpp | tail -n5 | wc -l
5
Sometimes Arch has the software you want at the version you want, other times it doesn't but other distros do. That's why there's half a billion distros instead of just one.Comment by zarzavat 1 day ago
Comment by blueybingo 1 day ago
Comment by flux3125 1 day ago
Comment by FeepingCreature 1 day ago
Comment by fy20 1 day ago
Comment by wolvoleo 1 day ago
Comment by kgwgk 1 day ago
One week, really, if we consider the "public" availability.
Llama announced: February 24, 2023
Weights leaked: March 3, 2023
Llama.cpp: March 10, 2023
(Ollama 0.0.1: Jul 8, 2023)
Comment by Maxious 1 day ago
Ollama v0.0.1 "Fast inference server written in Go, powered by llama.cpp" https://github.com/ollama/ollama/tree/v0.0.1
Comment by em-bee 1 day ago
doing what?
trying to build themselves what llama.cpp ended up doing for them?
Comment by saghul 1 day ago
Comment by song 1 day ago
Comment by wrxd 1 day ago
If someone has opinions please let us know!
Comment by Zetaphor 15 hours ago
Comment by osmsucks 1 day ago
Comment by g023 8 hours ago
Comment by speedgoose 1 day ago
I will switch once we have good user experience on simple features.
A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.
Comment by kennywinker 1 day ago
Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.
The better performance point alone seems worth switching away
Comment by speedgoose 1 day ago
Comment by Maxious 1 day ago
Comment by derrikcurran 1 day ago
`rm [FILE_NAME]`
With Ollama, the initial one-time setup is a little easier, and the CLI is useful, but is it worth dysfunctional templates, worse performance, and the other issues? Not to me.
Jinja templates are very common, and Jinja is not always losslessly convertible to the Go template syntax expected by Ollama. This means that some models simply cannot work correctly with Ollama. Sometimes the effects of this incompatibility are subtle and unpredictable.
Comment by pheggs 1 day ago
Comment by speedgoose 1 day ago
Comment by dminik 1 day ago
https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
Comment by speedgoose 1 day ago
Comment by ekianjo 1 day ago
Comment by speedgoose 1 day ago
I see quite a few versions, and I can also use hugging face models.
Comment by TomGarden 1 day ago
Comment by MyUltiDev 20 hours ago
Comment by tosh 1 day ago
Both llama.cpp and ollama are great and focused on different things and yet complement each other (both can be true at the same time!)
Ollama has great ux and also supports inference via mlx, which has better performance on apple silicon than llama.cpp
I'm using llama.cpp, ollama, lm studio, mlx etc etc depending on what is most convenient for me at the time to get done what I want to get done (e.g. a specific model config to run, mcp, just try a prompt quickly, …)
Comment by matja 1 day ago
Not really, because Ubuntu has always acknowledged Debian and explicitly documented the dependency:
> Debian is the rock on which Ubuntu is built.
> Ubuntu builds on the Debian architecture and infrastructure and collaborates widely with Debian developers, but there are important differences. Ubuntu has a distinctive user interface, a separate developer community (though many developers participate in both projects) and a different release process.
Source: https://ubuntu.com/community/docs/governance/debian
Ollama never has for llama.cpp. That's all that's being asked for, a credit.
Comment by UqWBcuFx6NV4r 1 day ago
Comment by oefrha 1 day ago
According to the article, ollama is not great (that’s an understatement), focused on making money for the company, stealing clout and nothing else, and hardly complements llama.cpp at all since not long after the initial launch. All of these are backed by evidence.
You may disagree, but then you need to refute OP’s points, not try to handwave them away with a BS analogy that’s nothing like the original.
Comment by operatingthetan 1 day ago
Comment by carlostkd 1 day ago
Comment by damnitbuilds 1 day ago
So it is more like saying "Stop using SCO Unix, use Linux instead".
Comment by yuppiepuppie 1 day ago
Comment by cadamsdotcom 1 day ago
Comment by damnitbuilds 1 day ago
" This isn’t a matter of open-source etiquette, the MIT license has exactly one major requirement: include the copyright notice. Ollama didn’t.
The community noticed. GitHub issue #3185 was opened in early 2024 requesting license compliance. It went over 400 days without a response from maintainers. When issue #3697 was opened in April 2024 specifically requesting llama.cpp acknowledgment, community PR #3700 followed within hours. Ollama’s co-founder Michael Chiang eventually added a single line to the bottom of the README: “llama.cpp project founded by Georgi Gerganov.” "
Comment by shantnutiwari 1 day ago
NO, it is not simpler or even as simple as Ollama.
There are multiple options-- llama server and cli, its not obivous which model to use.
With ollama, its one file. And you get the models from their site, you can browse an easy list.
I dont have the time to go thru 20billlion hugging face models and decide which is the one for me.
Thanks, but I'm sticking with Ollama
Comment by 4k93n2 19 hours ago
Comment by dragochat 1 day ago
- vLLM https://vllm.ai/ ?
- oMLX https://github.com/jundot/omlx ?
Comment by bashbjorn 20 hours ago
In contrast to Ollama, this is a self-contained library, not a server.
I wrote some quick notes on this blogpost, just to jot down how we think about good open-source citizenship: https://www.nobodywho.ai/posts/notes-on-friends-dont-let-fri...
Comment by tyfon 1 day ago
Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.
My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.
[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...
Comment by majorchord 1 day ago
The article mentions llama-swap does this
Comment by hacker_homie 1 day ago
Comment by segmondy 1 day ago
Comment by ekianjo 1 day ago
Comment by mrkeen 1 day ago
% ramalama run qwen3.5-9b
Error: Manifest for qwen3.5-9b:latest was not found in the Ollama registryComment by mrkeen 1 day ago
--
% ramalama run qwen3.5
> hi
Server or container exited. Shutting down client.
-- % ramalama run gemma4:e2b
> hello
Server or container exited. Shutting down client.
--Comment by utopiah 1 day ago
It's a joke... but also not really? I mean VLC is "just" an interface to play videos. Videos are content files one "interact" with, mostly play/pause and few other functions like seeking. Because there are different video formats VLC relies on codecs to decode the videos, so basically delegating the "hard" part to codecs.
Now... what's the difference here? A model is a codec, the interactions are sending text/image/etc to it, output is text/image/etc out. It's not even radically bigger in size as videos can be huge, like models.
I'm confused as why this isn't a solved problem, especially (and yes I'm being a big sarcastic here, can't help myself) in a time where "AI" supposedly made all smart wise developers who rely on it 10x or even 1000x more productive.
Weird.
Comment by sudb 1 day ago
I think the codec analogy is neat but isn't the codec here llama.cpp, and the models are content files? Then the equivalent of VLC are things like LMStudio etc. which use llama.cpp to let you run models locally?
I'd guess one reason we haven't solved the "codec" layer is that there doesn't seem to be a standard that open model trainers have converged on yet?
Comment by imtringued 1 day ago
Comment by pplonski86 1 day ago
The fact that they are trying to make money is normal - they are a company. They need to pay the bills.
I agree that they should improve communication, but I assume it is still small company with a lot of different requests, and some things might be overlooked.
Overall I like the software and services they provide.
Comment by rrhjm53270 1 day ago
Comment by erusev 1 day ago
Comment by mastermage 7 hours ago
Comment by hojinkoh 9 hours ago
Comment by endymion-light 1 day ago
When i'm using Ollama - I honeslty don't care about performance, I'm looking to try out a model and then if it seems good, place it onto a most dedicated stack specifically for it.
Comment by brabel 1 day ago
And anyway this thread has lots of alternatives that are even easier to use and don’t shit on the open source community making things happen.
Comment by endymion-light 1 day ago
Currently i've found Ollama to have the best intuitive experience for trying new models. Once i've tried those models and decide on something to use for a project, I can deploy them, and not need to use a UI again.
I'll be trying out the other options in this thread, but my point is that ease of use is going to triumph over the other points the original post made, and some of the alternatives mentioned in the original post miss why Ollama is so popular.
Comment by brabel 1 day ago
Comment by endymion-light 1 day ago
Comment by Zetaphor 15 hours ago
Comment by NamlchakKhandro 1 day ago
Comment by thot_experiment 1 day ago
FWIW llama.cpp does almost everything ollama does better than ollama with the exception of model management, but like, be real, you can just ask it to write an API of your preferred shape and qwen will handle it without issue.
Comment by thot_experiment 15 hours ago
Comment by iib 1 day ago
Comment by ekianjo 1 day ago
Comment by mentalgear 1 day ago
The progression follows the pattern cleanly:
1. Launch on open source, build on llama.cpp, gain community trust
2. Minimize attribution, make the product look self-sufficient to investors
3. Create lock-in, proprietary model registry format, hashed filenames that don’t work with other tools
4. Launch closed-source components, the GUI app
5. Add cloud services, the monetization vectorComment by san_tekart 1 day ago
Comment by rothific 1 day ago
Comment by rement 1 day ago
Comment by dhruv3006 1 day ago
Comment by Zetaphor 15 hours ago
If you only needed a single reason, how about kneecapping your performance by choosing ollama?
Comment by abhikul0 1 day ago
I guess if you're not frustrated with things like this then sure, no need to stop using it.
Comment by nextlevelwizard 1 day ago
What is the llama-cpp alternative?
Comment by ipeev 1 day ago
llama.cpp was already public by March 10, 2023. Ollama-the-company may have existed earlier through YC Winter 2021, but that is not the same thing as having a public local-LLM runtime before llama.cpp. In fact, Ollama’s own v0.0.1 repo says: “Run large language models with llama.cpp” and describes itself as a “Fast inference server written in Go, powered by llama.cpp.” Ollama’s own public blog timeline then starts on August 1, 2023 with “Run Llama 2 uncensored locally,” followed by August 24, 2023 with “Run Code Llama locally.” So the public record does not really support any “they were doing local inference before llama.cpp” narrative.
And that is why the attribution issue matters. If your public product is, from day one, a packaging / UX / distribution layer on top of upstream work, then conspicuous credit is not optional. It is part of the bargain. “We made this easier for normal users” is a perfectly legitimate contribution. But presenting that contribution in a way that minimizes the upstream engine is exactly what annoys people.
The founders’ pre-LLM background also points in the same direction. Before Ollama, Jeffrey Morgan and Michael Chiang were known for Kitematic, a Docker usability tool acquired by Docker on March 13, 2015. So the pattern that fits the evidence is not “they pioneered local inference before everyone else.” It is “they had prior experience productizing infrastructure, then applied that playbook to the local-LLM wave once llama.cpp already existed.”
So my issue is not that Ollama is a wrapper. Wrappers can be useful. My issue is that they seem to have taken the social upside of open-source dependence without showing the level of visible credit, humility, and ecosystem citizenship that should come with it. The product may have solved a real UX problem, but the timeline makes it hard to treat them as if they were the originators of the underlying runtime story.
They seem very good at packaging other people’s work, and not quite good enough at sounding appropriately grateful for that fact.
Comment by aquir 1 day ago
I was using LM Studio since I've moved to MacOS so that's fine I guess
Comment by opem 1 day ago
Comment by yokoprime 1 day ago
Comment by dnnddidiej 1 day ago
Comment by DeathArrow 1 day ago
Comment by StrauXX 1 day ago
Comment by Havoc 1 day ago
Comment by renierbotha 1 day ago
Comment by alfiedotwtf 1 day ago
Comment by damnitbuilds 1 day ago
So given, as the author states, Ollama runs the LLMs inefficiently, what is the tool that runs them most efficiently on limited hardware ?
Comment by asim 1 day ago
This is the game. We shouldn't delude ourselves into thinking there are alternative ways to become profitable around open source, there aren't. You effectively end up in this trap and there's no escape and then you have to compromise on everything to build the company, return the money, make a profit. You took people's money, now you have to make good, there's no choice. And anyone who thinks differently is deluded. Open source only goes one way. To the enterprise. Everything else is burning money and wasting time. Look at Docker. Textbook example of the enormous struggle to capture the value of a project that had so much potential, defined an industry and ultimately failed. Even the reboot failed. Sorry. It did.
This stuff is messy. Give them some credit. They give you an epic open source project. Be grateful for that. And now if you want to move on, move on. They don't need a hard time. They're already having a hard time. These guys are probably sweating bullets trying to make it work while their investors breathe down their necks waiting for the payoff. Let them breathe.
Good luck to you ollama guys!
Comment by tasuki 1 day ago
It seems to me the epic open source project was given to us by Georgi Gerganov. These people just tried to milk it for some money, and made everything a little worse in the process.
Comment by ontouchstart 1 day ago
UX is where the money is, it is in the wrapper, not the core.
Unfortunately, the core is the most valuable and labor intensive part of it.
With agentic coding, the gap between solid core and shitty wrapper is going to be wider and wider.
Comment by alzoid 1 day ago
Comment by dackdel 1 day ago
Comment by sudb 1 day ago
also you might be the only person in the wild I've seen admit to this
Comment by sminchev 1 day ago
Clients get disappointed, alternatives have better services, and more are popping out monthly. If they continue that way, nothing good will happen, unfortunately :(
Comment by WhereIsTheTruth 1 day ago
It is a parasitic stack that redirects investment into service wrappers while leaving core infrastructure underfunded
We have to suffer with limits and quotas as if we are living in the Soviet Union
Comment by goodpoint 1 day ago
Comment by eternaut 1 day ago
Comment by NamlchakKhandro 1 day ago
Comment by stuaxo 1 day ago
At the top could have been a link to equivalent llamacpp workflows to ollamas.
I wish the op had gone back and written this as a human, I agree with not using Ollama but don't like reading slop.
Comment by audience_mem 1 day ago
Comment by IshKebab 1 day ago
It's like those cliche titles - for fun and profit, the unreasonable effectiveness of, all you need is, etc. etc. but throughout the prose. Stop it guys!
Comment by audience_mem 1 day ago
Comment by IshKebab 1 day ago
Not-this-but-that like "The local LLM ecosystem doesn’t need Ollama. It needs llama.cpp."
Weird signposting: "Benchmarks tell the story."
Heres-the-rub conclusion: "The Bigger Picture"
Starting every title with "The ...".
It's definitely largely human-written, but there are enough slop-isms to make it annoying to read. And of course it's totally possible for a human to write an an AI style, but that doesn't make it any less annoying.
Comment by Zetaphor 15 hours ago
Probably a side effect of using them so much
Comment by _bobm 1 day ago
Comment by paganel 1 day ago
Comment by cowartc 1 day ago
Comment by hani1808 1 day ago
Comment by eddie-wang 1 day ago
Comment by arcza 1 day ago
Comment by Karuma 1 day ago
No wonder I get downvoted to hell every time I mention this... People here can't even tell anymore. They just find this horrible slop completely normal. HN is just another dead website filled with slop articles, time to move on to some smaller reddit communities...
Comment by arcza 1 day ago
Comment by holliplex 1 day ago