Many Small Queries Are Efficient in SQLite
Posted by tosh 5 days ago
Comments
Comment by daitangio 5 days ago
SQLite is an embedded database: no socket to open, you directly access to it via file system.
If you do not plan to use BigData with high number of writers, you will have an hard time beating SQLite on modern hardware, on average use cases.
I have written a super simple search engine [1] using python asyncio and SQLite is not the bottleneck so far.
If you are hitting the SQLite limit, I have an happy news: PostgreSQL upgrade will be enough for a lot of use cases [2]: you can use it to play with a schemaless mongo-like database, a simple queue system [3] or a search engine with stemming. After a while you can decide if you need a specialized component (i.e. Kafka, Elastic Search, etc) for one of your services.
[1]: https://github.com/daitangio/find
[2]: https://gioorgi.com/2025/postgres-all/
Comment by CuriouslyC 5 days ago
The nice thing about this pattern is that you can create foreign data wrappers for your customer SQLite databases and query them as if they were in postgres, cross customer aggregations are slow but individual customer analytics are quite fast, and this gives you near infinite scalability.
Comment by storystarling 5 days ago
Comment by liuliu 5 days ago
(I am just asking: are you sure WAL is on?)
Comment by conradkay 5 days ago
Comment by adityaathalye 5 days ago
I'm seeing these numbers on my current scratch benchmark:
- Events append to a 10M+ record table (~4+ GiB database).
- Reads are fetched from a separate computed table, which is trigger-updated from the append-only table.
- WAL-mode ON, Auto-vacuum ON
{:dbtype "sqlite",
:auto_vacuum "INCREMENTAL",
:connectionTestQuery "PRAGMA journal_mode;",
:preferredTestQuery "PRAGMA journal_mode;",
:dataSourceProperties
{:journal_mode "WAL",
:limit_worker_threads 4,
:page_size 4096,
:busy_timeout 5000,
:enable_load_extension true,
:foreign_keys "ON",
:journal_size_limit 0,
:cache_size 15625,
:maximumPoolSize 1,
:synchronous "NORMAL"}},
- 1,600 sequential (in a single process) read-after-write transactions, append-only, no batching.- With a separate writer process (sequential), and concurrently, two reader processes, I'm seeing 400+ append transactions/second (into the append-only table, no batching), and a total of 41,000 reads per second, doing `select *` on the trigger-updated table.
My schema is (deliberately) poor --- most of it is TEXT.
(edit: add clarifying text)
Comment by TylerE 5 days ago
Comment by adityaathalye 5 days ago
This is deliberate, to emulate "whoops, if I screw up my types, how bad does it get?".
However, when written into the DB with some care, each value is stored per the following storage classes:
https://sqlite.org/datatype3.html
Quoting...
```
Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
NULL. The value is a NULL value.
INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
BLOB. The value is a blob of data, stored exactly as it was input.
A storage class is more general than a datatype. The INTEGER storage class, for example, includes 7 different integer datatypes of different lengths. This makes a difference on disk. But as soon as INTEGER values are read off of disk and into memory for processing, they are converted to the most general datatype (8-byte signed integer). And so for the most part, "storage class" is indistinguishable from "datatype" and the two terms can be used interchangeably.Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, may be used to store a value of any storage class.
All values in SQL statements, whether they are literals embedded in SQL statement text or parameters bound to precompiled SQL statements have an implicit storage class. Under circumstances described below, the database engine may convert values between numeric storage classes (INTEGER and REAL) and TEXT during query execution.
```
(edits: formatting, clarify what I'm doing v/s what SQLite does)
Comment by appplication 4 days ago
My data scale is quite small (hundreds on mb), so you’d think SQLite would be perfect but Postgres really was just a lot simpler to spin up in a docker container and the performance difference in a 2G VPS is not noticeable. I’m sure the above issues were solvable but it was easier for me to just use Postgres and move on.
Comment by cyanmagenta 5 days ago
It seems like it mostly comes down to how likely it is that the site will grow large enough to need a networked database. And people probably wildly overestimate this. HackerNews, for example, runs on a single computer.
Comment by andersmurphy 5 days ago
That's before you even get into sharding sqlite.
[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...
Comment by bastawhiz 4 days ago
Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.
Comment by andersmurphy 4 days ago
Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.
Really, there are a handful of cases where you need multiple servers:
- You're network limited (basically you're a CDN).
- You are drive limited you need to get data off dirves faster than their bandwidth.
- Some legal requirement.
This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.
Comment by graemep 4 days ago
Comment by andersmurphy 4 days ago
Comment by luckylion 5 days ago
Network adds latency and while it might be fine to run 500 queries with the database being on the same machine, adding 1-5ms per query makes it feel not okay.
Comment by magicalhippo 5 days ago
Or going from ~1ms over a local wired network to ~10ms over a wireless network.
Had a customer performance complaint that boiled down to that, something that should take minutes took hours. Could not reproduce it internally.
After a lot of back abd forth I asked if the user machine was wired. Nope, wireless laptop. Got them to plug in like their colleagues and it was fast again.
Comment by fwip 5 days ago
Comment by magicalhippo 5 days ago
We're slowly rewriting the application, batching in the core logic will absolutely be high up on the board.
Comment by fwip 4 days ago
Comment by cyanmagenta 5 days ago
This isn’t an sqlite-specific point, although sqlite often runs faster on a single machine because local sockets have some overhead.
Comment by itopaloglu83 5 days ago
LLM also has this tendency of premature optimization where they start to write very complex classes for users who only want to extract some information just to resolve a quick problem.
Comment by 63stack 5 days ago
Comment by 9rx 5 days ago
They call it the n+1 problem. 200 queries is the theoretically correct approach, but due to high network latency of networked DMBSes you have to hack around it. But if the overhead is low, like when using SQLite, then you would not introduce hacks in the first place.
The parent is saying that if you correctly design your application, but then move to system that requires hacks to deal with its real-world shortcomings, that you won't be prepared. Although I think that's a major overstatement. If you have correctly designed the rest of your application too, introducing the necessary hacks into a couple of isolated places is really not a big deal at all.
Comment by PaulHoule 5 days ago
Part of the "object-relational mapping" problem has always been that SQL is superior to conventional programming languages in many ways.
Comment by 9rx 5 days ago
SQL was originally designed to run on the same machine as the user, so it was never envisioned as a problem. It wasn't until Oracle decided to slap networking protocols on top of an SQL engine did it become one. Unfortunately, they should have exposed a language more conducive to the limitations of the network, performing the mapping in the same place as the database. But, such is the life of commercial computing.
Oracle has that now, it was just several decades too late, and by that time everyone else had copied their bad ideas.
Comment by ctxc 5 days ago
Comment by anamexis 5 days ago
Comment by Kinrany 5 days ago
Comment by direwolf20 5 days ago
Comment by Gabrys1 5 days ago
Comment by CuriouslyC 5 days ago
Comment by bastawhiz 4 days ago
And what happens if you add a new server or scale down? You need to re-shard your data?
Comment by rustybolt 5 days ago
Comment by password4321 5 days ago
> For stored procedures that contain several statements that don't return much actual data, or for procedures that contain Transact-SQL loops, setting SET NOCOUNT to ON can provide a significant performance boost, because network traffic is greatly reduced.
Comment by Neywiny 5 days ago
Comment by direwolf20 5 days ago
Comment by ahartmetz 5 days ago
Comment by zffr 5 days ago
Maybe the page could have been shorter, but not my much.
Comment by sodapopcan 5 days ago
Comment by jstummbillig 5 days ago
Comment by chuckadams 5 days ago
Comment by mickeyp 5 days ago
Comment by ogogmad 5 days ago
Comment by Polizeiposaune 5 days ago
Always send "pragma foreign_keys=on" first thing after opening the db.
Some of the types sloppiness can be worked around by declaring tables to be STRICT. You can also add CHECK constraints that a column value is consistent with the underlying representation of the type -- for instance, if you're storing ip addresses in a column of type BLOB, you can add a CHECK that the blob is either 4 or 16 bytes.
Comment by BenjiWiebe 5 days ago
Still doesn't have a huge variety of types though.
Comment by mikeocool 5 days ago
I understand maintaining backwards compatibility, but the non-strict behavior is just so insane I have a hard time imagine it doesn’t bite most developers who use SQLite at some point.
Comment by sgbeal 4 days ago
SQLite makes strong backwards-compatibility guarantees. How many apps would be broken if an Android update suddenly defaulted its internal copy of SQLite to STRICT? Or if it decided to turn on foreign keys by default?
Those are rhetorical questions. Any non-0 percentage of affected applications adds up to a big number for software with SQLite's footprint.
Software pulling the proverbial rug out from under downstream developers by making incompatible changes is one of the unfortunate evils of software development, but the SQLite project makes every effort to ensure that SQLite doesn't do any rug-tugging.
Comment by sethops1 5 days ago
Comment by andersmurphy 4 days ago
Comment by jerf 5 days ago
Comment by jstummbillig 5 days ago
Comment by conradkay 5 days ago
Comment by jstummbillig 5 days ago
Comment by conradkay 5 days ago
And stuff like https://litestream.io/ or SQLite adding STRICT mode
Comment by Kerrick 5 days ago
Comment by skrebbel 5 days ago
Maybe this has been solved though? Anybody here running a serious backend-heavy app with SQLite in production and can share? How do you remotely edit data, do analytics queries etc on production data?
Comment by Sammi 5 days ago
Comment by Cthulhu_ 5 days ago
Comment by Kerrick 5 days ago
Comment by dahart 5 days ago
Comment by charcircuit 5 days ago
And instead were spent blocking on the disk for all of the extra queries that were made? Or is it trying to say that the concatenation a handful of strings takes 22 ms. Considering how much games can render with a 16 ms budget I don't see where that time is going rendering html.
Comment by simonw 5 days ago
Update: Actually it looks like I was wrong about TH1: https://fossil-scm.org/home/doc/tip/www/th1.md
The timeline appears to be constructed by C code instead: https://www.fossil-scm.org/home/file?name=src/timeline.c&ci=...
Update 2: Here's the timeline code from September 2016: https://www.fossil-scm.org/home/file?name=src/timeline.c&ci=...
Back then it had some kind of special syntax for outputting HTML:
sqlite3_snprintf(sizeof(zNm),zNm,"b%d",i);
zBr = P(zNm);
if( zBr && zBr[0] ){
@ <p style='border:1px solid;background-color:%s(hash_color(zBr));'>
@ %h(zBr) - %s(hash_color(zBr)) -
@ Omnes nos quasi oves erravimus unusquisque in viam
@ suam declinavit.</p>
cnt++;
}
}
That @ syntax is used in modern day Fossil too. Maybe that adds some extra overhead?Comment by sgbeal 4 days ago
(Long-time Fossil dev here.)
The @ syntax is pre-processed, transformed to printf()-like calls, the destination of which depends on whether fossil is currently running (to simplify only slightly) from the CLI or as a CGI/server process.
That is: @ itself has no runtime costs, but does transform into calls which do have runtime costs. (printf() and its ilk aren't cheap!)
Comment by hnlmorg 5 days ago
Which should be obvious. But I could see some reading this blog post and jumping to the wrong conclusion.
Comment by PaulHoule 5 days ago
In the bad old days you had to wait for a lever to move and for the disk to rotate at least once!
Comment by hnlmorg 5 days ago
I know it’s not and never suggested it was.
I was making the point that writes contain more overhead than reads (which should be obvious) so people should bear that in mind when reading this blog post.
Edit: is it “bear” or “bare”? I’m never sure with that phrase haha
Comment by wussboy 5 days ago
Comment by hahahahhaah 5 days ago
I.e. sometimes one query is cheaper. It is not network anymore.
Also you can run your "big" DB like postgres on the same machine too. No law against that.
Comment by wenc 5 days ago
Most SQLite queries however, are not analytic queries. They're more like record retrievals.
So hitting a SQLite table with 200 "queries" is similar hitting a webserver with 200 "GET" commands.
In terms of ergonomics, SQLite feels more like a application file-format with a SQL interface. (though it is an embedded relational database)
Comment by sgbeal 4 days ago
Let's also not forget that db servers can have a memory, in that they can tweak query optimization based on previous queries or scans or whatever state is relevant. SQLite has no memory, in that sense. All query optimizations it makes are based solely upon the single query being processed.
Comment by dahart 5 days ago
Comment by Kinrany 5 days ago
Comment by hahahahhaah 5 days ago
Each query needs to navigate the index then read. The two queries do that twice.
Is it faster to read pages 30-50 of a book by:
a) Go to page 30, read until 50
b) Go to page 30, read that page, close book, open book, go to page 31 and so on.
Each page open you get to binary search to find the page.
Comment by Kinrany 1 day ago
Comment by silon42 5 days ago
Unless you have toy amounts data... or doing batch operations which is not typical (and can be problematic for other transactions due to locking, etc...)
Comment by hahahahhaah 5 days ago
Comment by maxpert 5 days ago
Comment by ai-christianson 5 days ago
Comment by yomismoaqui 5 days ago
Comment by lifetimerubyist 5 days ago
I had to build some back-office tools and used Ruby on Rails with SQLITE and didn't bother with doing "efficient" joins or anything. Just index the foreign keys, do N+1s everywhere - you'll be fine. The app is incredibly easy to maintain and add features because of this and the db is super easy to backup - literally just scp the sqlite db file somewhere else. Couldn't be happier with this setup.
Comment by beagle3 5 days ago
If there's a chance someone is writing to the database during the copy, you should "sqlite3 database.sqlite .backup" (or ".dump") first; Or, alternatively, on a new enough sqlite3, you have a builtin sqlite3_rsync that is like rsync except it interacts with the sqlite3 updates to guarantee a good copy at the other end.
Comment by lifetimerubyist 5 days ago
We just flip into an app-side maintenance mode before we run the backup so we know there’s no writes, scp the file and then flip it back. We only do nightlies so it’s not a problem. The shell script is super simple and we’ve only needed to do nightly backups so far so we run it in a cron at midnight when no one is working. Ezpz. Literally took us an hour to implement and been chugging along without issues for nearly 2 years now without fail.
If we ever need more than that I’d probably just setup litestream replication.
Comment by gcbirzan 5 days ago
How much faster are the better queries?
Comment by nchmy 5 days ago
Or am i mistaken in thinking that communicating to mysql on localhost is comparable latency to sqlite?
Comment by Cthulhu_ 5 days ago
Of course, SQLite and client/server database servers have different use cases, so it is kind of an apples and oranges comparison.
Comment by nchmy 4 days ago
Comment by Neywiny 5 days ago
Comment by nchmy 4 days ago
Comment by wild_egg 5 days ago
SQLite is embedded in your program's address space. You call its functions directly like any other function. Depending on your language, there is probably some FFI overhead but it's a lot less than than an external localhost connection
Comment by Sesse__ 5 days ago
Comment by polyrand 5 days ago
I'm saying this as a huge SQLite fan, but also beware of what kind of storage you're using in your instance.
Comment by andersmurphy 4 days ago
Comment by philipodonnell 5 days ago
Comment by sgbeal 4 days ago
i'm not sure whether this might be helpful to you, but 3.52 will include a revamped "kvvfs" which (A) also works (non-persistently) in Worker threads and (B) supports callbacks to asynchronously send all db page writes to the client.
Comment by delbronski 5 days ago
I can think of many hacks to do this, but is there a best practice for this kind of stuff? I’m curious how people do this.
Comment by dtkav 5 days ago
I use it with pocketbase and it is a delightful and very productive setup.
This guide [2] is for an older version of pocketbase and litestream, but i can update it if would be helpful/interesting for anyone.
[1] https://github.com/benbjohnson/litestream/
[2] https://notes.danielgk.com/Pocketbase/Pocketbase+on+Fly.io
Comment by dansult 5 days ago
Comment by delbronski 5 days ago
Comment by NorwegianDude 5 days ago
Comment by adzm 5 days ago
Comment by NorwegianDude 4 days ago
Comment by yencabulator 4 days ago
Comment by NorwegianDude 3 days ago
Comment by yencabulator 3 days ago
1M 64-bit integers is only 8MB, that's still a small keyspace.
Comment by adzm 3 days ago
Comment by jacobobryant 5 days ago
Comment by solumunus 5 days ago
Comment by meken 5 days ago
Comment by nefarious_ends 5 days ago
Comment by meken 5 days ago
Comment by causalscience 5 days ago
So the sqlite developers use their on versioning system which uses sqlite for storage. Funny.
Comment by kmeisthax 5 days ago
Comment by phendrenad2 4 days ago
Comment by flipped 5 days ago
Comment by otoolep 5 days ago
As for reliability - it's a fault-tolerant, highly available system. Reliability is the reason it exists. :-) If you're asking about quality and test coverage, you might like to check out these resources:
- https://rqlite.io/docs/design/
Comment by pmbanugo 5 days ago