Many Small Queries Are Efficient in SQLite

Posted by tosh 5 days ago

Comments

Comment by daitangio 5 days ago

I am using SQLite on paperless-ngx (an app to manage pdf [4]). It is quite difficult to beat SQLite if you do not have a very huge parallelism factor in writes.

SQLite is an embedded database: no socket to open, you directly access to it via file system.

If you do not plan to use BigData with high number of writers, you will have an hard time beating SQLite on modern hardware, on average use cases.

I have written a super simple search engine [1] using python asyncio and SQLite is not the bottleneck so far.

If you are hitting the SQLite limit, I have an happy news: PostgreSQL upgrade will be enough for a lot of use cases [2]: you can use it to play with a schemaless mongo-like database, a simple queue system [3] or a search engine with stemming. After a while you can decide if you need a specialized component (i.e. Kafka, Elastic Search, etc) for one of your services.

[1]: https://github.com/daitangio/find

[2]: https://gioorgi.com/2025/postgres-all/

[3]: https://github.com/daitangio/pque

[4]: https://docs.paperless-ngx.com

Comment by CuriouslyC 5 days ago

The pattern I like to advocate for now is to do customer sharding with SQLite. Cloudflare makes this easy with D1, you can tie Durable Objects to a user as an afterthought.

The nice thing about this pattern is that you can create foreign data wrappers for your customer SQLite databases and query them as if they were in postgres, cross customer aggregations are slow but individual customer analytics are quite fast, and this gives you near infinite scalability.

Comment by storystarling 5 days ago

You hit those write limits surprisingly early if you use background workers though. I had a project with very little user traffic that choked on SQLite simply because a few Celery workers were updating job statuses concurrently. It wasn't the volume of data, just the contention from the workers that forced the switch to Postgres.

Comment by liuliu 5 days ago

Are you sure it is choked on writes not on reads and writes? SQLite default setup is inefficient in many ways (as well as it's default compilation options), and that often cause issues.

(I am just asking: are you sure WAL is on?)

Comment by conradkay 5 days ago

I'd imagine that's it. With WAL you can probably hit >1000 writes a second

Comment by adityaathalye 5 days ago

SQLite emphatically warns against concurrent writes. It is not designed for that.

I'm seeing these numbers on my current scratch benchmark:

- Events append to a 10M+ record table (~4+ GiB database).

- Reads are fetched from a separate computed table, which is trigger-updated from the append-only table.

- WAL-mode ON, Auto-vacuum ON

  {:dbtype "sqlite",
   :auto_vacuum "INCREMENTAL",
   :connectionTestQuery "PRAGMA journal_mode;",
   :preferredTestQuery "PRAGMA journal_mode;",
   :dataSourceProperties
   {:journal_mode "WAL",
    :limit_worker_threads 4,
    :page_size 4096,
    :busy_timeout 5000,
    :enable_load_extension true,
    :foreign_keys "ON",
    :journal_size_limit 0,
    :cache_size 15625,
    :maximumPoolSize 1,
    :synchronous "NORMAL"}},

- 1,600 sequential (in a single process) read-after-write transactions, append-only, no batching.

- With a separate writer process (sequential), and concurrently, two reader processes, I'm seeing 400+ append transactions/second (into the append-only table, no batching), and a total of 41,000 reads per second, doing `select *` on the trigger-updated table.

My schema is (deliberately) poor --- most of it is TEXT.

(edit: add clarifying text)

Comment by TylerE 5 days ago

Isn’t that schema actually the opposite of poor for SQLite, since it stores everything as text internally?

Comment by adityaathalye 5 days ago

It employs "flexible typing", which does not mean "everything is text". What I am doing is writing fully denormalised text (strings) in most fields, with column type declared as TEXT.

This is deliberate, to emulate "whoops, if I screw up my types, how bad does it get?".

However, when written into the DB with some care, each value is stored per the following storage classes:

https://sqlite.org/datatype3.html

Quoting...

```

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:

    NULL. The value is a NULL value.

    INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

    REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.

    TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).

    BLOB. The value is a blob of data, stored exactly as it was input.

A storage class is more general than a datatype. The INTEGER storage class, for example, includes 7 different integer datatypes of different lengths. This makes a difference on disk. But as soon as INTEGER values are read off of disk and into memory for processing, they are converted to the most general datatype (8-byte signed integer). And so for the most part, "storage class" is indistinguishable from "datatype" and the two terms can be used interchangeably.

Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, may be used to store a value of any storage class.

All values in SQL statements, whether they are literals embedded in SQL statement text or parameters bound to precompiled SQL statements have an implicit storage class. Under circumstances described below, the database engine may convert values between numeric storage classes (INTEGER and REAL) and TEXT during query execution.

```

(edits: formatting, clarify what I'm doing v/s what SQLite does)

Comment by appplication 4 days ago

I recently chose Postgres over SQLite for a project, but only after taking SQLite as far as I comfortably could. I found I pretty much immediately ran into issues with how to run migrations in my remote environment, since it wasn’t network accessible. I also wasn’t easily able to connect a db manager for admin tasks.

My data scale is quite small (hundreds on mb), so you’d think SQLite would be perfect but Postgres really was just a lot simpler to spin up in a docker container and the performance difference in a 2G VPS is not noticeable. I’m sure the above issues were solvable but it was easier for me to just use Postgres and move on.

Comment by cyanmagenta 5 days ago

There is some risk that, if you design your website to use a local database (sqlite, or a traditional database over a unix socket on the same machine), then switching later to a networked database is harder. In other words, once you design a system to do 200 queries per page, you’d essentially have to redesign the whole thing to switch later.

It seems like it mostly comes down to how likely it is that the site will grow large enough to need a networked database. And people probably wildly overestimate this. HackerNews, for example, runs on a single computer.

Comment by andersmurphy 5 days ago

The thing is sqlite can scale further vertically than most network databases. In some context's like writes and interactive transactions it outright scales further. [1]

That's before you even get into sharding sqlite.

[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...

Comment by bastawhiz 4 days ago

Sqlite isn't the part that needs to scale in most cases, though. As soon as you need multiple servers to handle the traffic you're getting (serializing data, concatenating strings for HTML, lots of network throughout, or even just handling amounts of data that press you up against your memory limit), you're probably not going to have a great time with sqlite. Having multiple boxes talk to the same sqlite file is not something I've ever seen anyone do well at scale.

Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.

Comment by andersmurphy 4 days ago

I'd argue the opposite vertical scaling us a whole lot more effective than horizontal scaling if your using a language that has both real threads and green/virtual threads (go or anything on the JVM). You get such insane bang for your buck these days even over provisioning is cheap. Hell direct NVME can easily give 10-100x vs the crappy network drives AWS provides.

Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.

Really, there are a handful of cases where you need multiple servers:

- You're network limited (basically you're a CDN).

- You are drive limited you need to get data off dirves faster than their bandwidth.

- Some legal requirement.

This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.

Comment by graemep 4 days ago

Is Postgres with "no network" running over a unix socket or an IP socket on the same machine?

Comment by andersmurphy 4 days ago

Yes unix socket using the java 16 socket channels. Interestingly there was only a 5-10% improvement vs IP sockerts (with no ssl).

Comment by luckylion 5 days ago

The same is true for regular databases though, isn't it?

Network adds latency and while it might be fine to run 500 queries with the database being on the same machine, adding 1-5ms per query makes it feel not okay.

Comment by magicalhippo 5 days ago

> adding 1-5ms per query makes it feel not okay

Or going from ~1ms over a local wired network to ~10ms over a wireless network.

Had a customer performance complaint that boiled down to that, something that should take minutes took hours. Could not reproduce it internally.

After a lot of back abd forth I asked if the user machine was wired. Nope, wireless laptop. Got them to plug in like their colleagues and it was fast again.

Comment by fwip 5 days ago

If you have the ability to batch that communication, you could probably get those minutes down to seconds.

Comment by magicalhippo 5 days ago

Yeah, sadly most of the business logic was written when people had like a few items to process, not tens of thousands, so batching wasn't a concern.

We're slowly rewriting the application, batching in the core logic will absolutely be high up on the board.

Comment by fwip 4 days ago

Awesome, I hope the rewrite goes well for you all. :)

Comment by cyanmagenta 5 days ago

Yes, that is why I said “local database (sqlite, or a traditional database over a unix socket on the same machine).”

This isn’t an sqlite-specific point, although sqlite often runs faster on a single machine because local sockets have some overhead.

Comment by itopaloglu83 5 days ago

Most of us, majority of the time, don’t need that level of optimization, because not every project is destined to grow 10x quickly.

LLM also has this tendency of premature optimization where they start to write very complex classes for users who only want to extract some information just to resolve a quick problem.

Comment by 63stack 5 days ago

I don't see how anyone would design a system that executes 200 queries per page. I understand having a system that is ín use for many many years and accumulates a lot of legacy code eventually ends up there, but designing? Never. That's not design, that's doing a bad job at design.

Comment by 9rx 5 days ago

> I don't see how anyone would design a system that executes 200 queries per page.

They call it the n+1 problem. 200 queries is the theoretically correct approach, but due to high network latency of networked DMBSes you have to hack around it. But if the overhead is low, like when using SQLite, then you would not introduce hacks in the first place.

The parent is saying that if you correctly design your application, but then move to system that requires hacks to deal with its real-world shortcomings, that you won't be prepared. Although I think that's a major overstatement. If you have correctly designed the rest of your application too, introducing the necessary hacks into a couple of isolated places is really not a big deal at all.

Comment by PaulHoule 5 days ago

I'd point to the difference between vector-based vs scalar-based systems in numerics. If your web programming language is more like MATLAB or APL than PHP than maybe it can naturally generate the code to do it all with sets. As it is we are usually writing set-based implementations in scalar-based languages.

Part of the "object-relational mapping" problem has always been that SQL is superior to conventional programming languages in many ways.

Comment by 9rx 5 days ago

Of course, the "object-relational mapping" problem is simply that of latency. In the theoretical world where latency isn't a thing, there is no such thing as the "object-relational mapping" problem. In the real world where you have something like SQLite, it isn't a practical problem either.

SQL was originally designed to run on the same machine as the user, so it was never envisioned as a problem. It wasn't until Oracle decided to slap networking protocols on top of an SQL engine did it become one. Unfortunately, they should have exposed a language more conducive to the limitations of the network, performing the mapping in the same place as the database. But, such is the life of commercial computing.

Oracle has that now, it was just several decades too late, and by that time everyone else had copied their bad ideas.

Comment by ctxc 5 days ago

Sounds a bit like me, reading the comments before the article!

Comment by anamexis 5 days ago

Did you read the OP?

Comment by Kinrany 5 days ago

There's also the alternative of having a cluster with one local DB in each node

Comment by direwolf20 5 days ago

Then you have massive synchronization problems if your data isn't almost read–only.

Comment by Gabrys1 5 days ago

if your data isn't mostly read-only, then you're going to have an issue with SQLite. It doesn't nicely support parallel writers

Comment by CuriouslyC 5 days ago

Not if you're sharding correctly.

Comment by bastawhiz 4 days ago

You're going to shard your data so requests end up on the server that has the local database that has the right data? Really? Are you going to write a custom load balancer to do this, just so you can use sqlite sharded onto each box?

And what happens if you add a new server or scale down? You need to re-shard your data?

Comment by rustybolt 5 days ago

This feels like a very elaborate way of saying that doing O(N) work is not a problem, but doing O(N) network calls is.

Comment by password4321 5 days ago

As another example, a SQL Server optimization per https://learn.microsoft.com/en-us/sql/t-sql/statements/set-n...:

> For stored procedures that contain several statements that don't return much actual data, or for procedures that contain Transact-SQL loops, setting SET NOCOUNT to ON can provide a significant performance boost, because network traffic is greatly reduced.

Comment by Neywiny 5 days ago

Rather I think their point is that since O(N) is really X * N, it's not the N that gets you, it's the X.

Comment by direwolf20 5 days ago

Right — the network database is also doing O(N) work to return O(N) results from one query but the multiplier is much lower because it doesn't include a network RTT.

Comment by ahartmetz 5 days ago

...and the difference between "a fancy hash table" (in-process SQLite) and doing a network roundtrip is a few orders of magnitude.

Comment by zffr 5 days ago

IMO the page is concise and well written. I wouldn’t call it very elaborate.

Maybe the page could have been shorter, but not my much.

Comment by sodapopcan 5 days ago

It's inline with what I perceive as the more informal tone of the sqlite documentation in general. It's slightly wordier but fun to read, and feels like the people who wrote it had a good time doing so.

Comment by jstummbillig 5 days ago

It being so obvious, why is sqlite not the de facto standard?

Comment by chuckadams 5 days ago

No network, no write concurrency, no types to speak of... Where those things aren't needed, sqlite is the de facto standard. It's everywhere.

Comment by mickeyp 5 days ago

Perfect summary. I'll add: insane defaults that'll catch you unaware if you're not careful! Like foreign keys being opt-in; sure, it'll create 'em, but it won't enforce them by default!

Comment by ogogmad 5 days ago

Is it possible to fix some of these limitations by building DBMSes on top of SQLite, which might fix the sloppiness around types and foreign keys?

Comment by Polizeiposaune 5 days ago

Using the API with discipline goes a long way.

Always send "pragma foreign_keys=on" first thing after opening the db.

Some of the types sloppiness can be worked around by declaring tables to be STRICT. You can also add CHECK constraints that a column value is consistent with the underlying representation of the type -- for instance, if you're storing ip addresses in a column of type BLOB, you can add a CHECK that the blob is either 4 or 16 bytes.

Comment by BenjiWiebe 5 days ago

SQLite did add 'STRICT' tables for type enforcement.

Still doesn't have a huge variety of types though.

Comment by mikeocool 5 days ago

The fact that they didn’t make STRICT default is really a shame.

I understand maintaining backwards compatibility, but the non-strict behavior is just so insane I have a hard time imagine it doesn’t bite most developers who use SQLite at some point.

Comment by sgbeal 4 days ago

> The fact that they didn’t make STRICT default is really a shame.

SQLite makes strong backwards-compatibility guarantees. How many apps would be broken if an Android update suddenly defaulted its internal copy of SQLite to STRICT? Or if it decided to turn on foreign keys by default?

Those are rhetorical questions. Any non-0 percentage of affected applications adds up to a big number for software with SQLite's footprint.

Software pulling the proverbial rug out from under downstream developers by making incompatible changes is one of the unfortunate evils of software development, but the SQLite project makes every effort to ensure that SQLite doesn't do any rug-tugging.

Comment by sethops1 5 days ago

Nearly every default setting in sqlite is "wrong" from the outset, for typical use cases. I'm surprised packages that offer a sane configuration out of the box aren't more popular.

Comment by andersmurphy 4 days ago

I mean it has blob types. Which basically means you can implement any type you want. You can also trivially implement custom application functions to work on these blob types in your queries. [1]

- [1] https://sqlite.org/appfunc.html

Comment by jerf 5 days ago

Isn't SQLite a de facto standard? Seems like it to me. If I want an embedded SQL engine, it is the "nobody got fired for selecting" choice. A competitor needs to offer something very compelling to unseat it.

Comment by jstummbillig 5 days ago

I mean as in: Most web stacks do not default to sqlite over MySQL or postgres. Why not? Best default for most users, apparently.

Comment by conradkay 5 days ago

I think in the past it was more obvious. Rails switched to SQLite as the default somewhat recently

Comment by jstummbillig 5 days ago

Yeah, that's the one prominent example but, like you said, also just rather recently. Since "the network is slow, duh" has always been true, I wonder why.

Comment by conradkay 5 days ago

My guess would be that performance improvements (mostly hardware from Moore's law and the proliferation of SSDs, but also SQLite itself) have led to far fewer websites needing to run on more than 1 computer, and most are fine on a $5/month VPS

And stuff like https://litestream.io/ or SQLite adding STRICT mode

Comment by Kerrick 5 days ago

It took a lot: https://fractaledmind.com/2023/12/23/rubyconftw/ and https://news.ycombinator.com/item?id=39835496

Comment by skrebbel 5 days ago

I haven't investigated this so I might be behind the times, but last I checked remotely managing an SQLite database, or having some sort of dashboarding tool run management reporting queries and the likes, or make a Retool app for it, was very messy. The benefit of not being networked becomes a downside.

Maybe this has been solved though? Anybody here running a serious backend-heavy app with SQLite in production and can share? How do you remotely edit data, do analytics queries etc on production data?

Comment by Sammi 5 days ago

My best answer so far is ssh and sqlite3 cli.

Comment by Cthulhu_ 5 days ago

It is for use cases like local application storage, but it doesn't do well in (or isn't designed for) concurrent use cases like any networked services. SQLite is not like the other databases.

Comment by Kerrick 5 days ago

It's becoming so! Rails devs are starting to ship SQLite to production. It's not just for their main database either... it's replacing Redis for them, too.

Comment by dahart 5 days ago

Partly for the same reason it’s fast for small sites. In their words: “SQLite is not client/server”

Comment by charcircuit 5 days ago

>For a 50-entry timeline, the latency is usually less than 25 milliseconds. Profiling shows that few of those milliseconds were spent inside the database engine.

And instead were spent blocking on the disk for all of the extra queries that were made? Or is it trying to say that the concatenation a handful of strings takes 22 ms. Considering how much games can render with a 16 ms budget I don't see where that time is going rendering html.

Comment by simonw 5 days ago

Yes, it's saying that the string concatenation and other outside-of-SQL business logic took 22ms, running in their custom TH1 scripting language. In 2016.

Update: Actually it looks like I was wrong about TH1: https://fossil-scm.org/home/doc/tip/www/th1.md

The timeline appears to be constructed by C code instead: https://www.fossil-scm.org/home/file?name=src/timeline.c&ci=...

Update 2: Here's the timeline code from September 2016: https://www.fossil-scm.org/home/file?name=src/timeline.c&ci=...

Back then it had some kind of special syntax for outputting HTML:

    sqlite3_snprintf(sizeof(zNm),zNm,"b%d",i);
    zBr = P(zNm);
    if( zBr && zBr[0] ){
      @ <p style='border:1px solid;background-color:%s(hash_color(zBr));'>
      @ %h(zBr) - %s(hash_color(zBr)) -
      @ Omnes nos quasi oves erravimus unusquisque in viam
      @ suam declinavit.</p>
      cnt++;
    }
  }

That @ syntax is used in modern day Fossil too. Maybe that adds some extra overhead?

Comment by sgbeal 4 days ago

> That @ syntax is used in modern day Fossil too. Maybe that adds some extra overhead?

(Long-time Fossil dev here.)

The @ syntax is pre-processed, transformed to printf()-like calls, the destination of which depends on whether fossil is currently running (to simplify only slightly) from the CLI or as a CGI/server process.

That is: @ itself has no runtime costs, but does transform into calls which do have runtime costs. (printf() and its ilk aren't cheap!)

Comment by hnlmorg 5 days ago

This might be true for SELECTs, but I found INSERTs are massively slower when compared to grouping in transactions.

Which should be obvious. But I could see some reading this blog post and jumping to the wrong conclusion.

Comment by PaulHoule 5 days ago

It's not the cost of protecting one transaction from another transaction so much as the cost of flushing a transaction to storage to survive a crash.

In the bad old days you had to wait for a lever to move and for the disk to rotate at least once!

Comment by hnlmorg 5 days ago

> It's not the cost of protecting one transaction from another transaction

I know it’s not and never suggested it was.

I was making the point that writes contain more overhead than reads (which should be obvious) so people should bear that in mind when reading this blog post.

Edit: is it “bear” or “bare”? I’m never sure with that phrase haha

Comment by wussboy 5 days ago

It is “bear”, meaning to carry.

Comment by hahahahhaah 5 days ago

One index scan beats 200 index lookups though surely?

I.e. sometimes one query is cheaper. It is not network anymore.

Also you can run your "big" DB like postgres on the same machine too. No law against that.

Comment by wenc 5 days ago

For analytic queries, yes, a single SQL query often beats many small ones. The query optimizer is allowed to see more opportunities to optimize and avoid unnecessary work.

Most SQLite queries however, are not analytic queries. They're more like record retrievals.

So hitting a SQLite table with 200 "queries" is similar hitting a webserver with 200 "GET" commands.

In terms of ergonomics, SQLite feels more like a application file-format with a SQL interface. (though it is an embedded relational database)

https://www.sqlite.org/appfileformat.html

Comment by sgbeal 4 days ago

> The query optimizer is allowed to see more opportunities to optimize and avoid unnecessary work.

Let's also not forget that db servers can have a memory, in that they can tweak query optimization based on previous queries or scans or whatever state is relevant. SQLite has no memory, in that sense. All query optimizations it makes are based solely upon the single query being processed.

Comment by dahart 5 days ago

Depends. Throughput is probably higher, but the latency of a big scan might be larger than a small one, so many small lookups might feel more responsive if they’re each rendered independently. The example on the page doesn’t look like it can be merged into a single scan. I’m not a SQL expert but at a glance it does look like it could maybe be compressed into one or two dozen larger lookups.

Comment by Kinrany 5 days ago

One query isn't cheaper than two queries that do the same amount of IO and processing and operate in the same memory space

Comment by hahahahhaah 5 days ago

How is it the same IO?

Each query needs to navigate the index then read. The two queries do that twice.

Is it faster to read pages 30-50 of a book by:

a) Go to page 30, read until 50

b) Go to page 30, read that page, close book, open book, go to page 31 and so on.

Each page open you get to binary search to find the page.

Comment by Kinrany 1 day ago

It needs to read the index twice, sure, but that's also likely to be cached? Guessing though.

Comment by silon42 5 days ago

Yes, (index) scans are rarely faster typical web apps.

Unless you have toy amounts data... or doing batch operations which is not typical (and can be problematic for other transactions due to locking, etc...)

Comment by hahahahhaah 5 days ago

I admit it is rare. It is more likely if the app has search and DB has been optimised to bring the needed retrevied data onto the index. But it isn't like I haven't reached for a clustered index a few times.

Comment by maxpert 5 days ago

A lot of skepticism in comments. Let me remind them doing N loops over local disk with in memory cached pages is absolutely different compared to doing RT over typical VPS network. Having said that there is no silver bullet for dumb code! So let's not conflate the argument the author is trying to make.

Comment by ai-christianson 5 days ago

Probably even more so if it can fully fit into CPU cache.

Comment by yomismoaqui 5 days ago

Also SQLite is 35% faster than the filesystem:

https://sqlite.org/fasterthanfs.html

Comment by lifetimerubyist 5 days ago

Definitely was something surprising that I discovered when building with Sqlite recently. We're tought to avoid N+1 queries at almost any cost in RDBMs but in Sqlite, the N+1 can actually be the best option in most cases.

I had to build some back-office tools and used Ruby on Rails with SQLITE and didn't bother with doing "efficient" joins or anything. Just index the foreign keys, do N+1s everywhere - you'll be fine. The app is incredibly easy to maintain and add features because of this and the db is super easy to backup - literally just scp the sqlite db file somewhere else. Couldn't be happier with this setup.

Comment by beagle3 5 days ago

scp works as long as the app is not making changes at the same time.

If there's a chance someone is writing to the database during the copy, you should "sqlite3 database.sqlite .backup" (or ".dump") first; Or, alternatively, on a new enough sqlite3, you have a builtin sqlite3_rsync that is like rsync except it interacts with the sqlite3 updates to guarantee a good copy at the other end.

Comment by lifetimerubyist 5 days ago

Great tips and you’re right.

We just flip into an app-side maintenance mode before we run the backup so we know there’s no writes, scp the file and then flip it back. We only do nightlies so it’s not a problem. The shell script is super simple and we’ve only needed to do nightly backups so far so we run it in a cron at midnight when no one is working. Ezpz. Literally took us an hour to implement and been chugging along without issues for nearly 2 years now without fail.

If we ever need more than that I’d probably just setup litestream replication.

Comment by gcbirzan 5 days ago

We shouldn't compare doing stupid stuff on sqlite vs doing stupid stuff on postgresql, we should compare doing stupid stuff on sqlite vs not doing stupid stuff on sqlite.

How much faster are the better queries?

Comment by nchmy 5 days ago

The article doesnt make it at all clear what it is comparing to - mysql running remotely or on the same server? I'm sure sqlite still has less "latency" than mysql on localhost or unix socket, but surely not meaningfully so. So, is SQLite really just that much faster at any SELECT query, or are they just comparing apples and oranges?

Or am i mistaken in thinking that communicating to mysql on localhost is comparable latency to sqlite?

Comment by Cthulhu_ 5 days ago

Even if you're on the same local server, you're still going over a socket to a different service, whereas with sqlite you remain in the same application / address space / insert words I don't fully understand here. So while client/server SQL servers are faster locally than on a remote server, they can (theoretically) never be as fast as SQLite in the same process.

Of course, SQLite and client/server database servers have different use cases, so it is kind of an apples and oranges comparison.

Comment by nchmy 4 days ago

Yes, I already said all of this. But you haven't clarified what the article is saying

Comment by Neywiny 5 days ago

I think they're trying to not shame other services, but yes the comparison is vs networked whether that's local on loopback or not. For a small query, which is what they're talking about, it's not inconceivable that formatting into a network packet, passing through the userspace networking functions, into and through kernel, all back out the other side, then again for the response, is indeed meaningfully slower than a simple function call within the program.

Comment by nchmy 4 days ago

Yes, I explicitly said localhost is slower. But the article doesn't say what it's comparing to nor provide any measurements.

Comment by wild_egg 5 days ago

Connecting to localhost still involves the network stack and a fair bit of overhead.

SQLite is embedded in your program's address space. You call its functions directly like any other function. Depending on your language, there is probably some FFI overhead but it's a lot less than than an external localhost connection

Comment by zffr 5 days ago

I think the most common set up is to have your application server and DB on different hosts. That way you can scale each independently.

Comment by nchmy 4 days ago

Quite likely! But the article doesn't say this

Comment by Sesse__ 5 days ago

Well, it depends. I vividly remember removing 200 small SQLite queries from a routing algorithm in a mobile app (by moving the metadata into a small in-memory data store instead) and roughly doubling its speed. :-) It was a pretty easy call after seeing sqlite3_step being the top CPU user by a large margin.

Comment by ilumanty 5 days ago

Yeah, en/decoding results and parameters from and to JS types is also quite the timewaster

Comment by Sesse__ 5 days ago

This was from C++, the encoding wasn't really a factor.

Comment by polyrand 5 days ago

Don't forget that if you're using SQLite on something like EBS, multiple queries may not be efficient.

I'm saying this as a huge SQLite fan, but also beware of what kind of storage you're using in your instance.

Comment by andersmurphy 4 days ago

Yeah, you really want directly connected NVME drives to your machine/VPS. It can make orders of magnitude difference.

Comment by philipodonnell 5 days ago

I’ve been experimenting with LiveStoreJS which uses a custom SQLite WASM binary for event sync, so for simplicity I’ve also used it for regular application data in browser and found no issues (yet). It surprised me that using a full database engine in memory could perform well vs native JS objects at scale but perhaps at scale is when it starts to shine. Just be wary of size limits beyond 16-20mb.

Comment by sgbeal 4 days ago

> I’ve been experimenting with LiveStoreJS which uses a custom SQLite WASM binary for event sync

i'm not sure whether this might be helpful to you, but 3.52 will include a revamped "kvvfs" which (A) also works (non-persistently) in Worker threads and (B) supports callbacks to asynchronously send all db page writes to the client.

<https://sqlite.org/wasm/doc/trunk/kvvfs.md>

Comment by delbronski 5 days ago

How does one go about deployment and backups with a local db? Like let’s say I have a web app hosted on a cloud service like App Engine or Elastic… if I redeploy my web app how do I make sure my current local db does not get get wiped? How are periodic backups handled?

I can think of many hacks to do this, but is there a best practice for this kind of stuff? I’m curious how people do this.

Comment by dtkav 5 days ago

sqlite+litestream [1] is fantastic, i highly recommend it.

I use it with pocketbase and it is a delightful and very productive setup.

This guide [2] is for an older version of pocketbase and litestream, but i can update it if would be helpful/interesting for anyone.

[1] https://github.com/benbjohnson/litestream/

[2] https://notes.danielgk.com/Pocketbase/Pocketbase+on+Fly.io

Comment by dansult 5 days ago

You should update it, it benefits many users.

Comment by delbronski 5 days ago

Thanks! I’ll look into this.

Comment by NorwegianDude 5 days ago

I do t have time to test myself now, but it would be interesting to see a proper benchmark. We all know it's not suitable for high write concurrency, but SQLite should be a very good amount faster for reads because of the lack of overhead. But how much faster is it really?

Comment by adzm 5 days ago

as an in memory database, I got around 40,000,000 reads per second. Using WAL and a file rather than in memory, around 900,000 reads per second. This is single threaded, for a simple integer ID to string Value query, and a few years old at this point, and only minor config optimizations eg not even using memory mapped io and a ~3gb database with a million or so rows on a Windows machine. The performance really is amazing.

Comment by NorwegianDude 4 days ago

40 million reads per second, on a single core? 40 million reads/s is 25 ns per read, that is faster than any RAM I know of.

Comment by yencabulator 4 days ago

It's not like every read would make a separate trip all the way to RAM, caches are a thing and SIMD pipelines/parallelizes comparisons within a hash bucket quite well. Lookups from a hash map should amortize to something like 5-20ns per lookup these days. Abseil's Swiss Tables for C++ and Rust's Hashbrown both should reach that.

Comment by NorwegianDude 3 days ago

If you're looking up values from a 3 GB DB, most would have to hit RAM. Lookups form a hash map can be fast, but SQLite does quite a bit more than just a hash map lookup, and it would usually hit RAM, not L3 cache.

Comment by yencabulator 3 days ago

Parent comment said "with a million or so rows". I looked up numbers for benchmarks with ~1M entries in the hashmap.

1M 64-bit integers is only 8MB, that's still a small keyspace.

Comment by adzm 3 days ago

Perhaps relevant is that probably only 25% of the IDs that get passed to the select actually have values and this is using one thread for the benchmark. It's too convoluted to share, but indeed when using the in memory database on a lower spec laptop currently I still get up to 20-30m reads per second, pretty close to the 40m on the big box.

Comment by jacobobryant 5 days ago

In some informal benchmarks I wrote using queries + data from a web app I develop, sqlite queries were about 5x faster than postgres.

Comment by solumunus 5 days ago

Orders of magnitude I would imagine. Very significantly faster.

Comment by meken 5 days ago

Side note - is this post accessible from the site somewhere? I don’t see where you’d find it (along with the C is Best post [1] shared here recently).

[1] https://sqlite.org/whyc.html

Comment by nefarious_ends 5 days ago

https://www.sqlite.org/doclist.html

Comment by meken 5 days ago

Great, thanks!

Comment by causalscience 5 days ago

Make sure you click this link https://sqlite.org/src/timeline

So the sqlite developers use their on versioning system which uses sqlite for storage. Funny.

Comment by kmeisthax 5 days ago

Yes. Git is the same way: it uses the Linux kernel for storage, and the Linux kernel is managed with Git. :P

Comment by 5 days ago

Comment by phendrenad2 4 days ago

If your database is running locally on your HTTP server, shouldn't it all be the same? SQLite, Postgres, MySQL, flat files, whatever.

Comment by flipped 5 days ago

Has anyone tried using distributed versions of sqlite, such as rqlite? How reliable is it?

Comment by otoolep 5 days ago

rqlite creator here, happy to answer any questions.

As for reliability - it's a fault-tolerant, highly available system. Reliability is the reason it exists. :-) If you're asking about quality and test coverage, you might like to check out these resources:

- https://rqlite.io/docs/design/

- https://rqlite.io/docs/design/#blog-posts

- https://philipotoole.com/how-is-rqlite-tested/

Comment by pmbanugo 5 days ago

quite interesting. So SQL patterns can be optimised differently in SQLite