PgDog is funded and coming to a database near you
Posted by levkk 4 hours ago
Comments
Comment by eikenberry 51 minutes ago
I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.
Comment by tempest_ 49 seconds ago
Comment by levkk 44 minutes ago
Load balancer with health checks and failover, works out of the box. :) Battle-tested at this point too, so could be worth a look.
Comment by eikenberry 36 minutes ago
Comment by doctorpangloss 30 minutes ago
Comment by dev-ns8 10 minutes ago
Comment by gchamonlive 11 minutes ago
Comment by globular-toast 45 minutes ago
Comment by codegeek 1 hour ago
Couldn't be a better why us :)
Comment by paoliniluis 57 minutes ago
Comment by yabones 3 hours ago
Comment by tempest_ 2 hours ago
This is for DBs that are ~1-1.5TB but doesnt have a huge amount of churn/qps
Effectively what is described here https://www.pgedge.com/blog/always-online-or-bust-zero-downt...
Comment by tux3 2 hours ago
If you use something like CloudNativePG they automate parts of the process with cli tools and declarative syntax. Otherwise you take the time to figure it out by hand. It might sound complicated, but just practice on your staging DB, and if all goes well you do the same procedure in prod.
Edit: Apparently Postgres 19 has a patch for one-shot logical replication of sequences! https://www.depesz.com/2025/11/11/waiting-for-postgresql-19-...
Comment by paulryanrogers 1 hour ago
Comment by znpy 2 hours ago
At this point i wonder if i'll ever see that.
Comment by jjice 2 hours ago
Comment by hylaride 51 minutes ago
Multi-master is hard. The main issue is what to do with commit/replication lag. It's far "easier" if support for eventual consistency is ok with your use case. In some cases it's not. Also, the problems related to read-only lag can happen on multi-master instances. If somebody does a giant long running query on one of the masters, the target instance needs to hold the data state for the query, even if the underlying DB is getting updates. It also needs to still keep up with other masters. This means the whole cluster can slow down if the multi-master replication is synchronous. Depending on a variety of factors, that can chew up disk space, memory, etc.
There are ways of dealing with these issues (and others), but it comes with tradeoffs with performance, etc.
Comment by timacles 1 hour ago
Multi master, from even a conceptual perspective, is incredibly complicated. Databases, transactions, consistency, parallelism are all very complicated.
It’s something that always seems promising at the start but as soon as maintenance and long term improvements enter the picture(ie integrating new Postgres versions), the complexity becomes too much.
Comment by tschellenbach 2 hours ago
Comment by briffle 1 hour ago
Comment by boxed 3 hours ago
Comment by jeltz 2 hours ago
For both MySQL and PostgreSQL you will need to use some kind of logical upgrades if you want no downtime.
Comment by boxed 1 hour ago
Comment by tomnipotent 2 hours ago
Comment by jeltz 2 hours ago
Comment by Blackthorn 2 hours ago
Comment by jeltz 2 hours ago
Comment by redmonduser 6 minutes ago
Comment by aejm 39 minutes ago
Comment by levkk 33 minutes ago
1. Control plane to manage multi-node deployments; "works out of the box" experience to make PgDog easy to deploy and use
2. QoS (quality of service): automatically block bad queries from taking down the database
Last but not least, you get SLA-backed support from us (up to P0).
New features are broken down into two categories:
1. Sharding / running Postgres at scale: always open source.
2. Infra management / making it easy to run PgDog at scale: enterprise.
Comment by aejm 29 minutes ago
Comment by chrisvenum 3 hours ago
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
Comment by levkk 1 hour ago
That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.
Comment by tschellenbach 2 hours ago
Comment by paulryanrogers 1 hour ago
Comment by welder 2 hours ago
Comment by Ozzie_osman 3 hours ago
We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than thatComment by singron 1 hour ago
Comment by rbranson 2 hours ago
Comment by GiorgioG 2 hours ago
Comment by mplanchard 2 hours ago
Comment by returningfory2 2 hours ago
Comment by jeltz 2 hours ago
Comment by happyopossum 2 hours ago
Comment by tingletech 2 hours ago
Comment by jeremyjh 1 hour ago
This is years of product development with a three person team. If Enterprise sales and support are a big part of your business plan it will suck up a lot more than that.
Comment by pphysch 56 minutes ago
Comment by welder 1 hour ago
1. pool exhaustion from idle connections inside open long-running transactions
2. SQLAlchemy's client-side pool using dead connections that PgBouncer had already killed, causing periodic request errors
3. Some tasks have to bypass PgBouncer when they use SET or prepared statements
I've already sharded large datasets at the application layer, but looks like PgDog solves the above problems for any future work?
Comment by tempest_ 1 hour ago
I had to disable application pooling as it was causing read only transactions I could couldnt pin down the cause.
Comment by kjuulh 3 hours ago
Also had an issue with it because it cached authentication requests when doing passthrough it seems, I'd changed the roles password, but it kept using the old one, which was no bueno ;).
PgDog seems to make more sense when you really care about a few databases that need massive scale, rather than a simple proxy in front of postgres. I'll keep following the development though, it is much needed in this space, postgres can use all the investment it can get to get it past the single machine scale that it excels at currently.
Comment by levkk 1 hour ago
We'll get there.
Comment by apt-get 1 hour ago
Comment by maherbeg 3 hours ago
You could also build a watcher side car that watches for changes of the pgdog_users.toml and have pgdog refresh itself then too with this combination. We thought about that but prefer to control the reloads for our needs.
Comment by valorzard 1 hour ago
My question is, has any of them been talked about being upstreamed to postgres itself? Or, adding a custom built in feature to postgres itself?
Comment by levkk 1 hour ago
Comment by mamcx 1 hour ago
This kind of tool will help in this case?
Comment by levkk 1 hour ago
Comment by drchaim 3 hours ago
If you’re already sharding by tenant for other reasons, OK… But I see CDC to a true OLAP system as more scalable.
PostgreSQL still needs real columnar tables in the core, hopefully one day
Comment by levkk 2 hours ago
SELECT tenant_id, COUNT(clicks)
FROM users
GROUP BY tenant_id
ORDER BY 2 DESC
LIMIT 25;
Performance is a side effect - definitely needed and we'll do everything we can, but we are not competing with ClickHouse or Snowflake - just trying to make sharded Postgres work with your app.Comment by christoff12 1 hour ago
Comment by htrp 3 hours ago
Still trying to figure out how this works technically, is the performance gain really just re-write in rust?
Comment by levkk 3 hours ago
Edit:
Performance gains are from having the ability to load balance reads (horizontal scaling for read queries) and scale out writes (with sharding). Once instance bottleneck in Postgres has many faces:
1. Behind schedule vacuums because of too many dead tuples (too many writes)
2. The WALWriter is single-threaded and IO-bound - Postgres can only do about 200-300MB/sec in writes per instance (real prod numbers on EC2 with NVMes and ZFS, basically best case scenario).
3. Bulkheading: single primary is a single point of failure. With 12 primaries, if one fails, 91% of your customers don't notice.
The list goes on. Rust is just a side effect. We love it because it's fast and correct - the perfect match for a database product.
Comment by hylaride 2 hours ago
Comment by levkk 2 hours ago
Comment by VeninVidiaVicii 3 hours ago
Comment by levkk 3 hours ago
Comment by Wonnk13 1 hour ago
Comment by maherbeg 3 hours ago
Comment by mnbbrown 2 hours ago
Comment by ParadisoShlee 3 hours ago
Comment by jeremyjh 3 hours ago
Comment by levkk 3 hours ago
The same old processes vs. threads debate, plus having the ability to scale the coordinator past a single machine. So, if you're OLTP, definitely consider PgDog. OLAP - Citus still wins because of its advanced query engine. We'll get there.
Comment by simonw 3 hours ago
Is there a binary I can run directly?
Comment by e12e 3 hours ago
Then again, sharding on a single host probably isn't very useful anyway - but it might work with docker in swarm mode?
Comment by levkk 2 hours ago
Comment by levkk 3 hours ago
Comment by simonw 3 hours ago
Comment by frogbydjsd 3 hours ago
Comment by fulafel 3 hours ago
Comment by levkk 3 hours ago
Comment by danielheath 3 hours ago
You _could_ make that ACID, but it's not going to be faster than a single machine.
Comment by bourbonproof 3 hours ago
Comment by saghm 3 hours ago
Expanding on that a bit, mongo drivers even have a shared specification of the state machine for monitoring topology changes[1] and algorithm for selecting the server to send an operation to[2] (along with various declarative test cases that the drivers use to validate them alongside the specs in the repo). I think people sometimes underestimate how important the client-side work is to this sort of experience; for all of the faults mongo has had over the years, the amount of investment that they put into the client libraries is something I've never seen anywhere else (although having spent several years working on some of these libraries, my take is likely very biased).
[1]: https://github.com/mongodb/specifications/blob/master/source... [2]: https://github.com/mongodb/specifications/blob/master/source...
Comment by sandeepkd 1 hour ago
Surfacing where and how PG is better than Dynamo or any other database is probably a good starting point instead of calling out PG a silver bullet for everything. At the end of the day its all a trade-off.
Comment by levkk 1 hour ago
Comment by melon_tsui 3 hours ago
Comment by levkk 3 hours ago
Comment by parthdesai 2 hours ago
Comment by levkk 2 hours ago
1. Let it crash. Increase the RAM, try again.
2. Page to disk (swap), make it slow but ultimately work.
Both have their trade-offs. There is no free lunch here.
Comment by Pet_Ant 3 hours ago
Comment by levkk 3 hours ago
Comment by faangguyindia 3 hours ago
Comment by rswail 2 hours ago
This solves the thousands of clients case for read in a way that is transparent to the clients.
Yes it's required at large scale, especially if you want to distribute reads or shard to a particular geographical area.
Comment by orliesaurus 1 hour ago
Comment by christoff12 1 hour ago
Comment by xenophonf 1 hour ago
https://github.com/pgdogdev/pgdog/commit/36434f93f03dec1d7d4...
I want to have as much fun as the next developer, but that makes me worry, what with supply chain attacks in the news and all.
Comment by levkk 58 minutes ago
In all seriousness, we review every single line of code that goes in and only people who work for PgDog Inc are allowed to merge.
Comment by 999900000999 3 hours ago
Comment by codegeek 1 hour ago
Comment by pantulis 2 hours ago
Comment by rswail 2 hours ago
As long as they don't get undercut by the equivalent of AWS https://aws.amazon.com/rds/proxy/ which is a managed pgbouncer.
Comment by 999900000999 2 hours ago
You’d need a ton of faith in these 3 people.
Feels more like it would work better inside of a bigger organization.
The QA tester in me is kinda risk adverse.
Comment by skiwithuge 3 hours ago
Comment by moralestapia 3 hours ago
Wrt. the pooler, how do you compare with pgbouncer?
I'm interested because I have a postgres instance, low-traffic but still like ... tens of r(eads)ps. I was not running anything close to the machine limits but still added pgbouncer to improve performance and didn't see a noticeable difference. I was stress-testing the machine obv., I'm not talking about the 10 rps, lol.
For context, my numbers were something like 10k rps +/- 1k vanilla postgres and like 9k rps +/- 1k with pgbouncer in front of it. So ... slightly slower but big error bars so I wouldn't say for sure. I ended up not using pgbouncer as the benefit was immaterial.
Also yeah, in case you want to check it out, it's the db that backs this project: https://httpstate.com.
Comment by levkk 3 hours ago