pg_durable: Microsoft open sources in-database durable execution
Posted by coffeemug 4 days ago
Comments
Comment by levkk 4 days ago
As an ex-app engineer though, I kind of prefer my queue logic to be in code, in Git, but maybe with the right tooling, you can change my mind. :)
Comment by babhishek21 4 days ago
That said, we did hand-build a simple job queue (just lock, poll, reserve on a column, poll and update reservation to mark job done) on top of postgres at my previous startup. Something like pgque would have made that much more polished.
Comment by dietr1ch 4 days ago
What's the story for version control, debugging, testing, releasing? It'd be cool to have everything together for data locality and simplifying the stack, but it feels you'd lose a lot of useful knowledge about how to do stuff "properly".
Comment by gdecandia 4 days ago
https://github.com/microsoft/duroxide - also OSS, the durable execution framework pg_durable is built on itself supports function versions. We can leverage that to get similar support in pg_durable.
Comment by vault 4 days ago
why is Azure/durabletask disabled? ':)
Comment by nextaccountic 4 days ago
Comment by CuriouslyC 4 days ago
Comment by moomoo11 4 days ago
i have always had maintenance packages for this type of stuff. if i could deploy them alongside the database itself that could be kind of cool.
but yeah i agree with you that i do prefer having this in the code layer.
Comment by hmaxdml 3 days ago
Comment by giancarlostoro 4 days ago
I mean, we used to keep our SQL code in git too for projects where we had DB triggers. I think some were even shoved in there via Django migrations just to let someone setup locally and have the triggers available in their local database.
Comment by jrumbut 4 days ago
With some cleverness you could even introduce some testing that way. Not perfect but better than nothing.
Comment by SoftTalker 3 days ago
Comment by tmpz22 4 days ago
Comment by KaiShips 3 days ago
Comment by hanzeweiasa 3 days ago
Comment by gitscope_ai 4 days ago
Comment by jraedisch 4 days ago
Comment by snqb 4 days ago
but I might not know all the details, I'm genuinely curious
Comment by CuriouslyC 4 days ago
Comment by kilobaud 4 days ago
How is this project at all comparable to something like Temporal? Am I misunderstanding the limitation implied by this particular recommendation?
Comment by faxmeyourcode 4 days ago
It's an interesting technical achievement I guess, but it's very bizarre to try and read this
SELECT df.start(
@> (
($$SELECT ... FROM demo.invoices WHERE status = 'pending'$$ |=> 'inv')
~> df.if_rows('inv',
$$UPDATE ... SET status = 'processing'$$
~> (df.http(...) |=> 'resp')
~> df.if($$SELECT $r.ok$$,
-- classify, branch, wait for signal ...
),
df.sleep(5)
)
),
'invoice-approval-pipeline'
);Comment by gdecandia 4 days ago
Comment by faxmeyourcode 4 days ago
Not trying to dismiss the project - it looks like a lot of hard work has gone in and somebody has a use for it. I just come from an airflow style external orchestrator frame of mind that manages durability state in postgres but keeps the control flow out. Sorry if I came off as a bit snarky
Comment by Onavo 4 days ago
Comment by rswail 4 days ago
The steps are:
1. Get all the pending invoices
2. Set their state to "processing"
3. Call out to an external service/process to do the actual processing, wait for a response.
4. If the response is OK, do something
5. Wait 5 seconds and then start again.
Not sure I love the syntax and the way SQL is embedded between the $$
But it is in the database, can be updated and modified in the same way as all the other stored procedures/functions, allows job control, I assume other control structures for parallel steps etc.
Gonna go read the doco now.
Comment by franckpachot 4 days ago
Comment by miohtama 4 days ago
Comment by pokstad 3 days ago
Comment by junto 4 days ago
Good for local database only jobs though. Niche use cases.
Comment by dpark 4 days ago
Say what? Stored procedures are awesome when used correctly.
Versioning is straightforward. You stick any sort of monotonically increasing id at the end of the name. Whenever you need a breaking change, you bump the id. You also leave the old version with the old id, retiring it only after it’s no longer used. You do need a real story for DB upgrades for this to work well. If your story is that someone on the team executes some random SQL migration as root, you’re gonna have a bad time.
You can unit test stored procedures in exactly the same way you could test any other SQL. You have to spin up a DB to do it. But if you can’t test your stored procedures, you’re admitting you have no way to test your SQL which is your real problem.
> Business logic in the database, (hidden brain problem)
Ok? How much you shove into your stored procedures is up to you. In my experience the real alternative to stored procedures is not zero business logic in the DB. It’s SQL code sprinkled throughout the codebase, where it’s harder to test, poorly versioned, and poorly encapsulated. And also often needlessly slow.
> harder to isolate noisy workloads
Dunno what this means
> no observability
Maybe some truth here. It is more work to inspect issues in SQL than most programming languages.
> scaling pressure lands solely in Postgres, lack of IO, especially API calls.
If stored procedures are causing IO problems and scaling issues then you are using them wrong.
Stored procedures often drastically reduce IO when used correctly and thereby improve scalability.
Comment by mattdeboard 3 days ago
Comment by dpark 3 days ago
I get the distinct impression that many teams have very weak engineering rigor around their DBs, which leads to a lot of avoidable pain.
Comment by otterley 4 days ago
Comment by dpark 4 days ago
Pulling the data out of the DB to do computation in a higher layer cannot be more efficient in terms of any I/O unless your stored procedure is just poorly written. It might be a win if your DB is compute bound, though.
Comment by otterley 3 days ago
Comment by pjmlp 4 days ago
You can certainly unit test them, good databases have telemetry and metrics.
Version control is no different from using containers instead of VMs.
Any database change goes through CI/CD pipelines and regular devs cannot edit code directly on the DB.
In fact the biggest issue with databases is like debugging, some devs rather not learn how to use them properly.
In one they never go beyond printf, in the other, they only know what an ORM looks like, and the command line applications for basic SELECTs.
Comment by TuringNYC 4 days ago
For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...
Also for example, you dont get ultra-wide high dimensionality vectors.
It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?
Comment by tjgreen 4 days ago
I didn't quite follow your comment about vector support, are you asking for something beyond what pgvector + diskann provide (both available on Azure)?
Comment by philippemnoel 3 days ago
Fyi, we are in discussion with some hyperscalers on making this possible.
Comment by TuringNYC 3 days ago
You dont support ultra-wide vectors from the largest embeddings models. We have to wierd stuff like chop up vectors across fields.
Comment by 0xCMP 3 days ago
I had to switch off pgvecto.rs at some point and figured that out.
I don't have specific experience with the Azure environment here, but this probably applies if you have access to pgvector.
[0]: Types of indexes + number of bits supported at bottom of this section: https://github.com/pgvector/pgvector#hnsw
Comment by moron4hire 4 days ago
Comment by oofbey 4 days ago
Comment by FuriouslyAdrift 4 days ago
Comment by eddythompson80 4 days ago
Comment by antonkochubey 4 days ago
Comment by jiggawatts 4 days ago
Comment by abeomor 4 days ago
Hybrid search (BM25 + vector): Worth noting that ParadeDB's pg_search isn't an AWS-native feature either, you'd need to self-host it on EC2. On Azure PostgreSQL, we built pg_textsearch which provides the same BM25 ranking model (term frequency saturation, document-length normalization, IDF) natively. Fun fact, the main contributor of pg_textsearch is now on the Azure Postgres team :)
Docs: https://learn.microsoft.com/en-us/azure/horizondb/ai/full-te...
High-dimensional vectors: This is actually an area where we're ahead. pgvector with HNSW caps at 2,000 dimensions. We support pgvector for vector storage and search, and for high-dimensional / large-scale workloads we ship pg_diskann — Microsoft's graph-based vector index that supports up to 16,000 dimensions and also does advanced in-index filtering (your WHERE clauses get evaluated during graph traversal, so you don't lose recall on selective predicates).
pgvector: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...
DiskANN high-dimension support: https://learn.microsoft.com/en-us/azure/horizondb/ai/vector-...
These are available today on Azure PostgreSQL, specifically Azure HorizonDB (Preview). Happy to dig into specifics if you have a particular workload in mind.
Comment by jbonatakis 4 days ago
Maybe you meant to word this differently and I’m nitpicking, but didn’t TJ Green build this while he was still at Tiger Data?
Comment by abeomor 3 days ago
Comment by lxdlam 4 days ago
Comment by faxmeyourcode 4 days ago
Why would I want to store my control flow in the database and not in code? It feels strange.
Not trying to dismiss the project, I'm just not getting it yet I think.
Comment by daxfohl 4 days ago
This one seems to be more database-specific use case. The advantage is probably that you can track the exact state of the job in the database itself, rather than having to cross-reference the workflow log with the codebase and trace through it line by line to figure out what the state is. Plus I assume it's less overhead and latency, and operationally one less thing to spin up.
[1] https://learn.microsoft.com/en-us/azure/durable-task/common/...
Comment by affandar 4 days ago
Indeed Durable tasks is an exceptional project and was a unique innovation at the time.
pg_durable brings the same reliability and durablity semantics to long running operations within the database.
We have tons of interesting scenarios on the roadmap. Stay tuned! :)
Comment by alex_hirner 4 days ago
Comment by abeomor 4 days ago
ai.backfill() ignores that row-level state entirely and reprocesses everything from scratch. https://learn.microsoft.com/en-us/azure/horizondb/ai/ai-pipe...
pg_durable answers "did this workflow instance finish, and if it crashed, where do I resume?", completed/running/pending/failed per node + checkpoint replay. https://github.com/microsoft/pg_durable/blob/main/USER_GUIDE...
If you want this problem addressed better, please add an issue to the open-source repo, we would love to dig in. https://github.com/microsoft/pg_durable/issues
Comment by sgarland 4 days ago
Comment by booi 4 days ago
Comment by CharlieDigital 4 days ago
df.wait_for_schedule()
How does this call work? Is it idempotent if I call it from an application? If I run it 2x with the same parameters, does it double tick? Am I invoking this manually from a query console to only do this one time? Am I running this as part of a migration script?For this[0]:
-- Wait for human signal (5 minute timeout)
~> (df.wait_for_signal('approval', 300) |=> 'sig')
~> df.if(
$$SELECT NOT ($sig::jsonb->>'timed_out')::boolean
AND ($sig::jsonb->'data'->>'approved')::boolean$$,
Is the `timed_out` a fixed constant that is returned on timeout?Also not immediately clear: how to handle errors/exceptions?
[0] https://github.com/microsoft/pg_durable/blob/main/examples/i...
Comment by affandar 4 days ago
Within this durable function you are calling df.wait_for_signal(<signal_name>). This call is exactly once within this function instance. There are no duplicates possible. Your df.start() call might get duplicated if it times out and you re-run it, but in this case it would end up creating a different function instance.
Any 'unhandled' errors in executing SQL will fail the function instance. Its status would bubble up the exact error being raised.
Comment by oa335 4 days ago
Comment by rswail 4 days ago
Don't need to synchronize the backups with anything else that is part of the same data store, good for ETL pipelines and other state machine type jobs.
If your ETL is mostly SQL anyway, then having the actual job being run on the same server helps as well.
Comment by regularfry 4 days ago
Comment by gdecandia 4 days ago
Comment by guhidalg 4 days ago
Comment by oofbey 4 days ago
Comment by sgarland 4 days ago
The other one (also related) is normalization. They’ll have hundreds of millions of rows of duplicated, low-cardinality strings, because “joins are expensive,” while somehow missing the fact that the increased I/O from reduced page-packing also has a performance cost.
Comment by Kaliboy 4 days ago
And I don't think it's as simple as you make it. So where I work we use amongst other things Rails. There are places in our codebase where using joins just isn't feasable cause the database would use too much memory and let's just say N is large.
But since we use Rails we can have it query the tables apart and join the tables through the defined model. We literally save like 20 seconds in some cases because 1 HUGE query becomes 8 straightforward ones with maximum index usage.
And because we have this capability in Rails we would never use something like this, cause that would neccesate us holding two mental models and have a clear "what do we run where" directive which honestly is a PTA.
Comment by sgarland 2 days ago
That sounds like you’re either joining on an unindexed column, or have outdated statistics. 8 queries implies 8 tables, which is well under the limit for both Postgres and MySQL where the planner may give up and choose a suboptimal plan. You may have something like “SELECT * FROM foo JOIN bar ON foo.id = bar.foo_id WHERE bar.baz = 'qux'”, and if there’s no composite index on (bar.foo_id, bar.baz), the planner will choose whichever column it thinks is more selective; it then has to go get the value for the other one, and that can be quite expensive at scale. Even if you have a separate index on each of those, there’s no guarantee the planner will decide to merge them.
> Yeah but the increased I/O is cheaper. It's easier to add another webserver as opposed to upgrading your db server.
I’m referring to I/O on the DB. Rows are stored in pages that are generally 8 KiB (Postgres and MS SQL Server default) or 16 KiB (InnoDB default). If you can fit 200 rows per page, a given query will probably have to fetch fewer pages than if you can only fit 100 rows per page.
Comment by oa335 1 day ago
I'm not sure thats true for Postgres. optimal join ordering is np-hard, and finding an optimal join requires exhaustive search through n! combinations (n=number of joins) - thats why postgres generally uses heuristics to figure out join order. 8 is also the default value of "join_collapse_limit" setting in postgres, so it can't ever reliably optimize over 8 joins at a time. Additionally, postgres starts using "genetic algorithms" aka testing random combinations of joins with 12 joins by default (geqo_threshold setting).
I generally agree its better to use database to its fullest, but I would say 8 joins is probably the "limit". Internally at work I've advised teams to try to avoid anything more than 6 joins for "hot-path" queries.
Comment by setr 4 days ago
I don’t understand how splitting a query up would have any relationship to index utilization; the planner should trivially pick up on it?
Also are you sure you’re not solving a different problem[0]? Doing joins manually being faster doesn’t smell right, except in the case of data duplication increasing total resultset size substantially
Like the cost of increased network load from not filtering through the join should outweigh anything else in the equation
https://learn.microsoft.com/en-us/ef/core/querying/single-sp...
Comment by jappgar 3 days ago
But this applies to both camps.
Comment by nextaccountic 4 days ago
Comment by jpalomaki 4 days ago
Also if all the "state" is in one database, then you have better chance of getting consistent backups.
Comment by thibaut_barrere 4 days ago
We use Postgres for that on https://transport.data.gouv.fr (Elixir app which does a fair bit of processing), and it helps.
Not familiar yet with pg_durable though, but I have used or implemented similar solutions and can relate.
Comment by hmaxdml 3 days ago
Comment by keynha 4 days ago
Comment by joelthelion 4 days ago
Comment by gdecandia 4 days ago
Comment by greenavocado 4 days ago
Comment by hmaxdml 3 days ago
Comment by 7373737373 4 days ago
Comment by ijustlovemath 4 days ago
Comment by cpursley 4 days ago
Comment by affandar 4 days ago
The provider is an extensibility point. We just shipped the simplest version of it. Happy to take contribs if someone sends a pgmq based provider!
Comment by cpursley 4 days ago
Comment by evntdrvn 4 days ago
Comment by efitz 4 days ago
For better or worse, they “understand” and have seen a lot of message queuing code and read lots of message queue support discussions.
Comment by advertum 3 days ago
Comment by rastignack 4 days ago
One would be able to trigger maintenance jobs via simple lambda functions whose duration is capped.
Comment by gdecandia 4 days ago
Is the proposal to be able to export pg_dump formatted data on some schedule or trigger, entirely hosted in PostgreSQL and with timeouts? There are already extension that can export to blob/file storage and can be combined with pg_durable or pg_cron, so I assume the challenge is pg_dump compatible data export from SQL running in the database?
Comment by fragmede 4 days ago
Comment by mikey_p 4 days ago
Comment by gdecandia 4 days ago
Comment by jiggawatts 4 days ago
What has Microsoft done to work around this?
Comment by mikey_p 4 days ago
Comment by redmonduser 4 days ago
Comment by linuxhiker 4 days ago
Comment by viveknathani_ 3 days ago
Comment by alik75 15 hours ago
Comment by eddysir 3 days ago
Comment by sathyayoshi 3 days ago
Comment by sathyayoshi 4 days ago
Comment by ryanshrott 2 days ago
Comment by steno132 4 days ago
I would propose a rewrite of Postgres in another language like Rust, introducing a pluggable application layer on top. While ambitious in scope I think it would be helpful and even necessary.
Comment by redmonduser 4 days ago
Comment by steno132 4 days ago
Comment by belinder 4 days ago
Comment by dalberto 4 days ago
PostGIS. pgvector. TimescaleDB. Citus. pg_cron. pgmq. Apache AGE. ParadeDB. hstore. plv8. postgres_fdw. pg_partman. pg_stat_statements...
The extension API is the thing making your thesis possible. Rewriting it away would mean deleting the exact feature you're asking for.
Comment by linuxhiker 4 days ago
Comment by dsr_ 4 days ago
I won't be running that, though.
Comment by reactordev 4 days ago
pgrust.
Comment by steno132 4 days ago
Comment by reactordev 4 days ago
Comment by sgarland 4 days ago
As to the age statement, why on earth does that matter? HAProxy launched in 2001, and despite it being 25 years old, you’ll struggle to find anything faster or more stable in its field. Then there’s, you know, Linux - 1991. I suppose you want to see it rewritten?
Comment by sbuttgereit 4 days ago
This is a pretty poor take. Sure the software that we call "PostgreSQL" started to be developed in the 80's... but they didn't stop there. PostgreSQL has been in continuous development, including improvements, changes, and additions, and by some very smart people at that. It's not static and as long as I've been a professional user of the database, decades, it has continually evolved and in some cases even led the way. If we were to survey the software, wouldn't you at least be interested to know how much of code base actually dates back to those long ago decades and how much is more modern before making such statement?
It would be a mistake to take what PostgreSQL actually offers: an excellent database that has be continuously developed and updated over many years (i.e. "maturity"), for some arbitrary idea and evidently baseless idea that somehow "new" must be better.
If new is better, say why; and do so with more actually true statements than it's not extensible. Want it in rust? Well, OK, sure you can give hand-wavy reasons about security and such for why that might be beneficial; but if you want to be convincing you need to be much more specific about the problem in PostgreSQL and the specific way in which your recommendation actually and convincingly moves the needle. If you can't do that, you're simply giving us an emotional outpouring rather than a rational one.