Oban, the job processing framework from Elixir, has come to Python

Posted by dimamik 6 hours ago

Comments

Comment by mperham 5 hours ago

I wrote Sidekiq, which Oban is based on. Congratulations to Shannon and Parker on shipping this!

I had to make this same decision years ago: do I focus on Ruby or do I bring Sidekiq to other languages? What I realized is that I couldn't be an expert in every language, Sidekiq.js, Sidekiq.py, etc. I decided to go a different direction and built Faktory[0] instead, which flips the architecture and provides a central server which knows how to implement the queue lifecycle internally. The language-specific clients become much simpler and can be maintained by the open source community for each language, e.g. faktory-rs[1]. The drawback is that Faktory is not focused on any one community and it's hard for me to provide idiomatic examples in a given language.

It's a different direction but by focusing on a single community, you may have better outcomes, time will tell!

[0]: https://github.com/contribsys/faktory [1]: https://github.com/jonhoo/faktory-rs

Comment by sorenone 4 hours ago

Thanks Mike! You are an inspiration. Parker and I have different strengths both in life and language. We're committed to what this interop brings to both Python and Elixir.

Comment by ai_critic 4 hours ago

"based on" is sorta a stretch here.

Sidekiq is pretty bare bones compared to what Oban supports with workflows, crons, partitioning, dependent jobs, failure handling, and so forth.

Comment by mperham 4 hours ago

By “based on” I don’t mean a shared codebase or features but rather Parker and I exchanged emails a decade ago to discuss business models and open source funding. He initially copied my Sidekiq OSS + Sidekiq Pro business model, with my blessing.

Comment by sorentwo 3 hours ago

This is absolutely true (except we went OSS + Web initially, Pro came later). You were an inspiration, always helpful in discussion, and definitely paved the way for this business model.

Comment by ai_critic 2 hours ago

Thank you for the clarification!

Comment by sorenone 3 hours ago

You got the beer. We got the pen. ;)

Comment by semiquaver 5 hours ago

Isn’t it more accurate to say that they are both based on Resque?

Comment by mperham 3 hours ago

Resque was the main inspiration, Sidekiq still provides compatibility with some of its APIs to this day.

https://github.com/sidekiq/sidekiq/blob/ba8b8fc8d81ac8f57a55...

Comment by simonw 5 hours ago

Sidekiq credits BackgrounDRb and Delayed::Job and Resque as inspiration here: https://www.mikeperham.com/2022/01/17/happy-10th-birthday-si...

Comment by bdcravens 4 hours ago

The API is very close, but architecturally it's different.

Additionally, delayed_job came before resque.

Comment by enraged_camel 5 hours ago

Maybe you didn’t intend it this way, but your comment comes across as an attempt to co-opt the discussion to pitch your own thing. This is generally looked down upon here.

Comment by BowBun 5 hours ago

Knowing Mike and his work over the years, that is not the case. He is a man of integrity who owns a cornerstone product in the Ruby world. He is specifically the type of person I want to hear from when folks release new software having to do with background jobs, since he has 15 years of experience building this exact thing.

Comment by mperham 3 hours ago

It was an off-the-cuff comment and probably not worded ideally but the intent was to discuss how Oban is branching off into a new direction for their business based on language-specific products while I went a different direction with Faktory. Since I came to the exact same fork in the road in 2017, I thought it was relevant and an interesting topic on evolving software products.

Comment by simonw 5 hours ago

> Oban allows you to insert and process jobs using only your database. You can insert the job to send a confirmation email in the same database transaction where you create the user. If one thing fails, everything is rolled back.

This is such a key feature. Lots of people will tell you that you shouldn't use a relational database as a worker queue, but they inevitably miss out on how important transactions are for this - it's really useful to be able to say "queue this work if the transaction commits, don't queue it if it fails".

Brandur Leach wrote a fantastic piece on this a few years ago: https://brandur.org/job-drain - describing how, even if you have a separate queue system, you should still feed it by logging queue tasks to a temporary database table that can be updated as part of those transactions.

Comment by zie 1 hour ago

I agree this is an awesome feature. I use pg_timetable instead of Oban though: https://cybertec-postgresql.github.io/pg_timetable/v6.x/

Comment by elfly 9 minutes ago

Lots of people are also trying to just use postgres for everything, tho.

Comment by nhumrich 4 hours ago

This is called the "transactional outbox pattern"!

Comment by simonw 3 hours ago

Good name! Looks like SeatGeek use that naming convention here: https://chairnerd.seatgeek.com/transactional-outbox-pattern/

This looks like a good definition too: https://www.milanjovanovic.tech/blog/outbox-pattern-for-reli...

Comment by airocker 26 minutes ago

Debezium was built exactly for that to power a queue based on WAL.

Comment by sieep 2 hours ago

Excellent point. Never thought of transactions in this way.

Comment by TkTech 4 hours ago

The Oban folks have done amazing, well-engineered work for years now - it's really the only option for Elixir. That said, I'm very confused at locking the process pool behind a pro subscription - this is basic functionality given CPython's architecture, not a nice-to-have.

For $135/month on Oban Pro, they advertise:

    All Open Source Features

    Multi-Process Execution

    Workflows

    Global and Rate Limiting

    Unique Jobs

    Bulk Operations

    Encrypted Source (30/90-day refresh)

    1 Application

    Dedicated Support

I'm going to toot my own horn here, because it's what I know, but take my 100% free Chancy for example - https://github.com/tktech/chancy. Out of the box the same workers can mix-and-match asyncio, processes, threads, and sub-interpreters. It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues). There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.

Comment by sorentwo 4 hours ago

> It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues).

We may well move some of those things to the OSS version, depending on interest, usage, etc. It's much easier to make things free than the other way around. Some Pro only features in Elixir have moved to OSS previously, and as a result of this project some additional functionality will also be moved.

Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.

> There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.

There's a lot more of everything available in the Python ecosystem =)

Comment by TkTech 3 hours ago

> Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.

That's totally fair, and I can only speak from the sidelines. I haven't had a chance to review the architecture - would it possibly make sense to swap from async as a free feature to the process pool, and make async a pro feature? This would help with adoption from other OSS projects, if that's a goal, as the transition from Celery would then be moving from a process pool to a process pool (for most users). The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.

On the other hand, already released and as you said it's much harder to take a free feature and make it paid.

Thanks again for Oban - I used it for a project in Elixir and it was painless. Missing Oban was why I made Chancy in the first place.

Comment by sorentwo 3 hours ago

> The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.

That's great advice. Wish we'd been in contact before =)

Comment by markbao 34 minutes ago

Is Postgres fast enough for job processing these days? We do hundreds of millions of jobs now and even years ago when our volume was a fraction of that, we got a huge performance boost moving from Postgres + Que to Redis + Sidekiq. Has that changed in the intervening years?

Comment by choilive 12 minutes ago

Hundreds of millions over what time frame? I got a system with Rails/Solid Queue + Postgres and doing about 20M jobs/day on a $45/mo VM with plenty of room to spare.

Comment by dec0dedab0de 2 hours ago

OSS Oban has a few limitations, which are automatically lifted in the Pro version:

Single-threaded asyncio execution - concurrent but not truly parallel, so CPU-bound jobs block the event loop.

This makes it not even worth trying. Celery's interface kind of sucks, but I'm used to it already, and I can get infinitely parallel expanding vertically and horizontally for as long as I can afford the resources.

I also don't particularly like ayncio, and if I'm using a job queue wouldn't expect to need it.

Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought

Comment by airocker 36 minutes ago

We had considered Oban when deciding whether to go with Kafka/Debezium or not. We sided with Kafka because it can do high throughput ingestion and it is easier to maintain it with cursor in today's world. Postgres is not meant for heavy writes, but heavy querying. You could fix that with lot of care but then it does not scale multi-master very well either. Kafka scales much better for heavy writes.

Comment by hangonhn 5 hours ago

This is something my company has been considering for a while. We've been using celery and it's not great. It gets the job done but it has its issue.

I've never heard of Oban until now and the one we've considered was Temporal but that feels so much more than what we need. I like how light Oban is.

Does anyone have experience with both and is able to give a quick comparison?

Thanks!

Comment by tecoholic 18 minutes ago

We migrated from Celery to Prefect a couple of years back and have been very happy. But ours is a small op which handles tasks in 1000s and not millions. It’s been night and day in terms of visibility and tracking. I would definitely recommend it.

It’s a heavy weight that covers a lot of use cases. But we just run simple ProcessWorkers for our regular needs and ECS worker for heavier ML tasks.

Comment by BowBun 5 hours ago

Very, very different tools, though they cover similar areas.

Temporal - if you have strict workflow requirements, want _guarantees_ that things complete, and are willing to take on extra complexity to achieve that. If you're a bank or something, probably a great choice.

Oban - DB-backed worker queue, which processes tasks off-thread. It does not give you the guarantees that Temporal can because it has not abstracted every push/pull into a first-class citizen. While it offers some similar features with workflows, to multiple 9's of reliability you will be hardening that yourself (based on my experience with Celery+Sidekiq)

Based on my heavy experience with both, I'd be happy to have both available to me in a system I'm working on. At my current job we are forced to use Temporal for all background processing, which for small tasks is just a lot of boilerplate.

Comment by owaislone 5 hours ago

I'm just coming back to web/API development Python after 7-8 years working on distributed systems in Go. I just built a Django+Celery MVP given what I knew from 2017 but I see a lot of "hate" towards Celery online these days. What issues have you run into with Celery? Has it gotten less reliable? harder to work with?

Comment by TkTech 4 hours ago

Celery + RabbitMQ is hard to beat in the Python ecosystem for scaling. But the vast, vast majority of projects don't need anywhere that kind of scale and instead just want basic features out of the box - unique tasks, rate limiting, asyncio, future scheduling that doesn't cause massive problems (they're scheduled in-memory on workers), etc. These things are incredibly annoying to implement over top of Celery.

Comment by hangonhn 4 hours ago

Yeah that list right there. That's exactly it.

We don't hate Celery at all. It's just a bit harder to get it to do certain things and requires a bit more coding and understanding of celery than what we want to invest time and effort in.

Again, no hate towards Celery. It's not bad. We just want to see if there are better options out there.

Comment by alanwreath 4 hours ago

I like celery but I started to try other things when I had projects doing work from languages in addition to python. Also I prefer the code work without having to think about queues as much as possible. In my case that was Argo workflows (not to be confused with Argo CD)

Comment by offbyone 3 hours ago

Ooof. I don't mind the OSS/pro feature gate for the most part, but I really don't love that "Pro version uses smarter heartbeats to track producer liveness."

There's a difference between QoL features and reliability functions; to me, at least, that means that I can't justify trying to adopt it in my OSS projects. It's too bad, too, because this looks otherwise fantastic.

Comment by sorentwo 2 hours ago

With a typical Redis or RabbitMQ backed durable queue you’re not guaranteed to get the job back at all after an unexpected shutdown. That quote is also a little incorrect—producer liveness is tracked the same way, it’s purely how “orphaned” jobs are rescued that is different.

Comment by offbyone 2 hours ago

"jobs that are long-running might get rescued even if the producer is still alive" indicates otherwise. It suggests that jobs that are in progress may be double-scheduled. That's a feature that I think shouldn't be gated behind a monthly pro subscription; my unpaid OSS projects don't justify it.

Comment by dec0dedab0de 2 hours ago

Agreed. I try to avoid using anything that has this freemium model of opensource, but I let it slide for products that provide enterprise features at a cost.

This feels like core functionality is locked away, and the opensource part is nothing more than a shareware, or demo/learning version.

Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought

Comment by Arubis 5 hours ago

While this is a Cool Thing To See, I do wish things would go the other way—and have all the BI/ML/DS pipelines and workflows folks are building in Python and have them come to Elixir (and, as would follow, Elixir). I get where the momentum is, but having something functional, fault-tolerant, and concurrent underpinning work that’s naturally highly concurrent and error-prone feels like a _much_ more natural fit.

Comment by cpursley 40 minutes ago

Agree, and Claude Code does very well with Elixir despite TS/Python getting all the hype:

https://youtu.be/iV1EcfZSdCM?si=KAJW26GVaBqZjR3M

This helps with keeping it on track writing idiomatic elixir and using good patterns: https://skills.sh/agoodway/.claude/elixir-genius

Comment by qianli_cs 4 hours ago

Thanks for sharing, interesting project! One thing that stood out to me is that some fairly core features are gated behind a Pro tier. For context, there are prior projects in this space that implement similar ideas fully in OSS, especially around Postgres-backed durable execution:

1. DBOS built durable workflows and queues on top of Postgres (disclaimer: I'm a co-founder of DBOS), with some recent discussions here: https://news.ycombinator.com/item?id=44840693

2. Absurd explores a related design as well: https://news.ycombinator.com/item?id=45797228

Overall, it's encouraging to see more people converging on a database-centric approach to durable workflows instead of external orchestrators. There's still a lot of open design space around determinism, recovery semantics, and DX. I'm happy to learn from others experimenting here.

Comment by sorentwo 4 hours ago

There are other projects that implement the ideas in OSS, but that's the same in Elixir. Not that we necessarily invented DAGs/workflows, but our durable implementation on the Elixir side predates DBOS by several years. We've considered it an add-on to what Oban offers, rather than the entire product.

Having an entirely open source offering and selling support would be an absolute dream. Maybe we'll get there too.

Comment by qianli_cs 3 hours ago

That's fair, the idea itself isn't new. Workflows/durable execution have been around forever (same story in Elixir).

The differences are in the implementation and DX: the programming abstraction, how easy recovery/debugging is, and how it behaves once you're running a production cluster.

One thing that bit us early was versioning. In practice, you always end up with different workers running different code versions (rolling deploys, hotfixes, etc.). We spent a lot of time there and now support both workflow versioning and patching, so old executions can replay deterministically while still letting you evolve the code.

Curious how Oban handles versioning today?

Comment by tnlogy 2 hours ago

Looks like a nice API. We have used the similar pattern for years, but with sqlalchemy and the same kind of sql statement for getting the next available job. Think it’s easier to handle worker queues just with postgresql rather than some other queue system to keep supported and updated for security fixes etc.

Comment by dfajgljsldkjag 5 hours ago

I have fixed many broken systems that used redis for small tasks. It is much better to put the jobs in the database we already have. This makes the code easier to manage and we have fewer things to worry about. I hope more teams start doing this to save time.

Comment by BowBun 5 hours ago

Traditional DBs are a poor fit for high-throughput job systems in my experience. The transactions alone around fetching/updating jobs is non-trivial and can dwarf regular data activity in your system. Especially for monoliths which Python and Ruby apps by and large still are.

Personally I've migrated 3 apps _from_ DB-backed job queues _to_ Redis/other-backed systems with great success.

Comment by asa400 1 hour ago

How high of throughput were you working with? I've used Oban at a few places that had what pretty decent throughput and it was OK. Not disagreeing with your approach at all, just trying to get an idea of what kinds of workloads you were running to compare.

Comment by BowBun 1 hour ago

Millions of jobs a minute

Comment by brightball 4 hours ago

The way that Oban for Elixir and GoodJob for Ruby leverage PostgreSQL allows for very high throughput. It's not something that easily ports to other DBs.

Comment by BowBun 1 hour ago

Appreciate the added context here, this is indeed some special sauce that challenges my prior assumptions!

Comment by owaislone 4 hours ago

Interesting. Any docs that explain what/how they do this?

Comment by TkTech 4 hours ago

A combination of LISTEN/NOTIFY for instantaneous reactivity, letting you get away with just periodic polling, and FOR UPDATE...SKIP LOCKED making it efficient and safe for parallel workers to grab tasks without co-ordination. It's actually covered in the article near the bottom there.

Comment by owaislone 4 hours ago

Thank you

Comment by brightball 4 hours ago

Good Job is a strong attempt. I believe it's based around Advisory Locks though.

https://github.com/bensheldon/good_job

Comment by sorentwo 3 hours ago

Transactions around fetching/updating aren't trivial, that's true. However, the work that you're doing _is_ regular activity because it's part of your application logic. That's data about the state of your overall system and it is extremely helpful for it to stay with the app (not to mention how nice it makes testing).

Regarding overall throughput, we've written about running one million jobs a minute [1] on a single queue, and there are numerous companies running hundreds of millions of jobs a day with oban/postgres.

[1]: https://oban.pro/articles/one-million-jobs-a-minute-with-oba...

Comment by BowBun 1 hour ago

Appreciate the response, I'm learning some new things about the modern listening mechanisms for DBs which unlock more than I believed was possible.

For your first point - I would counter that a lot of data about my systems lives outside of the primary database. There is however an argument for adding a dependency, and for testing complexities. These are by and large solved problems at the scale I work with (not huge, not tiny).

I think both approaches work and I honestly just appreciate you guys holding Celery to task ;)

Comment by pawelduda 5 hours ago

In Rails at least,aside from being used for background processing, redis gives you more goodies. You can store temporary state for tasks that require coordination between multiple nodes without race conditions, cache things to take some load off your DB, etc.

Besides, DB has higher likehood of failing you if you reach certain throughputs

Comment by shepardrtc 3 hours ago

> Inaccurate rescues - jobs that are long-running might get rescued even if the producer is still alive. Pro version uses smarter heartbeats to track producer liveness.

So the non-paid version really can't be used for production unless you know for sure you'll have very short jobs?

Comment by sorentwo 2 hours ago

You can have jobs that run as long as you like. The difference is purely in how quickly they are restored after a crash or a shutdown that doesn’t wait long enough.

Comment by sieep 2 hours ago

Oban is incredible and this type of software will continue to grow in importance. Kudos!

Comment by sergiotapia 3 hours ago

Python dudes are in for a treat, Oban is one of the most beautiful elegant parts of working with Elixir/Phoenix. They have saved me so much heartache and tears over the years working with them.

Comment by owaislone 5 hours ago

I don't know how I feel about free open source version and then a commercial version that locks features. Something inside me prevents me from even trying such software. Logically I'd say I support the model because open source needs to be sustainable and we need good quality developer tools and software but when it comes to adoption, I find myself reaching for purely open source projects. I think it has to do with features locked behind a paywall. I think I'd be far more open to trying out products where the commercial version offered some enterprise level features like compliance reports, FIPS support, professional support etc but didn't lock features.

Comment by sanswork 5 hours ago

For most of the history the main locked feature was just a premium web interface(there were a few more but that was the main draw) that's included in free now and I think the locked features are primarily around most specialised job ordering engines. Things that if you need free you almost certainly don't need. Oban has been very good about deciding what features to lock away.

(I've paid for it for years despite not needing any of the pro features)

Comment by tinyhouse 4 hours ago

How is this different than Celery and the like?

Comment by deeviant 4 hours ago

I can't imagine why you would want a job processing framework linked to a single thread, which make this seem like a paid-version-only product.

What does it have over Celery?

Comment by sorenone 3 hours ago

The vast majority of tasks you use a job processing framework for are related to io bound side effects: sending emails, interacting with a database, making http calls, etc. Those are hardly impacted by the fact that it's a single thread. It works really well embedded in a small service.

You can also easily spawn as many processes running the cli as you like to get multi-core parallelism. It's just a smidge* little more overhead than the process pool backend in Pro.

Also, not an expert on Celery.

Comment by dec0dedab0de 2 hours ago

I use celery when I need to launch thousands of similar jobs in a batch across any number of available machines, each running multiple processes with multiple threads.

I also use celery when I have a process a user kicked off by clicking a button and they're watching the progress bar in the gui. One process might have 50 tasks, or one really long task.

Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought

Comment by nodesocket 2 hours ago

Is there a web U/I to view jobs, statuses, queue length etc?

Comment by cpursley 3 hours ago

Oban is cool but I really like the idea of pgflow.dev, which is based on pgmq (rust) Postgres plugin doing the heavy lifting as it makes it language agnostic (all the important parts live in Postgres). I've started an Elixir adapter which really is just a DSL and poller, could do the same in Python, etc.

https://github.com/agoodway/pgflow

Comment by scotthenshaw3 3 hours ago

[dead]

Comment by waffletower 1 hour ago

No offense to all of the effort referenced here, I understand that there are many computing contexts with different needs. However, I really need to ask: am I the only one who cringes at the notion of a transactional database being a job processing nexus? Deadlocks anyone? Really sounds like asking for serious trouble to me.