The pitfalls of partitioning Postgres yourself

Posted by abelanger 4 days ago

Comments

Comment by abelanger 1 day ago

I mentioned this towards the bottom of the post, but to reiterate: we're extremely grateful to Laurenz for helping us out here, and his post on this is more than worth checking out: https://www.cybertec-postgresql.com/en/partitioned-table-sta...

(plus an interesting discussion in the comments of that post on how the query planner chose a certain row estimate in the specific case that Laurenz shared!)

The other thing I'll add is that we still haven't figured out:

1. An optimal ANALYZE schedule here on parent partitions; we're opting to over-analyze than under-analyze at the moment, because it seems like our query distribution might change quite often.

2. Whether double-partitioned tables (we have some tables partitioned by time series first, and an enum value second) need analyze on the intermediate tables, or whether the top-level parent and bottom-level child tables are enough. So far just the top-level and leaf tables seem good enough.

Comment by mnahkies 1 day ago

I'd consider myself pretty familiar with postgres partitioning, and even worked with systems that emulated partitioning through complex dynamic SQL through stored procs before it was supported natively.

But TIL, I didn't realize you could do multiple levels of partitioning in modern postgres, found this old blog post that touches on it https://joaodlf.com/postgresql-10-partitions-of-partitions.h...

Something that stresses me is the number of partitions - we have some weekly partitions that have a long retention period, and whilst it hasn't become a problem yet, it feels like a ticking time bomb as the years go on.

Would a multi level partitioning scheme of say year -> week be a feasible way to side step the issues of growing partition counts?

Comment by cldellow 1 day ago

Ha, what a coincidence. Just today I was reading a three year old Stackoverflow discussion about this [1].

It prompted Laurenz to submit the documentation patch that is cited in the article. In the discussion of the patch itself, people seem to conclude that it's a good improvement to the docs, but that the behaviour itself is a bit of a footgun. [2]

[1]: https://stackoverflow.com/questions/73951604/autovacuum-and-...

[2]: https://www.postgresql.org/message-id/Y8cQJIMFAe7QT73/%40mom...

Comment by wreath 17 hours ago

I ran into the exact same problem few weeks ago too with around 1k partitions but they were small. Ended up adding cronjob to run analyze on the partitioned table (not the partitions!) once a day. I hope this gets fixed in future version of PG.

Comment by groundzeros2015 1 day ago

They didn’t say why they didn’t use the built-in partitioning system.

Comment by Tostino 1 day ago

They are using the built-in partitioning. They just ran into one of the "you gotta know this" pitfalls.