I dug into the Postgres sources to write my own WAL receiver

Posted by alzhi7 3 days ago

Comments

Comment by smj-edison 2 days ago

Dang, this is really relatable. A couple months I started digging into the Jimtcl interpreter, which is a 20-year-old Tcl interpreter written in C. My goal was to make it thread safe, which I naively thought wouldn't be that hard. Turns out interpreters are really complicated, lol. There were many many times I saw a line of code that made no sense, and only would make sense a couple days after when reading in another subsystem. Variable lookup caching in particular gave me a lot of grief, as I learned how epoch-based invalidation works. It's slightly insane to keep track of a different epoch per call frame, and whenever a variable is looked up, you have to figure out what call frame that variable is referencing based on context, and then check if its epoch is correct. It is incredibly humbling that these systems work as well as they do.

I eventually did end up getting Jimtcl to be multithread-safe, but it ended up being slower than the naive approach of serializing and deserializing between threads. I've been seriously nerd-sniped since, and have slowly been building my own thread safe interpreter, but I still have to cross check with Jimtcl constantly.

Comment by alzhi7 1 day ago

I fully understand you. During my "compiler-writing" experience, I've realized how easy it is to spend weeks making things that at first glance seem to be the easiest in the world (like parsing C89 declarator: "int f(void);"), and to say exactly what the "f(void)" is :)

Comment by JSR_FDED 2 days ago

You really get a sense for how much the author learned from this experience. And I like how he developed a respect for C.

Given all the corner cases he describes, it seems like a good example of something you would never ever want to vibe code.

Comment by pierrekin 2 days ago

I think it may be that we agree but because the expression “vibe code” has many meanings it’s hard to tell.

I would absolutely still bring a coding agent with me for a project like this, but I would be in the mindset of “I need to understand and be familiar with every line” rather than say, every function signature or every service behaviour.

So it is almost like vibe coding but the abstraction level is lower?

The question I’ve been asking myself recently is, if the act of thinking about the code from scratch is somehow more good than the potential benefit of being able to let that mechanical part be handled by something else, be it another human or an agent.

To be specific I’m referring to a prompt like “next, add a for loop which iterates over the elements in the array and enumerate an index, then call our function $func by reference for each element.” “Is there a more idiomatic way of doing this in $lang?” etc.

This has the advantage to me of letting me code in languages who’s syntax I don’t know or have forgotten, but I’m not sure whether this is trading some sort of short term gain for long term cost yet.

Comment by globular-toast 2 days ago

> To be specific I’m referring to a prompt like “next, add a for loop which iterates over the elements in the array and enumerate an index, then call our function $func by reference for each element.”

When you're fluent in a programming language it's quicker to just type that directly in said programming language...

Instead you're training yourself to be able to say this stuff in English which will never be as powerful.

Comment by cyberpunk 2 days ago

I hear this sentiment a lot but it doesn’t match my experience. I’m an experienced erlang programmer, i’ve shipped thousands of gen-server and gen-statem over the years and coding models can absolutely wire them up faster than I can type them out… I guess if you’re using lombok in Java or something typing @foo probably doesn’t take all that long.

Comment by globular-toast 2 days ago

The problem with a lot of these discussions is it's apples and oranges. I'm not experienced with Erlang, but since you say "wiring up" I'm going to assume you're talking about a pretty repetitive task involving putting the right references across several source code files. This is a mechanical process that hasn't annoyed anyone enough yet to write a program to do it. So either Erlang programmers don't care much about their time, or it's a relatively low-frequency thing that doesn't have a big impact overall.

The example I replied to was more the nuts and bolts of programming. It's the thing you're doing 90% of the time. Changes here have a big impact. That's why for almost my entire career I've had expandable "snippets" in my editor to automatically expand, say, "for" to a for loop skeleton where I fill in the variable names. It's like using an electric screwdriver. You don't lose touch with the screws, it just saves you time and effort.

Typing the entire for loop into an LLM in pseudocode actually seems like a regression compared to that. You don't save any typing. But you lose the ability to work independently. You become dependent on a paid subscription and/or powerful hardware just to do what I've been able to do with a keyboard and hardware you can find for free.

It's similar to writing a letter to someone and having it translated to French, but the reader understands English. Why would you do it?

This changes if you go higher level, of course. This is the temptation that LLMs give us. First it's a for loop, then it's an entire class, then entire modules, then it's only a small step to "vibe coding". What we're still figuring out is where this is actually a benefit. Where can we save effort without compromises? I don't think it's typing out code in English, and I don't think it's vibe coding either. Is there something in between? It's too soon to tell.

Comment by zhainya 2 days ago

Wouldn't the long-term cost be the reliance on "something else" that did the actual work?

Comment by victorbjorklund 2 days ago

Not worse than delegating those tasks to a teammate.

Comment by j45 2 days ago

Vibe Coding can be reversed to teach as well.

Comment by samokhvalov 2 days ago

nice work

I wonder if you considered WAL-G, which is also written in Go

and has this: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.m...

Comment by alzhi7 1 day ago

Thanks for the feedback!

Yes, I know about this tool, it's great. I watched videos about how it was developed, what difficulties there were in achieving delta backups, and how the developers also spent a ton of time studying the PostgreSQL source code. And I studied the Wal-G source code myself. I just never had to use it at work, since I was used to pgBackRest and, a bit later, to Barman. Wal-G focuses on cloud and universality (i.e., it's not only used for PG, but has a unified interface for many different storage systems).

Initially, I didn't even have the idea of making a complete, reliable tool. Over time, I started striving toward exactly that. When there was an available hypervisor at work, I set up k8s there and ran my receiver for several dev databases, just to test its operation 24/7, setting aggressive config parameters (frequent compression, unloading, cleanup, frequent backups, etc.). At the same time, I was choosing not small databases, but quite real production ones, with various nightly integrations for data population (external APIs, Airflow, and all that), blobs/tablespaces.

And of course I read your articles, and watched a lot of videos

Comment by 3 days ago