Defrag.exfat Is Inefficient and Dangerous

Posted by dxdxdt 3 days ago

Comments

Comment by ycombinatrix 3 days ago

>We prioritized simplicity and correctness first, and plan to incrementally introduce performance optimizations in future iterations.

Sir, this is a correctness issue.

Comment by zapzupnz 3 days ago

Does nobody else think the responses from the person who wrote the code read like the usual sycophantic “you’re absolutely right!” tone you get from AI these days?

Comment by rcxdude 2 days ago

It's plausible that they are using an LLM for translation, which would create the tone but not necessarily mean that they are delegating all thought to it.

Comment by pajko 3 days ago

There at least 2 AIs.

Comment by zapzupnz 2 days ago

Probably. The way nobody is calling that out in the thread is wild.

Comment by burnt-resistor 3 days ago

Sigh. Piss poor engineering, likely by humans. For the love of god, do atomic updates by duplicating data first such as in a move-out-of-the-way-first strategy before doing metadata updates. And keep a backup of metadata at each point of time to maximize crash consistency and crash recovery while minimizing the potential for data loss. An online defrag kernel module would likely be much more useful but I don't trust them to be able to handle such an undertaking.

If a user has double storage available, it's probably best to do the old-fashioned "defrag" by single-threaded copying all files and file metadata to a newly-formatted volume.

Comment by doubled112 3 days ago

That last paragraph sums up the ZFS defrag procedure at one shop I worked at. Buy new disks and send/receive the pool.

At our size and use case the timing was usually close to perfect. The pools were getting close to full and fragmented as larger disks became inexpensive.

Comment by dxdxdt 2 days ago

Yeah. Pretty much.

Read the defrag code in other well-established fs like ext4 or btrfs. They all have limitations(or caveats, if you will). It's one of those problems where you just have to throw money at it and hope for the best. Even Microsoft kinda just gave up on it because it's really a pointless exercise at this point in time and age.

Comment by forgotpwd16 3 days ago

>After reviewing the core defrag logic myself, I've come to a conclusion that it's AI slop.

Will call it a human slop. AI may've given them some code but they certainly haven't use it fully. I uploaded the defrag.c in ChatGPT asking to review on performance/correctness/safety and pointed the sames issues as you (alongside bunch of others but not interested at the moment to review them).

Comment by dxdxdt 2 days ago

I did the same. Was genuinely curious. Didn't get much from it. I'm still confused.

The code base is huge for an LLM to handle, perhaps it was generated over multiple prompts idk. Not sure if someone can train a model on the kernel code or exfatprogs and generate the code. I doubt someone with such expertise would even go through the process when they can just write the code themselves which is much easier.

Comment by forgotpwd16 2 days ago

Multiple prompts are mandatory for anything non-trivial or/and larger in scope. That said, the exfatprogs repo is ~60k tokens (in 8k LOC) and Linux's exfat driver* is ~40k tokens (in 6k LOC). So directly relevant code is ~100k tokens (in 14k LOC). Not that extensive.

>Not sure if someone can train a model on the kernel code or exfatprogs and generate the code.

They can certainly finetune such a model. Not a crazy idea, just computationally expensive. (But less expensive than training from scratch.)

*Of course Linux driver also uses many includes so if consider those alongside linked code the number goes significantly up.

Comment by dxdxdt 2 days ago

> just computationally expensive. (But less expensive than training from scratch.)

Model training requires GPUs w/ 1kW TDP. I can shit out code on noodles and red bulls. Not sure about the quality, but still way less energy :)

Jokes aside, the defrag program probably was a slob to some extent.

Comment by stuaxo 3 days ago

Talk about a baptism of fire for the dev.

Seems like they are very new to tbings and didn't expect it to be adopted, but were hoping for a bit of feedback.