Show HN: Free PDF redactor that runs client-side
Posted by MrGuacamole 1 day ago
I recently needed to verify past employment and to do so I was going to upload paystubs from a previous employer, however I didn't want to share my salary in that role. I did a quick search online and most sites required sign-up or weren't clear about document privacy. I conceded and signed up for a free trial of Adobe Acrobat so I could use their PDF redaction feature. I figured there should be a dead simple way of doing this that's private, so I decided to create it myself.
What this does is rasterize each page to an image with your redactions burned in, then it rebuilds the PDF so the text layer is permanently destroyed and not just covered up and easily retrievable.
I welcome any and all feedback as this is my first live tool, thanks!
Comments
Comment by moritzwarhier 1 day ago
1. Export as PNG (or whatever you prefer)
2. Add black rectangle/redact, and save again as raster image, preferably in a lossless way
3. Export as PDF, if you need that. Make sure that you've checked and/or erased all metadata from step 1 that is easily found as text (hidden layers or text in metadata, for example). For common raster formats such as PNG or JPG, this should amount to briefly checking metadata and/or strings output.
Is there anything else that a "PDF redactor" should do?
And are we sure that this one does all the steps?
If you like to be paranoid: a universal removal tool for steganographically stored info is theoretically impossible.
Comment by MrGuacamole 1 day ago
The point about metadata is a good one, I checked a test file that I used and you can't see metadata from the original PDF, you only see basic info about the new PDF file and that it was produced by pdf-lib.
There definitely could be other things that a redactor should do, but for most use cases I think steganographically stored info lives outside of the threat model.
edit: ran strings on the output file, nothing but PDF structure and compressed image data, no original text content - thanks for the suggestion.
Comment by ranguita 19 hours ago
Comment by MrGuacamole 15 hours ago
Comment by gisanokharu 1 day ago
Comment by MrGuacamole 1 day ago
It seemed that these were already removed when the PDF was rasterized, but now they're explicitly being removed.
Comment by colesantiago 1 day ago
I have a friend who works in an air gapped environment that this would work for him.
Can't use this if it isn't open source.
Comment by MrGuacamole 1 day ago
For your friend's air gapped environment, the file works offline after the libraries cache on first load, but it does pull PDF.js and pdf-lib from CDN so a one-time internet connection is needed.
To run it fully offline you'd need to download those two libraries separately, transfer them to the air gapped machine, and swap the CDN links in the HTML to point to the local files instead.