Show HN: Free PDF redactor that runs client-side

Posted by MrGuacamole 1 day ago

I recently needed to verify past employment and to do so I was going to upload paystubs from a previous employer, however I didn't want to share my salary in that role. I did a quick search online and most sites required sign-up or weren't clear about document privacy. I conceded and signed up for a free trial of Adobe Acrobat so I could use their PDF redaction feature. I figured there should be a dead simple way of doing this that's private, so I decided to create it myself.

What this does is rasterize each page to an image with your redactions burned in, then it rebuilds the PDF so the text layer is permanently destroyed and not just covered up and easily retrievable.

I welcome any and all feedback as this is my first live tool, thanks!

Comments

Comment by moritzwarhier 1 day ago

Why go through these hoops instead of

1. Export as PNG (or whatever you prefer)

2. Add black rectangle/redact, and save again as raster image, preferably in a lossless way

3. Export as PDF, if you need that. Make sure that you've checked and/or erased all metadata from step 1 that is easily found as text (hidden layers or text in metadata, for example). For common raster formats such as PNG or JPG, this should amount to briefly checking metadata and/or strings output.

Is there anything else that a "PDF redactor" should do?

And are we sure that this one does all the steps?

If you like to be paranoid: a universal removal tool for steganographically stored info is theoretically impossible.

Comment by MrGuacamole 1 day ago

Appreciate the feedback. The steps you listed are essentially what the site is doing. Upload a PDF, add the black boxes, it gets converted to PNG and back to a new PDF. The value of this tool is just to streamline that process to make it quicker and easier.

The point about metadata is a good one, I checked a test file that I used and you can't see metadata from the original PDF, you only see basic info about the new PDF file and that it was produced by pdf-lib.

There definitely could be other things that a redactor should do, but for most use cases I think steganographically stored info lives outside of the threat model.

edit: ran strings on the output file, nothing but PDF structure and compressed image data, no original text content - thanks for the suggestion.

Comment by ranguita 19 hours ago

Congratullation your tool load very faster the pdfs

Comment by MrGuacamole 15 hours ago

Thanks, glad it worked well for you!

Comment by gisanokharu 1 day ago

curious whether metadata survives the PNG roundtrip. things like original creation dates, software used, or embedded thumbnails can still leak info even in rasterized PDFs. might be worth adding a strip step if you isnt already doing it

Comment by MrGuacamole 1 day ago

Good point, just pushed a fix. Title, author, subject, keywords, producer, creator, creation date, and modification date are now explicitly stripped from the output file metadata.

It seemed that these were already removed when the PDF was rasterized, but now they're explicitly being removed.

Comment by colesantiago 1 day ago

Is this open source?

I have a friend who works in an air gapped environment that this would work for him.

Can't use this if it isn't open source.

Comment by MrGuacamole 1 day ago

Just open sourced it: github.com/mr-guac/redactpdf

For your friend's air gapped environment, the file works offline after the libraries cache on first load, but it does pull PDF.js and pdf-lib from CDN so a one-time internet connection is needed.

To run it fully offline you'd need to download those two libraries separately, transfer them to the air gapped machine, and swap the CDN links in the HTML to point to the local files instead.