war story: censoring PDF files

Sometimes you want to remove confidential parts of a PDF file. Governments do this all the time. I wanted to do this too. It turns out to be hard. Governments get this wrong too. It is easy to scribble over something but skilled readers can remove these scribbles. What I ended up doing was using GIMP. The best open source tool. I sure didn't want to put my confidential stuff through an internet tool When loading the document into GIMP, it naturally puts each page in a different layer. That's not a great model but it kind of works. If your document is too big, break it up. With the layers menu, you will see each layer listed. Back to front. For each layer: look at the layer menu make sure that it is the only layer is visible: make sure that the only eyeball icon in the first column is for the layer of interest. click on the layer's image icon Move to the image window. select the rectangle selection tool: Tools: Selection Tools: Rectangle Select (or other suitable tools) for each area to censor, select the area censor it by your choice of Edit: Fill With FG color or Fill With BG color (you will type these a lot so learn the shortcut) select all the layers (in the Layers menu: all eyeballs on) File: Export as: select PDF file type/suffix. Pick a new name to not overwrite your original. The result may be a very large file. I reduced the size (and quality?) $ pdf2ps doc.pdf skinny.ps $ ps2pdf skinny.ps skinny.pdf This probably destroys resolution. You can use GhostScript directly and have more control.

D. Hugh Redelmeier via talk wrote on 2025-03-23 13:00:
What I ended up doing was using GIMP.
File: Export as: select PDF file type/suffix. Pick a new name to not overwrite your original.
I'm quite shocked that The GIMP reads / writes PDF files. Seems to me I had to use some LibreOffice tool last time I wanted to edit a PDF, although that was long ago and things change, including my memory fuzzing it. Was this with the GIMP v2 or the new v3? PS Anyone tried the new, v3 yet?

From: Ron via talk <talk@gtalug.org>
I'm quite shocked that The GIMP reads / writes PDF files.
Yes, but they end up as image files. Which is the clearest way you can be sure that there is no residue of what you deleted.
Seems to me I had to use some LibreOffice tool last time I wanted to edit a PDF, although that was long ago and things change, including my memory fuzzing it.
I tried LibreOffice. It popped me into Draw, if I remember correctly. And then I could not get it to do what I needed.
Was this with the GIMP v2 or the new v3?
I just asked the GNOME store for GIMP and got a flatpack of V3.0.0 RC3. I was able to crash it a couple of times, probably because I didn't know how to use it and went a previously untroddent path through the code. It does mean that googling for help produced results that didn't work.

On Sun, Mar 23, 2025, 16:05 Ron via talk <talk@gtalug.org> wrote: I'm quite shocked that The GIMP reads / writes PDF files.
Well, it has become an official standard (ISO 32000), meaning its specs must be open. There might be Acrobat enhancements but the format itself is supposed to be pretty stable. PS Anyone tried the new, v3 yet?
Downloaded, not yet installed. - Evan

Thanks for the tip and thanks for reminding the world that adding a black rectangle to "hide" confidential information is just not enough in a PDF file. But unfortunately, the proposed method requires a lot of labour. There is commercial software called PDF Studio <https://www.qoppa.com/files/pdfstudio/guide/#t=redaction.htm> that claims to completely remove all traces of the redacted content from the document. Redacting content in PDF Studio is as easy as adding a redacting rectangle. A desktop licence of PDF Studio used to cost US $79, but the [small] company that made PDF Studio was acquired by Apryse a couple of years ago, and now the same licence costs US $149. However, I have seen a deal for US $79 (if I recall it correctly) during last year's Black Friday and Cyber Monday event. PDF Studio is written in Java, so it works natively on Linux. On Sun, 23 Mar 2025 at 16:00, D. Hugh Redelmeier via talk <talk@gtalug.org> wrote:
Sometimes you want to remove confidential parts of a PDF file. Governments do this all the time. I wanted to do this too.
It turns out to be hard. Governments get this wrong too. It is easy to scribble over something but skilled readers can remove these scribbles.
What I ended up doing was using GIMP. The best open source tool. I sure didn't want to put my confidential stuff through an internet tool
When loading the document into GIMP, it naturally puts each page in a different layer. That's not a great model but it kind of works. If your document is too big, break it up.
With the layers menu, you will see each layer listed. Back to front.
For each layer: look at the layer menu make sure that it is the only layer is visible: make sure that the only eyeball icon in the first column is for the layer of interest. click on the layer's image icon
Move to the image window. select the rectangle selection tool: Tools: Selection Tools: Rectangle Select (or other suitable tools) for each area to censor, select the area censor it by your choice of Edit: Fill With FG color or Fill With BG color (you will type these a lot so learn the shortcut)
select all the layers (in the Layers menu: all eyeballs on)
File: Export as: select PDF file type/suffix. Pick a new name to not overwrite your original.
The result may be a very large file. I reduced the size (and quality?) $ pdf2ps doc.pdf skinny.ps $ ps2pdf skinny.ps skinny.pdf This probably destroys resolution. You can use GhostScript directly and have more control. --- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Hello, I have written a little python script that uses grep to find the string I want in the pdf and redact it. The other way is to use xournal++. Export your changes to a pdf file, but to make it irreversible, you need to convert the pdf to images and then convert it back to a pdf file, which all can be done using a simple script. Thanks, Mojtaba
On Mar 23, 2025, at 16:19, Val Kulkov via talk <talk@gtalug.org> wrote:
Thanks for the tip and thanks for reminding the world that adding a black rectangle to "hide" confidential information is just not enough in a PDF file. But unfortunately, the proposed method requires a lot of labour.
There is commercial software called PDF Studio <https://www.qoppa.com/files/pdfstudio/guide/#t=redaction.htm> that claims to completely remove all traces of the redacted content from the document. Redacting content in PDF Studio is as easy as adding a redacting rectangle. A desktop licence of PDF Studio used to cost US $79, but the [small] company that made PDF Studio was acquired by Apryse a couple of years ago, and now the same licence costs US $149. However, I have seen a deal for US $79 (if I recall it correctly) during last year's Black Friday and Cyber Monday event.
PDF Studio is written in Java, so it works natively on Linux.
On Sun, 23 Mar 2025 at 16:00, D. Hugh Redelmeier via talk <talk@gtalug.org <mailto:talk@gtalug.org>> wrote:
Sometimes you want to remove confidential parts of a PDF file. Governments do this all the time. I wanted to do this too.
It turns out to be hard. Governments get this wrong too. It is easy to scribble over something but skilled readers can remove these scribbles.
What I ended up doing was using GIMP. The best open source tool. I sure didn't want to put my confidential stuff through an internet tool
When loading the document into GIMP, it naturally puts each page in a different layer. That's not a great model but it kind of works. If your document is too big, break it up.
With the layers menu, you will see each layer listed. Back to front.
For each layer: look at the layer menu make sure that it is the only layer is visible: make sure that the only eyeball icon in the first column is for the layer of interest. click on the layer's image icon
Move to the image window. select the rectangle selection tool: Tools: Selection Tools: Rectangle Select (or other suitable tools) for each area to censor, select the area censor it by your choice of Edit: Fill With FG color or Fill With BG color (you will type these a lot so learn the shortcut)
select all the layers (in the Layers menu: all eyeballs on)
File: Export as: select PDF file type/suffix. Pick a new name to not overwrite your original.
The result may be a very large file. I reduced the size (and quality?) $ pdf2ps doc.pdf skinny.ps <http://skinny.ps/> $ ps2pdf skinny.ps <http://skinny.ps/> skinny.pdf This probably destroys resolution. You can use GhostScript directly and have more control. --- Post to this mailing list talk@gtalug.org <mailto:talk@gtalug.org> Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

From: Mojtaba Moodi via talk <talk@gtalug.org>
I have written a little python script that uses grep to find the string I want in the pdf and redact it.
That's great. I was tempted to do that but I had no idea how long that would take.
The other way is to use xournal++. Export your changes to a pdf file, but to make it irreversible, you need to convert the pdf to images and then convert it back to a pdf file, which all can be done using a simple script.
That's similar to what I did with GIMP. I have no idea which would be easier.

From: Val Kulkov via talk <talk@gtalug.org>
But unfortunately, the proposed method requires a lot of labour.
Yes it does. And error-prone repetitive work.
There is commercial software called PDF Studio
I'm not going to buy commercial software unless I have to. Waiting 6 months to save US$70 isn't going to work. For US$149 I would rather buy yet another computer :-)
PDF Studio is written in Java, so it works natively on Linux.
That's nice.

On Sun, Mar 23, 2025 at 04:00:15PM -0400, D. Hugh Redelmeier via talk wrote:
Sometimes you want to remove confidential parts of a PDF file. Governments do this all the time. I wanted to do this too.
It turns out to be hard. Governments get this wrong too. It is easy to scribble over something but skilled readers can remove these scribbles.
What I ended up doing was using GIMP. The best open source tool. I sure didn't want to put my confidential stuff through an internet tool
When loading the document into GIMP, it naturally puts each page in a different layer. That's not a great model but it kind of works. If your document is too big, break it up.
With the layers menu, you will see each layer listed. Back to front.
For each layer: look at the layer menu make sure that it is the only layer is visible: make sure that the only eyeball icon in the first column is for the layer of interest. click on the layer's image icon
Move to the image window. select the rectangle selection tool: Tools: Selection Tools: Rectangle Select (or other suitable tools) for each area to censor, select the area censor it by your choice of Edit: Fill With FG color or Fill With BG color (you will type these a lot so learn the shortcut)
select all the layers (in the Layers menu: all eyeballs on)
File: Export as: select PDF file type/suffix. Pick a new name to not overwrite your original.
The result may be a very large file. I reduced the size (and quality?) $ pdf2ps doc.pdf skinny.ps $ ps2pdf skinny.ps skinny.pdf This probably destroys resolution. You can use GhostScript directly and have more control.
Doesn't doing that turn the pdf into images? As far as I a concerned, a pdf document with text in it that isn't searchable or doesn't allow copying text is ruined. I suspect people with screen readers would be more upset than me. -- Len Sorensen
participants (6)
-
D. Hugh Redelmeier
-
Evan Leibovitch
-
Lennart Sorensen
-
Mojtaba Moodi
-
Ron
-
Val Kulkov