PDF Redaction

Permanently obscure any text matching a specific pattern in a document.

Any valid regular expression can be provided to the node. All matches, across all pages, will be permanently removed from the document, including any text layer data in the PDF. Use this node to obscure any data that might have a common format in your documents. All documents must be in PDF format, and be text-searchable to be processed by this node.

Depending on the application that generated the PDF, the text layer may not line up properly with the image on some PDFs. In such cases, use a PDF convert node. Square 9’s PDF generator can reliably position image under text for image based PDF files.

Example Patterns

 

 

 

 

Social Security Number ###-##-####

\d{3}-\d{2}-\d{4}

US Phone Number (###) ###-####

\(\d{3}\)\s?\d{3}-\d{4}

US Federal Tax ID ##-#######

\d{2}-\d{7}

Configuration

Node setup is very straightforward.

Simply provide a pattern in the Regular Expression input and all document pages will be processed.

Note that vector based (all text) PDF files and image over text based PDF files will present redactions differently. With a vector/text based PDF, the data is removed and will appear blank. With an image over text PDF, the data is removed, and a black redaction will be added to the image layer to obscure the text of the image itself.

Workflow

Two output paths exist for the node. Match status implies the document was processed and at least one occurrence of the pattern was found and redacted. No Match status means no results were found matching the pattern on any pages.

 

Date

Version

Description

Date

Version

Description

01/06/2022

1.0

Initial release.

Â