PDF Redaction
Permanently obscure any text matching a specific pattern in a document.
Any valid regular expression can be provided to the node. All matches, across all pages, will be permanently removed from the document, including any text layer data in the PDF. Use this node to obscure any data that might have a common format in your documents. All documents must be in PDF format, and be text-searchable to be processed by this node.
Depending on the application that generated the PDF, the text layer may not line up properly with the image on some PDFs. In such cases, use a PDF convert node. Square 9’s PDF generator can reliably position image under text for image based PDF files.
Example Patterns
 |  |
---|---|
Social Security Number ###-##-#### | \d{3}-\d{2}-\d{4} |
US Phone Number (###) ###-#### | \(\d{3}\)\s?\d{3}-\d{4} |
US Federal Tax ID ##-####### | \d{2}-\d{7} |
Configuration
Node setup is very straightforward.
Simply provide a pattern in the Regular Expression input and all document pages will be processed.
Note that vector based (all text) PDF files and image over text based PDF files will present redactions differently. With a vector/text based PDF, the data is removed and will appear blank. With an image over text PDF, the data is removed, and a black redaction will be added to the image layer to obscure the text of the image itself.
Workflow
Two output paths exist for the node. Match status implies the document was processed and at least one occurrence of the pattern was found and redacted. No Match status means no results were found matching the pattern on any pages.
Â
Date | Version | Description |
---|---|---|
01/06/2022 | 1.0 | Initial release. |
Â