Convert Node
The Convert node is available in GlobalCapture only.
Use the optional Convert Node to automatically transform your office documents into multi-page TIF or text-searchable PDF files prior to release.
Node Properties
Title
Add a title for this node. Titles are useful when reading the history in the history of a workflow for easier understanding of the overall process.
Description
Provide a synopsis of what this node is doing, or make note of any important details. This is useful for providing additional information such as workflow details and use case information. A good description is helpful when returning to modify the workflow in the future.
Conversion Type
TIF – Select to convert documents to a multi-page TIF file format used for images. The TIF file format tends to be fast and efficient for general image-processing applications.
Text Searchable PDF – Select to convert documents to a text-searchable PDF file format. This option uses multithreaded OCR to convert image pages to text. Increase the number of licensed GlobalCapture Cores to increase the performance of your OCR process. When a Classify Node immediately follows a Convert Node with Text Searchable PDF enabled, the OCR results will be preserved across nodes resulting in significant performance increases when your process includes multiple OCR related steps. If you are going to convert all documents to text searchable PDF and will be performing one or more Classify steps to extract text, it is strongly recommended that you have the Convert Node proceed Classify.
When Text Searchable PDF is enabled, the option for an OCR Extraction Profiles is available. Extraction profiles allow for customized application to maximize extraction and accuracy. The available profiles will vary depending on those that have been created. If none is selected, the default profile will be used.
Converting documents to text-searchable PDF requires a Text Searchable PDF Creator license.
PDF/A compliance
Use the Text Searchable PDF Creator module to automatically convert records in the TIF or non-text-based PDF file formats into a text-based PDF/A format as they are imported during a Workflow. The PDF/A file format is recognized by government and business as the standard for long-term document retention.
Documents in the PDF/A file format are designed to be self-contained to ensure that records can be reproduced in exactly the same way for years. All information necessary for displaying the document in the same manner every time is embedded within the file. This information includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. By using the PDF/A file format, GlobalCapture and Text Searchable PDF Creator becomes an integral component of your overall compliance strategy for records retention
Configure PDF/A
GlobalCapture supports the following PDF/A formats:
PDF/A-1a
PDF/A-1b
PDF/A-2a
PDF/A-2u
PDF/A-3a
PDF/A-3u
When converting to any PDF/A format, the workflow designer / administrator must ensure no processing nodes that might impact or change the file are introduced into the workflow process after the convert node. File modifications post convert can not be guaranteed to preserve the integrity of the PDF/A standard. This includes validation where users might perform any type of document transformation like page rotation, deskew, etc. Ensure PDF convert is happening after any nodes where documentation transformation might occur.
Customers wishing to verify a document's compliance with a particular standard can use this utility.
Documents converted from edoc formats to PDF will not be PDF/A compliant. If the goal is to convert edocs to PDF/A, two PDF conversion nodes would be required. The first will create the PDF, the second will convert the PDF to PDF/A.
FullPageOCR.cfg Settings
FullPageOcr.cfg controls the PDF/A output format. This file will exist in the GlobalCapture Engine's running directory.
[PDFExportParams]
PDFAComplianceMode=PCM_Pdfa_2a
TextExportMode=PEM_ImageOnText
Colority=PCM_KeepColority |
In the PDFExportParams section, the PDFAComplianceMode key may be set to any one of the following values:
PCM_Pdfa_1a
PCM_Pdfa_1b
PCM_Pdfa_2a
PCM_Pdfa_2u
PCM_Pdfa_3a
PCM_Pdfa_3u