Convert Node

 

Use the optional Convert Node to automatically transform your office documents into multi-page TIF or text-searchable PDF files prior to release. (Note that converting documents to text-searchable PDFs requires a Text Searchable PDF Creator license.)


The Convert Node Settings dialog.


  1. Drag the Convert Node from the Nodes Pane to the Design Canvas.

  2. In the Convert Node Settings dialog, enter a unique name and a description for the Node.

  3. Select the file format to which the document will be converted. Choices include:

    • TIF – Select to convert documents to a multi-page TIF file format used for images. The TIF file format tends to be fast and efficient for general image-processing applications.

    • RTF – Select to convert documents to the RTF file format used in word processing applications.

    • XLS – Select to convert documents to the XLS file format used in spreadsheet applications.

    • DOCX – Select to convert documents to the DOCX file format used for Microsoft Word.

    • XLSX – Select to convert documents to the XLSX file format used for Microsoft Excel.

    • PPTX – Select to convert documents to the PPTX file format used for Microsoft PowerPoint.

    • EPUB – Select to convert documents to the EPUB file format used in e-books.

    • ODT – Select to convert documents to the ODT file format used in word processing applications.

    • Text Searchable PDF – Select to convert documents to a text-searchable PDF file format. This option uses multithreaded OCR to convert image pages to text.  Increase the number of licensed GlobalCapture Cores to increase the performance of your OCR process.  When a Classify Node immediately follows a Convert Node with Text Searchable PDF enabled, the OCR results will be preserved across nodes resulting in significant performance increases when your process includes multiple OCR related steps.  If you are going to convert all documents to text searchable PDF and will be performing one or more Classify steps to extract text, it is strongly recommended that you have the Convert Node preceed Classify.



  4. Click Save.

PDF/A

Use the Text Searchable PDF Creator module to automatically convert records in the TIF or non-text-based PDF file formats into a text-based PDF/A format as they are imported during a Workflow. The PDF/A file format is recognized by government and business as the standard for long-term document retention.

Documents in the PDF/A file format are designed to be self-contained to ensure that records can be reproduced in exactly the same way for years. All information necessary for displaying the document in the same manner every time is embedded within the file. This information includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. By using the PDF/A file format, GlobalCapture and Text Searchable PDF Creator becomes an integral component of your overall compliance strategy for records retention

Configure PDF/A

GlobalCapture supports the following PDF/A formats:

PDF/A-1a
PDF/A-1b
PDF/A-2a
PDF/A-2u
PDF/A-3a
PDF/A-3u

When converting to any PDF/A format, the workflow designer / administrator must ensure no processing nodes that might impact or change the file are introduced into the workflow process after the convert node.  File modifications post convert can not be guaranteed to preserve the integrity of the PDF/A standard.  This includes validation where users might perform any type of document transformation like page rotation, deskew, etc.  Ensure PDF convert is happening after any nodes where documentation transformation might occur.

Customers wishing to verify a document's compliance with a particular standard can use this utility.


Documents converted from edoc formats to PDF will not be PDF/A compliant.  If the goal is to convert edocs to PDF/A, two PDF conversion nodes would be required. The first will create the PDF, the second will convert the PDF to PDF/A.


FullPageOcr.cfg controls the PDF/A output format.  This file will exist in the GlobalCapture Engine's running directory.

Configuration
[PDFExportParams]
PDFAComplianceMode=PCM_Pdfa_2a
TextExportMode=PEM_ImageOnText
Colority=PCM_KeepColority

In the PDFExportParams section, the PDFAComplianceMode key may be set to any one of the following values:

PCM_Pdfa_1a
PCM_Pdfa_1b
PCM_Pdfa_2a
PCM_Pdfa_2u
PCM_Pdfa_3a
PCM_Pdfa_3u