Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added PDF/A Definitions.

...

  • TextOCR.cfg: YOURGETSMARTDRIVE:\GetSmart\CaptureServices\GlobalCapture_1
  • FullPageOCR.cfg: YOURGETSMARTDRIVE\GetSmart\CaptureServices\GlobalCapture_1

Info
titlePlease Be Advised

If the TextOCR.cfg files are different between the Template Designer and the capture engine, the Template will read differently from what the engine will return. 

...

  • FullPageOCR.cfg – TextPDF/Full Page OCR Configuration settings.
    • These settings are used when converting a document to a text searchable PDF or other electronic formats.
  • TextOCR.cfg
    • These settings are used when extracting data use Zonal OCR.
  • FullPageBaseSettings.cfg
    • These settings contain a profile of commonly used settings present in version 4.1, which are customized further by FullPageOCR.cfg.

Info
titlePlease Be Advised

Changes to the default settings of these configuration files are not supported and to be modified at your own risk. Generally speaking, changes to these files will be done by, or come at the advisement of a Square 9 Technician

...

In the event your TextOCR or FullPageOCR configuration files are lost or corrupted, you can construct a new one using the settings directly below. Additionally, Square 9 offers a Aggressive OCR set of parameters in the second code box below. This will increase processing time but also increase OCR accuracy. The third code box includes a "High Performance" setting that is an excellent compromise between speed and accuracy, at the expense of conversion in color. This set of parameters also excels at reading sub-optimal text, so if your text is coming in from a low quality source (Such as an old dotMatrix printer) these settings may be a good option. The second and third configs should not be used on burdened or slower servers. Modification to one configuration file should be done to all configuration files to maintain parity and consistency.

Original OCR Settings

Code Block
languagec#
[PDFExportParams]
PDFAComplianceMode=PCM_Pdfa_1b
TextExportMode=PEM_ImageOnText
Colority=PCM_KeepColority

[PagePreprocessingParams]
CorrectOrientation = true

[PrepareImageMode]
CorrectSkew = false

[PageAnalysisParams]
ProhibitModelAnalysis=true

[ObjectsExtractionParams]
FastObjectsExtraction=true

[RecognizerParams]
FastMode=true

[DocumentStructureDetectionParams]
ClassifySeparators=false
DetectFootnotes=false
DetectTableOfContents=false

Aggressive OCR Settings (Square 9 Tested)

Code Block
languagec#
[PDFExportParams]
PDFAComplianceMode=PCM_Pdfa_1b
TextExportMode=PEM_ImageOnText
Colority=PCM_KeepColority

[PagePreprocessingParams]
CorrectOrientation = true

[PrepareImageMode]
CorrectSkew = true

[PageAnalysisParams]
ProhibitModelAnalysis=false
EnableTextExtractionMode=true

[ObjectsExtractionParams]
FastObjectsExtraction=false
EnableAggressiveTextExtraction=true

[RecognizerParams] 
FastMode=false

[DocumentStructureDetectionParams]
ClassifySeparators=false
DetectFootnotes=false
DetectTableOfContents=false

High Performance OCR Settings (Square 9 Tested)

Code Block
languagec#
[PDFExportParams]
PDFAComplianceMode=PCM_Pdfa_1b
TextExportMode=PEM_ImageOnText
Colority=PCM_KeepColority

[PagePreprocessingParams] 
CorrectOrientation = true 

[PrepareImageMode]
CorrectSkew = true
DiscardColorImage = true
PhotoProcessingMode = PPM_TreatAsPhoto
ImageCompression = IC_NoCompression

[PageAnalysisParams]
ProhibitModelAnalysis=false
EnableTextExtractionMode=false
DetectTables=false

[ObjectsExtractionParams]
FastObjectsExtraction=false
DetectTextOnPictures=true
EnableAggressiveTextExtraction=true
DetectPorousText = true
RemoveGarbage = true
RemoveTexture = true

[RecognizerParams]
FastMode=false
LowResolutionMode = true
TextTypes = 487

[DocumentStructureDetectionParams]
ClassifySeparators=false
DetectFootnotes=false
DetectTableOfContents=false


Configuring Full Page OCR PDF Export Format

In some cases you may have the need to define a new PDF standard for PDFs converted to text searchable to export as.  To do so, modify FullPageOCR.cfg and alter the setting "PDFAComplianceMode" under the "PDFExportParams" to one of the available values outlined below.


  • PCM_Pdfa_1a
  • PCM_Pdfa_1b
  • PCM_Pdfa_2a
  • PCM_Pdfa_u2

Note: By default, Square 9's OCR engine uses PDF/A Standard 1B.