Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As an administrator working on extraction rules, it's strongly advised that you keep document sample pages limited to the pages you intend on extracting data from.  When uploading documents to build extraction rules, all pages will be read and converted to text.  On larger documents that are tens or hundreds of pages, this process can be resource intensive.  It can take a long time, and it may negatively impact general performance of your capture environment for your other users.  If your must upload large file sets, do so sparingly, or do so in off hours so as not to impact other processing happening on your system.  In most cases, a page or a subset of pages is far more efficient, and will make the general process of rule building much faster.  Most Tiff and PDF document editors (like Adobe Acrobat) will include features to extract just a page or range of pages from a file.  Alternately, if you have access to the paper copy, just scan the pages of interest when planning your rule builds.

Targeted OCR

Targeted OCR, or OCR that is looking as a specific area or characteristics of a page to capture data, is the most efficient means to extract text from a page.  When defining extraction rules, each extracted field may be isolated to a specific page.  In this way, GlobalCapture can execute extraction rules very quickly since it doesn't need to wait for all pages of a document to be read by the OCR engine.  It's helpful to identify what pages data may possibly appear on in a document so rules can be built accordingly.  It is advisable to OCR only the pages you intend on extracting text from to improve the system's per page processing speed.

Full Page OCR

As the name implies, Full Page OCR does an image to text conversion on an entire page.  It's important to understand when and how to use this feature to ensure optimal performance.  Unless a customer wishes to perform text based searching (Content Search or Find In File) a full text PDF is generally not necessary.  Regardless of the type of file being stored (Tiff, PDF, etc.) GlobalSearch offers the ability to output a PDF when requested.  As such, for many customers, conversion to PDF is an unnecessary step that slows down per page processing speeds.  There are valid use cases where Full Page OCR is required.  For example, customer's whose compliance requirements mandate all documents are stored in PDF/A, or any customer that does wish to leverage the previously mentioned content based search features native to GlobalSearch.

...