Template Validation

Template Validation allows administrators working with complex templates, or templates that work across a large set of document formats, to understand the impact of template changes being made to the system.  Template Validation is most useful in scenarios where a template is working on "common" fields, like Invoice Number and Amount.  It is a common pattern for a single template to extract an Invoice Number value from a number of different vendor invoices.  Since each invoice can have a unique layout, there may be a number of rules that define how "Invoice Number" is defined, located, and extracted.  Changes to those rules to account for a new vendor's format do have the potential negatively impact the existing platform.  Historically an administrator needs to either reprocess vendor samples to ensure the system continues to operate normally, or wait until an end user reports a problem and then react.  Template Validation allows the administrator to be proactive, by allowing for a much more efficient testing process for changes made.

Template Validation works by creating a database of "known good" process results.  By creating this database of results, it allows the GlobalCapture to quickly reference and report on changes to the template as they happen without needing to load the documents as samples.  Administrators of large and complex template sets can save hours or even days of time by leveraging this feature, available to all GlobalCapture customers on version 2.2 or greater.

Setup

To begin using Template Validation, users must validate documents to generate a "known good" data set.  Create a workflow that processes all document samples into a validation node for manual user validation of the extracted field data.   Ideally this workflow will first run each sample though an initial template that was designed for the desired document set.

A System Actions setting is available (Template Validation) that tells GlobalCapture to retain the details of this document for analysis.  When enabled on a workflow Action, this allows the action to continue processing the document as normal and also create a copy of the document to be placed into the validation database. This copy will include the OCR files, document pages, thumbnails, and field information.  In order to be placed in this database the document needs to have been previous sent through a Classify node.  If the workflow does not contain a Classify node but does contains a Validation Action with Template Validation enabled, the workflow will error with "Template Validation Action Requires a Classify Node".

Performance Note

Any workflow containing a Validate node with Template Validation enabled will require all pages of all documents be OCR'd.  In general, GlobalCapture only runs OCR processing on pages where it expects to find data.  On a 20 page invoice where Invoice Number is always extracted on Page 1, only Page 1 would ever be OCR'd.  Template Validation alters that behavior.  While this doesn't present any technical or licensing problems, it can have an impact on overall throughput and should be considered when planning for document throughput in a production environment.  Be sure the system is scaled accordingly to handle the proposed document volume.

Validation Database

As mentioned above, the validation database will collect documents and data to optimize analysis downstream for the administrator.  Depending on the documents and the workflow, over time this can result in a large amount of historical data being maintained.  The default setting is to retain all data sent into the database.  It is possible to configure the system to automatically prune itself. The default storage location for these files is here C:\GetSmart\CaptureProcessing\TemplateDB.

This is configured in the Batch Portal's config file (by default: C:\GetSmart\BatchPortal\ssBatchPortal.exe.config).  Find or add the following key to the appSettings section of the file:

<add key="ValidationRetentionLimit" value="0" />

If the key is missing altogether, or if the value is zero, all documents are retained indefinitely.  Setting a value will tell the system to retain only that number of documents, and the oldest documents in the system will be purged once that threshold is reached.

Note that validation databases are stored per batch portal.  In environments with multiple batch portals, be aware you may have multiple validation databases.

Usage

Using Template Validation Actions is an effective way for users to send new documents to a process (like a new Vendor in an AP processing scenario) to an administrator for review.  Users will need to be trained on when and how to use Actions with Template Validation enabled.  Only the Actions configure with Template Validation will trigger saving to the database.  It may be helpful to name Validation Actions in a way that lets the user know how their behavior will impact the process.  For example, you might want to create two actions, one labeled "Save" and the other labeled "Save & Retain Data" to help users make the right choice for the documents being validated.

Template Analysis

Once documents have been validated and processed with an Action that includes Template Validation, analysis can be performed in the designer.  Template Analysis will give active feedback on how the current template in the designer (saved or not) will perform against their verified set of documents.  This comparison is done on a per field basis, any validated documents containing one or more of the same fields extracted by the current template will be analyzed.

In the toolbar of the Template Designer, click the Test button.

Test is available for anyone with access to the template designer, when documents exist in the template validation database (otherwise the Test option is grayed out).  Clicking it will ask the user to choose "All Documents", which compares the current template to everything in the validation database using at least one of the same fields. Alternately, they can select "Documents Matching Template" to strictly view validation results that were previously classified using the current template (based on the Template's ID). The latter option prevents validation documents using the same fields from showing in the results of a different template.  In either case, clicking the button will trigger the analysis process.  Analysis happens on the server, and performance can vary depend on the number of documents in the database.  The Analyzing window will display while the system is calculating results.

Note that analysis is running based on the current state of the template.  Changes to the template, saved or not, will be considered in this step.  This allows administrators to gauge the effectiveness of templates without committing those changes to production.  Of course, it is always recommended that any changes to production capture processes be tested in an appropriate test environment.  Once analysis completes, results will be displayed.