Multi-Value QR Code Extraction
This example demonstrates how a cover sheet with QR Codes can be used for both separation and indexing in a capture process. Users can use any tool for cover sheet creation that fits their workflow process. The example cover sheet referenced here was built using Square 9's GlobalForms.
Cover Sheet Layout
The cover sheet, linked here for reference, contains several barcodes on the page. Multiple, identical barcodes are used to increase overall accuracy when scanning.
The first block of barcodes, highlighted in red, contain the data used for indexing. All 5 of these barcodes are identical and are spread out on the document to increase the chances that at least one of them will read correctly if there is a problem scanning (the page skews, the quality is low, etc.). Each of the 5 barcodes contain the same values. In this case, 3 separate index values are included in the barcode, separated with a pipe character ("|"). The acutal values in the barcode are listed at the top of the document, in this case: IN8872545|Gibson USA|Invoice
The second block of barcodes, highlighted in blue, contain data used for separation (and optionally, cover sheet deletion). These barcodes contain the value: S9COVERSHEET
Template Setup
To process this type of document, use a Pattern Match Zone in the extraction template. Because the barcode contains 3 distinct piece of data, 3 zones will be created in the template. The configuration for all 3 zones will be identical, with the exception of the Search String property.
Zone Settings
- Each Zone should have a unique and distinct name. In this example, Value 1, Value 2, and Value 3 are used to represent the first, second, and third data item in the barcode.
- Each Zone's Type should be Pattern Match.
- Search String is the only value that is different for each zone. Search String is the RegEx pattern used to identify the data to extract. While these patterns are specific to this example, they do illustrate how to use regular expressions to extract text before, in between, or after a specific character that was extracted.
- Value 1 (the first zone) should use:
([^|]+) - Value 2 (the second zone) should use:
(?<=\|)(.*)(?=\|) - Value 3 (the third zone) should use:
[^\|]*$
A More Generic Pattern
When working with regular expressions, you will find that there are often many ways to achieve the same result. The regex patterns shown above specifically target (a) data before the first |, (b) data between the first and last |, and (c) the data after the last |. A more generic pattern would be:
(?<=(.*?\|){0})((.*?(?=\|))|(.*))
In this pattern, the number between curly braces represents the data point to extract. The number is zero based, meaning the first match is zero. So if a QR code had 5 data points separated by | and you wanted to read the 3rd, the pattern would read:
(?<=(.*?\|){2})((.*?(?=\|))|(.*))
- Value 1 (the first zone) should use:
- In the Field section, the template should target a process field from the workflow. The example workflow includes 3 fields for this purpose, named "Barcode Value 1". "Barcode Value 2", and "Barcode Value 3". For simplicity, administrations looking to repurpose this process may want to keep these fields and map the data to a field of their own creation using a Set Process Field node.
- In the Limits section, set Max Lines to 1.
- In the Barcode Section, check Barcode. The barcode options will expand. In Orientation, choose East, and in Symbology, choose QR.
Workflow Setup
The workflow for managing this process is mostly linear, with the only decision process stemming from a document that did not extract correctly (for example, if a user scanned a document without the cover sheet).
- Documents are import from a network location, or scanned directly into the process.
- A Document Separation step is included for end users that look to scan multiple documents in a single scan batch. This step specifically targets the separation barcodes (highlighted in the blue section above) and only separates when it reads a barcode with the value "S9COVERSHEET". If users are not intending to scan multiple documents in one scan batch, it's advisable to remove this step from the process. Leaving it in place will not cause any problems, but will slow the overall process down unnecessarily.
- A Classify step performs OCR on the document and extracts the 3 data elements to its corresponding Process Field.
While not illustrated in this example, the end user may be using a cover sheet only for indexing and separation. It might be desirable to remove the cover sheet from the document once it is no longer needed. To achieve this, a Delete node can be inserted between steps 4 and 5, set to delete pages where the QR Code containing S9COVERSHEET is found. - A Release node outputs the document. In this workflow, documents are release to a local drive, and the file is named with the data from each process field.
- A Validation step is included to handle and scenarios where the document is not extracted correctly. If for example, and document was scanned without a coversheet, it will flow to this step. Users have the ability to delete the document, or to manually index the document and releasing it.
- If a user chooses to discard the document in validation, all pages are deleted from the process.
- The workflow ends.
Workflow Package
GlobalCapture administrators can import this workflow as is. Once imported, you may want to adjust the Import (Green Circle) and Release (Orange Square) nodes to better reflect the environment's import and release locations. Click here to download the workflow package, which includes the Workflow, OCR Template, and Fields described above. The extraction and separation zones are not fixed position. Any barcode coversheet that uses QR codes and contains up to 3 index fields in the QR Code separated with a pipe character will work for this workflow.
Click here to download an example PDF. This document when processed will separate to three documents.