Find the Largest Amount on a Page
Extracting a value like a total amount can be a complex task, especially on a document like an invoice. In this type of document, the invoice total can be in close proximity to other amounts like subtotals, taxes, and freight amounts. This approach attempts to simplify the process by making an assumption that the largest amount found on a document is the invoice total. The setup for this process is very straightforward:
- Create three Process Fields to support both the extraction and Workflow that finds the highest amount.
- Create a Template that locates and extracts all amounts on a document.
Create a GlobalCapture Workflow process that will intelligently identify the largest amount from all amounts extracted.
The sample document used to build this example can be downloaded here:Â Brew_Haven_Vendor_Invoice_MM.tif.
Create Process Fields
Start with a new GlobalCapture Workflow in the Designer. Set up the necessary Process Fields and add them to the Workflow. For this example, create three Fields:
- Create a Field to track the count called "Amount Count." Assign it the Numeric data type.
- Create a Field to track the highest amount called "Highest Amount." Assign it the Decimal data type.
- Create a Field to store all extracted amounts from a page/document. Call this Field "Amounts (MV)" and configure it as a Multi-Value Field of the Decimal data type.
Once the Fields are created, add them to your Workflow. The Process Fields list should look like the image below. (The order is not important.)
Create a Template
With the Process Fields created, create a Template that extracts all occurrences of dollar amounts on your documents. The sample document used for the example has several dollar amounts on the page.
Â
You can use a simple Pattern Match Zone that will match all the amounts found on the document. Create a new Template, upload a sample document, and add a Pattern Match Zone. Give it a meaningful name that supports your process. In this example, the Template is named "All Amounts."
Patterns and Multi-Value Fields
When a Pattern Match Zone is assigned to a Multi-Value Field, the Zone will extract all values found matching the pattern and add them all to the Field. This behavior differs from a Zone that is assigned to a regular Field, where only the first-matched value is extracted.
There are a few properties of the Zone that should be modified from the default settings. These are:
- Name – Give the Zone a meaningful name, like "Amounts."
- Type – Set the Zone Type property to Pattern Match. (Note that the Pattern Match Zone Type requires an Unstructured Data Extraction license to be used in production. Demo's and unregistered versions will not be restricted.)
- Search String – Define a regex (regular expression) pattern that correctly matches all amount variants on your document. For this example, use the pattern below, but it is important to recognize that no one pattern will meet the needs of every amount you may encounter.
[+-]?[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}$
- Word Spacing – Set both Vertical and Horizontal Word Spacing values to "1."
- Coordinates – It is always a good idea to regionalize a Zone to an area of a page whenever possible to eliminate the potential for incorrect reads in the wrong location. In this example, being confident in the documents and their format, there is not a need to provide specific coordinates for the Zone.
Save the Template.
Create a Workflow
Create a Workflow for this process, starting with a very basic layout:
- Add an Import Node to ingest documents at your desired frequency.
- Add a Classify Node and assign the "All Amounts" Template you have just created to the Node.
- Initialize the "Amount Count" and "Highest Amount" Process Fields to zero by using two Set Process Field Nodes.
The Workflow should begin to take shape and look like this:
A few notes about what has been built so far:
- It's always a good idea to specifically set the Templates you wish to apply to the documents in the Classify Node properties. This prevents any possibility of the wrong Template getting applied.||
- Because you are assigning only a single Template and that Template has no required Fields, there is no chance a document would ever be in an Unclassified state. For this reason, it is safe to just end the process on the Unclassified path of the Classify Node.
- It's a best practice to initialize any Process Fields to baseline values, but this is not always necessary.
Next, continue to build the logic to identify the highest amount extracted in the Classify step. The process should loop through the list of extracted amounts, in order. Use the "Amount Count" Process Field as a variable to keep track of your current position in the list of extracted amounts in the "Amounts (MV)" Process Field. Use the "Highest Amount" Process Field as a variable to maintain the highest known value. On the first loop, "Highest Amount" will be unassigned (initialized to zero), so it will be by default the highest value and will be assigned to Highest Amount. Increment the "Amount Count" counter by one, and move to the next extracted amount in the list. One steps 2 through X of the loop, the amount of the current amount in the list will be compared to the "Highest Amount" variable. If it is larger, it will be assigned to the Highest Amount variable. This looping pattern will continue until you we have moved through the entire list. The value of "Highest Amount" when the loop exits (based on "Amount Count" equalling the number of items in "Amounts (MV)" will be the highest valued item in the list.
Multi-Value Array Index
For those familiar with basic programming concepts, a multi-value field can be considered an array. GlobalCapture references items in the array by their numeric index. It is important to remember that the numeric index is zero based. This means the first item in the list has an index of "zero." "One" would represent the second item in the list.
Create a loop using Condition Nodes, and use Set Process FIeld Nodes to control the conditions as follows:
- Add a Condition Node called "Loop Check." It should be based on the "Amount Count" Process Field, have a condition of Equals, and it's Value should contain the S9 Notation
{p_Amounts (MV)_length}
. Â This S9 Notation is used to identify the number of items that were extracted and assigned to the "Amounts (MV)" Field and it is used to control how many times the process should loop. - Add a Condition Node called "Compare Amount." It should be based on the "Highest Amount" Process Field, have a condition of Greater Than or Equal To, and it's Value should contain the S9 Notation
{p_Amounts (MV)[{p_Amount Count}]}
. This S9 Notation is used to identify the value of a specific item the "Amounts (MV)" Field. Think of a Multi-Value Field as an ordered list of values, and each value has an ID. Â The first value has an ID of zero, the next value is one, and so on. Using the S9 Notation defined here, you can access a specific value from the Field based on its ID. This step determines if the currently stored value in "Highest Amount" should be replaced with the current item in the list. - Add a Set Process Field Node called "Step Count." This Node will set the value of the "Amount Count" Field, so choose this Field from the Process Field list. For Value, you will need to increment the Process Field's current value by one, and this will repeat on every pass through the loop. Do so by referencing the Field with S9 Notation and basic arithmetic expressions to add one to its value:
{p_Amount Count}+1
. - Add a Set Process Field node called "Set Highest Amount." This Node will set the value of the "Highest Amount" Field, so choose that Field from the Process Field list. This Field is always set to the value of an item in the "Amounts (MV)" Field, and can be reference with the S9 Notation
{p_Amounts (MV)[{p_Amount Count}]}
.
If you continue adding these new Nodes to the Design Canvas, the completed process should look something like the example below. In a live scenario, you would expect to see some type of export step, along with validation steps, and any other business logic required for the process. The Workflow defined as is will demonstrate the power of GlobalCapture's rules engine.
Use the the Process Fields tab of a process in the Batch Manager to see the outcome. You will also be able to visualize all data in the "Amounts (MV)" Field in this screen.