Configure Directional Zones
Directional Zones are always assigned as a “child” of another Zone. They allow you to locate the first occurrence of specific data using a parent Zone and from there look in a specific place relative to the parent Zone’s location. In the Directional Zone settings, you can specify which edge of the parent zone should be used to calculate the position. This can be helpful in scenarios where a specific edge might shift from the variable data within it. You can configure the Region to continue in that direction to the edge of the page or restrict its minimum and maximum distances from the parent Zone for pixel-precise control.
You can use repeated Directional Zones for extracting Table Field and Multi-Value Field data, which is useful for line-item entries on an invoice, or invoice numbers from a check remittance.
- In the Properties Pane, enter a name for the new Zone.
- From the Type drop-down list, select Directional.
- To assign the required parent Zone, select it from the Parent drop-down list. Once selected, you can constrain the child Zone to extract on the same page as its parent Zone using the Parent button in the Pages section. Note that Directional Zones are set relative to parent Zones using the From Parent Edge setting in the Limits group.
- To choose the portal which determines which Fields to extract the data to, in the Field group, choose one of the following:
- Choose a portal from the Portal list.
- If the selected portal is a standalone GlobalCapture portal, from the Field list, choose the Process Field into which the extracted data will be placed.
- If the selected portal is a Square 9 portal (if GlobalSearch is installed), choose a database from the Database drop-down list and then from the Field drop-down list, choose an Index Field in that database into which the extracted data will be places.
- Choose a portal from the Portal list.
- To set the direction to look for data in relation to the parent Zone, in the Direction group, choose one or more directions from the selection of Left, Up, Right, Down.
- To set corresponding limits to the Zone, in the Positions group, enter the number of pixels within which data must be found in the Left, Top, Right, or Bottom text boxes.
By default, the Directional Zone is set to the confines of the parent Zone that was found. For example, if your parent Zone is 40 pixels wide and the Directional Zone is configured to move down from its parent, by default the Directional Zone will be 40 pixels wide. This can be adjusted using the Position settings to make the Directional Zone wider than the parent Zone.
Note that if the Zone is based on being, for example, to the left of another Zone and that Zone contains no data to extract, the next Zone to the left of it will also not extract any data. There might be data there, but you have broken the link to that next Zone.
Use the Position settings to define the Search Region of a Zone. These settings can be configured for all Zone types and they are required for Positional or Directional Zones. The location and dimensions of the Search Region can be specified by document page, by coordinates on the document page, the distance from the edge of another Zone (Zone Anchoring), or a combination of coordinates and Zone Anchoring.With Zone Anchoring, the coordinates are relative to up to four other Zones. Instead of using coordinates to define the Search Region to extract data, you can create completely dynamic Regions. Additionally, one Anchor can be chained to another. With multiple Anchors, extraction areas will adjust even more dynamically. Note that you cannot anchor a parent Zone to its child Zones.
Position properties are measured from the top-left corner of the page or parent Zone’s Search Region. Left and Right coordinate settings are the number of pixels over from the left edge of the page or Search Region. Top and Bottom are measured from the top edge. Configure these settings by entering a number, using the scroll arrows, or by drawing the Region using the Locator () icon on the Zone’s menu bar. You can use the Measure (
) tool on the toolbar to determine how many pixels to enter.- Coordinates – Use the Coordinates and Zone Anchor settings to set the dimensions and position of Search Regions. The starting point for measuring the edges of a Search Region depends upon how a Zone is configured:
- Zones with No Parent Zone – Coordinates are measured in pixels from the top-left corner of the document page. For Zones where Position settings are optional, If no coordinates are set the entire page is searched for data. If an individual position is not set, the page is searched to the edge of the page in that direction.
- Child Zones – Coordinates are measured in pixels from the top-left corner of the parent’s found data, except for Directional Zones. Coordinates for Directional Zones (which are always child Zones) depend on the direction of search. When configuring Zones, it is recommended that you set the parent Zone’s properties first, and then the child Zone’s properties.
- Zone Anchors – Click the Anchor () icon next to one or more of the Left, Top, Right, or Bottom text boxes and in the Anchor menu which appears, select a Zone from the list. The input box will then show the name of the anchored Zone in the text box. Any sides of the Search Region which are not defined by anchoring should be defined by coordinates. Note that if you set coordinates first and then select Anchors, you can revert to the coordinates by clicking Anchor again.
- Zones with No Parent Zone – Coordinates are measured in pixels from the top-left corner of the document page. For Zones where Position settings are optional, If no coordinates are set the entire page is searched for data. If an individual position is not set, the page is searched to the edge of the page in that direction.
Consider Atypical Documents When Configuring Zone Properties
Since the dimensions and position of a Zone Search Region can affect the extraction outcome, consider how to configure a “fallback position” if your Template encounters a document with non-standard content or formatting. For example, Zone Anchors are very dynamic, but if anchor text is not found, the drawn coordinates of the Zone is respected. If there are no drawn coordinates, the Zone checks the entire page. This may be the result that you want, but if you prefer that the Search Region is more specific for those times when the anchor text is not found, first set coordinates (either by drawing the Search Region or by entering coordinates) and then select the anchoring Zone or Zones.
- Overlap – Sometimes text is only partially in a Search Region. If those cases, you can use Overlap if you want to ensure that it is also picked up for the Zone. To set the directions outside of the drawn Zone Search Region to look for data to extract, in the Overlap subgroup, enable one or more of the Left, Top, Right, Bottom checkboxes. All four are enabled by default. All text must reside at least a pixel within the Search Region for the overlap to pick it up, regardless of the direction. A setting of 0 disables overlap in any direction, while a setting of 15 enables overlap in all directions.
- Orientation – Use the Orientation slider to set the direction in which data is read for extraction. Move the slider from the default of 0° (no rotation) to the right to set the orientation to 90°, 180°, or 270°.
- Pages – Use Pages to specify which page to search for data to extract. (Note that settings are unavailable for Separator Zones.) In the Page subgroup, select one of the following:
- In the Page subgroup, choose the First, Last, or All pages.
- Or, if there is a designated Parent Zone, to ensure that a child Zone’s data is extracted from the intended location in a multi-page document, you can specify that it must be found on the same page as the parent Zone’s data was found. For repeating Zones, note that the Search Region for their child Zones will be on the same page as the one where the repetition was found, irregardless of their Pages setting, with one exception. If child Zones that are footer Zones are found whose Pages is specified as Last, the repetition will be stopped for the current and any subsequent pages.
- Specific – To extract from specified pages of the document, enter one or more page numbers in the Specific text box. For more than one page, use a comma to delimit the page numbers (1,3,5) or use a hyphen to indicate a range of pages (3-5).
- In the Page subgroup, choose the First, Last, or All pages.
- Coordinates – Use the Coordinates and Zone Anchor settings to set the dimensions and position of Search Regions. The starting point for measuring the edges of a Search Region depends upon how a Zone is configured:
- Optionally, additional Zone Properties can be configured.
- Click the Apply () icon.
Create Line Item Data Extraction Zone
To extract data using the optional Line Item Data Extraction module, you must have both the Structured Data Extraction Zones and the Unstructured Data Extraction Zones available. For multiple-page line-item extraction you will need a header Zone to indicate the start of the table of data and Classification licensing.
- Create a Directional Zone.
- In the Repetitions group, enable the Repeating checkbox. A repeating Zone cannot be the child of another repeating Zone.
- Optionally, additional Zone Properties can be configured.
- Click the Apply () icon.
Additional Zone Properties
Field Zone Properties
You can map data extracted from Zones to indexing fields using the Field settings. This may be configured for any Zone type. Note that with the Line Item OCR Extraction license, you can extract Table Field data from a single page (such as invoice line items) or across multiple pages. The repeating Zone can be set per document, per page, or per document region. Use this option in Directional Zones.
General Zone Properties
Required Zones should not be used if GlobalCapture is not licensed for Classification.
Note that If there are any non-header/footer Zones also defined in the Template, any required Zones must also be found in order for classification and separation to occur.
Data will only extract if the average confidence for the words within the extracted data are above the set confidence threshold. The ideal level will be high enough to allow for acceptable accuracy in your Workflow automation while not being so high as to cause a lot of batch errors that users will have to correct manually. Around 80% to 85% is usually a good threshold to start with, unless it is likely that there will be no data in the Zone to extract, then leave it at 0%.
Limits Zone Properties
Use the Limits settings to specify parameters for data to extract. If a Zone on the current page does not meet the parameters (such as not enough digits for a Social Security Number), then the Template will search any subsequent specified pages for data that does. Data in the Search Region must contain at least the minimum specified elements or it will be not be validated. Results will truncated for any characters, words, or lines past the maximum specified. Note that the Min settings for the Characters limit is enforced per line read within the entire Zone, while for the Min settings for the Words limit is enforced for the entire Zone. Also note that the Lines settings value must be either zero (meaning no limit) or a number greater than one, in order to configure Word Spacing in Marker and Pattern Match Zones. Elements include: Control Extraction by Controlling Gaps The OCR engine treats a group of words as a “line,” although the words may not necessarily all be on the same horizontal plane, as one might think of a line of text. To the OCR engine, both examples shown are two lines. Use the Word Spacing settings to control your extraction results, based on the empty spaces between words. In this example, Horizontal Word Spacing has been set to 200 pixels. There is a 150-pixel wide gap between the right edge of the first line and the left edge of the second line. Since this falls within the specified spacing distance, the pattern match for the Zone is successful. If the document has a larger 500-pixels gap between lines, the pattern match is not found. Use Field Limits and Zone Limits Together If you set an Index Field for an invoice description using the default maximum of 50 characters and then set a Search Region of 75 characters for a Zone, only the first 50 characters will be extracted. (This is true for all the repeating Zones that follow the first one as well.) So, either set your Index Field to a maximum number of characters high enough to encompass most scenarios or use it knowingly to eliminate some invoice descriptions that you do not want to capture.
When you set your Zone to extract from different blocks of text, you can control what should be considered for the Zone and what should not, using the Word Spacing setting. This appears when the Lines Max setting is zero or greater than one. You can configure your multi-line pattern matching to specify the vertical and horizontal distances allowed between lines in one long, searchable string. When you specify the size of gaps, you can set two paragraphs to be extracted together, for example, or words in a full justified paragraph, where the space between words may be larger than normal.
Setting either Vertical or Horizontal spacing to zero will bridge gaps of any size. Both settings at zero will combine all available words into a single searchable string. You can use the Measure feature to help determine the gap settings. To measure the distance, click the Measure () icon in the Template Designer toolbar and then drag your mouse pointer on the Design Canvas from one point to another to create a line. The Measurement dialog will appear to display the line’s X and Y coordinates.
To specify limits to the spaces between consecutive words and lines, in the Word Spacing subgroup that appears, select Vertical ro specify the maximum number of pixels high and Horizontal to specify the maximum number of pixels wide the space between valid words or lines to extract.
- Dynamic Line Extraction – Use the Dynamic Line Extraction property for variable-height line extraction. This checkbox is contextual and only appears when a Positional Zone is assigned a repeating Directional Zone as a parent Zone. It is used most often for extracting repeating rows of table data where the number of lines of text in the row might vary (such as the description field of a line-item invoice where the text may wrap to multiple lines).
- Repetitions – The Repetitions subgroup is contextual and appears when configuring a Directional Zone. To specify limits of the number of Zone repetitions for repeating Zone types, enable the Repeating checkbox and then select enter a number for the minimum and/or the maximum number of repetitions of the Search Region required for data extracted from the Zone.
- From the Parent Edge – The From the Parent Edge subgroup, like Repetitions, is contextual and appears when configuring a Directional Zone. It is required by a Directional Zone. To locate the child Directional Zone relative to the parent Zone, enable one or more of the Left, Top, Right, and Bottom checkboxes.
- Dynamic Line Extraction – Use the Dynamic Line Extraction property for variable-height line extraction. This checkbox is contextual and only appears when a Positional Zone is assigned a repeating Directional Zone as a parent Zone. It is used most often for extracting repeating rows of table data where the number of lines of text in the row might vary (such as the description field of a line-item invoice where the text may wrap to multiple lines).
Barcode Zone Property
To specify a Zone for BCR (Barcode Character Recognition) in the Barcode group, enable Barcode and then select: