Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Marker Zone is the most basic Zone type, where the Template basis document validation by looking in a designated location on the page for a match to a particular word or phrase. If your Validate Node needs to process only structured forms, one or more Marker Zones may be all you need for a simple Template. The Marker Zone and Positional Zone are the Structured Data Extraction Zones types that are core to any GlobalCapture installation.  Note OCR options might be unavailable for bundled versions of GlobalCapture, but licensing may always be added to any installation type.

You can use Marker Zones to detect any shifts in the image position and automatically registers the Template to the proper location. You can also use Marker Zones to classify a document’s layout. By using an exact string match of characters, you can introduces a high level of accuracy, which allows you to draw tighter Zones for text-heavy documents. Some things to keep in mind about the Marker Zone search string:

  • The search string is not case sensitive.

  • It is a good idea for a Marker to be unique in the region of the page it is searching.

  • You can specify part of a word, one word, or more than word. Use part of a word for “contains” matching. For example, specify “appl” to return both “application” and “applied.”

  • You can specify more than one search string to perform an OR-based search in multiple values in the Zone. The results will be the first search string in the list which is found on the document, unless multiple occurrences of a search string or no occurrence of a search string are found. In either case, the Marker Zone will move on to the next search string in the list to look for a unique match. If no matches were found among the search strings, the “Marker not found” message will appear. Adjust your Search String entries to achieve successful results.

Tip
title

Choosing Between Marker and Pattern Match Zones

When choosing between a Marker Zone and a Pattern Match Zone, consider these factors:

  • Case – Pattern Match Zones are case sensitive (but can be controlled with (?i) at the beginning to ignore case). Marker Zones are not case sensitive.

  • RegEx Search Strings – Pattern Match Zones support the use of Regular Expressions to define search strings. Marker Zones do not.

  • Licensing - Pattern Match Zones require an Unstructured Data Extraction license.

Configure Zone

  1. In the Properties Pane, enter a name for the new Zone.

  2. From the Type drop-down list, select Marker (the default Zone).

  3. To select the text used to register the image position, enter the part of a word, word, or phrase in the Search String text box. Press Enter again to add more search strings.

  4. To designate an area of the page to search for a match, click the Locator

...

  1. icon, and then use your mouse to drag a box over the area to be searched to create the Search Region.

  2. Optionally, additional Zone properties can be configured.

  3. Click the Apply

...

  1. icon. The Zone will be added to the Zones Pane and the matching text will be indicated by a salmon-colored box on the document image around the extracted search string.

...

  1. The Template Designer with a Marker Zone.Image Added

Additional Zone Properties

...

Insert excerpt
Parent Zone Property
Parent Zone Property
nopaneltrue

Back to top

Field Zone Properties

...

Back to top

General Zone Properties

...

Field Zone Properties

You can map data extracted from Zones to indexing fields using the Field settings. This may be configured for any Zone type.

Note that with the Line Item OCR Extraction license, you can extract Table Field data from a single page (such as invoice line items) or across multiple pages. The repeating Zone can be set per document, per page, or per document region. Use this option in Directional Zones.

  1. Field or Table Field – Choose a Field or Table Field in the Field group to map a Zone to. If you select a Table Field, you can assign a Table Field to the parent/header Zone of a table. As you cannot assign a Table Field and a Field simultaneously, your Header Zone should not be used for extracting data that feeds a Field.

    1. Select a portal in GlobalCapture or GlobalSearch (if available).

    2. Select a database in the portal.

    3. Select either a Field or a Table Field in the database.

  2. Replacement – After mapping a Field or a Table Field to a Zone, the Replacement group appears. Use the Replacement settings to clean up extracted text. For example, if your OCR engine often interprets a zero as an O on a particular document, you can replace the letter O with the number 0. You could also use this to remove the dollar sign or comma from a number by replacing them with nothing.

    1. Click Add Replacement.

    2. Enter the text to be matched. You can use a string or a RegEx. This text box cannot be empty.

    3. Enter the replacement text. You can use a string or you can leave Replacement empty to strip out text, such as removing commas from a number.

    4. Select either Word to configure a text match or Pattern to configure a regular expression.

    5. To configure additional replacements, click Add Replacement again. To delete a replacement, click the Delete (X) icon next to the selected replacement settings.

General Zone Properties

  • Required – To ensure that the Zone contains data before the template is used for extraction, enable Required. This may be assigned to any Zone type, but it must be assigned for document classification. All required Zones in a Template must occur on a page for a document to be classified. A document will always classify with the first Template that matches all required Zones or the first Template that has no required Zones, whichever comes first. (Note that in some situations, enabling a Zone as a header or a footer may in fact make that Zone required. This primarily happens when one of these Zone types is a separator.)

    Required Zones should not be used if GlobalCapture is not licensed for Classification.

  • Separator – Advanced document separation can be performed with a Template. On one or more Zones, enable Separator and either Header or Footer to burst documents, when appropriate data is found. If Separator and Header are enabled, the Template will identify the first page of a document. If Separator and Footer are enabled, the Template will identify the last page. When either combination is enabled, the Group property becomes available. When Separator and Header are enabled, the Separate on Change property appears.

    Note that If there are any non-header/footer Zones also defined in the Template, any required Zones must also be found in order for classification and separation to occur.

    • Separate on Change – If identifying data appears on all pages and the documents should be burst when data in a Zone changes, also enable the contextual Separate on Change checkbox for that Zone.

    • Group – You can group together Zones so that the Template must match all of the Zones in the Group and be found on the same page. This makes it possible to add “OR” logic to separation when using Header or Footer groups. Assign a Zone to a group by number. Then, if there is more than one group of Zones configured for separation, the data extracted from the Zones belonging to the group that has the highest numerical value in the Group setting will be evaluated first. If all Zones in that group match a page, that Header or Footer group is used to separate the document on that page. If all Zones in the group do not match, the group with the second-highest Group numerical value is evaluated, and so on until a matching header or footer group is found. If none of the defined Header or Footer groups matches a page, the document will not be separated on that page.

  • Confidence – When OCR engines extract data there can be a margin of error in the interpretation. You can set the minimum accuracy for data extracted using the Confidence slider. Move it to the right to increase the threshold and to the left to decrease it. By default, it is set to 0%, which allows all results, including no data at all.

    Data will only extract if the average confidence for the words within the extracted data are above the set confidence threshold. The ideal level will be high enough to allow for acceptable accuracy in your Workflow automation while not being so high as to cause a lot of batch errors that users will have to correct manually. Around 80% to 85% is usually a good threshold to start with, unless it is likely that there will be no data in the Zone to extract, then leave it at 0%.

  • Priority – In some situations, two or more Zones could extract data for the same Field. To resolve extraction conflicts, enter a number in the Priority text box that appears if Separator is not enabled. The data extracted from the Zone which has the largest Priority numerical value will be retained. If the Zones have the same Priority setting, then the data extracted with the higher Confidence setting will be retained. This can be optionally configured for any Zone type.

  • Header – Enable the Header checkbox to assign the Zone as a header for a repeating or separator Zone. This can be assigned to go across multiple pages, from the same location on each page. Use headers to indicate the start of a document, thus allowing for separation on the start of a new document. Headers may also be used to trigger the start of a table of data, and are particularly relevant when attempting to extract tables of data across multiple pages. Note that if no header Zones are defined for a repeating Zone, then searching will continue at the edge of the following pages.

  • Footer – Enable the Footer checkbox to assign the Zone as a footer for a repeating or separator Zone. This can be assigned to go across multiple pages. Use a footer to indicate the last page of a document when separating, or to indicate the stopping point for table extraction when using repeating Zones.

Position Zone Properties

Insert excerpt
Position Zone Properties
Position Zone Properties
nopaneltrue

Back to top

Limits Zone Properties

Insert excerpt
Limits Zone Properties
Limits Zone Properties
nopaneltrue

Back to top

Barcode Zone Property

Insert excerpt
Barcode Zone Properties
Barcode Zone Properties
nopaneltrue

Back to top

title
Tip

Plan and Test Marker Zones

Choose and Test Marker Text Carefully. When selecting text for your Marker, choose text which is most likely to return accurate OCR results. For example, while the Marker text is not case sensitive, the text you select should be clear and without attributes. If possible, try to avoid characters that can be suspect during OCR, such as L/1, O/0, B/8, S/5, Z/2, and so on.

Use Distinct Markers. It is a good idea to give all your Templates distinct Marker Zones. Use more than one if your Workflow as Templates that are similar but need to classify documents differently.

Use Required Zones for classification. If you are using a Zone in Templates which will be used to selected the correct template for a document, be sure that at least one Zone is configured as “Required.” If all required Zones are not found on a document, then that Template will not be a match for classifying that document. The first template found that matches, or the first template found with no required zones, will be the active template.

Back to top