Processing Results

Step 1 - Extract

ArticulateML functionality is broken down into a few processing steps. The first, which is largely unattended, sends the document to Square 9’s hosted AI extraction services. These cloud hosted services receive document pages from GlobalCapture and performs the OCR steps. The OCR result is returned back to GlobalCapture, where that data can be parsed and set back to header or table fields (or both) in the active capture Process.

The Extract step is initiated by the ArticulateML Parser Node:

image-20240328-124732.png

Using this processing node in your workflow with send the document pages to the AI Services ad collect the results.

Step 2 - Inspect Response

The second processing step involves parsing the OCR response and setting the correct data into the correct fields. Data passed back from the OCR services is in JSON format. If you are unfamiliar with JSON, we would strongly encourage you to understand the basics. It will help significantly with general understanding of the OCR response and how to parse it. A basic introduction to JSON and how it is intended to work can be found here.

Results returned by the ArticulateML process will always be in a consistent layout. Refer to the Response Syntax documentation to learn more about the format of any response coming back from ArticulateML.

OCR responses are cached on the GlobalCapture server. Future versions of Capture will support a more streamlined interface for presenting and working with response output. For the initial release, response output will need to be collected manually for inspection from the CaptureProcessing directory.

The capture processing folder for a given process will include both the AI derived JSON response, as well as a base64 encoded file containing the page data for ever page inspected in the process. This output can be helpful for support and troubleshooting efforts, in addition to the initial setup of profiles for a given document format.

aiResponse_XXX.json is the AI OCR output, where XXX represents the page of the file.

encodedFilePage_XXX.txt is the base64 encoded page data, again where XXX represents the page.