Expression Exercise
In this exercise, we will attempt to illustrate just a sampling of some of the powerful options available to capture AI extraction results.
Setup
Start by opening a new browser window and navigating to https://try.jsonata.org. The page is broken down into 3 main sections. The left-side panel contains the JSON being inspected. The right top panel is the JSONata expression. The right bottom panel is the JSON output, based on the expression and JSON provided.
Basic Usage
The default page setup includes some JSON data to get started, and a sample expression as well.
$sum(Account.Order.Product.(Price * Quantity))
You will notice that a basic expression will resemble a spreadsheet formula.
The dollar sign ($) implies a function is being called, in this case, a built in function called sum. As the name implies, the sum function will provide a mathematical sum of the data points within parentheses.
Account.Order.Product is referring to the JSON in the left panel. If you explore that JSON, you will see a tree of data, logically broken down by Account (in this case there is only one), Order (of which there are two), and Product.
By expressing (Price * Quantity) within parentheses, mathematical order of operations will apply and the value returned by the inner expression will be the product of that operation, and this will apply for each item in the set.
By using the sum function against that inner expressions, you wind up adding all of the Price * Quantity operations found, and producing a total amount of 336.36. In the example case, we are operating against the following:
Product | Price | Quantity |
---|---|---|
Bowler Hat | 34.45 | 2 |
Trilby Hat | 21.67 | 1 |
Bowler Hat | 34.45 | 4 |
Cloak | 107.99 | 1 |
Deconstruct expressions to see their behaviors
It can be helpful to deconstruct expressions to better understand how they work.
In the example above, remove “$sum(“ from the start of the expression, and remove the last “)” from the end of it. You will notice the output changes significantly from a single value to an array of values.
We now see expressed output for each product element.
To understand specifically which items are being considered in the mathematical formula, remove that part of the expression. Remove “.(Price * Quantity)” from the expression and observe the output.
It should now be apparent which elements the (Price * Quantity) operation is working against, and further, the values being calculated by the original $sum operation.
Square 9 Sample Setup
The default example provided is great for exploring basic functionality, but it’s not consistent with the JSON format provided by FTE. Let try working with sample output that matches the FTE response syntax.
Clear the expression in the top right panel.
Clear the JSON in the left panel.
To clear the panels, simply select all (Ctrl+A) and hit the delete key.
Browse to the expression exercise sample data, and copy / paste it into the JSONata Exerciser’s JSON input panel (left side panel). You can now write expressions that should help you better understand Square 9’s specific implementation of this technology.
Parsing Headers Versus Tables
Data is returned in two discrete sections of a singular JSON response. Header / Footer data (also known as Key / Value pairs) is returned in an element aptly name “header_footer”. This section represents and array of identified keys and values on a page. When you compare the data in this section to the source image, you can quickly identify matching keys and values on the source image to better understand what is found and where it exists.
In the JSONata expression panel (top right), type the text “header_footer”. You will immediately notice the JSON output changes, and you now see an array of objects that is reflective of the header_footer section alone.
Tables are more deeply nested due to their complex nature, but targeting the tables section is no different.
In the JSONata expression panel (top right), type the text “tables”. You will notice the JSON output changes, this time showing you only an array of table objects.
Tables are more complex in that a single page may contain multiple tables. Each table is assigned its own index, starting with zero.
You can target a specific table in a number of ways:
Specify the index of the table in square brackets: tables[0]
Your output JSON will present only the first table.Specify the tab_index value of the table: tables[table.tab_index=0]
The tables object in the response is an array of table objects. Each table object has an element for tab_index, that corresponds to it’s numerical ID. The list of tables and associated id’s always starts at zero.Target a specific table object by altering the expression: tables.table[tab_index=0].
With this expression, you again get only one table, but pay careful attention to the JSON output. The way this example is expressed will result in slightly different output. It’s important to understand completely how your expressed output is formed, as it will impact how you write expressions to map any data points.
Find A Specific Header Value
Write an expression that will return the Due Date.
Filter by header/footer values with the expression: header_footer
Further filter the result by expressing only the key Due Date: header_footer[key='Due Date']
Further filter to get the value: header_footer[key='Due Date'].value
To summarize, the complete expressions finds the header_footer section, filters for a key of Due Date, and outputs the value of that key.
It can be useful to “generalize” your key targets with partial matches and case sensitivity. This is handled with regular expressions. If for example you wanted your lookup to be case insensitive, you would provide a case insensitive pattern. Regex is expressed as key ~> pattern.
header_footer[key~>/due date/i].value would match “Due Date”, “due date”, or “dUe dAtE”.