JSON, JavaScript Object Notation

Tech2Tech

Hands On

Monetize the Internet of Things

JSON turns a flood of data into business actions and results.

The world around us is humming with data. No longer are everyday devices isolated islands quietly doing their tasks to make our personal and business lives easier, safer and more efficient. All kinds of devices are now interconnected in the Internet of Things, communicating status, environmental facts, and all sorts of transaction and interaction data. The tremendous volume is overwhelming clutter unless it can be easily processed. By combining this data with other sources in an integrated data warehouse (IDW), it can be successfully stored, organized and analyzed to enrich the business with new revenue streams, valuable services and remarkable operational efficiencies.

JSON has become a de facto standard format for exchanging machine-generated data. JSON, or JavaScript Object Notation, is an open-standard format for human-readable text and often consists of key-value pairs in a hierarchical structure. The Teradata® Database can either shred the data for storage in individual structured columns or natively store it in its complete and original JSON structure, allowing users to process and query the data with
sophisticated SQL.

JSON at Work

Companies can receive supplies orders for a device they manufacture from several channels: the Web, a self-service kiosk in a store or directly from the device in the form of an automatically generated order to refill a supply, like printer toner, that is running low. The device, as well as the other channels, can send all of the information related to the toner order in a JSON file that looks like this:

{
	"customer" : "CustomerName",
	"customer_number" : 375,
	"order_date" : "2014/03/12",
	"items" : 
		[
			{"ID" : 123, "name" : "black toner", "amt" : 1},
			{"ID" : 234, "name" : "blue toner", "amt" : 1}
		]
}

The data can be kept in the JSON format in the IDW by creating an order table with a JSON data type column. Each row contains the information pertinent to one order, but all of the elements are in the JSON structure within the single column rather than in a separate column for each element of the order. The order table can be created with
this statement:

CREATE TABLE orders (order_
num INTEGER, order_data
JSON(250));

Once loaded, the incoming order is stored in a single row in the table. All data about the order is in the order_data column of the single row. This easily accommodates the array of line items included in the order as described in a standard JSON file.

Using the data is simple with an easily read JSONPath extension to Teradata SQL that mirrors how JSON is used in its native JavaScript environment. A “dot notation" identifies the hierarchy and keys in the JSON structure, much like table.column name identifies columns in a table. Several methods on the JSON data type and regular expressions in the JSONPath syntax make powerful queries easy to express. For example, the items ordered by customer 375 within the past 30 days can be found with the following query working directly on the JSON data in the orders table:

SELECT order_data.customer_
number, order_data.
JSONExtractValue('$.
items[*].ID')
FROM orders WHERE order_data.customer_
number = '375' AND
order_data.order_date >
(CURRENT_DATE-30);

Analytical and Data Flexibility

Storing data in its original JSON structure provides advantages inherent in using JSON as a data interchange format. Since it uses key-value pairs and is completely self-describing, it offers the flexibility of “late binding” or schema-less processing. This flexibility is maintained in the integration of JSON multi-structured data into the Teradata Database.

Expect Performance

Many of the standard Teradata® Database performance techniques are just as valid when using JSON data as using other data. For instance, collecting statistics on portions of the JSON data, similar to collecting statistics on a single column within a table, enables effective query planning by the optimizer.

Join indexes (JI) can also be defined with portions of the data in the JSON column, and statistics can be collected on the JI. The JI is used like any other join index. The optimizer will decide whether it is more efficient to access the data in the JI or to scan the JSON text in the base table.

Another physical database design technique is to “shred” a limited amount of the data from the JSON column and duplicate it in individual structured columns. This allows the database to directly access those data elements. For example, if queries on an order often qualify to access data based on customer_number, the user can make a customer_number column of type INTEGER and use it as part of the WHERE clause, or even include it in the table’s primary index.

Shredding and publishing commands are available to extract data elements into columns or to build a JSON file with data from separate structured columns. Using an ELT data acquisition design, JSON data can be loaded into a staging table, then MERGEd INTO the production table while simultaneously shredding key data into structured columns.

With JSON, the data elements and attributes are not defined in the table structure or schema. In the printer toner example, a single 250-character column includes all of the data for an order. The data and structure are defined when a query is run and the database looks at the keys in the key-value pairs within the stored data itself (e.g., order date, items, ID).

Discovering the structure at query time when the database looks at the information is called late binding, schema-less processing or query on read. This enables business flexibility by accommodating dynamic data situations. The information elements included in records can change without a database change. New elements that would require a new column in a traditionally modeled and structured environment could be added simply by inserting a new key-value pair in the next input data loaded into the table. The table itself does not have to change.

Integrating multi-structured JSON data into the Teradata® Database offers new flexibility and enables organizations to monetize the flood of data coming from the Internet of Things and elsewhere.

If new firmware on installed printers or a new printer model was able to include a “required delivery date” in the order information, for example, that data could immediately flow into the IDW with no physical table change. The business user or order processing application developer would only have to reference the new key in a query to understand how urgent an order is and ensure customer satisfaction through on-time deliveries. The data warehouse management, design and testing necessary for the business process change are eliminated. There are even built-in commands to find the keys contained within the data through a query.

This query retrieves the required delivery date along with order information:

SELECT order_data.customer_
number, order_data.
JSONExtractValue('$.
items[*].ID'),
order_data.
required_delivery_date
FROM orders

Rows without the required_delivery_date key-value pair return a NULL for that value. Data in a JSON column can be used along with all of the other information in the database. Additional columns with structured types can be in the same table as the JSON column, or data from the column can be joined with data from other tables. The user has analytical freedom to work with all of the data in the IDW, whether it is semi-structured or structured.

Make the Most of All Data

JSON is a very common data interchange format. It can be stored along with structured data in the Teradata Database and used directly in queries along with the other data in the data warehouse. Integrating multi-structured JSON data into the Teradata Database offers new flexibility and enables organizations to monetize the flood of data coming from the Internet of Things and elsewhere.

Alan Greenspan joined Teradata in 1988 and is the product marketing manager for the Teradata Database and other core software products.

Cameron Lewis has been a Teradata software engineer since 2010. He has extensive experience in Java and JSON.


Your Comment:
  
Your Rating:

Comments
 
This article provides a hands-on example of how to use JSON for a sample use case scenario. Add your JSON use case ideas and plans in a comment so that other readers benefit from a diverse collection of use case ideas along with the code examples and scenario in the article. Alan Greenspan Teradata Product Marketing

5/21/2014 5:13:22 PM
— Anonymous
 
This article provides a hands-on example of how to use JSON for a sample use case scenario. Add your JSON use case ideas and plans in a comment so that other readers benefit from a diverse collection of use case ideas along with the code examples and scenario in the article. Alan Greenspan Teradata Product Marketing

5/21/2014 4:00:36 PM
— Anonymous
Fuzzy Logix