Applied Solutions

Exceed Your XML Expectations

A new database release offers improved capabilities for the popular data formatting language.

The most popular data formatting language in use today is the eXtensible Markup Language, better known as XML. Primarily used for application integration to format messages being passed between applications, it also serves as a document format for storing data.

In either case, XML adds structure and metadata, making the data more robust and self-descriptive. Its popularity has resulted in an abundance of XML-formatted data that needs to be stored and analyzed in databases. In response, Teradata® Database 14.10 supports XML processing in several areas. (See figure.)
The capabilities include:

Native Storage

The latest release of the Teradata Database provides a data type named XML that can be used for storing XML values. These values are housed in the database in a compact binary representation that is smaller in size than the original string representation, resulting in significant space savings. XML values can be made more compact through algorithmic compression (ALC). Common processing operations such as validation, query and transformation are all supported on XML data type instances.

From a string representation of an XML document, users can construct an XML type instance using the new XML constructor or the CREATEXML function. Tables can be created with up to 15 XML type columns, which can hold documents that are each up to 2GB. Several methods are available that allow for validation, transformation and query using XML Query Language (XQuery).

XQuery for Access

XQuery is the standard query language for XML. The implementation provided by the Teradata Database supports XQuery 1.0 with minimal conformance. XQuery is used in multiple ways in the database. The XMLEXTRACT method can evaluate an XQuery expression and return the results as XML data type instances.

The EXISTSNODE method indicates whether an XML value contains a node matching an XQuery expression. The XMLEXTRACT method and the ANSI SQL/XML functions XMLQUERY and XMLTABLE also use XQuery to extract information from XML data type instances.

Parsing, Validation and Transformation

The XML type supports common processing operations, including parsing, validation and transformation. String representations of XML values are parsed to create XML type instances. These values can be validated against an XML schema. Either the method ISSCHEMAVALID or the SQL/XML function XMLVALIDATE can be used for validation. XML values can be transformed by applying XSLT stylesheets using the XSLTTRANSFORM method.

Shred Documents

Shredding is the process of extracting data values from XML documents to populate tables in the database. XML shredding capabilities in Teradata Database 14.10 have the advantage of allowing users to specify declaratively how the values are mapped to the database tables.

Teradata Database 14.10 provides users with a number of options for storing, processing and analyzing XML data.

The benefit of the declarative approach—which lets users indicate what they want to happen—is that it’s easier to implement than the procedural approach that requires users to specify the details of how something should happen, such as by writing code. The declarative approach also offers the flexibility to handle changes, like in XML and
relational schema.

To shred, users first acquire a schema for the documents they want to shred. An XML schema codifies the rules that a set of documents should follow, such as a purchaseorder.xsd describing the rules that control all purchase order XML documents in an enterprise.

Delivering the Goods

Teradata Database 14.10 supports XML in several ways by offering:

  • The XML data type
  • The XQuery query language
  • XML shredding (automated extraction of data from XML documents)
  • XML publishing (formatting query results
    as XML documents)
  • XML schema support for validating documents
  • XSLT support for transforming documents
  • ANSI SQL/XML operator support

Users then annotate this schema with instructions on how the data from document elements and attributes will map to tables in the database. The user documentation in Teradata Database 14.10 contains details of how this mapping is defined, along with examples. Once these annotations are made, the schema, along with the items to be shredded, is provided to the database stored procedures that shred the documents.

Publishing to Create Documents

Publishing is used to generate XML formatted documents from data in the relational database. Users write an SQL query based on the results they want published in XML format. Similar to the process for shredding, users then create mapping—from relational to XML—as an XSLT stylesheet.

In this mapping, users can perform simple operations such as grouping and ordering to rearrange the XML tree to fit their requirements. XML publishing stored procedures are used with the mapping and the original SQL query to produce XML formatted results.

ANSI Operators

The ANSI SQL-2008 specification lists several functions that perform operations such as XML construction and query within the context of SQL queries. The construction functions include XMLELEMENT, XMLPI, XMLCOMMENT and others that create different types of XML nodes.

Functions like XMLCONCAT, XMLFOREST and XMLAGG are used for composing more complex structures. XMLQUERY is a function that can be used to evaluate XQuery queries within an SQL query. XMLTABLE is a table function used to turn an XML tree into rows and columns.

Range of Options and Capabilities

With the XML data type along with related functions and stored procedures, Teradata Database 14.10 provides users with a number of options for storing, processing and analyzing XML data. Users can store XML documents in columns of XML data type to achieve compact storage that preserves their identity.

Alternately, users have the option to shred documents and extract the data values for database tables to enable superior query performance. Implementations of standards such as XQuery, XML schema and XSLT are included in this database release for querying and processing. These features, combined with XML publishing, provide a wide spectrum of capabilities for working with XML data in the Teradata environment. 

Srinivas Pandrangi is the architect of Teradata XML Services. He has more than 14 years of experience implementing industry standards related to XML.

Your Comment:
Your Rating:

Fuzzy Logix