Applied Solutions 1

Planning a route to location intelligence

Geospatial capabilities create new opportunities for enriched analytics.

Traditional location data—customer address, shipping address, sales region, etc.—has long been part of data models. While database platforms have offered powerful business analytics based on these postal locations, much more accurate geographic location data is now easily available with Global Positioning System (GPS) technology.

The ability to capture geospatial data across almost every aspect of business should prompt companies to re-evaluate their analytics and processes to find ways to leverage this capability. With Teradata 13, geospatial is brought to the enterprise data warehouse, and powerful location analytics expands out of specialty applications and becomes available throughout the organization.


Click to enlarge

New opportunities

The Teradata geospatial feature is based on the SQL/MM standard and includes a new geometric data type, along with a library of spatial functions. The ST_GEOMETRY type supports a large spectrum of shapes, from simple points, lines and curves to complex polygons. This allows representation of real-world objects in the database, such as:

  • Locations
  • Streets and railroad tracks
  • Rivers and lakes
  • Neighborhoods, towns and cities

In essence, any shape that supports your particular business, such as a complex sales territory, can be identified by the ST_GEOMETRY type.

Beyond capturing point locations, a new level of analysis can be achieved as third-party geospatial reference data is added alongside your business data. For instance, insurance companies can add flood-zone maps to their data models to help analyze insurance policy risk, and retail companies can integrate demographic data to target customers.

Adding geospatial data

Location information already exists in data models. Generally, tables that capture addresses are candidates for adding the ST_GEOMETRY field. The following are examples of how geospatial data is added to the database using SQL. (Click here for a database script for these examples.)

In the first example, three tables are created to represent stores, customers and competitors. A minimum set of fields captures entity ID and name with location as a geospatial field:

CREATE TABLE customers (
name VARCHAR(50),
location ST_GEOMETRY )

name VARCHAR(50),
location ST_GEOMETRY )

CREATE TABLE competitors (
name VARCHAR(50),
location ST_GEOMETRY )

In the next step, known as geocoding, the street addresses are translated into longitude and latitude coordinates. These coordinates are used to create the SQL insert statements. Geocoding can be done manually with mapping tools or through services from an outside vendor, such as Google Earth:

INSERT INTO Stores (id,name,location)
VALUES (1,’Store #1’,’Point(-82.440 27.997)’);

INSERT INTO Competitors
(id,name,location) VALUES (1,’Competitor #1’,’Point(-81.300 28.577)’);

INSERT INTO Customers (id,location)
VALUES (1,’POINT (-82.218 28.163)’);

Geospatial analytics

With this data loaded in the database, value is added to the existing business intelligence (BI) analysis as new ways to harness the power of location intelligence (LI) are understood. Generally, capitalizing on the benefits of geographic location data is accomplished by starting with the basic extension of existing location-based queries and working up the ladder to answer questions like, “How many customers are within three miles of our stores?” and “What is the average distance from each store to our customers?” The paramount advantage comes with integrating location with business data to answer questions like, “What’s the average purchase amount for customers from a selected area?”

Points and Distatance

The Teradata geospatial library of spatial functions includes three distance methods: ST_Distance, ST_SphericalDistance and ST_SpheroidalDistance. With geolocation coordinates, a much more detailed analysis between customer and store location can be calculated than with only ZIP codes. For example, this query can be run to find customers within a certain distance from the stores:

FROM customers C, stores S
WHERE C.location.ST_Distance
(S.location) < [X units]

Though simplified, analyses using these types of queries are easy to use and understand. However, these queries have some drawbacks. For instance, it can be confusing as to which units of measurement—miles, meters or kilometers—are used for the comparison. When using longitude and latitude coordinates, some conversions from degree units to meters and miles must be done. While this is not overly complicated, it does call for some additional work. Another drawback is that these types of scalar queries require all rows to be calculated. This could result in full table scans—a process database programmers try to avoid. The next section describes a better approach.

Power of the polygon

Geospatial is much more than location points. Lines, curves and especially polygons add tremendous value. In the customer distance example, a better approach than basing the query on addresses is to use area polygons. These polygons can be created dynamically with each query. This rewrite uses the ST_Buffer method to create a polygon around the store and the ST_Within method to find matching customers (ST_Within returns 1 or 0 for true/false):

SELECT  S.store_id,,
FROM customers C, stores S
WHERE C.location.ST_Within
(S.location.ST_Buffer(x))= 1

This type of query can be optimized so that the database is not required to compare all of the records—something that cannot be avoided with the scalar ST_Distance comparisons. In addition, the ST_Geometry polygon can be added to the data model for future queries, thus reducing the need to recalculate the ST_Buffer polygon each time.

Geospatial regions

Creating and adding regions as polygons to the data model enables using geospatial features to empower better analytics. The data can be from derived objects, such as the previous query, or geometries from third-party sources such as government agencies and commercial vendors on flood zones, census information and marketing demographics. These geospatial polygons can then be used to analyze risk exposure and help make better business decisions.

How an insurance company can use the geospatial functions to estimate a storm’s impact on its existing policy holders is shown below. After adding predicted storm paths to the data model as LineString objects, the ST_Buffer method can populate the regions table with a polygon representing the high-impact zone around the storm’s path. (See figure.)

name VARCHAR(100),

CREATE TABLE Storm_Impact_Zones (
storm_id INTEGER,
INSERT INTO Storms VALUES(1,’Hurricane Geo’,
‘LineString(-76.86 27.01,-77.42 27.13,-78.08 27.31,
-78.66 27.44,-79.11 27.59,-79.48 27.72,-79.89 27.86,
-80.27 28.01,-80.73 28.23,-81.03 28.39,-81.45 28.65,
-81.78 28.92,-82.05 29.14,-82.22 29.36)’);

INSERT INTO Storm_Impact_Zones
SELECT 1,, S.path.ST_Buffer(.15)
FROM Storms S WHERE = 1;

The value .15 gives an estimated 10-mile buffer, using degrees. The distance between degrees of longitude varies depending on where the location is on the globe. The distance between longitude degrees at the equator is approximately 69 miles but reduces to zero as the location nears the poles.

The map highlights the power and value of integrating geospatial data with business data in the EDW. Enabling the query ST_Within to include customer location data, as shown below, will help calculate estimated risk exposure. This type of analysis empowers better business decisions by providing true LI.

FROM Customers C, Storm_Impact_Zones Z
WHERE = 1 AND C.location.ST_Within(Z.area) = 1

From this step, calculating insurance exposure and potential losses is a simple query away.

Data model dimensions

Fully leveraging geospatial analytics is reached when data models are extended with new information based on location dimensions. This can be demonstrated with the previous retail example.

The location of the stores and customers, which has already been added to the database, can be extended by creating polygons to represent zones. These zones will be created based on the company’s requirements. For this demonstration, the “sales territory” regions around each store will be used.

With the regions added to the data model, the analyses for trends—number of customers, average purchase data, etc.—within these zones can be run using SQL as shown:

CREATE TABLE sales_territory (
name VARCHAR(100),

INSERT sales_territory(id,name,area)
values(1,’Store #1 Sales Territory’,
‘Polygon((-81.258 27.713,-81.755 28.317,
-82.665 28.418,-82.736 28.151,-82.362 27.797,
-81.258 27.713))’);

INSERT sales_territory(id,name,area)
values(2,’Store #2 Sales Territory’,
‘Polygon((-81.575 29.105,-81.739 28.298,
-81.262 27.719,-80.450 27.719,-80.627 28.341,
-80.582 28.493,-80.967 29.086,-81.575 29.105))’);

INSERT sales_territory(id,name,area)
values(3,’Store #3 Sales Territory’,
‘Polygon((-82.683 29.136,-82.668 28.420,
-81.739 28.324, -81.583 29.112,-82.683 29.136))’)

In addition, these aggregation values can become the basis to populate new summary tables, enhancing the data model well beyond the original location points. This derived data, calculated in-database, can uncover valuable trend information, which can then be used by BI tools for deeper analysis.

The following SQL represents the use of in-database analytics to create new aggregate data. Reporting can now be done that integrates these geospatial sales territory regions with customer YTD_sales data to create calculated data per region. Then, if summary tables are created within the database to store these results, they would be available for other reporting tools to use in an analysis:

CREATE TABLE customer_territory_stats (
territory_id INTEGER NOT NULL,
customer_count INTEGER,
avg_purchase NUMERIC )

INSERT customer_territory_stats
FROM customers C, sales_territory S, customer_sales Sales
WHERE S.area.ST_Contains(C.location) = 1
AND = Sales.cust_id

These numbers show the results:

territory_id  customer_count  avg_purchase
1 795 851
2 1579 781
3 884 756

The dynamics of geospatial

With advances in GPS technology, geospatial location can now become a valuable extension to data models. The examples have demonstrated how simple location-based queries can bring new value to business analytics.

By including third-party geospatial reference data, integrated location analytics can be easily added to the organization’s decision-support toolbox. This can lead to the creation of new dimensions of data that might uncover new opportunities.

The power of the Teradata geospatial functions can change the way business location data is viewed and will add LI to business decisions.

Your Comment:
Your Rating:

Fuzzy Logix