Loading...
Sid Adelman, principal of Sid Adelman and Associates

You need your data, your applications and your service level agreements to conduct a proper benchmark, says Sid Adelman, principal of Sid Adelman and Associates.

Viewpoints

Perspective

Make the
Most of Your Benchmark

Avoid these 9 ploys vendors use to tilt the playing field in their favor.

When choosing a data warehouse appliance, organizations have several options, and they can’t afford to make a bad decision. One effective way to test-drive the options in an actual environment is through benchmarking.

By using real-world data, applications and service level agreements (SLAs), benchmarks provide assurances that the system will scale to your intended volumes of data, complex­ity of workload and number of concurrent users. The purpose is to determine the best decision and to reduce the risk associated with that decision.

Benchmarking Right

The right way to execute benchmark is with your data, your applications and your SLAs.

Benchmarks can serve as an insurance policy to help guarantee that the system will:

  • Perform as expected
  • Handle the volumes of data for your target environment
  • Support your specific workload and concurrent use requirements

—S.A.

It is, therefore, critical to establish fair and objective requirements and measurements that do not favor one vendor over another. To yield comparable and defensible results from the benchmark, vendors should play by the same rules. Each should work with the same machine power, lead time, implementation time and level of data quality.

On Guard

However, vendors may attempt to change, skirt, interpret or ignore the rules to skew the results in their favor. Look out for these ploys when a vendor suggests:

Don’t waste your time; we’ll take care of the benchmark and let you know the results. You should be involved in all phases, including the design of the databases, the initial database loading, the monitoring, administrative activities and any tuning that may be required. You want to see how much effort is involved in these processes and how much your people will need to know to make the system work. Some might suggest that you don’t want to see what’s going on in the back room, but in this case, you do.

You don’t need to run with your pro­jected volumes of data. You’ll be able to extrapolate from the 10% we are sug­gesting. Some performance projections are linear, but many are not. Don’t assume linear performance. One of your primary reasons for considering an appliance is its speed accessing large amounts of data. Run the benchmark with your entire projected data volumes.

We will tune the system to give you the best performance. A database that’s tuned to perform well for known queries may be tuned terribly for others. Each vendor should fully disclose its tuning efforts, including use of indexes, partitioning, caching, ordering of data, rewriting queries, creating summary tables and managing workloads.

The important part of the extract, trans­form and load (ETL) process is the load, so that’s what we will be measuring. Appliance performance is not just the load time. The entire ETL set of metrics should be included. The extract, sorting, splitting and pre-processing, as well as the index builds and summary table creation, will be a major portion of the ETL elapsed time. Be sure that these are all included.

You don’t need to include all of your data. We can generate data that will reflect your real environment. It is likely that your real data has some interesting and challenging outliers. These skews can cause performance problems for some systems. Use as much of your real production data as possible.

We will take care of the hardware configuration. Be sure that the configura­tion running the benchmark is the same as the one you are considering purchasing. If you’re going through the effort of running the tests, run them on the configuration you are considering.

Play Fair with the Vendors

Another practice that can undermine the benchmark process is often suggested by customers, rather than vendors: We won’t give you any time to prepare. In this type of benchmark, sometimes called a “black bag” benchmark, cus­tomers show up with their tapes of data and expect the vendor to load data, run queries and demonstrate performance. Vendors receive no lead time to deal with bugs, data-quality problems, bad SQL or issues involving tool integration. This is a useless exercise in which the customer learns nothing.

—S.A.

Let’s run with a few of your queries. We don’t need to run them all. Even if your primary plan for the appliance is full-table scans, there will be other activity, so be sure to include those other queries in the benchmark. Include as much of your projected workload as possible.

The most effective way to measure query performance is to run a single query at a time. This is not true. Systems behave very differently as concurrent workload increases. And the mechanisms for handling concurrent workloads vary considerably with significant implications for the variability and predictability of end-user response time. The benchmark should be a measure of “queries in flight,” not queries that are queued, waiting their turn.

We will provide you with throughput numbers to measure our results Throughput is the amount of work completed in a given period of time—for example, the number of queries per hour. Throughput is an effective measure for batch systems but not for systems with end users. It is not a reflection of response time as it does not account for the time that submitted queries remain in queues.

Just One Aspect

Even with a well-designed, executed and measured benchmark, this should be only one part of the evaluation process. Be sure to assess benchmark results in the context of in-depth reference investigations and other indicators of a vendor’s capabilities and track record.


Your Comment:
  
Your Rating:

Comments