R you ready?

An open-source programming language empowers non-technical users.

If the mysteries of cancer are solved in our lifetime, we might have R to thank. At research centers and bioscience companies worldwide, the open-source programming language R is helping researchers and statisticians who are not computer programmers perform innovative statistical analysis and data visualization. What’s more, R is at the heart of a multitude of applications helping organizations in a variety of industries.

For those without deep programming skills, R packages offer a quick path to sophisticated analysis. By downloading and installing special-interest packages written by university researchers, users can gain the benefits of advanced applications without needing to understand how they were written or how they work. Users skilled in coding can easily modify these packages to perform specialized tasks.

R’s graphics capabilities are also attractive to users. “It’s easy to create sophisticated graphics in R that would require extensive programming efforts in any other language,” says Rob Kabacoff, vice president of research at Management Research Group in Portland, Maine, who uses and writes about R. “With R, you can create graphics exactly the way you want them and present the data visually.”

Compared with languages such as C, Java and Perl, the open-source software is budget-friendly. As part of the Free Software Foundation’s GNU Project, it can be downloaded at no cost. “I often hear from users in Third World countries who couldn’t afford to buy commercial statistics software,” says John Verzani, a professor of mathematics at the College of Staten Island in New York and author of the book “Using R for Introductory Statistics.” He adds, “R is a free download, but it has a wide range of packages available to meet specific analysis needs.”

Academia and beyond

R has long been popular in academic settings. Created in 1996 by statistics professors Ross Ihaka and Robert Gentleman, then of the University of Auckland in New Zealand, R was designed to provide students with an easy-to-use but comprehensive statistical and graphics environment. It is still used by many university statistics departments as well as other academic organizations and is available on platforms such as Windows, Mac and UNIX.

But thanks to its affordability, ease of use and data-visualization capabilities, the programming language has also gained a foothold in finance, biosciences, engineering and commercial enterprises. Data analysts, statisticians, engineers and scientists within university and corporate settings use R to solve complex problems. For example, Gentleman is using R-based software for computational biology applications at the Fred Hutchinson Cancer Research Center in Seattle. R helps him and other researchers analyze large amounts of data from biological experiments while generating new insights.

Tracking the deployment of open-source software is nearly impossible, but proponents suggest there are more than 250,000 users of R. With so many enthusiastic users and a committed community of developers, the breadth of R packages continues to grow. The Comprehensive R Archive Network (CRAN) offers nearly 1,800 specially developed packages.

“R is always under active development,” says Kabacoff. “It’s probably the most comprehensive statistical platform that exists right now. Because there are so many people contributing so many packages for specialized data analysis, it’s always very cutting-edge.”

For those without deep programming skills, R packages offer a path to sophisticated analysis.

Analytic applications

The increasingly sophisticated uses for R include data warehousing and business intelligence (BI) applications. For example, one well-known pharmaceutical firm uses the language to develop new drugs more quickly, while a major media company tracks trends in advertising pricing.

Yet R is not commonly used as a platform for data warehousing; instead, it more often acts as a facilitator for performing focused analysis within a database management system. “R handles everything in memory, so it can be difficult to use it for expansive volumes of data or large data structures,” says Verzani. “R does have solutions to this problem, but S+, R’s commercial cousin, is better designed to handle large data volumes from the ground up.”

Fortunately, R offers numerous interfaces to applications such as Microsoft Excel, products from SAS and SPSS, and SQL, enabling users to have the best of both worlds. “Typically, I see people running their data warehouses in other packages and then using R to perform high-quality, focused analysis,” says Kabacoff.

R is also useful for sandbox applications, in which users analyze limited quantities of detailed data. “R is very big in the analysis of genome data, for example,” says Kabacoff. “It has very sophisticated routines for cluster analysis and tremendous resources for data segmentation. If you’re looking to analyze qualitative data, R is very good at that, too.”

Future of R

As university students graduate and enter the work force, some are taking R along. Although most companies prefer to use commercially developed and supported programming languages for routine data warehousing and BI tasks, R is making inroads at select organizations. Because it is easy to add to existing IT environments, statisticians and analysts use it for special, analytics-intensive projects.

Where the language is less strong, programmers come to the rescue. For example, because R lacks a state-of-the-art graphical user interface (GUI), developers are connecting GUIs with the language’s analytic capabilities. “I’m seeing more attractive, easy-to-use front ends being built for packages that use R as the processing back end,” explains Kabacoff. “For example, the data mining tool kit Rattle [R Analytical Tool To Learn Easily] was developed in Gnome as a front end to R.”

With plenty of prolific and enthusiastic developers, the number of packages for R is expected to grow tremendously. Statisticians and analysts using these packages will find innovative ways to use data to answer their research and business questions. And as organizations become more willing to rely on open-source software for mission-critical tasks, R is poised to become an essential tool for analyzing our complex world.

Your Comment:
Your Rating:

Fuzzy Logix