September 18, 2013

Teradata Gets In Deep with R

Alex Woodie

Because the demand for data scientists has far outstripped supply, software vendors have stepped up to fill the gap. Teradata did its part when it announced today that it has parallelized the full body of R statistical functions from partner Revolution Analytics and made them available in the new 14.10 release of its eponymous database.

Existing Teradata customers are the big winners with the new R package that will become available later this year as a built-in part of the Teradata data warehouse. Customers will not only be able to apply the full library of Revolution Analytic’s R statistical algorithms (specifically the HPA package) against data already stored in their database, but they can be assured that it’s done quickly and accurately, according to Imad Birouty, a program marketing manager at Teradata.

“R, as fantastic as it is, has some shortcomings. In itself, it’s not a parallel language. It’s limited to running on a single server,” Birouty says. “We went ahead and did this extra work [to parallelize R] because we’re focused on this stuff. You’ll hear about other database vendors talk about running R in-database. They’ll talk about parallelism. They’re not doing that. They’re doing node-level. We’re doing true parallelism.”

The difference between running R algorithms at the node level and running them at the cluster level can mean the difference between being right and being wrong.

“Let’s say we have data spread across four servers, and that you can want to figure out what’s the median of house prices in Arizona, for example,” he says. “If you run the median on each server separately, you’re going to a median for data on that server. If you bring the medians back from all four servers and now you try to take the median of the median–that’s bad math. It’s not going to work. You’re going to get a wrong result.

“Whereas with this system parallelism,” Birouty continues, “you can ask the same question, it’ll run out there, look at all the data across all the servers, and bring it back and give you one answer that’s accurate across the entire data set.”

Teradata is considering bringing the R package to bear on its Aster Hadoop appliance, Birouty says. That would give Teradata a one-two combination that enables customers to explore their data with R on Hadoop, and then put any statistical deliverables into production on the Teradata Database proper. It would also give the company a capability similar to what Revolution Analytics unveiled earlier this summer, when it announced that it parallelized its Revolution R Enterprise 7.0 R package to run in Cloudera’s Hadoop distribution.

Teradata 14.10–the first major release of the analytical database since version 14.0 shipped in early 2012–brings several other major new capabilities, including the inclusion of 615 analytical functions from Fuzzy Logix into the Teradata Database; support for new XML data types; and support for temporal and geospatial data.

As part of the deal with Fuzzy Logix, the full breadth of its analytical and statistical functions can run, in parallel, on both the Teradata relational data store, as well as the Aster Hadoop file system. The Fuzzy Logix routines are available as an add-on package, and are accessible as standard SQL, Birouty says.

“Imagine doing things like a moving average, a median mode, very advanced hypothesis testing, financial functions, and time series analysis. All of those are built-in functions that now run deep without our database,” he says. “Together with what Fuzzy brings us, and what we already had, we’re at over 1,000 database functions.”

The new XML functions in 14.10 will be particularly beneficial to Teradata customers in healthcare and financial services, which have standardized on XML for data exchange.

The database has supported the capability to “shred” XML documents for some time. But now, entire XML documents can be stored in a column in the relational data store and queried with XQuery. “We’re making it that much easier for them to store, publish, or hold” XML documents, Birouty says.

The new temporal functions will make the Teradata database more time-aware than it previously was. It gives developers the capability to build applications that can format data by segments of time, without having to write pages of SQL, which is what it would previously have taken. “It’s like the database can go through a time warp and go back in time to say ‘What did things look like on January 1, 2010?’ That’s very hard and very few database have this capability, and fewer are doing it the right way,” Birouty says.

Lastly, the new geospatial indexing capabilities will allow Teradata customers to add location data as another dimension in their databases. This will be particularly useful to customer in the utility and insurance industries, Teradata says. For example, utility companies can use this function to identify and respond to outages by tracking the location of available employees, parts, and equipment. Similarly, property and casualty insurers can use it to develop better storm damage projections.

Related Items:

Putting the “R” Into Hadoop

Teradata to Hold Online Hadoop Developer Conference

Teradata Delivers Industry’s First Flexible, Comprehensive Hadoop Portfolio

Applications: Data Mining, Enterprise Analytics, Predictive Analytics

Technologies: Systems

Sectors: Financial Services, Healthcare

Vendors: Teradata

Tags: Hadoop, R, Teradata

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Teradata Gets In Deep with R

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Teradata Gets In Deep with R

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link