ParAccel Lifts Hood on CARFAX Overhaul
ParAccel is one of only a handful of analytical platform and database vendors that haven’t been snatched up in the big data analytics free for all that pushed big purchases from HP, IBM, Oracle and others.
Today the company announced a new Hadoop connector, along with a fresh crop of user stories that include a web services company (Evernote) and healthcare information company, Alliance Health Networks, which chose ParAccel in place Hadoop to handle its big diverse data that connects patients who share similar conditions.
In advance of the announcement, we had an extended conversation with the company’s Chief Operations Officer, Paul Zolfaghari. While the connector story is noteworthy for the company’s overall strategy, we wanted to back up and better understand how companies contending with large, complex datasets are hitting a performance wall with traditional database and appliance offerings from titans, as the more general role of high performance hardware for analytical database customers.
To sharpen the lens on ParAccel’s approach, we can look at their recent overhaul at CARFAX. The vehicle history giant houses over 10 billion records, the result data that’s been snapped up from other 34,000 data sources, including all U.S. and Canadian vehicle agencies, most auto auctions, police and fire departments, collision repair facilities, rental agencies.. This granular vehicle history data is then turned around quickly to suit consumers and dealers—a task that requires a heavy-duty database and some high performing hardware—that is, if the customers are hoping to get their data in a reasonable amount of time.
The problem, however, is that with those roughly 10 billion records, CARFAX hit a performance and scalability wall. The company wasn’t delivering on its SLAs, says Zolfaghari, which meant a serious overhaul of their approach to handling and processing data. This meant that they and had to look outside its legacy Oracle databases to scale with increasing data volumes, data complexity, and overall speed of data delivery.
As with any large-scale analytics installation, this wasn’t a simple matter of finding one problem and plugging in a new component solution. Even if it was that simple, there were some stiff requirements that wouldn’t have lent to the use of an appliance or some other database solutions.
First, CARFAX told the several companies scrambling for the contract (among which was Vertica—Zolfaghari was mum on the others) that they needed at least a 10x performance gain to meet its SLAs. Further, they stipulated that whatever they stuck with had to tap commodity hardware so they wouldn’t have another scale wall. Also, they demanded ecosystem friendliness–the solution would have to operate with ease inside the existing environment (an Informatica and Cognos blend).
Zolfaghari says that the challenges that CARFAX faced when it realized it wasn’t able to scale with the growth of data mirror the challenges others with strained, legacy operations are trying address. He says that all the traditional database and management systems that handle mixed workloads are designed for transactional, TLTP-type activity and were never built with analytics in mind.
The vehicle history company vetted through several companies, including Vertica, before settling on ParAccel. He says that on the simples operations, CARFAX got a greater than 10x performance boost, but for those tough, complex problems (where it really needed something that went beyond a traditional database) he says they got a 240x performance increase, meaning that some of the serious crunching that used to take days cooked in hours.
Zolfaghari says that what this represents is how having a “large-scale legacy relationship as with the IBMs and Oracles of the world means that as your data volumes and the complexity of the questions you’re asking of it grow, the legacy technologies stop stretching.” He says that this is why his company’s platform, which is “massively parallel, columnar and hardware agnostic” is finding converts.
With so many similar analytical platforms finding a home in the open arms of industry titans (as was the case with Vertica, for instance) one has to wonder how ParAccel has maintained its independence. Zolfaghari says that their lack of acquisition power means that they are a standalone “best of breed” company. He told us that “if you look across the enterprise software industry, there are always a few standalones that represent the best—the companies offering something that really matters.” In this case, he says what “matters” is their integration and openness. He points to how ParAccel’s ability to work with “all the BI tools, all the ETL tools and all the hardware backends” is what gives them their zing.
His contention is that while the Oracles of the world and appliance vendors are trending with their Exadata-like platforms, they are overlooking something important that he says customers are clamoring for—the ability to choose their own hardware and software; to run on commodity systems that can scale across the board when data and business grow.
The CARFAX example does a little more than highlight the general flaw of legacy systems as data complexity and size grows unchecked, says Zolfaghari. He says that on a more granular level, it shows how some ultra-complex algorithms respond to columnar, MPP approaches as in the case of fraud detection.
Really, what CARFAX is doing on some levels is some big data fraud detective work. They’re taking a look at data across the 10 billion records and their algorithms crawls through to determine if a vehicle’s history is accurate—a core part of the “certification” service that’s central to their business model. This requires combining two large tables, one which is something on the order of 70 million records, another that’s in the 220 million record range—and steadily growing.
He says that when Dell or others come out with a new higher-core, high-memory server, users want the flexibility to tap those benefits right away. This isn’t possible with a proprietary hardware/software package.
The venture-backed company, which was founded in 2005, seems to have a clear story for large-scale analytics customers, but only time will tell how that message plays out against the din from the titans and the appeal of open source frameworks.