This Week’s Big Data Big Ten
Welcome to this week’s Big Data Big Ten, which is wrapping up what has seemed like the quickest five days in recent memory–perhaps a sign of the impending summer.
Here at the week ending May 11th, we look back at some of the biggest stories with an eye ahead…and looking ahead, we wanted to mention that we’ll be on site in San Jose this coming week for the GPU Technology Conference.
There are a number of interesting stories that will certainly emerge on the big data, visualization and analytics fronts–we’ll bring them in next Friday’s Datanami Weekly Update.
Let’s move ahead with the top stories of the week, starting with the big bold Hadoop future IDC sees ahead…
Analysts Map the Hadoop Ecosystem
This week IDC announced it delved into the Hadoop ecosystem to dig for trends and insights, with the final estimate that revenues for the worldwide Hadoop and MapReduce software market are considered to be $77 million in 2011 and are expected to grow to $812.8 million in 2016 for a compound annual growth rate of 60.2%.
According to IDC much of the activity is due to the ability to make use of new sources of data, including the social web and applications that are better at collecting and vetting massive datasets to tap in unique ways.
While the forecast is rosy overall, the research group says that there are a number of challenges ahead. For instance, IDC cites scarcity of tools and qualified staff, a factor expected to inhibit growth during the forecast period.
They also say that on the business side, the competition between open source vendors and their closed source counterparts, which may force lower license fees from the latter group, resulting in somewhat slower software revenue growth than would be the case if open source projects did not represent so large a component of this market space.
“The Hadoop and MapReduce market will likely develop along the lines established by the development of the Linux ecosystem,” added Dan Vesset, vice president, Business Analytics Solutions at IDC. “Over the next decade, much of the revenue will be accrued by hardware, applications, and application development and deployment software vendors – both established IT providers and start-up, which in aggregate have raised more than $300 million in venture capital funding.”
SAS Adds High Performance Text Mining
SAS announced this week that it will add high-performance text mining to its powerful in-memory analytics software in the third quarter of 2012. SAS High-Performance Analytics, appliance-ready software for Teradata and EMC Greenplum, performs complex analytics on big data and the addition of text mining will allow more diverse uses of the platform.
SAS High-Performance Analytics, which began shipping last year, will be enhanced in August with new text-mining support to provide companies with insight gained from unstructured data in emails, social media, call center logs, documents and other information.
Unstructured data, accounting for more than 80 percent of today’s data, can be difficult to analyze, often swamping traditional computer architectures. SAS High-Performance Analytics will provide new capabilities to address even the largest text repositories, known as “big data,” revealing hidden relationships within unstructured data.
“Enterprise data assets have grown exponentially in recent years, and these massive amounts of structured and unstructured data require two things to be valuable for decision making,” said Surya Mukherjee, Senior Analyst at Ovum. “First, the data volume must be abstracted from users so that they can analyze very large data sets without having to worry about the plumbing. And second, these insights must be delivered quickly to decision makers before a business opportunity is lost or a risky situation develops into a catastrophe.
“High-performance analytics is the most significant SAS technology advance in at least 10 years,” said SAS CEO Jim Goodnight. “We realized that organizations were accumulating massive amounts of data that could provide answers to questions they could never ask before. The analysis took so long to process, answers were irrelevant by the time the computer spit them out. High-performance analytics provides answers when the information is still useful and leaves time to explore multiple possibilities.”
LexisNexis Adopts Own Big Data Platform
This week LexisNexis Risk Solutions, the company behind the HPCC Systems big data platform, announced it would be moving its insurance solutions over to the HPCC Systems arm.
According to LexisNexis, the platform will be used to help insurers assign premium more accurately, better understand their customers and risk throughout the policy lifecycle, and drive a more profitable book of business.
The company is showing that the platform, which was originally built to handle the complex financial services data requirements internally, is capable of being extended to other industries. In other words, by leveraging their own platform they’re giving it the ultimate vote of confidence since many insurers already rely on the parent company, LexisNexis Risk Solutions.
At the core of the announcement is the proprietary auto insurance loss underwriting database package C.L.U.E. Auto. To put the value of the package in context, LexisNexis says that a recent evaluation of insurance applications found that 20 percent said they had no prior claims. However, when LexisNexis ran these “clean” applicants through C.L.U.E. Auto for the first time, and found 49 percent had prior claims history. The lifetime premium leakage associated with those missed claims totals over $1 million.
“In today’s competitive insurance environment, carriers continue to see increases in unreported claims that can represent millions of premium dollars and places additional risk on their book,” said Bill Madison, SVP and general manager, insurance, LexisNexis.
HPC Leaders Showcase Big Data Appliance
A partnership between STEC, Mellanox, AMAX and Zettaset has produced a new appliance designed to deliver performance boosts for Hadoop users.
The PHAT-Data40G as it’s known, is powered by four STEC 980GB PCIe storage accelerators, Mellanox’s 40GbE interconnect solutions, AMAX’s Intel Xeon E-based servers and Zettaset’s installation, management, and security tools.
Mellanox’s 40GbE solutions provide AMAX’s Intel Xeon E5-2600 based servers and STEC’s 960GB PCIe SSD with leading throughput and latency performance that minimizes response times and delivers the highest of IOPS performance over the lifetime of the device. Powered by Zettaset’s installation and management tools, the PHAT-DATA40G provides customers with what the companies call a “one-stop shop for a performance-leading Hadoop solution.”
According to Jean Shih, President, AMAX. “Data is gold if it’s organized and analyzed correctly, and in highly competitive markets, this translates into reaching more revenue faster and on a larger scale, yet with precise incision. PHAT-DATA40G powers Hadoop with the most powerful and user-friendly engine on the market for this very goal, to give companies a real competitive edge quickly and dynamically using data-driven business intelligence.”
When compared with other Hadoop solutions, Mellanox says that the PHAT-DATA40G delivers a minimum 20 percent performance enhancement, well above existing available solutions. Using Terasort, PHAT-DATA40G is able to sort a 1TB dataset in less than 27 minutes using only 5 machines; 20 percent faster than previously possible. They say this could mean users will be able to upload their datasets into PHAT-DATA40G and immediately source business intelligence information from their data, saving time and money on lengthy experimental deployments.
TIBCO Shines New Light on Spotfire
This week TIBCO announced updates to its Spotfire analytics platform with the release of Version 4.5, which emphasizes data discovery, visualization and collaboration for big data industries.
The company is touting a “dimension-free” data exploration approach to structured and unstructured data with the new release, including enhanced predictive analytics capabilities and the ability to create private-branded Spotfire analytics solutions for the iPad.
According to Lars Bauerle, vice president of product strategy at TIBCO Spotfire, the variety of data and the visualization capabilities are part of what make this release notable. As he said in a statement this week, “We’re now allowing organizations to pull back the curtain on every data type, regardless of volume, variety, velocity or complexity.”
“In order to master big data you need to couple analysis with visualization-based data discovery like Spotfire’s,” said Dr. Matt Hahn, senior vice president and chief technology officer at Accelrys, a leading provider of scientific enterprise R&D software and services to companies that differentiate themselves through scientific innovation. “Working with our partner SCYNEXIS, we support big data analysis within a number of neglected disease research networks, utilizing the powerful visualization of Spotfire in combination with Accelrys scientific analytics. The more data you can visualize and analyze, the greater the potential for ground-breaking innovation. With Spotfire we’ve expanded the boundaries for discovery.”
NEXT — Another Hadoop Marriage…>>
Hortonworks and Kognitio Tie Knot
This week Hadoop software vendor Hortonworks and in-memory data analytics company Kognitio announced a partnership that brings the two companies’ platforms together for more scalable processing of large datasets.
The companies say the partnership is aimed at simplifying the movement of data between multiple, heterogeneous enterprise data systems and Apache Hadoop, which they claim will be faster and more cost-effective.
Kognitio says that its platform stands apart as being one that’s standards-driven, mature platform enables users to query what they want, when they want, no matter how complex, or granular, or how voluminous the data, returning responses in seconds. By leveraging Apache Hadoop, the Kognitio Analytical Platform runs data transformations directly inside Hadoop for maximum scalability.
“Kognitio is making it possible for more companies than ever before to implement in-depth analytics on vast amounts of data via the cloud,” said Eric Baldeschwieler, CTO and co-founder of Hortonworks. “Apache Hadoop gives organizations the power to integrate and refine enterprise data from multiple sources, delivering more insight, and driving more intelligent decisions. Kognitio’s power, coupled with our knowledge, will help enable Hadoop to become the next-generation enterprise data platform and achieve our vision of processing half of the world’s data within the next several years.”
Actuate Climbs Aboard Hadoop Express
Open source business intelligence vendor, Acutate, which makes the BIRT platform, have signed on with Hadoop, specifically the brand of Hadoop Cloudera produces.
The company’s platforms, ActuateOne and BIRT offer access to data stored in Cloudera’s Distribution including Apache Hadoop (CDH) for commercial and development environments using HIVE Query Language (HQL).
In light of the new partnership, BIRT-based information and business intelligence applications are able to natively access Hadoop as a data source for analysis, dashboards, reporting and interactive visualization. BIRT can be used to build data sets or data visualisations that combine Hadoop data with other data sources including SQL databases, XML data, document archives, print streams and flat files.
Actuate says that these capabilities will allow users to build build information applications that facilitate more effective and collaborative decision making and for users to tap into data that is too large to access and interpret using existing database management tools, whether that data is structured, unstructured, device- or activity-generated — for example data from social networks.
ActuateOne provides out-of-the-box access to Apache Hadoop data. The bundled JDBC driver provides access to Hadoop data via the Hive interface using Hive Query Language (HQL). BIRT developers will be able to easily connect to and access massive amounts of data stored in Hadoop and generate BIRT displays such as reports and dashboards using this data.
Zettaset, Hyve Partner on Hadoop OS
This week Zettaset and Hyve Solutions announced a collaboration that creates the first deployment of Hadoop on an enterprise-grade platform and operating system.
“In the Hadoop community, there are a lot of wasted resources that go into figuring out what is the best configuration of hardware, operating system and Hadoop distribution for the best use case for the enterprise. There are a lot of wasted cycles, a lot of headaches and a lot of pain,” said Brian Christian CTO for Zettaset—but these are problems that the partnership aims to solve
Zettaset’s Orchestrator open source Hadoop platform was designed to try to make deploying Hadoop easier and less labor intensive. Zettaset says that companies that use the platform don’t have to rely on an entire stable of IT experts to manage their big data needs. It is also one of the first so-called “self-healing” data management environments within Hadoop that identifies potential system failures and corrects them automatically.
Several of Zettaset’s enterprise Hadoop components are not found in free open-source versions of Hadoop, and automatically maintain the health and welfare of the user’s cluster, making administration of a Hadoop cluster seamless.
The collaboration taps Zettaset’s Hadoop big data management platform called Zettaset Orchestrator running on RHEL 6.2, and for now just offers the solution through Hyve Solutions.
Big Data Analytics Startup Looks to Clouds
Big data analytics startup Nuevora entered the scene this week with an announcement of a first round of funding. The company plans to use the money to develop a suite of cloud-based business-processes-as-a-service (BPaaS) analytics applications.
Nuevora develops business analytics solutions for major corporations in retail, financial services, insurance, high technology, and travel services markets. In doing so over the past few years, the company has built a proven analytics platform and targeted analytics solutions that can be delivered as Business-Processes-as-a-Service (BPaaS) via the cloud.
“Just as software-as-a-service (SaaS) disrupted the traditional enterprise software business models over the past decade, Nuevora envisions a new wave of business processes-as-a-service (BPaaS) applications taking root in major corporations that drive smarter decisions through real-time and continuous analytics,” said Phani Nagarjuna, founder & CEO of Nuevora. “It’s clearly becoming an apps-driven world, whether at the consumer level or within the corporates and we intend to be among the leaders in the new BPaaS market space. This new funding will be instrumental in helping us execute that strategy.”
Phani noted that innovative BPaaS business models will combine the benefits of globalization inherent in the KPO models with those of scalability available through SaaS.