Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
Hadoop and NoSQL Now Data Warehouse-Worthy: Gartner
Not long ago, the rules for what constituted a data warehouse were fairly well defined. The schema was fixed, you could say, and was based primarily on relational database technology designed to process structured data. My, how times have changed. Last week, Gartner for the first time accepted non-relational technologies–including those based on Hadoop and NoSQL–in its annual Magic Quadrant for Data Warehouses report.
Gartner isn’t the end-all, be-all when it comes to what constitutes a data warehouse. The definition of a data warehouse has been in flux for the past several years, largely as a result of the rise of Hadoop and the capability it provides customers to process and store large amounts of structured and unstructured data on low-cost commodity hardware. For a barometer of Hadoop’s disruptive influence on this space, one needs to look no further than Teradata, which suffered a sales decline on traditional data warehouse technologies, largely attributable to Hadoop.
Despite the hype, Hadoop’s bark is worse than its bite when it comes to market impact, Gartner says. “Distributed processing on commodity clusters (that is, Hadoop, NoSQL, NewSQL) created more confusion than revenue, with only a fraction of revenue actually going toward newly emergent vendors compared with the value of the overall market,” the analyst firm says in its report. “Clearly, while some organizations were shifting to experiment with new approaches, others were simply vacillating amid indecision.”
The shift to include non-relational data warehouse products opens Gartner’s report to three new vendors, including Cloudera, MarkLogic, and Amazon Web Services, the first two of which are in the lower left “Niche Players” quadrant, while AWS is in the upper left “Challegers” quadrant. Teradata, by the way, remained the king of the data warehouse hill, with the best rating in the upper right “Leaders” quadrant, along with other established giants like Oracle, IBM, Microsoft, SAP, and HP, which crawled up from the “Visionaries” quadrant to sit with the big dogs.
The data warehouse market is in a state of flux, and will be for some time, as established vendors move to adopt some of the new Hadoop and NoSQL/NewSQL tricks, and the offerings from upstart vendors continue to mature. Gartner notes that most of the data warehouse establishment has already adopted new engines, primarily HDFS.
“Over the next two to three years, the market will witness the fruition of the struggle…in which the industry will see all sorts of competing architectures and strategies for addressing data management for analytics,” Gartner says. “Acquisitions and business failures among new vendors will result in two or three emerging as viable companies in analytics data management.”
Few of the upstart data warehouse vendors will survive past 2016, and will either be eaten by bigger companies or “burned away” in the fiery forge that is the Trough of Disillusionment in Gartner’s Hype Cycle.
Gartner bases this slightly cynical view on its assessment of early Hadoop deployments, which it says were very large but were maintained manually. “Implementing organizations began to realize the true cost of these manually managed and maintained clusters and it is now becoming clear that this shift was really a return to the previous personnel- and skills-based model–and has sustainability issues,” it writes.
Enterprise software is a rough and tumble game, and the best technology does not always win. The folks at Cloudera seem to understand this, which is why they have taken a more aggressive business stance. That doesn’t sit well with their fellow members of the Hadoop community, but as a survival tactic it makes sense.
Gartner says Cloudera’s advantages include a capability to analyze all types of data, a broad feature set (including prepackaged machine learning libraries and connections to SAS and R for statistical analysis), and good customer reviews (it has about 1,000 customers, Gartner says). Challenges include threats from the rich and powerful megavendors; perhaps too much diversity across shrink-wrapped and service-based delivery models; and security, functionality, and skills issues.
NoSQL vendor MarkLogic (at age 13, practically an old man in the big data world) was hailed by Gartner for its ACID compliant transactions; how it stores indexes alongside data; the capability to read HDFS data and emulate MapReduce-style processing; a broad customer base; its scalability; and its flexibility to go schema-less or treat data as schemaless with its sematic (triple store) capabilities.
A low customer count for MarkLogic may be reason for caution (or, maybe not, Gartner says). It also wonders whether the semantic approach that MarkLogic is now offering will translate into growth. The company’s database also requires a considerable amount of skills to use, and those skills can be hard to find in the market, Gartner says.
AWS made the Quadrant for three offerings, including Redshift, AWS Data Pipeline, and Elastic MapReduce (EMR). AWS had the highest customer satisfaction of any vendor in the report, Gartner says. Not surprisingly, strengths of AWS’ approach are fast deployments, low cost, and flexibility. Red flags for AWS are a lack of advanced functionality (such as support for stored procedures and integrity constraints in Redshift); awkwardness at mixing cloud and on-prem data (which isn’t really Amazon’s fault); and a traditional view of data warehousing that may put it in competition with traditional vendors.
Returning to the list this year are Kognitio and Actian, the latter of which jumped from the “Niche Players” quadrant into the “Visionaries” quadrant, largely due to its acquisition of ParAccel (which disappeared from the list). InfiniDB (formerly Calpont), Exasol, and Infobright remained in the “Niche Players” quadrant, while 1010data remained in the “Challengers” quadrant.
Hadoop vendor Pivotal was also added to this list, but it basically replaced EMC, which it was spun out of. Other big data vendors that didn’t make the cut this year but were mentioned by Gartner clients include Hortonworks, MapR Technologies, MongoDB, BMMsoft, Hitachi, Objectivity, ParStream, RainStor, and XtremeData.