Follow Datanami:
May 3, 2013

Big Data Big Five

Isaac Lopez

Hadoop distro vendors stole the big headlines this week when Cloudera released Impala, and MapR dropped its M7 edition on the market, but a lot of other worthwhile news took place this week. Please keep your hands inside the cart at all times as we tour through the Big Data Big Five this week.


IBM Intros MessageSight for Internet of Things

The so-called Internet of Things is said to be the real data tsunami of the future, which is expected to rise the worldwide data tide to the Brontobyte level (that’s a 1 followed by 27 zeroes). IBM announced this week that they will be getting in on the action with their own appliance aimed at helping organizations manage this flood of data, where mobile devices and sensors are doing most of the talking.

The new platform, dubbed IBM MessageSight, is said to enable large volumes of events to be processed in near real time, and capable of supporting one million concurrent sensors or smart devices at a time, and scale up to thirteen million messages per second.

Relevant to this ability is the foundational MQTT, a machine-to-machine telemetry transport connectivity protocol designed as a simple and lightweight publish/subscribe messaging transport. The protocol has been around for over a decade (Eurotech has implanted it in many of their products), but is now being shoe-horned in as an Internet of Things chauffer.  MQTT has recently been proposed as an OASIS standard.

IBM expects MessageSight to gain acceptance by such entities as governments and organizations looking to connect and infuse intelligence into cities, and across industries as automotive, healthcare and finance.

NEXT – Precog Goes GA – – >

NoSQL Analytics Startup, Precog, Launches into GA

Startup NoSQL Analytics vendor, Precog, announced that their Precog analytics platform is ready for prime time. Touted as an analytics platform that bypasses ETL and allows companies leveraging non-relations operational data stores to perform analytics directly on their data, Precog is said to be designed for integration into a company’s existing NoSQL platform and pick up where it leaves off.

“NoSQL databases such as MongoDB, Riak, Couchbase, and Cloudant have experienced a surge in popularity over the past few years, with companies of all sizes adopting the technologies to store NoSQL data,” said the company in their release. “As operational databases, these solutions are not designed to perform analytics, and do not offer analytic capabilities beyond simple aggregation.”

The company says that they have formalized pricing for its cloud offering, starting at $500 on the low-end (10GB of data on SSDs), and $5,000 on the high-end (up to 500 GB). The company says they are also offering deployment options that include a virtual and hardware applicance for enterprise customers with prices that range from $5k to $60k.

NEXT – SAS Launches Suite of In-Memory Analytics Tools – – >

SAS Announces New High Performance Analytics Tools

Business Analytics/Intelligence vendor, SAS announced this week that they are releasing six new high-performance analytics products this summer, aimed at delivering the speed and flexibility of in-memory big data analytics. The six new analytic products focus on an array of analytic processes including analytic technique, data mining, text mining, optimization, forecasting, statistics and econometrics.

Additionally, SAS says that the new analytic products will include high-performance procedures to streamline data preparation and modification in order to decrease the time analyst spend preparing analytical modeling – an activity said to take the majority of allocated project time.

The new software packages will operate in MPP environments, distributing in-memory tasks across server blades, and will be configurable for Teradata, Greenplum/Pivotal, Hadoop, and Oracle.

NEXT – Skytree Passes “Go” – – >

Skytree Passes Go, Collects $18 Million in Funding Round

Machine learning company, Skytree announced this week that they have closed $18 million in Series A funding with U.S. Venture Partners (USVP) and an investor syndicate that includes UPS and Scott McNealy.

CEO, Martin Hack told Datanami that they realized last year that with the customer uptick they were seeing that they would need a Series A, and that the demand for it was so enormous that they were completely oversubscribed in terms of the people that wanted to invest, but settled on USVP and Scott McNealy due to their successful track record. Hack noted that he shares Sun roots with McNealy.

“The main area that we are going to be focusing on is going to be sales and engineering,” said Hack who noted that there is a hiring crunch happening right now which likened in similarity to 1999 & 2000 when companies were trying to get engineers by any means necessary. “We are trying to recruit pretty heavily on the engineering front, spec’ing out sales to make sure that we have the coverage there, and then, of course, being aggressive in the market in going after the areas that we’re focusing on.”

Those areas include all the traditional areas you would expect analytics to be relevant, including retail, financial services, government, life sciences, etc.  Skytree offers a suite of general purpose machine learning and advanced analytics system designed for processing massive datasets at high speeds.


NEXT – 1010data Releases Version 6 – – >

1010data Releases Version 6 of Cloud Based Analytics

Analytics company, 1010data announced this week that they are releasing a cloud-based version of their big data analytics platform. Dubbed “Version 6,” the company says that the update improves the ability of business analyst to churn out insights from large volumes of data through ad hoc querying and improved report and sharing tools. The new platform also adds controls for cloud environments.

The company says that the update includes additions to their machine learning capabilities, including new statistical functions such as logistic regression, K-means clustering, Markov chains and linear recursion – all designed to work on virtually unlimited data sets (“trillions of rows”) with no programming or sampling required.