Follow Datanami:
November 3, 2012

This Week’s Big Data Big Seven

Datanami Staff

This week’s top seven stories from the realm of big data include some funding news from a graph database startup as well as an established analytics company. We also check in on how big data is solving problems in the wake of Hurricane Sandy and point to a few new releases that up the data-intensive computing ante.

Without further delay, let’s dive in…

More Funding for Graph Database Developer Neo

Neo Technology, creators of Neo4j, announced this week that it has raised $11 million in Series B funding.

The round, led by Sunstone Capital, brings the total amount raised by Neo Technology to more than $24 million. The investment will be used to fuel product development, meet the needs of Neo’s expanding customer base, and to deliver on the company’s vision for mass adoption of graph databases.

Some startups and emerging companies such as Squidoo, FiftyThree, GlassDoor, Teachscape, Gamesys, Viadeo, MoviePilot, InfoJobs, and Woozworld are using Neo4j to bring innovation to mobile and social applications.

The Neo4j Partner Graph is expanding by including leaders in NoSQL and graph database applications. New partners include HP, OpenCredo, Open Software Integrators, InfoClear, McKnight Consulting Group, Xnlogic, CodeCentric, Instil Software, JayWay, Lateral Thoughts, Morgner, SPP42, Xebia, Zenika, Centrum Systems, Readify, SpringSource, Heroku, Thoughtworks, and Capgemini.

Graph database and Neo4j users, including Intuit, ThoughtWorks, FiftyThree, Squidoo, Adobe, Telenor, and Cisco, will present at Neo Technology’s GraphConnect 2012 conference on November 5-6 in San Francisco.

Next — A Drawn to Scale Spire >


Drawn to Scale Releases Limited Availability Spire

Drawn to Scale, Inc, creator of Spire announced Spire is in Limited Availability. Spire deploys massively scalable SQL operational workloads for web, mobile, and the enterprise. 

Spire allows developers to interact with Hadoop as if it were a traditional RDBMS, with real-time reads and writes. Spire leverages a fully distributed architecture similar to Google’s F1 database, built for commodity infrastructure or the cloud.

Spire’s goal is to make it easy for developers to connect new and existing applications with thousands of simultaneous users, without having to wait for batch-oriented MapReduce jobs. According to Bradford Stephens CEO and Founder of Drawn to Scale, Spire is helping telecoms, mobile analytics, social networking, and financial companies looking to scale applications and drive business decisions utilizing increasingly more data.

There are three core components to Spire’s real-time performance: two-layer optimization (on a global and local level), a distributed indexing engine, and a Virtual Query Machine.

Using the global optimization engine and distributed index, Spire creates a map of the underlying data allowing routing of requests to only the servers in the cluster with the data needed. This results in orders-of-magnitude better throughput than classic sharded databases. As a SQL query runs, Spire then employs local optimization tactics on every node to manage memory, disk, and network resources. The Spire Virtual Query Machine performs low-level computations by abstracting away the cluster and network access, making it easy to add features and tune performance without server-side changes.

Joins are essential requirement of ANSI SQL. Executing joins on data across hundreds of nodes is very complicated and expensive to compute. Joins in real time require careful measurement of the amount of data to be joined and complicated memory management. Drawn to Scale and Spire can support many applications previously utilizing traditional DBs, as well as BI tools requiring complex aggregation of disparate pieces of data into a single result set. 

Next—Simba’s New Son >


Simba Releases SimbaO2X 4.5

Simba Technologies Inc. announced the immediate availability of SimbaO2X 4.5. SimbaO2X 4.5 offers usability and performance enhancements enabling access and high-performance analytics in popular ODBO-based business intelligence tools such as Microsoft Excel 2013.

SimbaO2X 4.5 includes Excel 2013 and Windows 8 support, leveraging enhancements and new functionality in Microsoft’s latest Office release and OS.

Excel 2013 allows users to explore data and create richer data visualizations, using PivotTable’s enhanced interface and features such as Excel’s new timelines functionality to allow users to easily view data trends over time. SimbaO2X is the de facto platform for connecting OLE DB for OLAP (ODBO) based applications, such as Microsoft Excel, to XML for Analysis (XMLA) based data sources from multiple vendors.

SimbaO2X is used by both proprietary and open source software companies to enable data analysis with popular applications – such as Microsoft Excel – which utilize the ODBO data standard to communicate with other software products in the marketplace. SimbaO2X enables ODBO-based applications like Excel to be used in a web services environment to query and analyze data over networks or the Web. Data queried is stored in multi-dimensional data stores using the widely-accepted XMLA and Multi-dimensional eXpressions (MDX) data standards.

SimbaO2X is fully compliant with ODBO, XMLA and MDX query language data standards for wide interoperability with numerous products and solutions in the marketplace.

Next—Pentaho Scores Further Funding >


NEA Awards $23 Milion of C Round Funding to Pentaho

Pentaho Corporation announced that it has raised a $23 million C round led by New Enterprise Associates (NEA), with participation from additional existing investors Benchmark Capital, Index Ventures and DAG Ventures. Pentaho will use the funds to expand its development, engineering, services, sales and marketing efforts.

“Pentaho has a proven big data strategy with over 300 percent increase in big data sales for the first nine months of 2012 over the same period in 2011,” said Quentin Gallivan, chairman and CEO at Pentaho Corporation. “This Series C investment allows Pentaho to keep pace with fast-moving technology innovations, recruit the necessary talent to execute on our big data strategy, and to expand our leadership in big data analytics.”

This first-mover advantage enabled Pentaho to engage with big data customers early and to continually rollout technology updates that keep its users ahead of the big data curve. Today Pentaho provides a most comprehensive platform from big data preparation, ingestion and integration, to interactive visualization, analysis and prediction. Pentaho also reduces the time to design, develop and deploy big data analytics solutions by as much as 15x, and is part of a  partner ecosystem that includes technology companies such as Cisco, Cloudera, DataStax, Dell, EMC Greenplum, HP Vertica, MapR, Netezza, 10gen/MongoDB, and Teradata.  

“Pentaho has carved out a massive opportunity as an analytics and intelligence layer for a wave of web-scale, open-source data solutions,” said Harry Weller, general partner at NEA.

NEA, one of the world’s largest venture capital firms, is among the most active investors in enterprise software and systems.

Next—Splunk’s New Enterprise >


Splunk Launches Enterprise 5

Splunk Inc. announced the general availability of Splunk Enterprise 5, the fastest, most resilient version of the company’s flagship product. The latest release includes added features to create a platform for developers building big data applications.

“Technology needs to provide answers as quickly as users think of questions, regardless of the speed, complexity and scale of the underlying data,” said Guido Schroeder, senior vice president of products, Splunk. “We need to put that technology in the hands of developers and IT professionals so they can innovate and drive new ideas. It is for these reasons and more that we created Splunk Enterprise 5.”

“With the added pressure on IT professionals to meet performance goals and rapidly introduce new services and control costs, buyers have shifted towards technologies that provide real-time operational visibility and analytics across their mission-critical infrastructures,” said Jonah Kowall, research director, IT operations management, Gartner.

Reports are up to 1,000 times faster and dashboards are easier to navigate and share with Splunk Enterprise 5. Drilldowns integrate simple workflows, providing a more intuitive user experience. Integrated PDFs enable reports or dashboards to be shared with anyone on demand or on a scheduled basis.

Splunk Enterprise 5 introduces patent pending Index Replication that delivers built-in high availability and enterprise-class resilience, while scaling on commodity servers and storage. As data is collected and indexed, multiple identical copies are maintained. During an outage, incoming data continues to get indexed and indexed data continues to be searchable. Set up is simple and management is done through the Splunk Manager user interface.

Splunk Enterprise 5 also contains platform features to drive greater extensibility, modularity and interoperability. For developers, this release includes a versioned API and JavaScript SDK. SDKs are also available for Java, Python and PHP. Each SDK includes documentation, resources and tools to help developers through application development and testing.

Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk Enterprise and Hadoop. This makes it easier to stand up reliable, secure, enterprise-grade big data projects in days instead of months.

Next—Actuate’s Next Big Data Hub >


Actuate Announces BIRT iHub

Actuate Corporation, The BIRT Company, announced the BIRT iHub, a next generation deployment framework for delivering applications that keep end users up to date on the insights they need to run their businesses. As the latest addition to ActuateOne, BIRT iHub helps organizations detect and operationalize data-derived insights while addressing today’s key deployment issues.

These issues include: staying abreast of Big Data, and data in all of its forms, without increasing investments in data warehouses; meeting the increasing end user requirement to access and respond to data anywhere using their mobile touch devices; increasing the value of information available to these users by providing deeper means to analyze information.

The BIRT iHub broadens the ActuateOne infrastructure to let organizations nimbly deploy solutions that access and combine increasingly diverse data sources, including Big Data, cloud and social media; cater to increased consumption of information on touch devices, such as tablets and smartphones; visualize, profile, associate and mine huge sets of information to quickly and easily discover, analyze and even predict key business outcomes; allow information-centric applications to evolve with user needs; accelerate development cycles and creation of applications; and take advantage of flexible virtualization to scale and contract their applications as appropriate in public/private/hybrid clouds and via SaaS and PaaS environments.

”With the BIRT iHub, ActuateOne becomes even better equipped to deploy any application that delivers insight derived from Big Data – all the way to touch devices,” said Pete Cittadini, CEO and President, Actuate Corporation. “ActuateOne with BIRT iHub is ready to meet customer needs today and tomorrow, no matter what follows Big Data.”

Next—Big Data vs. Big Storm >


Big Data Improves Sandy Preparations, Response

As Hurricane Sandy barrels up the East Coast, Direct Relief is working with technology partners at Palantir to assess needs, determine likely emergency scenarios, and mount a targeted response for health clinic partners in the path of the storm. 

Because of the huge geographic area and population potentially affected by the storm — an estimated 60 million people – a key priority is to identify people and communities that are most at risk.

Palo Alto-based company, Palantir, specializes in data integration, visualization and analysis tools that allow Direct Relief to pull together massive amounts of information sources into a common framework to understand, visualize, plan, and manage for complex emergencies, including Hurricane Sandy.

A critical component which enables Direct Relief’s ability to mount an effective response is their understanding of social vulnerability: who is at risk, where, and why? Not everyone within a hurricane’s path is equally at risk. 

Extensive research by Dr. Susan Cutter of the University of South Carolina regarding past hurricanes and other emergencies has identified over 30 factors that affect communities’ vulnerability in such events, including an area’s natural and built environment, its rural or urban character, and the demographic composition and income levels of the population. In general, vulnerability is greater among people at age extremes (young and old), with low incomes, members of minority populations, and those with special health or medical needs.

“Palantir’s analytical and data visualization tools are helping Direct Relief pinpoint our clinic partners located in socially vulnerable populations, and in flood risk zones as they relate to  Hurricane Sandy, allowing us to better anticipate the needs for essential medicines in emergencies,” said Andrew Schroeder, Direct of Research and Analysis at Direct Relief. 

“In addition we can assess the likely scenarios for population movement which may stretch the resources of inland primary care health centers in the event of evacuation.  The better the information the better we can understand and manage complex problems in near real time.”

The general principle that good information is needed to make good decisions is particularly true in emergencies, as situations can shift rapidly and information flow can be interrupted.

Among the challenges in emergencies is the need to make rapid decisions to deploy resources—equipment, personnel, money, food, water, shelter supplies, and health resources—occurs just as current, precise information becomes harder to obtain and distribution channels become damaged. 

A “fog of war” analog exists in emergencies; Direct Relief’s preparedness and response efforts focus on building information channels, working consistently with the nonprofit health centers and clinics that serve vulnerable people in high-risk hurricane areas, pre-positioning essential health resources in the areas, and building a robust distribution channel to infuse additional resources as circumstances warrant.

Rather than treating these dimensions of Direct Relief’s disaster response as separate and distinct, Palantir pulls them all into a continuous braid of analytic workflows to improve the overall intelligence and efficiency of our response.

Starting, for example, with a geographic layer showing county-level values of the social vulnerability index and flood-related damage estimates from the University of South Carolina’s Hazards and Vulnerability Research Institute, Direct Relief builds statistical correlations with data from the Centers for Disease Control and Prevention (CDC) on disease prevalence rates to score counties in terms of their health risks, population needs and disaster impacts.

Pulling that analysis directly into a map view to see where the most at-risk counties are located, Direct Relief can cross reference with clinical addresses and storm scenarios to prioritize problem areas and response requirements. Rapid, highly targeted analysis of historical product flows for health centers in these risk zones focuses attention on specific material needs.

In addition to emergency response efforts, Direct Relief pre-positions Hurricane Preparedness Packs in advance of the start of hurricane season across nine U.S. states and seven countries most likely to be affected during hurricane season.

The Hurricane Preparedness program, the largest such nonprofit program in the U.S., pre-positions large quantities of medicines and supplies at health centers, clinics and hospitals in at-risk areas to treat vulnerable people during emergencies. The pre-positioning of these medical resources is another key component of Direct Relief’s emergency relief efforts and ongoing assistance to partner clinics to facilitate a fast, efficient response when a disaster strikes. In the U.S. 50 Hurricane Packs are currently in place and stand ready to be deployed in an emergency.

The contents of the prep packs are versatile and can be used for acute care as well as to treat patients with chronic diseases should they become displaced by storms and lose access to their medications or medical care. Each U.S.-bound pack contains enough medicines and supplies to treat 100 patients for three to five days after a hurricane hits. The modules shipped internationally are much larger, containing enough supplies to treat 1,000 people for a month following a disaster.

Direct Relief supplies the Hurricane Prep Packs with donations from pharmaceutical and medical corporations and through a long-standing relationship with FedEx to assist in shipping and logistics. The Prep Packs are provided free of charge to the healthcare facilities.

Datanami