Tackling the MDM Challenge of Big Streaming Data
Objectivity, an established Silicon Valley firm with experience in high-performance distributed object-oriented databases, today debuted a new Hadoop-based product that addresses one of the looming challenges in the Internet of Things (IoT): How to handle metadata management of big and fast streaming data.
It may not seem obvious from the outside, but one of the challenges in tackling big streaming data is metadata management. Much of the unstructured and semi-structured data that flows (or will flow) across the IoT is not readily usable in its raw form. Time-series data, in particular, often needs to be transformed before it can be consumed, by analytic applications or otherwise.
Marking up one data stream wouldn’t be so bad. The associated metadata can be cataloged and stored without too much trouble. But as organizations mix and match multiple streams, the whole pipeline threatens to become a messy quagmire.
That’s roughly the challenge that Objectivity hopes to address with ThingSpan, the new YARN-certified Hadoop application that the company unveiled this morning. Inspired by Objectivity’s success in the object-oriented database (Objectivity/DB) and graph database (InfiniteGraph) spaces–and borrowing technology from those two products—the new ThingSpan product should help keep customers’ IoT and streaming data projects on the straight and narrow, says Jin Kim, vice president of marketing and partner development for Objectivity.
“A lot of this senor data comes in time-series form and time-series data has high dimensionality that’s not well suited for many of the analytic algorithms. So one of the key aspects is dimensionality reduction,” Kim tells Datanami.
“But when we do that kind of dimensionality reduction, we need to create a lot of metadata, because some of the analytics is run on the metadata and not the actual raw data,” Kim continues. “What we’re trying to do is to basically create the frameworks so we can enrich it with semantics technology so it can be more oncology driven. It’s about MDM and the automatic creation of metadata and the data model that is necessary for complex fusion processes.”
ThingSpan will be that Hadoop-based repository of metadata created from that streaming data. The company won’t do any of the actual analytics—it will leave that up to the individual customer, who typically have strong preferences. “We kind of want to keep it analytics agnostic so they can bring their favorite buffet of analytic tools with them,” Kim says. “We’re not about to tell our customer that we have a better set of enrichment or clustering or anomaly detection techniques than they do.”
With that said, the software is being developed to work with the graph analytic and machine learning tools available in Apache Spark. It’s also being developed to work with Apache Kafka, as well as Project Apex, a streaming analytic application developed by DataTorrent.
Objectivity thinks it can offer companies that are building streaming analytics and IoT applications a better and more scalable MDM framework that what is currently available, which tend to be mostly modified NoSQL databases, Kim says. Objectivity has been solving these sorts of problems for customers in the intelligence and military sector for the past decade, and now sees an opportunity as real-time analytic applications become more common in the commercial market.
“Objectivity has been dealing with the domain of how to integrate and fuse fast time-series data form sensor networks and enrich them with contextual information for a long time,” Kim says. “It’s been doing this on beyond petabyte-scale data, approaching data ingestion rates well over 1 billion events per second.”
The popularity of object-based technologies has come and gone over the years, but the folks at Objectivity see the technology now being used to bring performance and scalability advantages to the burgeoning field of IoT and streaming analytic applications.
“One of our intelligence customers told us recently that for every piece of data they ingest, they generate six separate metadata items for all the relationships they need to maintain,” Kim says. That’s why “object-based technology is coming into vogue again, [because] as people introduce concepts like data lakes and the idea of ingesting multi-various types of data…you are beginning to deal with much more complex metadata.”
ThingSpan has been certified to run on the Hadoop distributions from Hortonworks and Cloudera, and Objectivity is working with MapR Technologies, Kim says. This will jump start the MDM efforts of companies that are developing IoT and streaming analytic applications on Hadoop, without subjecting them to the steep learning curve that organizations in the intelligence and oil and gas fields had to deal with, and without incurring the high price tags that accompany enterprise streaming analytic products from big-name vendors like IBM and Software AG, Kim says.
“The industry needs a standard stack for running advanced and streaming analytics,” says Kim, who worked previously worked at Skytree. “Intel‘s Trust Analytics initiative is a good [start]. But we need more standards…To effectively run complex analytics, you need to automatically generate and maintain the complex metadata and relationships. We think more and more that metadata structure will be ontology-driven. It has to be as the data set gets richer and just from a provenance point of view. You have to do it.”
ThingSpan will become generally available in October. The company will be showcasing the product next week at the Strata + Hadoop World conference in New York City.
June 20, 2019
- DataRobot Acquires MLOps Pioneer and Category Leader, ParallelM
- Can Facebook Help Predict and Monitor Disease? Study Says ‘Yes’
- AMAX Unveils New Series of Servers for Artificial Intelligence and Machine Learning
- SDSC Receives New Funding for West Big Data Innovation Hub
- Starburst Presto Enterprise Now Available on All Three Major Public Cloud Platforms
- Midwest Big Data Hub Successfully Transitions to Second Phase with New NSF Award
June 19, 2019
- Coventry University Selects Rubrik to Accelerate Digital Transformation
- Western Digital Extends Openness of PlatformIO and Enhances its RISC-V Portfolio
- Global Visual Hacking Study Reveals Alarming Data Privacy Risks for Business Travelers
- DataRobot Named A Leader in Automation-Focused Machine Learning by Independent Research Firm
June 18, 2019
- HPE Advances Hybrid Cloud Strategy by Extending AI, Composability and Partnerships Across Portfolio
- HPE Announces Plans to Offer Entire Portfolio as a Service by 2022
- Hewlett Packard Enterprise Redefines Mission-Critical Storage with New Platform Designed for the Intelligence Era
- HPE Delivers Innovations to Drive the Next Wave of Intelligent Edge Adoption
- New Syncsort Trillium Software Delivers Data Quality at Scale
- NetApp’s Data Fabric Offerings Aims to Dominate Hybrid Multicloud
- Paxata Announces Issuance of U.S. Patent for Automated Join Detection
- GeoSpock Expands Footprint in Asia with Offices in Singapore and Tokyo
- Cirrascale Cloud Services Deploys Non-Virtualized Data Science Workstations-as-a-Service for Deep Learning Workflows
- Cloudian Announces New Object Storage Solution for VMware Cloud Provider Platform
Most Read Features
- Hadoop Struggles and BI Deals: What’s Going On?
- Big Data File Formats Demystified
- Is Hadoop Officially Dead?
- Teradata Turns 40, Takes Off Gloves, Readies for a Fight
- Snowflake Rides Cloud Wave to Great Heights
- Three Deadly Sins of Data Science
- 10 Big Data Trends to Watch in 2019
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- Slootman Makes It Snow at Snowflake Summit
- How to Build a Better Machine Learning Pipeline
- More Features…
Most Read News In Brief
- After Funding Falls Through, MapR Seeks a Buyer to Avoid Shut Down
- MapR Says It’s Close to Deal to Sell Company
- How IBM Is Turning Db2 into an ‘AI Database’
- Cloudera Unveils CDP, Talks Up ‘Enterprise Data Cloud’
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- Facebook Releases Another Deep Learning Tool
- War Unfolding for Control of Elasticsearch
- Google Cloud Unveils Slew of New Data Management and Analytics Services
- Data Management: Still a Major Obstacle to AI Success
- ‘Data Workers’ Failing to Cope
- More News In Brief…
Most Read This Just In
- TiDB 3.0 Officially Available for Public Preview
- Cloudera Announces the 2019 Data Impact Awards
- Spark NLP Becomes the World’s Most Widely Used NLP Library in the Enterprise Within 18 Months
- Tens of Millions of Data Workers Face Inefficiencies as Data Complexity Grows Worldwide
- Comprehensive Data Mapping, the Biggest GDPR Challenge
- StreamSets Showcases Major Attractions at DataOps Summit 2019
- Aible Reveals the Fundamental Disconnect in Artificial Intelligence
- Toshiba’s GriDB and Hitachi’s Pentaho Data Integration and Analysis Platform Deliver New Capabilities to Business Customers
- Cockroach Labs Launches Broad Multi-Cloud Database Partnership Program
- Snowflake Announces Data Exchange to Break Down Data Barriers
- More This Just In…