
Seeing Graphs in Smart Data Lakes

Data lakes are synonymous with Hadoop to many people grappling with the promise and the peril of big data. That’s not surprising, considering Hadoop’s unparalleled capability to gobble up petabytes of messy data. But for Barry Zane and other folks at Cambridge Semantics, data lakes are taking on a decidedly graph-like appearance.
Cambridge Semantics, which acquired Zane’s latest startup SPARQL City earlier this year, is beginning to talk about its concept of the smart data lake. The data lake concept is a well-worn one by now. The “smart” part, you may have guessed, owes to the semantic aspect of how the data is stored, how it’s connected to other data in the lake, and the way it impacts how people can extract meaningful information from it.
To Zane’s way of thinking, those who can get the most insights with the least amount of effort have an advantage. Of course, this has always been the case. But the telling part is the fact that Zane—who was founder and CTO of ParAccel (acquired by Actian) and a co-founder and VP of architecture at Netezza (acquired by IBM)–sees graph databases and graph analytic technology as the best way to get there for at least the next 10 years.
“We strongly believe that this is an extremely effective approach, a future-proof approach,” Zane tells Datanami. “Just as Hadoop basically came of maturity because relational just wasn’t able to work with a certain class of question and wasn’t able to work at a certain scale, we pursue those classes of questions and scale using the graph standards, at an incredible cost and performance advantage, as compared to hiring programmers for every question and analytic you want to perform.”
From Relational to Graph

Barry Zane, vice president of engineering for Cambridge Semantics
Zane, who is Cambridge Semantics vice president of engineering, sees graph databases—such as the Anzo Graph Query Engine–as a natural evolution from relational databases, which he says have developed some pretty powerful analytic capabilities themselves over the past 40 years.
“Without a doubt what we’re doing is educated by learning from Netezza, educated from learning from ParAccel. So I really see it a just an evolution,” Zane says. “The difference is you’re able to ask more interesting question of your data. You’re able to find relationships that are otherwise nearly to impossible to find.”
The core problem with relational database technologies—even the massively parallel processing (MPP) technologies that he championed at ParAccel (which powers Amazon’s Redshift data warehousing service) and Netezza (which IBM has renamed into something that nobody can ever remember)—is the ease at which advanced analytics can be performed, and the length of time it takes to get answers back.
“Being a longtime relational guy, one of the great things about the relational database is that you don’t need to be programmer. You’re able to work with the database through either a set of application layer tools or in the SQL language,” he says.
“The best way to think of SPARQL and RDF is that they’re just the next evolution of relational database SQL,” he continues. “That’s the way I think about it, and that’s what got me excited because you can have people who are not super high trained programmers be able to post queries of the data in a matter of minutes or hours and get back response in a matter of seconds or minutes, as opposed to hiring very highly trained and expensive programmers for any given query.”
Graph As Oracle
Zane sees graph databases giving us oracle-like powers to start with one set of facts and to drill outward to ask innumerable other questions from huge and connected data sets. That kind of power has never been available on a widespread basis, but graph is quickly making it possible, and having an impact in multiple industries.
For example, say you’re a retailer selling sweaters, and you want to know how many sweaters to stock. A graph database can let you easily add other data sets, such as weather forecasts or social network data, which can let you spot trends and adapt to changing demand.
“Graph databases, and the technology behind graph databases and graph analysis, is all about working with that kind of stuff and being able to add in additional graphs of information, like demographics, weather, geographic information, and so forth,” Zane says. “It’s obviously very relevant in the life sciences space, where you might be relating genetic aspects to drug effectiveness to drug marketing, clinical trials and so forth. Likewise in financial services around trades. In national security [it helps to] find who the bad guys are.”
Data Rich, Insight Poor
Most of us are data rich these days, but insight poor, says Cambridge Semantic vice president of marketing John Rueter.
“The explosion of data is causing a great deal of pain to organizations,” he says. “Most organizations have been very good at collecting and storing information, but really have not done a very good job of making sense of that data and then being able to perform analytics on top if it.”
End users who are accustomed to having practically limitless amounts of data available to them will eventually come to depend on the capability of graph analytics to navigate it and make sense of it, he says.
“Everybody thought that big data would make everyone’s job easier, when it fact we know it’s made everybody’s job a lot harder,” he says. “End users are demanding and asking for the ability to have interactive data they can work with and go beyond just a traditional query, which almost goes linear in fashion, whereas here with the graph technology, you’re able to traverse all of the data and on a dime spin and ask new questions…It mimics the way we think and the way we want to ask questions of our data.
Product Positioning
While Hadoop-based data lakes compete on some level with Cambridge Semantics graph offering, called Anzo Graph Query Engine, it’s mostly complementary. In many instances, HDFS will be the repository for unstructured data sets before it’s loaded into the in-memory graph database.
The marriage of Cambridge Semantics and SPARQL City makes a lot of sense when viewed through a technological lens. SPARQL City provided an in-memory graph database that could scale to great heights, while Cambridge Semantics provided the tooling that made it more useful.
“As a standalone company, SPARQL City had a great massively parallel database architecture and likewise Cambridge Semantics has a great architecture and product for doing knowledge and data management and the associate visualizations, ETL, and so forth,” Zane says. “So it was just plain very natural that we combined. That way as a single company we could provide the entire stack.”
Yesterday Cambridge Semantics announced that customers can now buy the varous big data products—including the Anzo Graph Query Engine and Anzo Smart Data Manager–as stand-alone products. Users can also buy it as part of the Cambridge Semantics’ Anzo Smart Data Lake offering.
Related Items:
Cambridge Semantics Buys Graph Database Specialist
The Bright Future of Semantic Graphs and Big Connected Data
Hadoop, Triple Stores, and the Semantic Data Lake
February 24, 2021
- dotData Launches Cloud for BI Teams to Quickly and Easily Fully Automate AI/ML Development
- Fivetran Doubles Revenue and Customers in 2020
- Seagate Unveils Lyve Cloud Built to Store, Activate, and Manage the Massive Surge in Data
- Pliops Closes $65M Funding Round for Datacenter Efficiency
- Digital Guardian Deepens Relationship with AWS
- Actian Launches New CX-Focused Capabilities for Cloud Data Warehouse
- Hazelcast Releases Cloud-based Architecture for Financial Services Risk Management Applications
- Veeam Releases New V11 with 200+ Enhancements, Eliminating Ransomware and Data Loss
- Katana Graph Secures $28.5 Million Series A Financing Round Led by Intel Capital
February 23, 2021
- Mission Launches Data, Analytics and Machine Learning Practice for Businesses on AWS
- Hasura Releases Version 2.0 of its Open Source GraphQL Engine
- StorageOS Secures $10M in Series B Funding for its Cloud Native Storage Solution
- Tookitaki Powers AI-Driven Anti-Money Laundering Using HPE GreenLake
- ScienceLogic Raises $105 Million in New Financing for AIOps
- Teradata Joins Open Manufacturing Platform
February 22, 2021
- Researchers Develop Software to Extend GraphIt to Run on GPUs
- Wharton Research Data Services Expands RavenPack Analytics
- HPE and Cristie Data Helping the Greater London Authority Tackle the City’s Biggest Challenges
- Octopai Business Intelligence Platform Announces Support of Snowflake
- Privitar’s Latest Data Privacy Platform Release Offers Enhanced HIPAA Compliance Features, Language Support
Most Read Features
- He Couldn’t Beat Teradata. Now He’s Its CEO
- Big Data File Formats Demystified
- Who’s Winning the Cloud Database War
- Why Data Science Is Still a Top Job
- Snowflake: Not What You May Think It Is
- Big Data Predictions: What 2020 Will Bring
- Governance, Privacy, and Ethics at the Forefront of Data in 2021
- Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?
- Understanding Your Options for Stream Processing Frameworks
- Empowering the Data Consumer: Living, and Breathing Data Governance, Security, and Regulations
- More Features…
Most Read News In Brief
- Databricks Edges Closer to IPO with $1B Round
- Researchers Use Deep Learning to Plow Through NASA Snow Radar Data
- Databricks Plotting IPO in 2021, Bloomberg Reports
- Momentum Builds to Break Elasticsearch Licensing Deadlock
- The AI Inside NASA’s Latest Mars Rover, Perseverance
- Soda Launches Open Data Monitoring
- Update: Elastic Shifts Licensing Terms, Citing Amazon Moves
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- The Rise and Fall of Qlik
- Python Popularity Persists, AI Drives PyTorch
- More News In Brief…
Most Read This Just In
- Cal Poly Team Working on Cross-disciplinary Data Science and Analytics Effort
- UCL Reports: Online Search Activity Can Help Predict Peaks in COVID-19 Cases
- SingleStore Strengthens Executive Team with Oliver Schabenberger as Chief Innovation Officer
- Collibra Acquires Predictive Data Quality Vendor OwlDQ
- DataRobot Announces Feature Discovery Integration with Snowflake
- SAS Establishes Opioid Analytics Users Group
- SingleStore Adds AWS Glue for Simpler Cloud Data Integration
- NVIDIA Violates the Transaction Processing Performance Council’s Fair Use Policy
- McKinsey & Company Launches Data Analytics Platform Experience DNA
- Sinequa Announces Strong Momentum and Fiscal Year 2020 Results Amid COVID-19 Pandemic
- More This Just In…
Sponsored Partner Content
-
The Best Techniques for Data Integration in 2021
-
Onboard data AND coffee!
-
The object store of choice for VMware’s Tanzu initiative
-
Who wins the hybrid cloud?
-
Making the Business Case for a Data Catalog
-
Hear from William McKnight on everything you need to know to modernize your enterprise data warehouse.
-
Free Whitepaper: The Race for a Unified Analytics Warehouse
-
Move beyond extracts – Instantly analyze all your data with Smart OLAP™
-
CDATA | Universal Connectivity to SaaS/Cloud, NoSQL, & Big Data
Sponsored Whitepapers
Contributors
Featured Events
-
ASF Roundtable
March 4 @ 11:00 am - 12:30 pm -
AI & Big Data Expo Global 2021
March 17 - March 18Kensington London United Kingdom