October 26, 2016

This Catalog Recommends Data with Machine Learning

Alex Woodie

Finding the right piece of data can be a big challenge. With the latest release of Collibra’s data governance and data catalog solution, machine learning algorithms help the product learn what types of data you use with the goal of surfacing and recommend new data sources that are appropriate for your job.

This use of machine learning is one of the features in Collibra 5.0, the latest release of the company’s flagship data governance solution that was formally announced yesterday. The Collibra Catalog is one of several applications built into the platform that hundreds of companies use to keep track of big data sitting in Hadoop, Hive, and other locations.

“We have a technology platform that has the capability to keep track of processes around data, the metadata and organizations and roles and who has responsibility for data,” says Daniel Sholler, director of product marketing for Collibra. “We keep track of all the technical connections in all the data because you need to know that stuff. But it turns out that stuff isn’t the interesting stuff.”

The interesting stuff for data scientists, Sholler says, is the data itself. While CIOs and auditors are interested in ensuring there’s security around data and the lineage is accurately tracked, data scientists want to get their hands on the right piece of data as quickly as they can, without going on an extensive hunting expedition.

“With big data, a lot of folks are familiar with the data at a conceptual level,” Sholler said during a briefing at the recent Strata + Hadoop World conference in New York City “They know there’s a customer churn data set out there. What they don’t know is which of the 37 data sets that are labeled ‘customer churn’ is the one they ought to use for their own purpose.”

Collibra is hoping to accelerate data scientists’ access to data by implementing an Amazon-style shopping experience with the latest release of the Collibra Catalog. The Java-based software, which can run on-premise or in the cloud, enables data analysts and scientists to search for data using an intuitive user interface, and select multiple data products to “purchase” through a check-out function.

collibra-5-0-data-governance-center-catalog-dashboard

Collibra’s Data Catalog can recommend new data sources to data scientists

As the users build up a search history, algorithms monitor the activity, with the goal of issuing recommendations to other data products that the user may be interested in—just like the real Amazon website.

The Collibra software does this while tracking all the data on the backend to ensure that tight controls are kept on the security, lineage, and quality of the data. In this sense, it’s all about putting guiderails around data tasks that are often neither automated nor centralized.

“In a typical data lake, there’s zero marginal cost to put more data into the data lake, so everybody just throws it in there,” Sholler says. “So now we’ve got the pit. Which [data set] is the official one? That’s a business process that we would automate…It means there’s a certain level of trust around the data set. You know the source, and where it came from.”

The idea is to provide a system of record for critical data-oriented functions similar to what corporations have in place for sales, HR, and finance functions, says Collibra co-founder and CEO Felix Van de Maele.

“Collibra 5.0 serves as that ‘system of record,’ providing a data governance backbone that helps organizations increase the value of their data and eliminate data silos,” Van de Maele says. “Our new Collibra Catalog helps eliminate one of the major pain points with which data scientists and business analysts grapple—namely, the time-intensive and tedious process of finding data—and enables them to work more quickly and strategically to solve critical business challenges.”

Collibra, which was founded in Belgium eight years ago, recently moved its headquarters to New York City. The company has customers in heavily regulated industries like healthcare and financial services, as well as higher education and high tech. In addition to a catalog, the Collibra platform includes business glossary, data dictionary, data helpdesk, policy manger, reference data, and stewardship applications.

Data Catalogs Emerge as Strategic Requirement for Data Lakes

Applications: Artificial Intelligence

Technologies: Middleware

Sectors: Financial Services, Healthcare

Vendors: Collibra

Tags: Collibra, data catalog, Data Governance, machine learning

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

This Catalog Recommends Data with Machine Learning

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

This Catalog Recommends Data with Machine Learning

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link