October 4, 2013

Driving MapReduce into the Semantic Web

Isaac Lopez

At the turn of the century, a revolution took place on the infant World Wide Web that transformed static pages into interactive, collaborative portals, spawning the social media era that permeates today. That transition was called Web 2.0, and there’s an open discussion about how semantic data will create the next transition. A group at the University of Freiburg is working on ways to enable MapReduce to pave the way.

At the 12th International Semantic Web Conference (ISWC) in Sydney, Australia later this month, researcher Alexander Schätzle says that his group will be presenting on PigSPARQL, a SPARQL query processing system built on top of MapReduce. This new system, they hope, will pave the way for the enablement of the semantic Web for people who are not MapReduce programmers.

For those who aren’t familiar with the concept of the semantic Web, it’s essentially machine learning for the World Wide Web. Using metadata and algorithms that help computers understand what content people are searching for, the machines can then MapReduce their way to delivering it.

It’s able to smartly do this using a family of W3C specifications known as the Resource Description Framework (RDF), which is essentially a metadata framework that can be used to identify and link data to other data, giving the machine intelligence a handle. While RDF (and its corollary RDF triples) are outside of the scope of this article, the important thing to understand is that RDF data is growing exponentially, and the potential for the semantic web is burgeoning before us (despite a few flies in the ointment which we’ll discuss soon).

To demonstrate the move in this direction, consider that the data in the Linking Open Data community project has grown from 12 datasets in 2007 to about 300 datasets today. Contained in these open data sets is over 31 billion RDF triples, with around 504 million RDF links contained within them. This is truly big data on a grand scale, and as it happens, MapReduce happens to be both the enabler and inhibitor to progress in this arena.

While it takes MapReduce to manage this enormous amount of data, it’s no big surprise to anyone that MapReduce is hard to use., “MapReduce means writing a lot of code – especially Java code – and you have to reinvent the wheel because common operations like joins do not exist out of the box,” Schätzle says. This problem is as old as MapReduce itself, which is why Yahoo! developed Pig back in 2006 to wick away the complexity from the MapReduce soup, making it easier to use, thus opening it up to a wider user base.

This is a similar junction that the movement towards the semantic web finds itself. While SPARQL is the querying language which best lends itself to RDF data, it is not easily translatable to Pig, and thus an easy MapReduce programming experience. Pig Latin is an imperative language, and SPARQL is more declarative – similar to SQL.

However, Schätzle says his research group has developed an answer for this, which they’re calling PigSPARQL. The project, he says, enables developers to express every SPARQL operator by an equivalent of Pig Latin expressions, thus making the SPARQL query language executable (through Pig) on Hadoop out of the box.

If it works, it has the potential to open up a whole new world of web development, taking MapReduce from a metaphorical dirt road to a four-lane highway by giving developers in the semantic web space broader access to its functionality.

The group will be presenting its project findings later this month at the ISWC show in Sydney. In the meantime, here is a video describing the project in greater detail:

Stinger Looking to Tez to Cross 100x Performance Line for Hive

Cloudera Search 1.0: Like Googling Hadoop

Technologies: Frameworks, Systems

Sectors: Other

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Driving MapReduce into the Semantic Web

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Driving MapReduce into the Semantic Web

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link