March 7, 2014

Customizing the Internet, One User at a Time

Tiffany Trader

You’ve no doubt heard the statistic that a full 90 percent of all the data in the world has been generated in the last two years. In 2012, humans created 2.5 quintillion bytes of data every day. Every minute of every hour, Google’s servers process more than 2 million search queries, Facebook scans 3.5 terabytes of data, and some 277,000 tweets are sent. The global Internet population now represents 2.1 billion people all contributing to the ever-widening digital footprint.

With great swarms of data coming and going in all directions how does one go about finding the content that they want to see? How do sites like Yahoo and Facebook know what articles and ads will get people’s attention?

To help address questions like these, Lehigh University researchers developed a technique that studies the behavior of social media users. Based a small sample of online activity, the researchers were able to predict the types of content users would like to see.

“We process terabytes of data every hour,” says Liangjie Hong, who earned his PhD in computer science at Lehigh University and is now a research scientist at Yahoo Labs. “You cannot consume it all.”

Brian Davison, associate professor of computer science and engineering and head of Lehigh’s Web Understanding, Modeling and Evaluation (WUME) laboratory, concurs. For the engaged social media user, it’s nearly impossible to keep up with all the feeds and messages coming in, he says. Davison knows about social media overload first hand. Currently on sabbatical from Lehigh, he is working in the data science group at Facebook, where being a social networking power user comes with the territory.

The project has been evolving since 2010. First, Davison and Hong developed an algorithm to predict how often the recipients of tweets would pass along (aka “retweet”) messages to their own followers. For this effort, the researchers were awarded best poster paper at the 2011 World Wide Web Conference.

The then changed their focus to analyze how a user responded to incoming information. “If we could record a user’s activities for 24 hours,” says Hong, “we would know exactly what they are looking for.”

They developed co-factorization machines that use a mathematical analysis method to examine how social media users interact with tweets. For example, do they reply or reweet or mark as favorite? Do they reply? Do they retweet? Which tweets do they mark as favorites? The technique would also expose user interests based on, for example, the frequency with which certain terms appeared in their feeds.

“If we can better understand what you are interested in,” says Davison, “we can decide what to filter, rank higher, or flag for your attention.”

Davison and Hong developed the algorithms using a machine learning approach. Instead of explicitly programming rules to assess how users respond to tweets, the algorithms are trained with data sets. In this case, the algorithms used past interactions to build and refine rules for individual users. A published description of their work was a finalist for best paper award at ACM’s Sixth International Conference on Web Search and Data Mining (WSDM) in Rome in 2013.

The project has many implications, from refining social media feeds so you don’t miss out on the feeds and messages you really want to see to helping news outlets provide personalized content.

Hong explains that while finding patterns in terabytes of data is a tremendous challenge, mining smaller data flows for individual patterns can be done more easily. A specific user is only interested in a tiny fraction of the Webosphere, so why not navigate the problem one narrow slice at a time? That’s what their approach does.

Recognizing the potential for customization to intensify the “filter bubble” effect, in which personalized news feeds create an echo chamber that omits important information and differing viewpoints, Davison is also undertaking another study to examine the way that people perceive bias in online news.

For now, Web companies will continue to deliver content based on a mixture of personalization and popularity, and it will be up to users to diversify their news feeds.

Applications: Artificial Intelligence, Data Mining

Technologies: Middleware

Sectors: Academia

Tags: algorithm, machine learning

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Customizing the Internet, One User at a Time

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Customizing the Internet, One User at a Time

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link