Comcast Develops Advanced Advertising Platform to Handle Real Time Big Data
In conjunction with MapR, Datanami presents Comcast with this month’s “Big Data All Star” award.
For Nathaniel Auvil, a Distinguished Engineer with the company’s Engineering and Platform Services Group, it’s applying the latest in high performance computing (HPC) capabilities to Comcast’s advertising offerings.
Specifically, he has been designing and developing systems that enable Comcast to analyze data. For example, he has developed systems that enables advertising on Comcast’s IP-based systems, as well as a frequency capping system that limits the number of times an advertisement is played to a household. Another recent project was building a subscriber information service for use by Comcast’s video on demand (VOD) services.
Advanced Ad Platforms
“A lot of what we do is enabling delivery of those advertisements, as well as gathering and analyzing the operational data that this activity generates,” Auvil says. “The group’s biggest area of growth is focused on developing advanced advertising platforms. For example, we are moving beyond limitations such as showing the same ad to everyone in a specific region. With data analytics we learn more about our audiences, which allow our advertisers to tailor their offerings to appeal to those more finely-honed consumer segments and boost their ad effectiveness.”
Many homes have set top boxes in multiple rooms. These devices generate terabytes of unstructured data that provides anonymized viewing preferences. The data can be aggregated and enriched with demographics to be more meaningful to advertisers while the privacy of audience members is protected.
The big challenge, he says, is to ingest, process and analyze all this data in close to real time in order to make ad delivery decisions on the spot.
Moving to the Cloud
Comcast is transitioning to a cloud-based delivery platform. This will move associated data flows into the hundreds of terabytes of data daily and approaching a petabyte per day in the not too distant future.
Auvil and his team are building platforms to handle these constantly growing data sets in order to work on the fly with streaming data describing targeted customer demographics and preferences.
“We built a custom data ingestion layer that takes all posts and writes them to a local server-based shared nothing architecture so the data can be scaled horizontally,” he says. “The data is moved into Apache Kafka, a highly scalable distributed messaging protocol. Apache Storm allows us to process immense amounts of data in real time. The data can then be analyzed and subjected to business logic, targeting, frequency capping (a feature that limits the number of times the same ad is shown to a person on the network), and other solutions that drive the Comcast advertising platform.”
Part of the data flow stores data in the platform’s MapR Distribution for Hadoop that runs Hive and Oozie (a job scheduler that supports ETL [extract, transform and load] database functions). This allows Comcast analysts to perform business analytics against the MapR Hadoop cluster. The group also uses MapR-DB – a key piece of the MapR implementation – to query a massive data store in real time to fine tune the ad delivery system.
Auvil adds, “We were already using Hadoop and MapR for our data warehouse so implementing those solutions for our advertising platform worked out very well, especially with the addition of HBase and MapR-DB. The combination gives us very high throughput and easily scales out. What is unique to us was tying together all these Big Data tools such as Apache Kafka, Apache Storm, the MapR distribution for Hadoop, and most importantly, MapR-DB.”
There is No Try
In addition to being able to process massive amounts of data, he cites cost effectiveness, ease of development, and the confidence that the platform can scale as the organization’s needs ramp up. “This space that we operate in is constantly changing; we are continually adding new features, new data points, and constantly evolving the platform. And this activity will accelerate as the IP-based delivery systems takes hold and allows us to do a whole host of things we couldn’t before.”
When asked what advice he would like to pass on to other technologists contemplating immersing themselves in the world of Big Data, Auvil recommends diving right in – don’t hesitate. He quotes Star Wars’ little green guru, Yoda: “Do or do not. There is no try.”