Big Data • Big Analytics • Big Insight

February 12, 2014

How SQL-on-JSON Analytics Bolstered a Business

Alex Woodie

Running a multi-channel loyalty and engagement platform requires a lot of data. With 18,000 loyalty programs running across Web, mobile, social, and email channels, the folks at PunchTab are up to their eyeballs in data for customers like Arby’s, Green Day, ConAgra Foods. But when it came to allowing customers to analyze all that data, its NoSQL-based transactional system came to a grinding halt.

If you have ever loaned your personal information to a retailer with the hopes of getting a discount or winning a prize, you may have worked with a PunchTab client. The company manages the loyalty and engagement campaigns for retailers and other companies, and along the way helps their clients stand out by their targeting their promotional efforts to specific groups.

From an operational point of view, MongoDB, a document-based NoSQL database, was an excellent choice for hosting this workload. PunchTab has several Mongo clusters that have performed very well at storing the millions of JSON files that PunchTab collects on behalf of its customers every day. That type of workload is right up MongoDB’s alley. But when the company tried to analyze the data in Mongo–such as trying to identify the most loyal customers or the biggest social influencers–the database failed.

“I love Mongo for the flexibility and the fact that it stores unstructured data. A document-based approach works very well,” says Mehdi Ait Oufikir, CTO and co-founder of PunchTab. “But as soon as we wanted to run analytics and reporting and BI on top of the Mongo infrastructure, then it broke. It basically was not able to deliver what we were expecting it to deliver.”

With PunchTab’s business growing, Ait Oufikir set out to find a solution that would give his clients the analytics and complex reporting capabilities they needed. While the data PunchTab stores is semi-structured binary JSON (or BSON), Ait Oufikir wanted to enable his customers to analyze it using traditional SQL techniques. And because all of PunchTab’s IT infrastructure is hosted in the cloud, it was also a requirement that the data warehouse live in the cloud.

Ait Oufikir’s searches led him to BitYota, a fellow tech startup based in Mountain View, California. BitYota is a hosted data warehouse service founded by former Yahoo employees with the aim of delivering SQL-based analytics across mounds of semi-structured data–often out of Mongo databases, which are popular for hosting Web 2.0 workloads.

BitYota says it designed its own hosted data warehouse from scratch, and that it’s differentiated by having a JSON access layer atop the data store. “We have some uniqueness where we operate SQL directly on JSON,” says BitYota CEO Dev Patel. “We don’t need to translate that data into a structured format like a CSV. We believe that if you transform the data, you will lose some of the data quality. And once that’s transformed, you won’t get it back.”

BitYota’s ability to analyze JSON from MongoDB was critical for PunchTab. Every time one of its customers’ customers participates in a loyalty campaign by downloading an app, responding to an email, or pressing a Facebook “like” button embedded on a Web page, it generates another JSON document. Mongo did an admirable job of storing the JSON documents, but analyzing the data was a challenge.

Once PunchTab started loading its Mongo-based JSON data into BitYota (using Amazon S3 as a conduit), the company developed various Web-based dashboards that allowed their customers to query their data. “All our reporting and analytics are built on top of BitYota and goes through a simple ODBC and JDBC APIs,” Ait Oufikir says. “They’re able to see who’s doing what, who are the influencers in the program, what are they’re doing.”

With just 1TB of data, PunchTab’s is not necessarily a classic “big data” problem. But when one considers that the 1TB of data is composed of more than a billion individual records that need to be mined, and that millions of new JSON documents are added every day, you understand why the company needed a solution that grew out of big data analytics technology.

Ait Oufikir would have stuck with MongoDB if it had delivered the goods. But the lack of an SQL interface in MongoDB would have required his customers to adopt different techniques to get the data they needed, such as writing MapReduce jobs. But adding MapReduce to the mix was “out of the question,” he says.

BitYota is similarly dedicated to pure, unadulterated SQL. “The focus on SQL,” Patel tells Datanami. “Not SQL-like or another new language that a developer or an analyst or a BI tool needs to learn. Rather, the language is SQL, and I mean ASNI SQL, and that’s what we conform to. Those are the principals in how we started the company.”

Patel is adamant that writing queries in a declarative language like SQL is preferable to writing them in a procedural language like Java or C++. The “time to analytics” is too long with procedural language because “every time your query changes, you have to write your code again.”

In the future, PunchTab may be looking to bring ad hoc analytics into the mix. BitYota operates on a pay-as-you-go model, so any increase in customer workloads translates into an incremental uptick in their bill. For now, PunchTab has been running BitYota’s data warehouse and analytics service for about a year, and is “super happy” with the service, Ait Oufikir says. “It solved my problem. Our customers use it every day. I don’t interact with them, because they’re always up and running.”

Related Items:

MapReduce Alternatives in Bioinformatics

I Didn’t Know Big Data Could Do That!

Alpine Debuts ‘Chorus’ Line for Big Data

Tags: , , ,