Training Firm’s Unexpected Growth Exposes Weakness of Relational Tech
When CPL Training Group started providing Web-based training several years ago, the UK company expected strong but steady growth. But when unexpected demand threatened to freeze the relational database serving the application, the company turned to big data tech for solutions.
If you work in certain fields in the UK, you’re required by law to receive a certification, or a “personal license,” that shows you have received a minimum level of training. So bartenders, for instance, must possess personal licenses to ensure the public that alcohol sales will be handled professionally, and there are similar standards to ensure food safety as well.
CPL Training Group is one of the biggest providers of personal license training in the UK, serving companies not only in the retail and hospitality industries, but also security and care sectors. Companies like Britannia Hotel Group and Admiral Taverns rely on CPL Training to ensure their workers are trained and licensed in accordance with the law.
This training was done mostly in a face-to-face manner until 2010, when the company started offering training over the Web through a new subsidiary called CPL Online. The new offering started out slowly the first year, when the Microsoft (NASDAQ: MSFT) SQL Server-powered system delivered 20,000 completed courses. By 2011, that number was up to about 35,000—a nice growth curve, but nothing the system and the IT team couldn’t handle.
But things were about to change.
The relational database started showing signs of stress in 2012, when the company doubled the size of its business compared to the prior year. David Dasher, the managing director of CPL Online, explains the impact.
“Coming into 2012, we were starting to see not just the number of people doing courses going through the roof, but we were also seeing a lot more people looking at compliance and getting visits from the legal authorities,” Dasher tells Datanami. “We were getting hit with questions such as ‘What’s the status of your staff? Are they compliant? Can you prove it?’ So we were doing much more analysis and reporting.”
Handling the daily production side of the business, as well as the reporting aspects, proved to be too much for the single SQL Server database to bear. In particular, the daily update process—which involved uploading personnel changes from every client’s payroll system—has to be handled with great care.
“We were starting to see problems on the SQL Server side, where the database was grounding to a halt and we had to be really careful about when we imported clients,” Dasher says. “Each of those tasks has a weight to it in terms of doing a write to the database, updating indexes, etc. So when you’ve got 100 to 200 clients all trying to import data…just the sheer number of requests going into the database, we were starting to see record locks.”
Dasher and his team of SQL experts worked diligently to make their relational database run as smoothly as possible. “We tried pre-calculating data. We tried better procedures,” he says. “We tried everything we could to make it as efficient as possible, but at the end of the day, we’re using a transactional database to do everything in one go. And it’s almost like we were victims of our own success. We were growing that quickly.”
Struggle for Survival
CPL Training and its online subsidiary had found a winning business model. The combination of clever and engaging training videos and a sophisticated reporting system gave all parties involved—the workers, their supervisors, and the authorities—what they wanted.
On the surface, the business appeared to be thriving, but under the surface was a struggle for survival. “Our motto is ‘Exceed everybody’s expectation.’ We pride ourselves on good customer service and reliability,” Dasher says. “But behind the scenes we were really, really struggling. We had new clients to set up, but the database was struggling. We can’t turn it off to do database maintenance because we’re open 24/7.”
One of the options available to Dasher and his team was to simplify the data-gathering process. Depending on the size of the course, CPL Online records anywhere from 200 to 3,500 rows of data–essentially every click of the mouse and tap of the keyboard. The company could whittle that IoT-like data stream down to one row of data—pass or fail, essentially—but that was a compromise the company wasn’t willing to make.
The fine-grained detail was essential to the CPL Online business model, Dasher says. “A big part of what we’re selling is we could say to our clients, ‘This person really struggled, they changed their mind three times on each answer, they’re taking twice as long as everybody else, and that person might need a bit of help,’ which is really important,” he says. “So we generally weren’t willing to compromise on that.”
High Performance Offload
Instead of dumbing down the system, the company looked to a clustered big data system that could handle the granularity of data collection it desired, as well as the scalability it demanded. It looked at several open source offerings, including one based on Hadoop and another from HPCC Systems.
In the end, the company chose HPCC, the big data platform that information services company LexisNexis has been developing for more than 15 years. “We certainly looked at Hadoop,” Dasher says. “But compared to HPCC, we felt it was a little bit too complicated. We didn’t feel as comfortable. Roxie [the big data delivery system of HPCC] was a big selling point for us. It just fit in with our way of thinking.”
In 2014, the company turned on its new HPCC cluster, a 24-node affair that included Roxie and Thor, HPCC’s data preparation component. The cluster handles CPL Online’s reporting workload, as well as data aggregation for clickstream analysis, which also involves a hosted Microsft Azure environment.
The HPCC system is hooked into the SQL Server via APIs, and has offloaded 70 percent of the work from the old relational database, according to Dasher. “It gave us a huge amount of breathing space to actually just stop and think,” he says. “Since then, we have grown the size of the SQL Server, which we would normally do. But most of the work is done now on HPCC.”
CPL Online is still growing quickly, and the demands on the database are extensive. It delivered about 4.5 million session starts last year, and expects to deliver about 6 million this year, with about 1.3 million completed courses. In addition to the Web interface, it’s offering mobile interfaces too, and it has also expanded into other areas besides e-learning, including conducting appraisals, hosting events, and taking surveys.
Dasher possibly could have made this all work with the traditional relational technology, but it would have required hiring many more SQL coders, Dasher says–not to mention lowering expectations. “Writing in SQL can be very efficient. However, if you use ECL [enterprise control language] you can just do things so much more efficiently,” Dasher says. “I can tell you, without a doubt, that ECL is the best language I’ve written with in my entire career. You can do things that are just not physically possible with SQL, things that would take 10 times longer.”
A year after first going live on the big data cluster, CPL Online expanded the HPCC environment to 70 nodes, and recently it was increased to 120 nodes. CPL Online’s customers still require individual attention to address their specific reporting needs, but thanks to the power of ECL, those one-off reports are no longer such a big deal.
“It’s a great revenue stream because we charge for that service, and it pays for the team,” Dasher says. “We’ve really been able to say with a smile on our face, ‘Keep throwing what you need at us.'”
With so much testing data housed in HPCC, CPL Online has begun harnessing it for predictive capabilities, including detecting suspicious activity. If a tester flies through questions that trip others up, it indicates the possibility of cheating. “We’ve caught a few situations where they take every question and put all the possible answers on the wall,” Dasher says. “If they have cheated, that puts our company at risk. We don’t want to check the box and say you passed.”
On the flip side, this predictive power also enables CPL Online to determine those people who are good candidates to skip the e-learning component and go straight to the testing. For some of CPL Online’s larger clients, who have 20,000-plus employees, this has the potential to save the client tens of thousands of British Pounds per year.
And in addition to being an HPCC Systems customer, CPL Online is also a partner. “We have a lot of big clients, all these massive companies who vet us very thoroughly, who are now looking at us saying ‘What’s the secret? Show us your infrastructure,'” Dasher says. “Customers are coming to us saying ‘We have this problem on our system, can you help us?’ We’re hosting multiple instances now, where we’re offering those services to clients.”
Like many companies that have pursued digital business strategies, CPL Online struggled to store and process data efficiently. But somewhere along its journey, big data went from being a challenge that threatened the company’s existence into a business opportunity to be exploited.