Big Data University Programs Get Real
The sky-high demand for data scientists is driving universities and big data companies to launch new educational and certification programs. What’s different about the programs recently launched by the University of Southern California, Stanford University, Colorado Technical University, and Hadoop vendor Cloudera is the focus on giving people real-world business skills, such as creating big data business strategies in financial services or detecting fraud in Medicare data.
Last week the USC Marshall School of Business announced its Master of Science in Business Analytics. The one-year program will be open to about 30 to 40 students, and will center on creating the next generation of data scientists who are ready to apply their big data skills the first day on the job, says professor Yehuda Bassok, chair of USC Marshall’s Department of Data Sciences and Operations.
“When I speak with companies, they say, ‘You have good students, but when they come to work for us, they are paralyzed. They don’t know how to analyze the data, how to structure it, and find patterns in it,'” Bassok tells Datanami. “At the end of the day, we have an issue of big data. There is a lot of data which didn’t exist earlier. The data is not structured, and the question is, what do you do with it?”
Bassok hopes his new program will generate business-oriented data scientists who have the requisite technical abilities, but also the business savvy to work in the real world. Finding that proper balance is not easy. Other university-level graduate programs focus too much on the technical side, on programming databases and working with products, Bassok says. There will be some of that too, including big doses of math, statistics, and programming, classes in machine learning and optimization. They’ll learn to program R, which is “essential,” and they will learn how to work with databases and visualization tools.
But the new USC program aims to equip data scientists with big-picture capabilities. “I think most universities are looking at the more [technical] micro-level, which is important, and we will do it too,” Bassok says. “But I think very few are looking into the higher level of developing a big data strategy at a company, and we are teaching these classes.”
|Yehuda Bassok, chair of USC Marshall’s Department of Data Sciences and Operations|
The demand for data-savvy MBAs took USC by surprise. When the Marshal School of Business offered a class in R recently, the professors didn’t think many people would show up, so they scheduled it for a Saturday in a small computer lab and charged a $100 fee. The class filled up in just two hours. Bassok expects MBA students will make up about a third of the class (with four-year degree holders and working professionals making up the balance), and some of the course work will apply to the school’s MBA programs.
Along with the technical courses will be courses in applying big data analytics in specific fields, including marketing, healthcare, and financial services. “We will give it the flavor of business, without compromising on the technical ability,” Bassok says. “It’s not a new MBA. It’s not soft. We’re a business school and we want to concentrate on the business application.”
Cloudera will also be hoping to highlight data scientists with the right stuff for business with its new Cloudera Certified Professional: Data Scientist (CCP:DS) certification program. Announced in late March, the CCP:DS program is designed to help developers, analysts, statisticians, and engineers get experience with big data tools and techniques as used in the real world.
The first step in getting the CCP:DS certification is passing the Data Science Essentials certification exam. The company offers three-day Introduction to Data Science courses that focus on teaching how to build machine learning models and to implement recommender systems using tools like Hadoop, Python, and Mahout. The company also provides practice tests to help students get ready.
The second and final step in the certification process is completing a Cloudera Data Science Challenge, a “live data” challenge that the Hadoop distributor offers twice a year. The latest challenge, which opened for three months on March 31, is called “Detecting Anomalies in Medicare Claims.” The challenge, designed by Cloudera’s director of data science Sean Owen (who we recently spoke with about Oryx), asks aspiring data scientists to detect possible errors and anomalies in Medicare claims using a massive set of anonymized healthcare data.
This type of simulated big data training gets data scientists more ready for real-world challenges than they could ever be than just answering a multiple choice test. “From my perspective, this makes the exercise much more compelling, valuable, and meaningful than any other certification available today. You are actually solving problems through data analysis in a full simulation of situations data scientists face in the field,” says Luis Quintela, a Samsung data scientist who is one of the early CCP:DS holders.
Stuart Horsman, a Cloudera data scientist, says he’s “pumped” to earn the CCP:DS credential. “It holds true weight in the market because it replicates a real, sufficiently difficult big data scenario I would see on the job and requires a professional-level approach to solving problems,” he says. “The exam captured all the relevant elements of data science and machine learning, and the challenge made the experience completely non-trivial.”
The real-world demand for data scientists drove Stanford University’s Center for Professional Development (SCPD) recently to expand two graduate-level big data programs, including Mining Massive Datasets, which is offered through the university’s computer science department, and Data Mining and Applications, which is served through its statistics department.
As SCPD program manager Lewis Kaneshiro explains, the capability to offer these programs online via the Stanford OpenEdx platform helps people who are already working in the real world get the same experience as fully matriculated Stanford students attending class in Palo Alto, California.
“We feel the opportunity cost of pursuing graduate studies full-time is prohibitively high for many working professionals,” Kaneshiro says via email. “The flexibility of pursuing graduate courses alongside fully-matriculated Stanford students while applying concepts directly to real-world problems immediately is attractive.”
While many university programs focus on building hands-on, practical experience, others continue to push academic research. For example, Colorado Technical University is finishing up the first year of its Ph.D.-level big data program, which it launched last fall.
According to Myles Vogel, the dean of the school of Information Technology, Computer Science, and Engineering, the new Doctor of Computer Science–Big Data Analytics (DCS-BDA) program is one of the first such doctorate level big data degrees available in the country.
The class, which conducted entirely online, is based on the practitioner scholars model, and requires successful defense of a research proposal and final dissertation. The for-profit school’s master’s level classes are more focused on problem-based learning, as well as solving business problems with data analytics.