Follow Datanami:
January 27, 2014

A Deluge of Data and a Dearth of Data Scientists

SDSC’s 2014 Series of Data Mining Boot Camps Kicks Off February 26-27

Data Mining Boot Camp

Each day, our society creates 2.5 quintillion bytes of data (that’s 2.5 followed by 18 zeros). With this glut of data the need to make sense of it becomes more acute, and the demand for data scientists steadily increases. Conventional statistical analysis and business intelligence software, although useful, are not designed to capture, curate, manage and process large quantities of data generated by most enterprises. Data mining and predictive modeling, now commonly referred to as data science, are capable of automatic extraction of knowledge deeply hidden in the data, enabling discovery of new insights not otherwise attainable.

According to the Harvard Business Journal and Fortune magazine a career as a data scientist is “the” job to have in the 21st century. At the same time, the McKinsey Global Institute’s Big Data Report notes that by 2018, the U.S. alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of “Big Data to make effective decisions.

Today data is growing at a very fast rate, doubling every nine months, and already the scarcity of qualified data scientists is cause for concern. In order to extract meaningful information from large amounts of data, researchers must be able to automatically review and analyze it. Large data sources create great opportunities for the application of data mining methods that inspire new inquiries to solve many of society’s most pressing challenges as well as to speed understanding and innovation in the marketplace.

To be concise, we have a deluge of data and a dearth of data scientists.

Who are these mysterious data scientists? What kinds of skills should they have? What are their characteristics and qualities?  Where can you find them?

Keenly aware of the importance of answering these questions and more, the San Diego Supercomputer Center (SDSC), an Organized Research Unit of UC San Diego, established a new resource for the Big Data community –– the Predictive Analytics Center of Excellence (PACE). The mission of PACE is to help create the 21st century workforce of data researchers by leading a collaborative, nationwide education and training effort among academia, industry and government via a novel, multi-level curriculum.

SDSC is considered a leader in data-intensive computing and all aspects of Big Data, e.g., data integration, performance modeling and data mining. SDSC established PACE to share their expertise and best practices to prepare professionals to innovate and compete in the global digital marketplace. PACE leverages both the center’s data-intensive expertise and the capabilities of Gordon, SDSC’s data-intensive supercomputer, to bridge new, Big Data partnerships with science, health and business enterprises.

As part of its commitment to education and training, PACE developed a comprehensive, two-part series of two-day “Data Mining Boot Camps” held at SDSC. Launched in October 2012, the Boot Camps (BCs) help organizations expand the analytical skills of their own subject matter experts to develop a built-in pool of talented data scientists as well as prepare managers and analysts to dive deeply into their Big Data to make wise business decisions. There’s been a tremendous amount of interest from both industry and academia, including many UC San Diego colleagues. And, the BCs have attracted a range of industry participants, including some unanticipated business sectors such as utilities, food services and gaming industry.

Participants gain a clear understanding of how to rapidly translate the burgeoning amounts of data flooding an array of technologies – from smart phones to smart grids – and learn the critical skills to design, build, verify and test predictive data models. The BCs provide conceptual and hands-on training with critical predictive analytic tools and techniques that non-computer science professionals can use to discover patterns and relationships which in turn translate to accurate, actionable and agile data.

The comprehensive hands-on BC curriculum is an outgrowth of the data mining certification courses offered through the UC San Diego Extension. The instructors for the data certification courses also lead the BC training. The BCs cover basic data mining, data analysis, pattern recognition concepts and predictive modeling algorithms so that a participant can explore and implement analyses on their data. Moreover, participants are able to sharpen their skills, apply data mining algorithms to real data and interpret the results.

PACE BCs are unique in their teaching methods, format and size. The classroom setting allows the instructors to work one-on-one with participants during the hands-on training. Boot Camp training includes  –

  • Overview: Data Mining, Machine Learning and Statistics
  • Overview of CRISP-DM: Cross Industry Standard Process for Data Mining
  • Introduction to Data Mining Tools
  • Data Preparation
  • Learning Algorithms Implementations
  • Model Evaluation and Validation
  • Data Mining Trends, Applications and Guidelines

PACE also conducts knowledge mining workshops and onsite training tailored to meet an organization’s core research and business objectives so that their professionals can quickly gain relevant insight to reach better decisions, faster.


[email protected]