So You Want To Be a Data Scientist: A Guide for College Grads
Congratulations, recent college graduate, and welcome to the workforce! Of all the jobs that you’ll apply for, the one with the sexy title “data scientist” may be the toughest to get–and potentially the most rewarding too. But never fear: Datanami is here with advice from actual data scientists on how to become one of them.
The first piece of advice for budding data scientists is not to get frustrated by the job requirements. No recent college grad can fill is simultaneously a math/statistics genius, an expert in marketing/derivatives /cybersecurity, and a pro Python/Java/R coder. (Hint: That’s why data scientists are called unicorns—because they don’t exist!)
“There are many skills under the umbrella of data science, and we should not expect any one single person to be a master of them all,” says Kirk Borne, a data scientist with Booz Allen Hamilton. “The best solution to the data science talent shortage is a team of data scientists. So I suggest that you become expert in two or more skill areas, but also have a working knowledge of the others.”
According to Borne, you’ll do well by yourself to bone up on core data science skills such as machine learning, information retrieval, statistics, and data and information visualization. You’ll also want to know your way around a databases and data structures and have at least some programming languages under your belt, such as Python, R, SAS, or Spark. Familiarity with graph analysis, natural language processing, and optimization also looks good on your data science resume, as do data modeling and simulation.
“The good news for physics, biology, astronomy, chemistry, and other science students is that they can easily translate their science skills into a data science profession,” he says.
Should You Go Back to School?
While a number of doctorate programs in data science have popped up recently to help stem the unicorn shortage, you won’t want to stay in school for too long. A master’s degree is ideal, according to Borne.
“These days more and more organizations are willing to hire data scientists with little course work and with some experience, without an advanced degree,” he tells Datanami. “The degree will eventually be very important for career advancement (perhaps most importantly an MBA, which now include business analytics), so don’t avoid getting your degree– it just doesn’t have to come before your first data science job.”
That assessment is echoed by Ashish Thusoo, the CEO of Qubole, a hosted Hadoop service provider. While having a solid background in math, data mining, statistics, probability theory, and SQL are required, data scientists will eventually need to venture forth from the ivory tower into industry to get their hands on the most important element: interesting data.
“Learning these skills in industry is very important,” says Thusoo, who is also the co-creator of Apache Hive. “You have strong fundamentals. But in order to apply those skills, you need to get access to data. A lot of interesting data sets are tied up in industry. This was not true 20 or 30 years ago, when a lot of interesting data sets would be in academia.”
Today’s top data scientists didn’t go to school to become data scientists. Instead, they went to school to learn to be computer scientists, astrophysicists (like Borne), chemical engineers, or theoretical physicists. As the world evolved, those hard science and math skills proved invaluable in manipulating the ever-growing wave of data.
“More important than anything else is being able to think around data,” Thusoo says. “I think the tools and languages, those things you can pick up. A random forest algorithm is a random forest algorithm, whether it’s implemented in Python or Scala or Java or any other language. You need to understand where to use that particular technique, rather than how to code that technique.”
Statistics also plays a critical role in big data, says Dr. Monnie McGee, associate professor of statistical science at Southern Methodist University and program director for the university’s Master of Science in Data Science Program.
“Both statistics and computer science are important skills,” she says. “However, it’s my belief–and perhaps this shows my bias as a statistician–that individuals with statistics training are sorely needed in the field of data science.”
Having statistics training doesn’t just mean being able to apply the correct statistical method or run the software, she says. “I mean the ability to formulate a hypothesis that can be tested, to gather the data properly, to design a plan for estimating what is signal and what is noise, and to interpret the result in terms of the context of the problem,” she says.
Patience, Young Grasshopper
Don’t expect to solve the world’s data science problems when you’re 22. Becoming a data scientist actually takes years of training and experience–and a good amount of failure and perseverance doesn’t hurt!
According to Shlomo Engelson Argamon, a professor of computer science and director of the Master of Data Science program at Illinois Institute of Technology, says being successful in data science requires expertise in a number of different areas. “This is an enormous number of diverse skills and tools,” Argamon continues. “It takes many years of experience to develop real depth in them. The key for the budding data scientist, however, is to have a strong grasp of fundamental principles in each area, as well as the ability to use one or two methods and tools. Other tools and techniques can be easily picked up, provided one’s understanding of the fundamentals is good.”
It’s been predicted, in this publication and elsewhere, that advances in software will eventually replace the need for skilled data scientists. The consensus among our ad hoc group of actual data scientists is mixed on that.
According to Borne, the fact that data scientists have a solid base of experience and are literate in things like statistics, machine learning, and data manipulation gives them an advantage that’s separate and above what any software package can bring to the table. “Armed with these talents and aptitudes, the agile data scientist can learn and apply new software packages, can learn and apply new programming skills, and can learn and apply new approaches that are created by the brilliant analytic minds in numerous organizations,” he says.
“Therefore,” Borne continues, “advances in analytic packages will not replace the need for data scientists, as some folks have predicted. But these advances will definitely replace the need for some of the data scientist’s skills (such as Java or Hadoop), though not all of their skills: I think that we all will need to know a programming language (Python, R, or SAS) and also SQL for the foreseeable future.
Soft Skills Matter
You may be a hard-core quant able to leap billions of rows of data in a single bound. But that doesn’t automatically translate into success on the data science circuit. Beyond the fundamental math and technology skills, there are also “soft skills” that come into play, such as humility, curiosity, and determination, according to SMU’s McGee.
“Humility is necessary because often the data do not tell us what we want to hear,” she says. “We have to be humble enough to accept that and interpret what the data are actually telling us. Curiosity because it is important to keep asking questions about the world around us, and about how to find the answers to those questions. And determination because those answers aren’t readily available, and sometimes to data needed to answer the questions aren’t available either. The data scientist has to keep telling herself, ‘I know there is a way to do this,’ and to keep plugging away until she figures it out. And if the data scientist was wrong? There isn’t a way? See trait one.”
IIT’s Argamon encourages budding data scientists to preserver through the tough parts of the job. “The vast majority of work in any data analysis is ‘data drudgery’–reformatting messy data sets, figuring out how to combine different data formats, dealing with erroneous or missing data elements, exploring the overall shape of the data, testing and discarding different models, and so on,” he says. “If you want to be successful at finding the insights that the data are hiding, you must be energetic and tenacious. This is a quality that cannot be taught or learned in any educational program, though it can be developed through assiduous practice.”
“My number one piece of advice always is to follow your passions first,” says Borne, who recently left George Mason University to practice data science in the private sector at Booz Allen Hamilton. “Know what you are good at and what you care about, and pursue that.”
Figuring out which direction to take is a challenge that all budding data scientists must tackle. You might be naturally blessed with talent in math, problem-solving, and communications, or have refined your programming and data manipulation skills in college or grad school. Luckily, you can put these skills to use in a variety of industries, from scientific research and cybersecurity to marketing and finance. “The good news for physics, biology, astronomy, chemistry, and other science students is that they can easily translate their science skills into a data science profession,” he says.
Become A Data Scientist, Be Thankful
Being a data scientist is a dream job for a lot of people (maybe even you!) Borne says to keep that privilege in mind. “As a successful data scientist, your day can begin and end with you counting your blessings that you are living your dream by solving real-world problems with data,” he says.
Borne is reminded of a FastCompany story about Jeffrey Hammerbacher, the gifted data scientist who left Facebook to help found Cloudera. “‘If you think your scarce [data science] skills could be better used elsewhere, be bold and make the move.”