Follow Datanami:
August 31, 2017

Continuing Your Data Science Education


“What do you want to be when you grow up?” It’s a question kids around the world are asked a million times a day. Until a few years ago, the odds are good that none of them answered “I want to be a data scientist.” How, then, did we get any data scientists? For many, the answer lies in continuing education.

Over the last few years, as demand for data scientists heated up, we saw universities add full-fledged post-graduate programs in the field of data science. You can now get a PhD in data science, which is certainly a great way to kick start your career as a data scientist, although Master’s level programs are more prevalent and can also get you where you want to go.

Having a framed data science degree from a prominent university hanging on your wall is certainly nice, but it’s definitely not the only available path for those looking to become data scientists. In fact, recent years have seen a proliferation of online courses and bootcamps dedicated to helping further the education and training of prospective data scientists.

Datanami recently interviewed a pair of students of the data science training site DataCamp to get their perspective on data scientist’s job, the importance of continuing education in data science, and which courses have been the most beneficial.

Fruit of One’s Labor

Cameron White earned a degree in mathematics from Western Kentucky University, and was on a path to become a community college instructor. One day, out of the blue, he got an email from his former statistics professor asking whether he’d be interested in an internship at Fruit of the Loom.

He joined the clothing giant, and after completing the internship, was offered a full time job as a business analyst. White was eventually asked to join the company’s fledgling data science group, and so he looked to bolster his skills in that department. With his mathematical background, R was natural fit, and he sought more training with the language.

White found DataCamp to be a good source of education and training for R. As data science took off around 2015, the number of R packages exploded, and selecting which tools to use became tough. DataCamp helps to winnow that huge pool of possibilities down into something more manageable.

“At DataCamp, the courses will use specific packages, so I at least have a foundation for how I might solve a certain problem,” he says. “It’s much easier knowing at least what tools to use. Without DataCamp, I’d be completely lost.”

One of the useful tools that White learned about at DataCamp is ggvis, a data visualization package for R. “I did the first chapter on ggvis and it was really great,” he says. “I went to work the next day and made a lot of graphs that my co-workers were really impressed with. From then, I decided DataCamp is the one for me, because I started applying the things I was learning immediately at work. It really helped a lot.”

White is making his way through the courses, and gradually adding data science skills to his resume. The fact that DataCamp courses are often taught by the developers of the tools counts as a big plus in his book.

Cameron White, business analyst, Fruit of the Loom

“It’s great to take a short four to five hour course and learn a totally new package that I’ve never learned before,” he says. “A lot of time it’s taught by the package creator, so they show all features and things that would be kind of tricky to discover.”

Balancing the demands of work and home are tough for anybody, but they’re even tougher for a young father like White. With an unlimited package that costs $30 per month, White, who is 30 years old, can fit in data science courses when he can.

“If I only have 20 minutes, then I can spend 20 minutes and actually get things done,” he says. “I don’t have to have to devote an hour a week to this or that. There are no deadlines.”

For White, who admittedly didn’t particularly excel in high school, finding his place in the world of data science has been a blessing.

“Up until the day I got that email, I had no idea I’d be working in Bowling Green, Kentucky using my math degree. I didn’t even realize that Fruit of the Loom was headquartered in my town,” he says. “I feel like I won the lottery.”

Striving to Learn

Godefroy Clair was studying for his PhD in economics in a French university about six years ago when something both horrible and wonderful happened: he got bored. So he decided to do something else.

Clair had taken some computer science classes before, and that seemed to be the right direction.  So he signed on to continue his education at the Conservatoire National des Arts et Métiers (National Conservatory of Arts and Crafts). As it so happens, the institute was involved in some pioneering work in applied statistics, and it quickly appealed to him.

Godefroy Clair is CTO at Flylab

“I thought to be a data scientist would be a great place, because it’s a mix of all the fields that I have experience in, even though I didn’t really know what it was at the beginning,” Clair tells Datanami. “There’s some math, some statistics, there’s some computer science, and you have to bring a lot of passion.”

As he began studying data science, the technology took off. “Suddenly a lot of libraries emerge and a lot of new tools came up and made everything more sexy,” the Frenchman says. “It was more and more possible to use black box libraries to help you analyze quickly a lot of data in a much more efficient way.”

Clair – who is currently CTO at Flylab, a drone analytics company based in Paris – sought more sources to continue his data science education. But he was growing frustrated at the lack of educational material around many of the new packages that were emerging, such as ggplot2, a data visualization package for R created by Hadley Wickham.

“There was a lot of new stuff coming up but you couldn’t find any training for it,” he says. “You could only get a little bit of it.”

At this point, having selected R as his primary data science language, Clair discovered DataCamp. He was happy to find courses that included the material on the latest tools, often taught by the creators of the tools themselves. That meant Clair could learn ggplot2 tips and techniques from directly from Wickham, whose is chief scientist at RStudio, and also an adjunct professor at University of Auckland, Stanford University, and Rice University.)

“It was super up-to-date with what was new in the language,” Clair says of the DataCamp courses. “All the best R developers are giving course on the platform. Obviously it’s a plus to learn from the people who make the thing.

While he prefers R, Clair is also learning some Python, and appreciates the partnership that DataCamp has with Anaconda (formerly Continuum Analytics), which distributes the popular Python package of the same name.

Clair dedicates one hour a day, every day, to DataCamp. In particular, he appreciates the way the website presents data science exercises to students to help them practice the discipline. For Clair, who learned Japanese as a foreign exchange student for a couple of years, this approach has worked well.

“When you learn a foreign language, you really understand that practice is the most important thing,” he says. “Your brain, by doing and redoing and redoing, always with small variations, is going to train the neurons on how to build a sentence in Japanese. I cannot emphasize enough how it’s the same in data science and computer science.”

Having a broad foundation of education and experience to draw from is also an important trait that will help your data science career, he says. “Every new case is going to draw you into a new field or a new domain,” he says. “Today I’m working with wine and spirits. I have to understand how to make champagne and improve it. And tomorrow, I’ll work with drones in warehouses and maybe work in a new field.”

Clair, who is 37 years old, doesn’t seem to have any misgivings about giving up a PhD in economics to take a chance on data science. DataCamp has helped Clair further his career in data science, but his advice is applicable to all fields: “In the future,” he says, “everybody will continue training through his life and will have to mix different kinds of mediums to build his continuous training.”

Related Items:

Taking the Data Scientist Out of Data Science

Machine Learning Education: 3 Paths to Get Started

Q&A with a DataCamp Counselor