People to Watch 2017
Co-Founder and Chief Data Scientist
Travis Oliphant achieved some fame as the creator of NumPy, a popular Python-based statistical package. But he became a bonafide data science visionary for his recognition that the Python community needed a unified package, which he and Continuum co-founder Peter Wang delivered in Conda. Today, the Anaconda package is a must-have tool for anybody doing data science in Python.
Datanami: Hi Travis. Congratulations on being named a Datanami Person to Watch in 2017!
To take advantage of the nature of open-source projects, users need a unique blend of stability and agility – the ability to rely on production deployments as well as the power to quickly take advantage of the latest software developments. How do you see Continuum’s solutions changing the way the modern business is run?
Travis Oliphant: The modern business must use all the data relevant to its customers that it can access to stay competitive. Accessing that data, creating insightful reports from that data, creating production-ready predictive models from that data, and creating production-ready micro-services and additional data-products is essential to keep pace.
At Continuum, we provide Anaconda to help data-science teams take full advantage of the incredible pace of innovation in open source in order to accomplish these data-driven goals. Anaconda and its Enterprise solution helps data-science teams by providing their IT counterparts, the tools to help their company manage and govern company access and usage of open-source. This includes providing indemnification and infringement assurance for their use of popular open-source tools.
Anaconda Enterprise also helps people securely collaborate, scale, and then deploy into production their data-science solutions which they can produce quickly. This combination changes how quickly a data-science team can go from raw data to discovering an idea to deploying that idea into a production application, dashboard, report, micro-service, or derived data-set. This entire process might have taken days to weeks in the past and can now be completed in minutes to hours with Anaconda.
Datanami: What goals has Continuum set for 2017? What should our readers look forward to seeing from you?
2017 will be a big year for Continuum and for data-science. We brought in a new CEO to help drive company organization to meet the enterprise demands for Anaconda and partnerships opportunities with Anaconda. We expect to double product sales again this year around our Anaconda Enterprise with the GA release in June of Deploy features and Fusion (our Excel integration solution). Our enterprise product will then continue to improve in its ability to automatically understand the data throughout your organization as well as build beautiful and interactive visual presentations and reports from that data (more on those ideas to come).
With the additional leadership, I will be able to focus my attention on Data Science, and the libraries in Anaconda itself. After several years of exploring, I now understand how to re-factor the concepts currently in NumPy into separate libraries that will allow those concepts to influence even more users across multiple languages. I’m very excited to see these ideas finally start to emerge in two libraries that will be first released this year: ndtypes and gumath. These libraries will first impact Numba and Dask which should themselves reach 1.0 releases this year. But, these ideas should then influence other libraries across the ecosystem.
Bokeh and a sister library, Holoviews, should also continue to advance to help people rapidly build beautiful interactive data-science applications on very large data-sets using our datashader library. In addition, Jupyterlab will be released by the summer of this year, and will be a very exciting next step for the rapidly growing ecosystem around Jupyter notebooks. Jupyterlab is a browser-based interactive data environment that is easily extended through a powerful plug-in system. It comes out of the box with a terminal, file-browser, and code-editor in addition to the notebook. In fact, the next version of Anaconda Enterprise to be released in June is based on Jupyterlab and will be able to take advantage of all the extensions that will be written for this powerful next-generation of the notebook experience.
Datanami: Generally speaking, on the subject of big data, what do you see as the most important trends for 2017 that will have an impact now and into the future?
With tools like the incredibly well-documented scikit-learn and Anaconda to bring it to the masses, machine learning has already been extremely accessible. However, new scalable tools with Python interfaces like tensorflow, CNTK, Torch, and XGBoost will allow more automatic model discovery. This has caused some to believe that the demand for data-scientists will drop. This is not true, instead data-scientists will be freed to do more important things like ensuring better data collection, model evaluation, data-product construction, and high-level communication.
As access to these powerful tools becomes ubiquitous, people will move beyond the concept of a data-lake to understand that it doesn’t matter exactly where your data is stored as long as it is accessible, catalogued, relatively near compute resources, and easily provided to the powerful libraries emerging for getting insight from the data. I have been talking about the concept of a virtual data-lake consisting of a data-catalogue which maps URLs to meta-data and then a data-structure available in Python or R. With Python, for example, the URL could map to a data-frame or array (including dask-versions of those data-structures for data too large to fit in memory).
With powerful engines of compute at your disposal through Anaconda and the concept of a virtual data-lake, you can turn any collection of files or data-base tables into a data-insight engine complete with the ability to rapidly build applications, workflows, notebooks, reports, micro-services, and other data-products.
Datanami: Outside of the professional sphere, are there any fun facts about you that your colleagues may be surprised to learn?
My wife, Amy, and I have six children: three daughters (all now in college) and three sons. I started what became SciPy in graduate school while I had three little girls at home. I play the piano and enjoy singing. My wife and I met after a BYU University Singers choral concert. We sang together in the group the next year. Before starting Continuum, I seemed to have more time for playing sports like flag football, basketball, volleyball, and softball which I have enjoyed for decades.
More about Travis Oliphant:
Travis has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding author of the SciPy package. He is also the author of the definitive Guide to NumPy.
Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of the Biomedical Imaging Lab, where he researched satellite remote sensing, MRI, ultrasound, elastography, and scanning impedance imaging.
As president and chief data scientist of Continuum Analytics, Travis ensures Anaconda meets the needs of data science teams, fosters the open source community and is dedicated to furthering the company’s open source projects. He also engages customers in all industries and helps guide technical direction of the company. He has served as a director of the Python Software Foundation and as a founding director of NumFOCUS.