Follow Datanami:
August 4, 2016

Real Machine Intelligence Making its Way into the Enterprise

Michael Schmidt

(fatmawati achmad zaenuri/Shutterstock)

Hundreds of billions of dollars lies dormant in data. From retailers and healthcare providers to government agencies, untapped data has the ability to generate savings, drive efficiencies, and make entirely new paradigm-changing discoveries. And while companies almost universally accept that data is the new gold, they still haven’t managed to consistently put it to use and unlock its value. To do so requires the ability to not only crunch data and build algorithms, but the ability to deeply understand its significance and convey it to the decision-makers. And as business problems become increasingly complex and the volumes of data explode, it makes sense that the industry first-movers are turning to artificial intelligence (AI) to tackle the analytics problem head-on.

The first attempt at leveraging new-age artificial intelligence to problem-solve was IBM’s (NYSE: IBM) Jeopardy-acing robot, Watson. Watson was one of the most interactive AI systems to date, supposedly showing that a machine could actually “think.” Watson could interpret human questions into a search for a relevant response, and return a logical answer based on the available sources across the internet. Watson’s interactivity and quick “answers” inspired similar technologies like Apple’s (NASDAQ: AAPL) Siri and Microsoft’s (NASDAQ: MSFT) Cortana. And to its credit, unlike Apple, Google (NASDAQ: GOOG), Facebook (NASDAQ: FB) and others, IBM had the brass to step out of the safe confines of a specific application – answering game show trivia questions – and attempted to work with unstructured business data to solve real-world business problems, with a particularly heavy emphasis on healthcare.shutterstock_data_drop_pixelparticle_Nutonian_1

Unfortunately, the results have been lackluster. It turns out taking a neuro-linguistic programming device and adapting it to solve enterprise data science problems is non-trivial and something that even some of the most brilliant start-ups and data scientists have failed to simplify. It’s no coincidence that even the ones who seemingly do it well – Google, Facebook – have teams of data scientists burning monthly cycles to answer a single problem…and none of the questions are to problems in which the answer is already known.

On top of that, much of the data companies are working with is structured, not unstructured. Despite the massive increase in unstructured data, such as text messages and speech, the large majority of data science applications deal almost exclusively with structured data. Unsurprisingly to me and many other AI experts, IBM announced it’s splitting the different components inside Watson into individual services, instead of trying to map a complete solution for its customers. It’s an apparent acknowledgement that their offering was overly ambitious and under-scoped.

What’s required of an artificial intelligence technology to make the leap from science project to a dynamic, automated scientist that can churn through data to elucidate problems and make actionable recommendations? Let’s start with the types of data it needs to play well with. Most businesses today have an assortment of log data, event data, sensor data, sales data, and other numeric data that has thousands of potential factors affecting outcomes. An enterprise data science AI must be able to transform this noisy, chaotic data into meaningful insights and action. What causes things to happen, what triggers – or blocks – certain outcomes, and even most simply, what’s possible with the data I currently have?

ibm watson_1

IBM has split Watson up into individual service, which shows the difficulty in creating enterprise data science services

The way we can best understand the relationships in our data, the driving variables and the causes of events, is through analytical models. Analytical and predictive models are essentially mathematical blueprints for how a system or problem “works.” They show you which variables matter and to what extent, and how all of the inputs interact with one another. If we’re trying to figure out if a tornado is likely to develop over the flatlands of the United States, a model would tell us all of the contributing meteorological factors we should pay attention to, and whether or not the current conditions fit that criteria. If we’re a car manufacturer and want to know if our materials will withstand enough pressure to earn a five-star crash test rating, we can build a model for aluminum strength and be able to predict if it will meet the test – and if not, what to change so it will.

The most overwhelming part about data science is figuring out which data is relevant and how relevant it actually is. While AI that can parrot back previously-answered human questions is fun and useful in certain contexts, it does little to advance modern data-driven applications and discoveries. Businesses need smart machines that are really that – smart – and can think like a scientist, churning through the data and explaining what it means, to advance true intelligence. We need automated modeling engines that can build predictive and analytical models from raw data, and explain them to us in English so we can act on them.

The hype needs to end. Instead of wasting another breath reinforcing the hype of tech giants like IBM, we should be paying attention to the real innovation that’s happening right now in the venture-backed startup offices of Boston and Silicon Valley. The innovations that bring data to life, that will make it as powerful to the world as the internet, will be developed by unique companies that can elegantly overlap data science, statistics and programming, and synthesize it into easy-to-use applications for anyone who can use a basic tool like Excel.

The next stop on the data train is machine intelligence. Machine intelligence is the newest subfield of AI that automates the discovery and explanation of answers from data. Forget big data. Lots of data doesn’t guarantee anything if you can’t light it up. Forget hardcore statistics, forget data visualization. Stats wizards are too hard to come by and take weeks to deliver meaningful answers, and the graphs and charts within visualization tools only tell stories about past occurrences, offering no predictive insight or understanding of why something happened. Machine intelligence leverages AI to do the unglamorous part of data analysis – data crunching and model creation – and allows the human to manipulate the wand, incorporating any of his/her domain expertise into the problem at hand and recommend the best course of action. That is a smart machine.

shutterstock_log_files_Bildagentur Zoonar GmbH

Log files represent a potential source of insight for organizations looking to leverage machine intelligence (Bildagentur Zoonar GmbH/Shutterstock)

The growing shortage of data scientists is taking a toll on companies looking to scale their data science practices. Machine intelligence will not replace data scientists or analysts, but it will make them orders of magnitude more productive. Model-building that would normally require weeks or months of effort from a data scientist takes minutes with machine intelligence. On top of that, machine intelligence, in leveraging massive distributed compute, finds the simplest models that explains the problem at hand, so they can be easily interpreted by the end user and decision-makers. And unlike other machine learning technologies, which are often preconfigured black boxes, machine intelligence enables analysts and data scientists to stitch in their domain expertise by interactively excluding any variables they know are irrelevant, highlighting the relationships they already know exist, and selecting the best model from a handful of options. That is, machine intelligence is data-driven but depends on human expertise for context and implementation.

Machine intelligence has very real applications in the enterprise. It’s currently used in critical projects such as forecasting municipal electricity demand, identifying global climate change patterns, developing new materials for jet engines, understanding how galaxies are created, and optimizing corn yield. It’s time to stop giving credit to IBM’s witty TV ads and tune into what could be the most important technological development in years.

Michael Schmidt

About the author: Michael Schmidt is the founder and CTO of Nutonian. Michael’s research focuses on “machine science”–a direction in artificial intelligence research to accelerate data-driven discovery. Over the past six years, Michael has worked on algorithms and techniques to automate knowledge discovery from data. In particular, he has published extensively on identifying mathematical relationships (such as laws of physics) in experimental data, and algorithms in evolutionary computation. Michael is the creator of the Eureqa project–a popular software program for discovering hidden mathematical relations in experimental data. His research has appeared in several news outlets from the New York Times, to NPR’s RadioLab, and Communications of the ACM. Currently, Michael runs Nutonian Inc. which specializes in scientific data mining and cloud computing for data analysis. In 2011, Michael was featured in the Forbes list of the “Most Powerful Data Scientists” by Tim O’Reilly.