How 3 Startups Are Tackling Machine Learning Challenges
Machine learning is the secret sauce that allows us to use computers to automate tasks in powerful new ways. However, there are a lot of steps that must go right for the ML to work: ideas must be mined from huge amounts of data, clean sample data must be provided to train models, and models must be managed and maintained over time. Here are three startups looking to simplify some of these aspects of the ML lifecycle.
Before a data scientist can build a machine learning model, she must first identify patterns in the real world. There are various ways to identify patterns. One approach involves putting a visualization tool in front of a data analyst and relying on human’s intrinsic ability to detect differences in data. On the other end of the spectrum are unsupervised neural networks that can find the pertinent trends by analyzing millions of samples.
One startup working in this space is Gamalon. The Cambridge, Massachusetts company last month received $20 million in Series A funding led by Intel Capital to streamline the identification of ideas from large amounts of data.
Gamalon says it’s taking a Bayesian approach to “idea learning.” Instead of relying on large server farms, teams of data scientists, and complex deep learning models to identify and label objects from huge amounts of text data, the company says its Bayesian Program Synthesis (BPS) running on a tablet can identify the same objects using a much smaller dataset – perhaps just one example.
The idea is to help companies do more with the huge amounts of unstructured data — mail conversations, surveys, feedback forms, phone transcripts, and product reviews — that can’t be processed in a fast and efficient manner by today’s deep learning models. The company’s technology utilizes decision trees running in a database, a relatively lightweight approach compared to the massive GPU farms being built to run deep learning workloads.
Gamalon’s mission is “to accelerate human understanding by combining human and machine learning,” says company founder and CEO Ben Vigoda. “When Gamalon’s Idea Learning technology reads large amounts of text, and forms ideas, the AI becomes an extension of you — allowing you to read and respond to huge volumes of messages.”
Before founding Gamalon in 2013, Vigoda was the CEO of Lyric Semiconductor, where he developed microprocessor architectures for statistical machine learning based on his PhD research at MIT. Lyric was acquired in 2011 by Analog Devices, the $3.4 billion chip company based in Norwood, Massachusetts, and today Vigoda’s technology is deployed in all manner of consumer electronics, including smartphones, medical devices, wireless base stations, and cars.
To date, Gamalon has brought in $32 million in funding, which includes seed money from DARPA. The company reportedly has a handful of big customers, including telecom giant Avaya.
Labeling A Solution
One startup taking a slightly different approach to tackling big data challenges is LightTag. The Berlin-based startup has created a text annotation platform that it claims can significantly speed up the data labeling process, which is often the major bottleneck preventing organizations from using deep learning.
LightTag’s approach to the labeling dilemma is to make it as easy as possible for the humans who are driving the product, while using AI to guide the process as much as possible. “At LightTag we provide our customers with a platform to execute and manage large scale annotation projects,” the company says in an April blog post.
The LightTag platform automatically allocates annotation work across teams of users, and keeps track of the mark-up. The company’s UI and use of keyboard shortcuts are designed to let users quickly label text data. In a demo on the company’s website, LightTag co-founder and CTO Tal Perry, who previously was a natural language processing (NLP) researcher at Citi, demonstrated a markup of President Trump’s tweets – always a fun task.
LightTag’s platform utilizes various techniques to keep labelers engaged, including overlapping the work assigned to workers and using ML to learn from previous labeling actions. While “active learning” techniques promise to boost the accuracy of labels with fewer samples, the company is wary about the approach, as it explains in its blog post.
LightTag is available in hosted and on-premise versions. The hosted version starts at $100 per user per month
The GitHub of AI?
Data scientists can use CometML with a variety of machine learning environments and frameworks, including TensorFlow, Keras, Theano, Scikit Learn, and PyTorch. By simply entering the CometML tracking code into their projects, data scientists can begin tracking various aspects of their machine learning work.
“We allow data science teams to automatically track their data sets, model changes, and experimentation history, creating efficacy, transparency, and reproducibility,” the company says in a video.
Comet.ml users can view live updates of machine learning experiments, such as accuracy and loss curves, and the hyperparameter settings used for each experiment. Users can view machine learning source code, share experiments with others, compare machine learning experiments, and add documentation to review the experiments later. The software also integrates with GitHub, which is being acquired by Microsoft, which will streamline work with disperse data science teams.
Comet.ml, which went through the Amazon Alexa Accelerator program, has raised $2.3 million in seed funding. The service went through a private beta, and now the company is making Comet.ml available as a free service for public projects. Organizations that want to host private projects can purchase subscriptions from the company, starting at $49 per user per month.