Follow Datanami:
June 25, 2018

Machine Teaching Will Drive Crowdsourced Cognition into the AI Pipeline

James Kobielus


Building high-quality artificial intelligence (AI) is hard work. It’s a specialized discipline that historically has required highly skilled specialists, aka data scientists.

Any time you require some highly skilled, highly paid practitioner to accomplish something of value, you’ve introduced a bottleneck into that process. That explains why there’s been such a huge push for machine learning (ML) automation.  It also explains why many organizations are seeking to democratize these functions to less skilled personnel.

Building ML and other AI models is increasingly being automated, but, apart from crowdsourced labeling of example data, this pipeline has heretofore resisted democratization out to non-data-scientists. One of the reasons why is because feature selection—which kickstarts any ML modeling exercise—has proven itself amenable to a high degree of automation. If data scientists can rely on a steady stream of example data that has been reliably labeled at the source, or by automated tools, or by themselves through highly efficient manual methods, there is little need to bring non-experts into the curation process.

However, supervised learning is impotent if ML models lack a valid feature model of the “ground truth” source data. This might be the case if the subject domain is so complex or unfamiliar that there is no pre-agreed semantic model. To the extent that domain experts haven’t yet clarified the underlying entity-relationship graph to their own satisfaction, data scientists risk building ML models that produce an excess of false-positive and false-negative classifications, predictions, and other algorithm inferences.


Clarifying an ML model’s underlying semantic model might require giving domain experts the ability to interactively visualize, explore, and query ground-truth data for themselves. As knowledgeable individuals crystallize their specification of the underlying concepts, the resultant semantics, schemas, and labels should drive downstream feature engineering in Spark, TensorFlow, PyTorch, or some other ML modeling framework.

This is the essence of an emerging feature-engineering approach known as “machine teaching.” As discussed in this Microsoft Research paper and this heavy-going academic research paper, this relies on data exploration tools that enable domain experts—the so-called “machine teachers”–to interactively clarify the conceptual model expressed in a ground truth data set. As explained in the paper, the efficacy of this approach depends on how one structures the interactions that the “teacher” has with the subject-matter data. As stated by the authors: “The role of the teacher is to transfer knowledge to the learning machine so that it can generate a useful model that can approximate a concept.”

That, in turn, relies on an acceptance that the teachers may not, at the start of the exercise, have their own conceptual approach to the topic fully baked. In other words, they may not be “experts” at all, but, instead, simply knowledgeable individuals who are trying to resolve their own approach to the topic down to greater concreteness and specificity. Depending on the subject domain of the ML initiative, this is a function that could be democratized out to any intelligent person who can, say, assess the sentiment expressed in a snippet of spoken language or the activities taking place in a complex video clip.

Machine teaching proceeds from the premise that “concept evolution” may prevent premature closure of the feature engineering phase in ML model development. In other words, the eventual feature set emerges as the teacher decomposes the subject matter into its semantic constituents—such as the meanings of spoken language or the sequence of events in a video feed–through exploration of the source data.

The machine teachers may in fact be experts in the domain of a specific ML initiative. For example, the machine teachers on a life-sciences ML modeling initiative may include molecular biologists who are being asked to classify and label diverse proteins based on their structures, properties, and behaviors relevant to particular disease. However, the teachers, though experts on the topic, may not have a clear idea how to carry out this task without inline tools for visualizing, querying, simulating, and otherwise exploring different proteins. In addition, individual teachers may benefit from reviewing other teachers’ classification and labeling of this same data, as a double-check on the validity of their own decisions relevant to the appropriate ML feature set on this common exercise.


Another possibility is that machine teaching might be useful for building ML that addresses domain such as “AI safety” for which there are few well-established domain experts or consensus paradigms. As I was reading this recent OpenAI blog, I couldn’t help thinking that any system which “trains [ML-driven] agents to debate [AI safety] topics with one another, using a human to judge who wins,” is ripe for machine teaching.

In this scenario, the best teachers would come from a crowdsourced constellation of experts in every branch of this emerging field, including AI ethics, privacy, algorithmic accountability, anti-biasing, anti-adversarial, and so on. As these experts attempt to teach ML algorithms how to do their jobs, the experts themselves will almost certainly have to constantly evolve their perspectives on their own subject domains as they find some approaches well suited to algorithmic expression and others too vague to encode in ML-model feature sets.

When deployed within a crowdsourced environment that involves humans at any expertise level, machine teaching can help to ensure that a sufficient number of individuals input their choicest “cognition” into the feature engineering process, and even into the ongoing review and refinement of the associated feature sets. As the cited Microsoft paper states, democratizing the feature engineering in this way may be essential for ML initiatives in which:

  • Documentation is lacking, sparse, or confusing regarding the rationale for the original ML feature set;
  • The domain expert who defined the original feature set is no longer available;
  • The ground-truth data distribution has changed, causing the associated ML model’s predictive accuracy to decay;
  • The ML subject domain’s underlying causal variables have evolved, rendering the previously built-out feature set less valid a predictor; or
  • The ML model’s existing feature set hasn’t been decomposed to a sufficient degree of modularity to permit easy modifications to diagnose the root causes of deficiencies in its predictive accuracy.

To drive highly efficient feature engineering in these sorts of interactive and collaborate ML feature engineering initiatives, a machine-teaching environment should:


  • Provide a user interface that simple, easy to learn, and expressive enough to enable teachers to distinguish example data in a meaningful way;
  • Enable teachers to define, revise, and refine the semantic model as they interact with the data;
  • Allow teachers to change their underlying concept definitions, schemas, and labels as they discover atypical or anomalous examples that cause them to reshuffle how they classify instances into various categories;
  • Give teachers the ability to add features, ignore ambiguous patterns, and affix meaningful labels to the data sets with which they’re interacting;
  • Let teachers interactively decompose source data into semantic sub-concepts that can be recomposed, linked, labeled, and manipulated in a manner that’s easy, interpretable, and reversible;
  • Record all teacher actions into a shared repository for collaboration, versioning and governance;
  • Enable review, correction, and refinement of those actions by other teachers; and
  • Use teacher-developed feature models as inputs to drive the automated compilation of ML models into execution formats for back-end ML frameworks such as TensorFlow, MXNet, and PyTorch.

Looking at machine teaching in the broader context of ML automation, it’s clear that this approach can become a core “human in the loop” approach for continual maintenance of ML models’ accuracy. This approach might be integrated into an ML automation in several ways:

  • Automatically generating visualizations of model feature sets for machine teachers to explore, manipulate and assess trade-offs between interpretability and accuracy;
  • Automatically identifying predictive features in the data that a machine teacher may have never considered;
  • Automatically refreshing a front-end curational interface in which machine teachers review and correct specific false-negative or false-positive instances of data that had been ML-scored as anomalous; and
  • Automatically feeding back machine-teacher-corrected scores in real time to update an ML model’s feature set in order to improve its accuracy on future runs.

Before long, we’ll find that the ground truth for any ML initiative isn’t in a specific example data set. Instead, it resides in the very human minds of the machine teachers who interactively curate and refine the feature sets that drive automated ML modeling, training, and operationalization.

About the author:  James Kobielus is SiliconANGLE Wikibon‘s lead analyst for Data Science, Deep Learning, and Application Development.

Related Items:

Blockchain Starting To Feel Its Way into the Artificial Intelligence Ecosystem

Training Your AI With As Little Manually Labeled Data As Possible

Ensuring Iron-Clad Algorithmic Accountability in the GDPR Era