Data Fingerprinting: An Innovative Lean AI Approach for Startups
Technology startups have immense pressure to embrace machine learning and AI in order to stay relevant in their business. Machine learning, if implemented well, can have a direct impact on a startup’s ability to grow and raise the next round of funding.
However, early stage companies face unique challenges in developing machine learning driven features and products because the quality of solutions is dependent on the quality of the domain-specific data available to train the models. Startups often lack the data specific to the business problem they are trying to solve and generic datasets are not very useful for the unique problems that they are trying to solve. In order to generate the data they need, the product/feature needs to be live. This dilemma either delays or completely prohibits the use of machine learning for many startups.
To overcome this obstacle, startups can embrace a “lean AI” approach by beginning with simple algorithms and incrementally introducing machine learning complexity into the product as they collect more data. For this to happen, there must be a synergy between the data and the algorithms that have the ability to process the data.
Let’s take a closer look at how a lean AI approach can work.
Algorithms Before Data
Despite the prevailing belief that data enforces algorithm requirements or data comes before the algorithm, there is a way to use algorithms to define the data requirement. This approach allows developers to gradually work their way to a more sophisticated algorithm for a product feature. The technique of defining the data requirement specific to an algorithm is what we refer to as data fingerprinting.
Data fingerprinting can be thought of as a pattern recognition engine that recognizes certain classes of data in the context of the underlying processing algorithm. By starting with a simple algorithm and limited data, it is possible to incrementally create a more complex algorithm by computationally controlling the nature of the data.
Using the data fingerprinting approach, companies can gate the input to certain classes of data that can be handled by the selected algorithm. If the fingerprint of the input data cannot be recognized, a default logic bypassing the main logic is invoked. It is desirable that the fingerprint generation system be able to transform the input data into a fingerprint irrespective of the type of data. In reality, this is not achievable unless we have data specific processing logic before the data can be fingerprinted.
For example, if the input data is text we need a text processing module that converts text into a standard form. If we have images as the input data, we need an image processing module that converts images into a standard form.
A second requirement for the fingerprint system is to be able to incrementally evolve based on the evolution of the underlying algorithm complexity.
Start Simple, Grow Complex
One of the most attractive aspects of data fingerprinting is it lets organizations start small with a simple algorithm and whatever modest amount of data they have for the product feature they want to incorporate machine learning into. They can begin by first identifying a particular machine learning technique such as image recognition or text processing required for the feature.
The next step is to devise a basic algorithm that supports the development of this feature. This algorithm should benefit from an unsupervised learning approach like clustering or a simple supervised learning technique. The critical consideration, however, is pinpointing what type of data the algorithm can use to achieve this objective—which is how the algorithm defines the data requirement, not the opposite. Once developers understand that, they simply have to use only the data that the algorithm can handle to get a basic working version of the desired feature.
Once the feature is built, they can incorporate a more advanced algorithm that refines the goals for the input data and improve the feature’s capabilities over time.
Implications for Startups
Data fingerprinting enforces a synergy between data and algorithms that have the ability to process the data. For startups, the implications are huge. This methodology enables startups to use machine learning in their initial product offerings by implementing simple algorithms to start with and incrementally evolving the algorithm complexity – creating a differentiator It also lets them surmount the limited domain specific training data dilemma they frequently encounter, so they can still build credible machine learning features with the data they do have.
About the author: Manjusha Madabushi is the Co-Founder and CTO of Talentica Software, the global leader in outsourced product development services for early and growth-stage technology companies. Talentica has helped 150-plus startups successfully develop innovative technology products through a combination of deep technology expertise, startup DNA, and focus on client outcomes. Follow Manjusha at LinkedIn.