Dogged Determination: How Trupanion Pulled AI Across the Finish Line
David Jaw had reason to be excited. As a data scientist at Trupanion, Jaw had just put the finishing touches on the prototype of a machine learning model that could replicate the actions of a human claims adjuster with a high degree of accuracy. He had overcome some big data and data science challenges in creating the system, and couldn’t wait for the pet medical insurance company to put it into action.
“I expected it to be deployed. This will be in production generating value within weeks!” Jaw remembers thinking. “But it took years of going back and forth with engineering and infrastructure, and I didn’t really understand what the holdup was.”
Jaw had just run head first into one of the inconvenient truths about AI today: That the hardest part rarely is the data science itself, but all the other tasks that must completed before one can find success with machine learning in the real world.
Trupanion eventually put the model into production, and today it is automating 30% to 40% of its claims submitted through their patented software with an accuracy level in excess of 99%. The system tells Trupanion customers whether or not a given procedure or prescription is covered within a matter of seconds, while the customers stand at the counter of their veterinarian’s office.
The machine learning model is generating real value for Trupanion at the moment, as measured both in terms of the satisfaction level of customers as well as cost savings for Trupanion. But getting to this point was not a straight line for Jaw or his team at the Seattle, Washington company.
Dog Data Days
Jaw and his data science colleagues overcame some significant challenges in building the system, not all of which had to do with data science. For example, there were significant hurdles with the raw veterinary data, which would provide the basis for training the machine learning models. Jaw likens the vet data to the Wild Wild West.
“There are few standardized rules that are true for all veterinary clinics across the US and Canada,” he tells Datanami. “If you get a procedure done for your dog or medication prescribed in New York City and get the exact same prescription or procedure done in Seattle, they’re going to look entirely different.”
Just about anything might appear in the veterinary invoices. We all appreciate the personal attention that veterinary technicians give to our pets, especially when their injured or sick. But there’s no standard procedure code for “hugs and kisses,” which shows up quite a bit in veterinary records, according to Jaw (they are covered as a no-cost item, he adds).
“We’ll see spelling errors, or the pet just wasn’t feeling well. That’s a medical diagnosis,” he says. “What data science can do there is standardize these procedures into actual discrete buckets of medical conditions that we’re treating.”
Trupanion has a relatively straightforward insurance business: It pays 90% of all legitimate medical claims, as long as the treatment isn’t for a preexisting condition, preventive care, or for non-medical items. Since vet clinics often sell items like dog food and dog beds in addition to providing medical care and medicine, Jaw’s model had to take that into account.
“That’s a very simple task that you just use machine learning to comb the text,” he says. “Does this text contain dog bed, or things like dog bed?”
However, detecting preexisting conditions is a little trickier. To solve this, Jaw built a system that can ascertain whether a pet was suffering from a condition, even when there isn’t specific diagnoses of that condition in the pet’s medical history.
“If your pet already had kidney disease when they signed up for the policy, then we don’t cover the kidney disease,” Jaw says. “At that point we have to convert the free text to kidney disease and then link to all pre-policy history. We have to search the historical medical records of the pet, and the history may not contain ‘kidney disease.’ It will contain a bunch of information that points to kidney disease being a very likely cause of a set of symptoms.”
It would be almost impossible to automate this process with a rules-based approach. But using the power of machine learning and tools like TensorFlow, Scikit-learn, Pandas, and Juypter, Jaw created a machine learning model that could replicate the work of a human claims adjuster to a high degree of accuracy.
“It’s really replicating the knowledge of the claims adjuster,” he says. “We make the joke that everybody who can be a claims adjuster who lives in Seattle, we’ve already hired them. They’re all former veterinarian technicians. They know exactly what medication treats what and what symptoms mean what conditions.”
A Data Scientist’s Best Friend
That first machine learning prototype that Jaw created in 2017 was a monolithic application, which perhaps was part of the problem with deploying it. Since then, Jaw has broken that application up into 15 independent models, which run in parallel on the company’s AWS instances.
There are multiple benefits to modularity, including elimination of tedious code merges and the ability to isolate problems, Jaw says. “Each model gets its own code repository and own AWS hardware, so they’re isolated from each other,” he says. “There’s very little chance for this claim automation process to have a catastrophic failure.”
But getting the models deployed in the first place – the challenge that Jaw ran headfirst into at the beginning of our story — proved to be another challenge altogether. Jaw knew roughly what he needed, but didn’t find a lot of solutions on the market.
“Surprisingly there were few [pre-built commercial] products that I could find,” he says. “There was of course the homegrown solution, just using Docker and containers, where you just create a container and deploy it to a Linux machine yourself. You just manually write up the machine code that’s required to spin up in AWS that’s capable of deploying your models. But that was painful and wasn’t something that data scientists are natively good at or enjoy.”
Jaw eventually found a solution in the guise of Domino Data Lab. The San Francisco company develops a product that automates the management of many aspects of machine learning models, including managing the environment. Jaw did a proof of concept against one other similar product and selected Domino in the end.
“It gives us easy ownership over the non-data science modeling parts of putting models into production,” Jaw says of Domino. “It’s making sure the hardware is spec’d correctly. It’s making sure we’re not using a really expensive, really powerful machine when we don’t need to. We can spin down models that are no longer in use We can spin up new models if we want to try something different.”
Domino is not just a place where Trupanion promotes models to production, but where the data scientists do their actual work, Jaw says.
“That’s useful because everything is in a controlled environment,” he says. ” You can log in from any computer and have this workspace ready for you, and everybody has the same workspace with all the the same version of all the tools we use. Not having a controlled environment to do prototyping work is painful. You have to make sure everyone’s tool is the same tool.”
The 15-part prediction system is used to evaluate every claim that’s processed through Trupanion’s app (claims received via fax or mail are not automated).
But as good as it is, the model approves or rejects only three or four out of 10 claims. That’s because the model is expected to meet the 99% accuracy threshold that it sets for human claims adjusters, Jaw says. If the accuracy of the prediction is below that level, the decision is kicked out to a human.
Jaw and his colleagues are constantly refining the model in the hopes of increasing the automation level. The company actually maintains twin identical streaming data pipelines – one for the production inference engine, and one for testing.
“As they make predictions, they differ,” Jaw says. “There’s a difference between test and prod because we’re constantly making improvements to test. Once we make improvements, we monitor the results as they come in, and once we’re convinced that our test version of automation is better than prod, then we’ll just promote test to production.”
The creation of this duplicate system represented another valuable lesson in his career as a data scientist at Trupanion, says Jaw, who recently spoke at a Domino conference.