Follow Datanami:
August 10, 2016

TaxBrain Brings Open Data Science to Federal Models


Some of the most important decisions made in Washington D.C. are influenced heavily by sophisticated computer models refined over the course of decades. However, most of the models are proprietary, which hinders the ability to improve them and to weigh underlying assumptions that affect the results. Now, a new initiative dubbed TaxBrain seeks to jumpstart an era of government transparency and openness with a set of new open source models written in Python that anyone can access over the Web.

TaxBrain was launched in April by the Open Source Policy Center (OSPC) at the American Enterprise Institute, a Washington D.C.-based think-tank that has historically supported conservative causes. The idea behind the initiative, which was developed with software and assistance from Continuum Analytics, is to encourage government transparency and democratic participation by lowering the barriers to accessing and understanding how many of the most critical forecasts are made.

OSPC founder and managing director Matt Jensen explains the significance of the secret models–mostly written in SAS and Fortran–that are currently used by dozens of analytic groups working in Congress, the White House, and the Treasury Department, not to mention independent think-tanks such as the AEI.

“These are probably the most critical models known to mankind,” Jensen tells Datanami. “It sounds crazy to say that, but the actions of government and tax policy and spending policy affect millions of people on a daily basis.”

Whenever a bill gets past, people want to know how much money does it raise, how much money does it spend, and how it affects the rich and the poor. “Those numbers come from models,” he says. “Every important expected outcome of a policy is being estimated by these models, and those estimates absolutely affect policy decisions.”

Cracking Black Boxes

The problem with the status quo around these models is two-fold. First, most of them are proprietary, which means the creators of the models aren’t sharing them. And even if they did share them, the third-party would need a license to the runtime environment. SAS isn’t known for giving its runtime away.

But with TaxBrain, the OSPC hopes to get government modelers and the populace at large hooked on economic models based on open source data science tools. OSPC chose Python as the language of choice because it’s open and well-adopted in the data science community, and it chose Continuum Analytics to provide the Python runtime with its open source Anaconda suite of tools.

shutterstock_chalkboard_equations_SFIO CRACHO

TaxBrain hopes to bring openness to the many variables and assumptions that underlie federal economic models (SFIO CRACHO/Shutterstock)

Continuum CEO Travis Oliphant’s is enthusiastic about the idea of shining a light on the computer models the government uses to create public policy.

“One of the most interesting aspects of the open data science movement was really getting the models accessible to broader group,” he says. “The challenge there is, of course, the people doing the models don’t want to change the way they write the models. If I have a SAS model, I don’t really want to spell it differently. I spent a lot of time trying to figure out how to make it work in this language. The nice thing about Python is it’s very flexible and you can create interfaces that are closer to what you’re used to.”

With TaxBrain, the OSPC has started the process of creating new economic models written in Python, and accessible as a Web application. Since the economic models themselves are open source, they it can be freely downloaded and analyzed by academics and data scientists who understand Python and are therefore qualified to judge whether the model was constructed in the correct way.

Pesky Variables and Assumptions

There can be hundreds of variables in a single model that influence its outcome in ways both big and small, so getting more eyeballs on the code is a critical aspect to ensuring that the variables are correctly coded and implemented. The same goes for underlying assumptions that get baked into the models, which are important for understanding the functioning of feedback loops, such as taxbrain_logowhether and how a tax decrease impacts employment.

“I can tell you there are a whole lot of assumptions that are being made [in the models] that aren’t being disclosed,” OSPC’s Jensen says. “Many of these assumptions are being made by government modelers, but they don’t disclose what the assumption that they make are, so an external [party] can’t say whether the assumptions line up with the empirical literature.”

This level of secrecy is anathema to the democratic ideals of transparency and participation in government, Continuum’s Oliphant says.

“If you’re a financial hedge fund, maybe it’s a good deal that nobody can read your models unless they have your internal structure,” he tells Datanami. “Maybe that’s a good thing because you don’t want people to share what you’re doing unless they have your secret sauce. But for a government or for a scientist, whose purposes are to communicate about how the law translates to a mathematical model,” that’s not such a good thing.

It’s a lot to ask government modelers who have been using SAS or Fortran for decades to suddenly switch to TaxBrain’s Python-based tools. That’s why another important aspect of the project is a Web interface in front of the models, which are accelerated using Anaconda’s N-Python compiler.

“They don’t even know that Anaconda runs things,” Oliphant says. “They just see a Web application where they can explore and make adjustments and then fire off a model that runs across cluster of machines that all use Anaconda.”

Forecasting Model Success

So, what does winning look like for TaxBrain? Surprisingly, it doesn’t require convincing every government modeler that Python is the superior language for writing models. According to Jensen, TaxBrain will succeed through openness and trust.

“The official modelers don’t have to abandon what they’re doing,” he says. “The policy maker and public have to get to the point where they say ‘We trust the open thing that we know what it’s doing and why, where we can see the development and we can actually work on it and make it better and understand it. We trust that more than we trust the black box.’ “When you get to that point, you’ve won.”

It may be surprising to know that the most important tax policy decisions are made under the cloak of computer secrecy. In a democracy known for its ideals of freedom of information and transparency of government officials, it’s curious that we trust unseen bureaucrats to use secret codes to forecast what impact laws will have.

“The wheels of government move slowly, but I think we’re headed in the right direction,” Oliphant says. “Data science is really starting to help businesses…[and] improve service to customers with data. Government has exactly the same opportunity. We have data and we can improve our administration and democracy, and improve how well we deliver services to the country using these data.”

Related Items:

New Platform Visualizes Open U.S. Data

FDA Releases Medical Device Database