Follow Datanami:
April 15, 2013

IRS to Utilize Big Data to Improve Returns

Ian Armas Foster

Today, April 15, 2013, is tax day in the United States. Millions of Americans are assuredly scrambling to file their returns today and most will be doing so electronically. Specifically, the IRS expects 80 percent of tax returns, or 250 million, to be filed online. That represents a significant big data problem in identifying the veracity of said filings.

With the cutback in Internal Revenue Service (IRS) personnel as a result of this year’s budget cuts, the IRS is looking more towards computer-generated audits to track down and resolve an estimated $300 billion in lost revenue per year as a result of errors and evasions.

As detailed in this report, the IRS will be taking advantage of an extensive big data infrastructure to utilize financial and social information made available to the government in finding errors.

Reportedly, the IRS is looking to use this information, culled with the help of companies like IBM and EMC, in various ways that may not propagate themselves in more than a research effort this year. Those efforts include charting social media data from Facebook, tracking internet addresses and emailing patterns, and finding relationships between social security numbers and their respective spending patterns.

The notion is to generate and track one million ‘unique’ attributes that would form a personalized code that would let the IRS tap into each individual’s financial behavior. Testing of such a system reportedly started last year, as the agency was able to create tax profiles of 1500 test subjects and recovered $200 million.

The process of expanding that system across the entire population of the United States has been aided by an exponential increase in the IRS’s processing power. According to Jeff Butler, the Director of Research Databases for the IRS, the agency can now load all tax returns in just ten hours as opposed to four months as was the case in 2005.

Those online filings amount to 15 terabytes, a mere fraction of the 1.2 petabytes the agency has reportedly collected through various social and financial means.

So what does this all mean? On the one hand, the government is using this data in much the same way that private financial companies collect and process data: to better understand the constituents and ultimately increase revenue. However, using what the general populace believes to be private social information against people in potential computer-generated audits raises questions.

For example, there exist questions as to whether the current IRS regulations are properly adapted to a world where the agency can access swaths of electronic information. “I don’t really see strong legal regulation in place to manage something of this magnitude,” argued Paul Schwartz, University of California law professor and co-director of the Berkeley Center for Law & Technology.

As it stands, this trial period for how the IRS uses big data to find lost tax revenue will be crucial as standards are set and regulations are formed.

Related Articles

Big Data on the Range in OK

Big Data, Smart Government

Civic Hacking Targets Optimizing Government

Datanami