Follow Datanami:
May 25, 2017

Under Contract: Crowd-Sourced Zillow Algorithm


Zillow, the home sales tracking web site, has transformed the real estate business, but its proprietary algorithm for estimating home values hasn’t always measured up.

In tacit acknowledgment of that shortfall, the Seattle-based company launched a machine learning competition this week that will award $1 million to the individual or team that comes up with the best improvement to its “Zestimate” home valuation tool. Data scientists would compete to improve the accuracy of property valuations on about 110 million homes in the U.S.

The company said the competition marks the first time part of its proprietary data behind the Zestimate home valuation will be available to outside researchers.

The metric represents an estimated market value based on public and user-submitted data, including location, lot size, square footage and the number of bedrooms and bathrooms. Historical data like real-estate transfers and tax information are also factored in, as are sales of comparable houses in a neighborhood.

Given the volatility in the U.S. housing market, punctuated by the collapse of the late 2000s, the accuracy of the Zillow valuation algorithm has continued to attract home sellers and buyers.

For its part, the company claims its current “U.S. median absolute percent error” rate stands at 5 percent, down from 14 percent in 2006 when the service was launched.

Still, it acknowledges there is room for improvement. “We know the next round of innovation will come from imaginative solutions involving everything from deep learning to hyper-local data sets—the type of work perfect for crowdsourcing within a competitive environment,” noted Stan Humphries, Zillow Group’s chief analytics officer and creator of the Zestimate home valuation.

The Zillow estimate is combined with other market data such as recent home sales and market intelligence gathered by local real estates agents to help home sellers establish sales prices and buyers determine what’s selling and for how much.

Kaggle, a platform designed to connect data scientists with complex machine learning problems and host to data science competitions, is administering the competition. Zillow said Wednesday (May 24) the contest would consists of two rounds: a public qualifying round that extends through Jan. 17, 2018, and a private final round that runs from Feb. 1, 2018, to Jan. 15, 2019.

Competitors in the qualifying round have until Oct. 16, 2017, to register, download the competition data set and develop a model to improve the Zillow estimate residual error. The top 100 teams able to reduce the difference between the Zestimate home valuation and the actual sale price of the homes within the dataset would then compete for the $1 million dollar prize.

The winning team must build an algorithm to predict the actual sale price itself, using data sources to design new features that will give the model an edge over other competitors, Zillow said.

Additional details on the Zillow Prize competition are available here.

Recent items:

Inside the Zestimate: Data Science at Zillow

Google Buys Data Science Competition Site Kaggle