Follow Datanami:
April 18, 2022

Frictionless Shopping at the Enterprise Data Store

(unicro/Shutterstock)

Storage is no longer a big problem at this stage in the data game. Neither is processing power. But finding the right piece of data and getting it to where it can have impact remains an unresolved challenge. One company with a novel solution is TickSmith, a Databricks-funded firm that some have called “the Shopify of data.”

TickSmith CEO and co-founder Francis Wenzel has been at the big data integration game longer than some folks in this business have been alive. While he was still in high school, way back in 1986, he started a software company in that traded in the closing prices of stocks.

“We had some teachings on the stock market, thought it was interesting, and decided to build some software–started investing ourselves,” Wenzel said. “Every month we would ship out floppy disks that contained the month’s trading activities or end-of-day quotes.”

If a customer was tracking IBM shares, for instance, they would get the four floppy disks that contained the trading data from Wenzel’s firm, load it onto his PC, and then print out the data. It wasn’t a huge amount of data by today’s standards, but hard disks back then were expensive and small. And obviously you couldn’t get this data off the Internet, which was still in its infancy.

By 1994, Wenzel was helping to develop an online trading platform at Exchange Market Systems, which would be used by millions of Canadians through major banks. That company was acquired by SunGard, where he led development of its financial products until 2011.

His next data integration challenge was at TickSmith, which Wenzel co-founded in 2012, near the start of the high-frequency trading craze.

“A lot of firms were asking for tick data, basically, to be able to explore getting into the game, and there were very few alternatives out there, simply because it was a big data problem and people didn’t know about big data technology,” Wenzel said.

TickSmith had access to 400TB of tick data–which refers to the fine-grained up-and-down movements that individual stocks make throughout a trading day. The challenge was that this data was in 300 different formats, Wenzel said.

“We had to take it in, normalize it, run analytics, and deliver it back out,” he said. “We ended up building a full software stack on top of Hadoop at a time when big data and Hadoop were unknown.” (Unknown outside of Datanami, of course, which has been tracking the evolving big data stack since it started way back in 2011.)

Wenzel and his partners coded their data processing system in Pig and MapReduce, which were the earliest programming framework available to Hadoop users. That initial Hadoop cluster ran in the basement of one of the company’s founders.

“We literally had to build our own hardware, because you couldn’t buy a Hadoop node back then,” he said. “When we started all of our servers, it sounded as if a plane was taking off. And his electricity bill sure took off.”

TickSmith CEO and co-founder Francis Wenzel

TickSmith eventually re-developed the data processing stack in Apache Spark and moved it to AWS, which cut down on the shuddering floorboards and sky-high utility bills. The company also started building other data solutions around it, including an online data store and compliance solutions.

Around 2015, the Montreal-based company pivoted to focus exclusively on the data store opportunity. Today, TickSmith develops tools that allow a company to set up their own enterprise data store, which they can host on their own servers or in the cloud, and start monetizing data themselves.

The company’s solution, which runs exclusively on AWS, automates a range of tasks for the customer, including developing the e-commerce GUI that delivers a point-and-click experience for data shoppers, as well as the data plumbing that moves data behind the scenes.

Moving the data around via various data pipelines is the big challenge today, Wenzel says. Whether it’s exposing an API, querying data via SQL, moving it via automated FTP, or copying it directly into a cloud bucket, TickSmith’s tools are designed to take handle the data engineering automatically, thereby taking the development and operational burden off the customer.

TickSmith relies on Databricks Delta Lake technology to automate some of the transformation tasks. In fact, the company was the first company to get Delta Share up and running after Databricks announced it just over a year ago. Databricks’ venture capital arm has also made an investment in TickSmith.

“It’s really the convenience of just making the data available the way that that people need it,” Wenzel told Datanami. “The concept of the data store is not just a storefront that shows the data. It’s all of the data pipeline behind it that takes the data and transforms it into what we call data products.”

Data sellers rarely just offer a giant lump of data. Instead, they offer more granular data products, such as certain historical data sets, or even just individual fields contained within a database table. Selling that fine-grained detail can be more valuable, and also help avoid privacy and security issues.

CME Group’s data-selling site is powered by TickSmith

One of TickSmith’s customers is CME Group, the world’s largest provider of financial derivatives. According to The Economist, CME Group is “the biggest financial exchange you have never heard of.”

When customers shop for derivatives data on the Chicago, Illinois-based company’s website, they’re interacting with the TickSmith platform. CME Group has been a TickSmith customer for the past six years, and over that time, it has attracted a wide variety of types of customers, Wenzel said.

“It’s anything from financial institutions and hedge funds that are looking for data, academics. But even farmers,” he said. “I’m farming soy and I need some background as to how the soy traded in the past. It’s going to cost me a little bit of money, but I’ve got the best content available for them.”

But it’s not just industrial giants like CME Group that have a need for an online data store. Companies across all types of industries are finding that they’re sitting on a potential gold mine with their data–they just need a better way to bring it to market.

“Ultimately it just goes back to the fact that there’s a need to distribute data and there’s a need to access data,” Wenzel said. “We have investors who have called us ‘Shopify for data.’ But in the end we want to ensure that getting access to data and consuming it is as simple as consuming any physical good or service that you can buy on an Amazon or any other type of online store.”

Don’t be fooled into thinking only the big dogs can host a data marketplace. Sure, AWS and Snowflake are making big moves to facilitate the buying and selling of data. But those marketplaces come with certain limitations, whereas data marketplaces set up with TickSmith’s software remain under the full control of its customers.

“If I sell data through AWS Data Exchange, the customer has to be an AWS customer, because he can only receive it into his S3 bucket,” Wenzel said. “If I sell my data using the Snowflake Data Marketplace, well, the customer has to have a Snowflake account because that’s the only way he can reach the data.”

Wenzel supports customers selling their data through whatever means they can, including the big data marketplaces. In fact, TickSmith offers tools that makes that easy to do.

“But your customers should not have to be locked into a specific technology stack to have access to your data,” Wenzel said. “Ultimately, if I’m selling this as a product, there shouldn’t be any limits as to where I sell it from.”

Related Items:

What to Look for in a Data Catalog

All Eyes on Snowflake and Databricks in 2022

Sell Your Data, Earn Some Money

Datanami