Data Startup Aims to Make S3 ‘Work Like Dropbox’
Quilt Data emerged from stealth today with a new service that aims to make S3 work more like Dropbox, the handy file sharing service. For about $500 per month, Quilt Data allows teams to securely large share files that are too big to distribute via FTP or Web archives, and simultaneously get visibility into the contents of the file through its preview feature.
Quilt Data‘s two founders, Aneesh Karve and Kevin Moore, met while studying in the graduate computer science program at the University of Wisconsin-Madison, which Karve says is one of the best database schools in the country.
“While we were there, we realized that there’s a ton of data out there and a ton of benefit to be gained from database and data technology” says Karve, who is Quilt Data’s CTO. “But data was too difficult to access.”
After graduating in 2007, Karve and Moore took jobs and internships at firms Microsoft, Nvidia, and Google. It wasn’t until 2015 when they joined forces to found Quilt Data with a plan to radically simplify data access. With $4 million in seed funding from Y Combinator and others, the pair started developing a new system that would change people’s relationship with large files.
“We really weren’t happy with any of the alternatives for sharing even moderately sized data,” Karve tells Datanami. “We looked at GitHub. We looked at Google Drive. And there are these quotas and limitations and we found S3 to be a much simpler system — as long as we could bring a simple user experience and convenience to Quilt.”
Launched today, Quilt Data essentially turns Amazon S3 into a virtual private cloud where groups of trusted collaborators can work with files of all sizes. In addition to automatically configuring S3 security policies on behalf of the owner, it provides the a preview feature that lets users see some of the contents of a large file before they decide to interact with it or download.
The company currently offers browsers for previewing image files, Parquet files, CSVs, and any type of JSON data. The service can also work with other files, such as Avro or ORC, but there are no browsers yet available for them.
Quilt customers can share data in S3 by simply entering the email address of the recipient, who then receive an email that contains a link to open the data in S3. Getting the permissions straight in vanilla S3 can be difficult for the average user, Karve says.
“If you want to use Lake Formation, you’ll be forced to permission people using IAM [Identity and Access Management], which is complex and takes time, whereas with Quilt you can literally invite somebody with an email address,” Karve says. “We do all the permissions management for you transparently.”
While the company just launched publicly, it has already been working in beta mode with customer in the biotechnology and financial services industries. Data scientist and analysts often work with large files in excess of 1GB, which can be hard to manage using traditional tools.
“Quilt makes S3 look a lot more like Dropbox,” Karve says. “So our goal in making S3 accessible is they’ll be able to replace document stores like Dropbox with S3, and now they have a single store of ground truth.”
The company is also offering access to large data sets that Amazon makes available via public S3 buckets. These buckets contain billions of pieces of data, including things like Amazon product reviews, satellite imagery, data from the IRS, Juypter data science notebooks, and genomic data. Customers can work with this public data and even combine it with their private data in their Quilt Data environments. The company also offers Quilt Packages, which allow users to view snapshots of managed S3 buckets at certain periods of time.
Data scientist and analysts are particular about the tools they use, and they often want to get their hands on the data. To that end, Quilt allows them to download the files from S3. However, for data that’s over 1GB in size, Quilt recommends that they leave it in S3, and use AWS services, such as Sagemaker, its automated machine learning tool.
“Because Sagemaker runs in the cloud, transfers are super-duper fast,” Karve says. “With the Sagemaker notebook instance, you can get a lot of disk and you’ll be talking to Quilt on S3 at a much higher rate than your laptop.”
The company is charging $500 per month for a single Quilt bucket. For $1,000 per month, customers can get access to up to five buckets.