Follow Datanami:
July 9, 2020

cnvrg.io and NetApp Partner to Deliver MLOps Dataset Caching

SAN FRANCISCOJuly 9, 2020 — cnvrg.io, the data science platform simplifying model management and introducing advanced MLOps to the industry, announced its partnership with NetApp, the first to leverage the cnvrg.io dataset caching tool, a unique set of capabilities for immediate pulling of datasets from cache for any machine learning job. cnvrg.io is the first ML platform to use dataset caching for end to end machine learning development. Caching allows datasets to be ready to use in seconds rather than hours, and cached datasets can be authorized and used by multiple teams in the same compute cluster connected to the cached data. Dataset caching is already used by cnvrg.io customers at production level.

It’s not uncommon to have hundreds of datasets feeding models. However, those datasets may live far away from the compute that is training the models, such as in the public cloud or in a data lake. With NetApp and cnvrg.io’s dataset caching capability, users can cache the needed datasets (and/or their versions) and make sure that they’re located in the ONTAP® AI storage attached to the GPU compute cluster or CPU cluster that is exercising the training. Once the needed datasets are cached, they can be used multiple times by different team members.

The cnvrg.io dataset caching feature can be used by any cnvrg.io user with the ONTAP AI storage server. Once connected to an organization, data scientists can cache commits of their dataset on that Network File System (NFS). When a commit is cached, users can attach it to jobs for immediate high throughput access to the data, and the job will not need to clone the dataset on start-up. cnvrg.io’s dataset caching feature creates the following business advantages:

  • Increased productivity – Datasets are ready to be used in seconds rather than hours.
  • Improved sharing and collaboration – Cached datasets can be authorized and used by multiple teams in the same compute cluster connected to the cached data.
  • Reduced cost – Models are pulling the datasets from the cache, reducing payments per download.
  • Operationalizing hybrid cloud – Dataset cache presents an on-premises high performance mirror storage.
  • Multi-cloud dataset mobility – with on-prem cache as control point for the data.

“Deep Learning workloads are unique in that they need access to random data samples from a large dataset that may be sourced from diverse data sources and dispersed locations,” said Santosh Rao, Senior Technical Director, NetApp AI & Data Engineering. “Further, Deep Learning requires high performance data close to the GPU Compute clusters and this requires the combination of High Performance Flash Storage Systems, Connectors into Edge, Core and Cloud for dispersed data location access and the support of widely used Data Source formats across NFS or other filesystems on a unified Data Platform. NetApp and cnvrg.io form a first of its kind partnership to bring these capabilities to customers worldwide adopting Deep Learning to transform their business.”

“Our partnership with NetApp drives productivity and efficiency for data teams.” says Yochay Ettun, CEO & Co-founder of cnvrg.io. “We’re excited to launch our dataset caching for machine learning, to offer NetApp users and cnvrg.io users faster and simplified access to their datasets with tools for advanced data management and data versioning features that will allow data teams to focus on data science over technical complexity.”

To read more about the partnership and the dataset caching, visit https://cnvrg.io/solutions/netapp/.

About cnvrg.io

cnvrg.io is an AI OS, transforming the way enterprises manage, scale and accelerate AI and data science development from research to production. The code-first platform is built by data scientists, for data scientists and offers unrivaled flexibility to run on-premise or cloud.


Source: cnvrg.io

Datanami