Follow Datanami:
October 10, 2018

Kubernetes Is a Prime Catalyst in AI and Big Data’s Evolution

James Kobielus

(Olga Salt/Shutterstock)

Kubernetes is becoming synonymous with cloud-native computing. As an open-source platform, it enables development, deployment, orchestration and management of containerized microservices across multicloud ecosystems.

Kubernetes is the key to cloud-native microservices that are platform agnostic, dynamically managed, loosely coupled, distributed, isolated, efficient, and scalable. The maturation of Kubernetes continues to deepen as it leverages containers, orchestrations, service meshes, immutable infrastructure, and declarative APIs.

One clear indicator of Kubernetes’ maturation is the rich ecosystem of other open-source projects that have grown up around it. These include a remote procedure call (gRPC), container network interface (CNI), DNS-based service discovery (CoreDNS), packaging environment (Helm), service mesh and proxy (Istio and Envoy), serverless interface (Kubelets), VM-coexistence environment (Kubevirt), stateful semantic interface (StatefulSets), transaction-troubleshooting interface (Jaeger debugging environment (Linkerd), object storage system (Rook), and messaging environment (NATS).

Another sign of Kubernetes’ growing dominance is rapidly expanding enterprise adoption. The Cloud Native Computing Foundation’s recently released bi-annual enterprise user survey reports that use of cloud-native technologies in the enterprise has grown more than 200 percent since December 2017. Kubernetes is far and away the top choice for container management, with 83 percent saying they use it in private, public, hybrid, or multicloud deployments. Around 40 percent of enterprise respondents reported that they are now running Kubernetes in production.

Among cloud-computing vendors, Kubernetes is a key competitive focus. All of the leading public-cloud providers have made sizable investments in this open-source technology. AWS, MicrosoftGoogleIBM, Oracle, and Alibaba all have their respective Kubernetes engines, as do Red HatCisco, VMware, and others.

Kubernetes is the foundation for the new generation of artificial intelligence (AI), machine learning (ML), data management, and distributed storage in cloud-native environments. In this regard, the most noteworthy development over the past several months has been the recrystallization of the data ecosystem around Kubernetes. The most significant Kubernetes-related vendor announcements were the following:

  • Announcement of the Open Hybrid Architecture Initiative by Hortonworks, IBM, and Red Hat, under which is an attempt to modularize and containerize Hadoop in its entirety, orchestrate Hadoop-based DevOps pipelines and workloads over Kubernetes, and evolve those vendors’ respective solution portfolios toward full implementation of the emerging framework for hybrid, edge, and streaming deployments;
  • Launch of the new Apache Hadoop Ozone subproject, which is developing scalable distributed object store that is designed for containerized environments such as Kubernetes in which storage and compute have been decoupled;
  • NVIDIA’s release of its TensorRT inference server, a software solution that containerizes and uses Kubernetes for orchestrated deployment of TensorRT, TensorFlow, or ONNX models to heterogeneous clusters of GPUs and CPUs in premises-based data centers and clouds;
  • Dataiku’s release of version 5 of its ML pipeline automation tool, which, among other new features, adds full containerization capabilities for in-memory processing of Python & R code, with automatic deployment to Kubernetes clusters for computation elasticity, allowing for more scaling and easier isolation of resources;
  • Announcement of IBM’s deal with MayaData to bring OpenEBS, which provides, hyperconverged block storage for stateful applications on Kubernetes, to its Cloud Private environment; and
  • Announcement by Lightbend of a Kubernetes-optimized version 2.0 of its Fast Data Platform for designing, building, and deploying Reactive streaming microservices on Kafka, Spark, Akka, HDFS, and other environments.

These and other innovations will further accelerate the decoupling, containerization, and orchestrated deployment of every component of the AI ecosystem all the way to the edge. This trend has been building for some time. As I discussed in this Wikibon research note from last year, AI developers are increasingly building these capabilities as functional primitives for containerization, orchestration, and management as microservices.

Before long, the AI functional primitives exposed as microservices will include both the coarse-grained capabilities of entire AI models (e.g., classification, clustering, recognition, prediction, natural language processing) and the fine-grained capabilities (convolution, recurrence, pooling, etc.) of which those models are composed. In the emerging world of platform-agnostic Kubernetes-based AI pipelines, these functional-primitive microservices will have the following core capabilities:

  • Orchestrate complex patterns within a distributed AI control plane;
  • Expose independent, programmable RESTful API, so that they can be easily reused, evolved, or replaced without compromising interoperability.
  • Support development in different programming languages, algorithm libraries, cloud databases, and other enabling back-end infrastructure.
  • Rely on a back-end middleware fabric for reliable messaging, transactional rollback and long-running orchestration capabilities
  • Expose stateless, event-driven, and serverless interfaces to execute transparently on back-end cloud infrastructures without developers needing to know anything about where or how the IT resources are being provisioned from; and
  • Support accelerated AI development through an abstraction layer that compiles declarative program specifications down to AI model assets at every level of granularity.

We are a sure to see growing industry adoption of the AI pipeline frameworks defined under the Kubeflow and Seldon projects. They are designed to simplify and scale the framework-agnostic modeling, training, serving, and management of containerized AI models across Kubernetes/Istio multiclouds and edge environments. They support continuous integration and deployment of loosely coupled AI applications into production cloud-native environments, with full lifecycle updating, scaling, monitoring, and security over the deployed microservices.

Before long, AI functionality will be so easy to decouple that you’ll be able to embed whole models—or even the tiniest pieces of them—on the edge in mobile devices, Internet of Things endpoints, and so. Across the Kubernetes-based cloud ecosystem, AI-driven intelligence will be thoroughly embedded in every edge, hub, and cloud service. AI functionality will soon be decoupled so thoroughly and disseminated so broadly that it will seem to disappear.

About the author: James Kobielus is SiliconANGLE Wikibon‘s lead analyst for Data Science, Deep Learning, and Application Development.

Related Items:

Cloud Looms Large at Strata, and So Does Kubernetes

One Containerized View of Data Science’s Future

Developers Will Adopt Sophisticated AI Model Training Tools in 2018

Datanami