Follow Datanami:
September 27, 2017

Five Companies to Watch at Strata Data Conference

Just like the Big Apple, the big data industry doesn’t sleep, and innovations are constantly occuring. Here are five companies exhibiting at the Strata Data Conference this week that are either new, or are doing something new and innovative with big data.


Lightbend, formerly known as Typesafe, originally created a development platform for building microservices and distributed Web apps, but so many people asked it about streaming analytics apps that it decided to build the Fast Data Platform.

Led by VP of Fast Data Engineering Dean Wampler, the San Francisco company analyzed the major streaming engines and settled on four to support within its Fast Data Platform: Spark Streaming, Kafka Streams, Akka Streams, and Flink.  Its platform provides a framework for developing streaming analytic apps using these engines, as well as pulling in data from surrounding repositories like HDFS.

Developers don’t need to know Scala, but they do need to know how to work with JVMs.  The platform itself runs on DC/OS containers, and it features management and monitoring by way of OpsClarity, which the company decided to buy. The company has received $52 million in funding  since being founded in 2011, and officially launched its product this week at the Strata Data Conference.


Immuta is a Maryland-based data virtualization startup that’s building software that provides a single point of control for accessing data from a variety of downstream systems. Instead of giving customers direct access to HDFS or MongoDb or S3, Immuta’s software brokers the connection on behalf of the user.

The product also serves as a data catalog to facilitate better data sharing in a multi-source environment, and also supports governance features to enable better data tracking.

The system also enforces security controls to prevent unauthorized users from viewing sensitive data. Instead of configuring access controls on each data source, Immuta provides a centralized point to define who can access what data, thereby simplifying the data security landscape.


Skymind is a big data startup out of San Francisco that provides a deep learning library. The company was founded by deep learning expert Adam Gibson, the creator of the Deep Learning 4 Java (DP4J) library, and has a simple goal: “to make deep learning simple and accessible to enterprises.”

In addition to developing the open source DL4J library, Skymind develops a product called the Skymind Intelligence Layer (SKIL) and the SKIL Model Server, which is designed to make it easier for users to run and take advantage of the DL4J library in Hadoop and Spark environments.

The company has received $3 million in seed funding and has more than a dozen customers in a variety of industries who are trying to apply deep learning for anomaly detection, computer vision, natural langue processing, recommender systems, machine transcription, face and voice recognition, and time series predictions.


Bigstream is a big data startup based in Mountain View, California that’s looking to make it easier for users to take advantage of hardware accelerators like GPUs, FPGAs, and ASICs without coding their applications specifically for these hardware types.

The company says its patented software, called Bigstream Hyper-acceleration Layer (HaL), will eventually allow customers to get the maximum performance out of applications built using big data frameworks like Hadoop, Spark, Hive, Storm, and TensorFlow. Currently it only supports Spark.

The big benefit of using HaL is customers don’t need to utilize compiler technology like OpenCL or Cuda to take advantage of the 30x or more performance boosts that GPUs and FPGAs can provide compared to running on CPUs. Its optimization software can also provide up to a 3x boost running on plain CPUs, it says.


RapidsData is a Chinese company that’s building an integrated big data software stack that includes relational, analytics, and streaming data technology. The company’s RapidsData Platform (RDP) offering consists of several integrated components built upon a Hadoop/YARN base, dubbed Rapids Hadoop.

Sitting atop this base are RapidsDB, a SQL-compliant relational database;  Rapids StreamDB, a stream processing system with millisecond latency; Rapids ParallelR, a distributed R computational engine; and Rapids Manager, a console for managing the cluster and developing queries. Sitting atop the platform is Rapids Federation, which provides external data connectors.

The company, which is a subsidiary of the Boray Data Technology Co. Ltd. Of Bejing, China, has a number of customers in China and is now looking to break into the North American market. It lists offices in California and Washington and is looking to break into the telecommunications and financial services industries, among others.

Related Items:

Former Yahoo Unit Releases Vespa Engine

Cloudera Eyes Uniform Data Experience for All

Strata Data Conference Kicks Off in New York