Spring Strata 2018

Making Hadoop Relatable Again

There has been much debate over the future of Hadoop in recent months. Should it work more like a cloud object store? Should it support GPUs and FPGAs, Docker or Kubernetes (or both)? Should compute and storage be separated in Hadoop? Is it even necessary anymore? The folks at Splice Machine have their own take: If you make Hadoop look more like a relational database, then people will do more with it. Read more…

Feature Articles from Spring Strata 2018

What Kind of Data Scientist Are You?

(3/15/2018)

If you’ve worked with the data science community, you’ve probably interacted with data scientists and formed a definition for the increasingly popular position. But it turns out, not all data scientists are alike, and according to a recent analysis by researchers at UCLA and Microsoft, there are actually nine different types of data scientists. Read more…

Top 3 New Features in Apache Spark 2.3

(3/14/2018)

It’s tough to find a big data project that’s had as much impact as Apache Spark over the past five years. The folks at Databricks, who contribute heavily to Spark (along with the wider Spark community) are keeping the project on the cutting edge with version 2.3.

Apache Spark 2.3 was unveiled by the Apache Spark project on February 28, also forms the underpinning for version 4.0 of the Databricks Runtime, which Databricks unveiled last week during the Strata Data Conference. Read more…

Jeff Dean Thinks AI Can Solve Grand Challenges–Here’s How

(3/13/2018)

In 2008, the National Academy of Engineering presented 14 Grand Challenges that, if solved, had the potential to radically improve the world. Thanks to recent breakthroughs in artificial intelligence – specifically, the advent of deep neural networks — we’re on pace to solve some of them, Google Senior Fellow Jeff Dean said last week at the Strata Data Conference. Read more…

Strata Speakers Drop Clues on Winning with Data Science

(3/08/2018)

The rapid pace of technological innovation is giving organizations amazing new capabilities in the field of data science. These advances are lowering the barrier of entry and super-charging data science capabilities for organizations around the world. With the playing field leveled somewhat, we look to the Strata Data Conference this week for clues on what will separate the data science winners from losers. Read more…

Blowing Up Silos In a Big Data World

(3/06/2018)

A startup named data.world is embarking upon a grand experiment to build a collaborative data platform that links together data, people, and their analytic tools. By eradicating data silos and building a social community around data, the firm is betting that it can grease the wheels on insight discovery and unleash a network effect on data. Read more…

Streamlio Claims Pulsar Performance Advantages Over Kafka

(3/06/2018)

Streamlio, a startup created a real-time streaming analytics platform on top of Apache Pulsar and Apache Heron, today published results of stream processing benchmark that claims Pulsar has up to a 150% performance improvement over Apache Kafka. The company also unveiled a new processing framework called Pulsar Functions.

In the battle for stream processing supremacy, there’s one platform that has developed an early advantage over all the others: Read more…

Ben Lorica on What to Expect at Strata Data Conference

(3/05/2018)

Thousands of data-obsessed technologists will descend on Silicon Valley this week to take part in the Strata Data Conference’s annual West Coast swing. Datanami caught up with O’Reilly Media’s Chief Data Scientist Ben Lorica, who’s also Strata’s program chair, to get the low down on the show’s high-tech expectations. Read more…

News in Brief from Spring Strata 2018

Syncsort Tacks to Address 4 ‘Megatrends’

(3/20/2018)

Syncsort’s ship got bigger last year when it acquired Vision Solutions, a provider of data availability and security tools. Now the New York software company is bringing that cargo to bear for its next voyage: helping customers cope with the data management ramifications of four converging “megatrends.”

Syncsort CTO Tendü Yoğurtçu shared some of her company’s plans with Datanami at the recent Strata Data Conference in San Jose, California. Read more…

StreamSets Balances Streaming Data Demands for Security, Access

(3/07/2018)

It can be difficult to find the right balance between protecting data and utilizing it. With its new Data Protector offering unveiled yesterday at Strata Data Conference San Jose, StreamSets thinks it has found a happy medium, at least for data in motion.

StreamSets develops tools to help organizations create and manage data pipelines, and make that data available for processing, analysis, and other uses. Read more…

Confluent Adds KSQL Support to Kafka Platform

(3/07/2018)

The latest version of Confluent’s Kafka-based platform incorporates an open source streaming engine for Apache Kafka designed to allow developers using SQL to build real-time, streaming applications.

Confluent, the company behind open source Kafka and developer of the Confluent Platform, announced the general availability of its KSQL streaming engine on Wednesday (March 7). Read more…

Inside MapR’s Support for Kubernetes

(3/06/2018)

MapR today announced that customers can now run and deploy applications on MapR’s big data cluster utilizing the Kubernetes containerization technology. In addition to providing data statefulness, it also gives MapR customers a new way to move workloads from on-prem to cloud platforms.

The delivery of a native Kubernetes K8S volume driver seems like a relatively simple thing, acknowledges MapR‘s Senior Vice President Data and Applications, Jack Norris. Read more…

Orchestrator Emerges to Speed ML Models to Production

(3/06/2018)

As the pace of machine learning model development accelerates, vendors are beginning to offer orchestration tools designed to help data scientists manage the testing, retraining and redeployment of predictive analytics models with short shelf lives. The latest entrant is Hitachi Vantara Labs, which unveiled a model manager this week designed to speed the deployment of “supervised” models in production. Read more…

Cloudera’s Vision for Cloud Coming Into Focus

(3/06/2018)

Cloudera today unveiled a host of new cloud-based offerings — including Cloudera Altus Shared Data Experience (SDX), a cloud-based machine learning offering, and a cloud-based SQL data warehouse offering — that get it one step closer to meeting its vision for the type of secure yet flexible, cloud-based data processing capabilities that its clients demand. Read more…

This Just In from Spring Strata 2018

Share This