Spring Strata 2018
There has been much debate over the future of Hadoop in recent months. Should it work more like a cloud object store? Should it support GPUs and FPGAs, Docker or Kubernetes (or both)? Should compute and storage be separated in Hadoop? Is it even necessary anymore? The folks at Splice Machine have their own take: If you make Hadoop look more like a relational database, then people will do more with it. Read more…
Feature Articles from Spring Strata 2018
If you’ve worked with the data science community, you’ve probably interacted with data scientists and formed a definition for the increasingly popular position. But it turns out, not all data scientists are alike, and according to a recent analysis by researchers at UCLA and Microsoft, there are actually nine different types of data scientists. Read more…
It’s tough to find a big data project that’s had as much impact as Apache Spark over the past five years. The folks at Databricks, who contribute heavily to Spark (along with the wider Spark community) are keeping the project on the cutting edge with version 2.3.
Apache Spark 2.3 was unveiled by the Apache Spark project on February 28, also forms the underpinning for version 4.0 of the Databricks Runtime, which Databricks unveiled last week during the Strata Data Conference. Read more…
In 2008, the National Academy of Engineering presented 14 Grand Challenges that, if solved, had the potential to radically improve the world. Thanks to recent breakthroughs in artificial intelligence – specifically, the advent of deep neural networks — we’re on pace to solve some of them, Google Senior Fellow Jeff Dean said last week at the Strata Data Conference. Read more…
The rapid pace of technological innovation is giving organizations amazing new capabilities in the field of data science. These advances are lowering the barrier of entry and super-charging data science capabilities for organizations around the world. With the playing field leveled somewhat, we look to the Strata Data Conference this week for clues on what will separate the data science winners from losers. Read more…
A startup named data.world is embarking upon a grand experiment to build a collaborative data platform that links together data, people, and their analytic tools. By eradicating data silos and building a social community around data, the firm is betting that it can grease the wheels on insight discovery and unleash a network effect on data. Read more…
Streamlio, a startup created a real-time streaming analytics platform on top of Apache Pulsar and Apache Heron, today published results of stream processing benchmark that claims Pulsar has up to a 150% performance improvement over Apache Kafka. The company also unveiled a new processing framework called Pulsar Functions.
In the battle for stream processing supremacy, there’s one platform that has developed an early advantage over all the others: Read more…
Thousands of data-obsessed technologists will descend on Silicon Valley this week to take part in the Strata Data Conference’s annual West Coast swing. Datanami caught up with O’Reilly Media’s Chief Data Scientist Ben Lorica, who’s also Strata’s program chair, to get the low down on the show’s high-tech expectations. Read more…
News in Brief from Spring Strata 2018
Syncsort’s ship got bigger last year when it acquired Vision Solutions, a provider of data availability and security tools. Now the New York software company is bringing that cargo to bear for its next voyage: helping customers cope with the data management ramifications of four converging “megatrends.”
It can be difficult to find the right balance between protecting data and utilizing it. With its new Data Protector offering unveiled yesterday at Strata Data Conference San Jose, StreamSets thinks it has found a happy medium, at least for data in motion.
The latest version of Confluent’s Kafka-based platform incorporates an open source streaming engine for Apache Kafka designed to allow developers using SQL to build real-time, streaming applications.
Confluent, the company behind open source Kafka and developer of the Confluent Platform, announced the general availability of its KSQL streaming engine on Wednesday (March 7). Read more…
MapR today announced that customers can now run and deploy applications on MapR’s big data cluster utilizing the Kubernetes containerization technology. In addition to providing data statefulness, it also gives MapR customers a new way to move workloads from on-prem to cloud platforms.
As the pace of machine learning model development accelerates, vendors are beginning to offer orchestration tools designed to help data scientists manage the testing, retraining and redeployment of predictive analytics models with short shelf lives. The latest entrant is Hitachi Vantara Labs, which unveiled a model manager this week designed to speed the deployment of “supervised” models in production. Read more…
Cloudera today unveiled a host of new cloud-based offerings — including Cloudera Altus Shared Data Experience (SDX), a cloud-based machine learning offering, and a cloud-based SQL data warehouse offering — that get it one step closer to meeting its vision for the type of secure yet flexible, cloud-based data processing capabilities that its clients demand. Read more…
This Just In from Spring Strata 2018
- MemSQL Establishes a New Baseline for Database Speed – (3/08/2018)
- Trifacta Available for Deployment Through Microsoft Azure – (3/06/2018)
- Hitachi Vantara Labs Introduces Machine Learning Model Management – (3/06/2018)
- OneClick.ai to Launch Flagship AI Platform – (3/06/2018)
- StreamSets Debuts First Solution to Discover, Secure and Govern Personal Data in Motion – (3/06/2018)
- Neo4j to Speak at Strata Data Conference San Jose 2018 – (3/06/2018)
- MapR Extends Data Fabric for Kubernetes – (3/06/2018)