Follow Datanami:

Tag: Spark

Esri Melds GIS with AI, Graph, and Analytics

Esri has long been the industry leader in geographic information systems (GIS), which are used by urban planners, building engineers, and landscape designers around the world. At its UC 2022 conference this week, the com Read more…

NetApp Spots a Data Platform Opportunity in the Cloud

The market for spot instances in the cloud is, well, spotty. Some days you can count on a 90% discount by buying excess capacity from the public clouds, and other times you can't. NetApp has turned that cloud unpredictab Read more…

EMR Serverless Now Available from AWS

Amazon EMR, which ostensibly is the world’s most popular hosted Hadoop environment, is now generally available as a serverless offering, AWS announced today. Amazon EMR Serverless will save customers time and money Read more…

In Search of the Data Dream Team

When it comes to succeeding at big data, the people you put in place are just as important--if not more important--than the products and technologies you use. One of the folks exploring the intersection of people and dat Read more…

Kubernetes Adoption Widespread for Big Data, But Monitoring and Tuning Are Issues, Survey Finds

Kubernetes may be a complex piece of software that can be difficult to monitor and manage. But the benefits of running applications in the popular container orchestration system appear to outweigh the disadvantages, beca Read more…

Alluxio Nabs $50M, Preps for Growth in Data Orchestration

Data orchestration software provider Alluxio today announced the close of an oversubscribed $50-million Series C round, which its CEO plans to spend on a global expansion. It also launched version 2.7 of its software, wh Read more…

Aerospike Turbocharges Spark ML Training with Pushdown Processing

Companies that need to access a lot of data in a hurry, such as retraining a machine learning model in Spark, have traditionally had to move that data from the edge to a central repository, such as a cloud data lake. But Read more…

Informatica Accelerates DataOps with Spark, GPUs

Informatica today announced that customers can see up to a 5x performance boost for ETL and data management workloads when they run them under its new cloud-based data integration engine that’s powered by Apache Spark Read more…

Prophecy Spins Up Low-Code Data Pipeline Tool

In recent years, the shortage of data engineers has at times exceeded the shortage of data scientists. To help close the gap, a Silicon Valley startup called Prophecy today unveiled a low-code data engineering tool that Read more…

LinkedIn’s Translation Engine Linked to Presto

An SQL translation engine unveiled this week by LinkedIn is integrated with other open-source SQL query engines like Presto in a combination aimed at bulging data lakes. The Microsoft unit’s Coral engine handles ana Read more…

Data Exchange Maker Harbr Closes Series A

Harbr, a London startup that helps organizations like Moody’s Analytics to create their own custom data exchanges, yesterday announced that it has completed a Series A round of financing, netting $38.5 million for the Read more…

The Past and Future of In-Memory Computing

When Nikita Ivanov co-founded GridGain Systems back in 2005, he envisioned in-memory computing going mainstream and becoming a massive category unto itself within a few years. That obviously didn’t pan out, but on the Read more…

Aerospike Gives Legacy Infrastructure a Real-Time Boost

A database connector upgrade released this week by Aerospike Inc. links open source frameworks like Apache Spark data streaming to existing enterprise data infrastructure. Among the goals is providing backward compati Read more…

Microsoft Now Developing Its Own Hadoop

Hadoop might be dead, but that’s not stopping public cloud providers from using it. The latest to make a move is Microsoft Azure, which in July announced that it would begin developing its own distribution under its HD Read more…

To Centralize or Not to Centralize Your Data–That Is the Question

Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how d Read more…

Intel Updates Optane, Expands NAND SSD Offerings

Intel Corp. remains persistent in upgrading its Optane persistent memory series. The chip maker (NASDAQ: INTC) said this week its second generation Optane series is tuned to the latest version of its Xeon Scalable proce Read more…

Staying On Top of ML Model and Data Drift

A lot of things can go wrong when developing machine learning models. You can use poor quality data, mistake correlation for causation, or overfit your model to the training data, just to name a few. But there are also a Read more…

Will Databricks Build the First Enterprise AI Platform?

Ali Ghodsi might have one of the best jobs in technology right now. As the CEO of Databricks, Ghodsi just completed an oversubscribed $400 million round of funding that gave the company a $6.2 billion valuation. Better s Read more…

Simplifying the Big Data Lake Experiences in the Cloud

The cloud is a hot spot for big data lakes these days, thanks largely to the greater technological simplicity and lower upfront costs of getting started in the public cloud. But as organizations grow their cloud data lak Read more…

Presto Moves Under Linux Umbrella

An SQL query engine developed by Facebook and moved earlier this year to a non-profit development group is now being hosted by the Linux Foundation. The new Presto Foundation is seen as a way to scale the popular dist Read more…

Datanami