Follow Datanami:

Tag: storm

Yahoo’s Vespa Takes a Whack at CORD-19 Data

Verizon Media (formerly Yahoo) is giving its new Vespa search engine a chance to show what it can do against CORD-19, the collection of scholarly articles about COVID-19. The company is inviting the public to try using V Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks. While Hadoop is no longer the conference headliner that it once Read more…

Hortonworks Shifts Focus to Streaming Analytics

Hortonworks started life providing a Hadoop distribution that allowed customers to process big data at rest. But these days, the company has shifted its much of its attention and resources to streaming analytics, or proc Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. But appare Read more…

Learning from Your Data: Essential Considerations

For any organization undergoing digital transformation, a primary consideration is how to find, capture, manage and analyze big data. They are looking to big data and data science to facilitate the discovery of analytics Read more…

Concord Claims 10x Performance Edge on Spark Streaming

Organizations that are looking for a stream processing engine upon which to build fast data applications featuring high-throughput and low-latency may want to check out Concord, a new framework that emerged from the ad-t Read more…

Wanted: A Plug-In Architecture for Hadoop Development

Hadoop is hard. There's just no way around that. Setting up and running a cluster is hard, and so is developing applications that make sense of, and create value from, big data. What Hadoop really needs now, says former Read more…

Deep Dive Into Oracle’s Emerging Big Data Stack

Oracle has a lot of turf to protect in the multi-billion-dollar relational database market, where it owns a dominant share of the market. That creates a natural tension when it comes to big data technologies like Hadoop Read more…

Big Data So Easy a Caveman Could Do It?

Let's face it: big data isn't easy. If you're building a big data application today, you're up to your eyeballs in things like R and Java, MapReduce and Pig, and Storm and Kafka. There's a reason data scientists are so h Read more…

Hortonworks Goes Broad and Deep with HDP 2.2

From full support for Apache Spark, Apache Kafka, and the Cascading framework to updated management consoles and SQL enhancements in Hive, there's something for everybody in Hortonworks' latest Hadoop distribution, which Read more…

Cascading Now Supports Tez–Spark and Storm Up Next

Concurrent, the company behind the open source Cascading framework, today unveiled a major update that will allow its customers to migrate their Hadoop applications from using MapReduce to use the new Apache Tez engine, Read more…

Crossing the Big Data Stream with DataTorrent

Enterprises eager for a competitive edge are turning to in-memory stream processing technologies to help them analyze big data in real time. The Apache Spark and Storm projects have gained lots of momentum in this area, as have some analytic NoSQL databases and in-memory data grids. Another streaming technology worth keeping an eye on is DataTorrent. Read more…

How Fast Data is Driving Analytics on the IoT Superhighway

The promise of big data is morphing into the fast data opportunity. Unless you have the capability to respond to the Internet of Things and the trillions of data points generated by smartphones, sensors, and social media, the business opportunities of fast data can pass you by. Read more…

Hortonworks Drives Stinger Home with HDP 2.1

Hortonworks today unveiled a major new release of its Hadoop distribution that puts significant new capabilities into the hands of its customers. The speed and scale of SQL processing in Apache Hive were improved with the final phase of the Stinger initiative, while the additions of Apache Storm and Apache Solr in HDP 2.1 open up new ways for customers to manipulate their data. Security and data governance were bolstered with Apache Knox and Apache Falcon, respectively, while Apache Spark is now available as a tech preview. Read more…

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Let's face it: Writing MapReduce processes is not very fun. That's the main reason that the Cascading framework is gaining such a big following--because it abstracts away the difficult part of MapReduce with an easy-to-use Java API and library. With today's launch of a new product called Driven, the company behind Cascading is enabling users to instrument the data analytic apps developed with Cascading, in pursuit of faster troubleshooting and higher performance. Read more…

Zooming Through Historical Data with Streaming Micro Queries

Stream processing engines, such as Storm and S4, are commonly used to analyze real-time data as it flows into an organization. But did you know you can use this technology to analyze historical data too? A company called ZoomData recently showed how. Read more…

Yahoo Unveils SAMOA to Mine Multiple Data Streams

Yahoo last month unveiled a new streaming processing framework called Scalable Advanced Massive Online Analysis (SAMOA) that it says will simplify the process of developing and executing machine learning algorithms against multiple data streams. The open source software works with individual stream processing engines, such as Storm and S4, and is available for download now. Read more…

HDP 2.0: Rise of the Hadoop Data Lake

Hortonworks became the first Hadoop distributor to ship the new Hadoop version 2 software today when it announced the general availability of Hortonworks Data Platform (HDP) 2.0. The update will enable customers with small Hadoop clusters to upgrade their big data platform into a shared Hadoop service, or a data lake, a Hortonworks executive explains. Read more…

Hadoop Version 2: One Step Closer to the Big Data Goal

The wait for Hadoop 2.0 ended yesterday when the Apache Software Foundation (ASF) announced the availability of the new big data platform. Among the most anticipated features is the new YARN scheduler, which will make it easier for users to run different workloads--such as MapReduce, HBase, SQL, graph analysis, and stream processing--on the same hardware, and all at the same time. Better availability and scalability, and a smoother upgrade process, round out the new platform, as Hadoop creator Doug Cutting explains, but still not everybody is happy with Hadoop. Read more…

Apache Takes Storm Into Incubation

On Wednesday night, Doug Cutting, Director for the Apache Software Foundation (ASF), announced that the organization will be adding the distributed real time computation system known as Storm as the foundations newest Incubator podling. Read more…

Datanami