Follow Datanami:

Tag: batch

Real-Time Data Streaming, Kafka and Analytics, Part 2: Going Beyond Pure Streaming

Feb 25, 2020 |

Data transaction streaming is managed through many platforms, with one of the most common being Apache Kafka. In our first article in this data streaming series, we delved into the definition of data transaction and streaming and why it is critical to manage information in real-time for the most accurate analytics. Read more…

Real-Time Data Streaming, Kafka, and Analytics Part One: Data Streaming 101

Feb 10, 2020 |

The terms “real-time data” and “streaming data” are the latest catch phrases being bandied about by almost every data vendor and company. Everyone wants the world to know that they have access to and are using the latest, greatest data for making business decisions. Read more…

First Look at Scio, a Scala API for Apache Beam

Nov 15, 2017 |

Apache Beam has emerged as a powerful new framework for building and running batch and streaming applications in a unified manner. In its first iteration, it offered APIs for Java and Python. Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Jun 16, 2017 |

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks.

While Hadoop is no longer the conference headliner that it once was, the platform is still critical for the daily operations of Yahoo, which officially became part of Verizon Communications this week when the $4.5 billion acquisition finally closed. Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

May 19, 2017 |

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. Read more…

Flink: Worth a Second Look

Sep 16, 2016 |

The big data ecosphere has evolved to the point where there are clear technology leaders. In the category of SQL engines that run on Hadoop, Hive and Spark are clearly the dominant products among open source developers. Read more…

Datanami