
Tag: batch
Data transaction streaming is managed through many platforms, with one of the most common being Apache Kafka. In our first article in this data streaming series, we delved into the definition of data transaction and streaming and why it is critical to manage information in real-time for the most accurate analytics. Read more…
The terms “real-time data” and “streaming data” are the latest catch phrases being bandied about by almost every data vendor and company. Everyone wants the world to know that they have access to and are using the latest, greatest data for making business decisions. Read more…
Apache Beam has emerged as a powerful new framework for building and running batch and streaming applications in a unified manner. In its first iteration, it offered APIs for Java and Python. Read more…
Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks.
While Hadoop is no longer the conference headliner that it once was, the platform is still critical for the daily operations of Yahoo, which officially became part of Verizon Communications this week when the $4.5 billion acquisition finally closed. Read more…
Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. Read more…
The big data ecosphere has evolved to the point where there are clear technology leaders. In the category of SQL engines that run on Hadoop, Hive and Spark are clearly the dominant products among open source developers. Read more…