Follow Datanami:

Tag: pig

LinkedIn’s Translation Engine Linked to Presto

An SQL translation engine unveiled this week by LinkedIn is integrated with other open-source SQL query engines like Presto in a combination aimed at bulging data lakes. The Microsoft unit’s Coral engine handles ana Read more…

Google Releases Cloud Processor For Hadoop, Spark

Google took the wraps off of its managed Apache Hadoop and Spark service this week, saying its cloud data processing platform is intended to reduce the cost and ease management of processing big datasets. Cloud Datapr Read more…

Top 10 Netflix Tips on Going Cloud-Native with Hadoop

Four years ago Netflix made the decision to move all of its data processing--everything from NoSQL and Hadoop to HR and billing--into the cloud. While going "cloud native" on Amazon Web Services hasn't been without its challenges, the move has benefited Netflix in multiple and substantial ways. Here are 10 tips from Netflix on making the cloud work. Read more…

Has Dirty Data Met Its Match?

One of the dirty little secrets about big data is the amount of manual effort it takes to clean the data before it can be analyzed. You may have the best and brightest data scientists on your team, but unless you liberate them from the drudgeries of digital janitorial work, you aren't getting their best work. Today, the data cleansing startup Trifacta launched its first product aimed at alleviating data professionals from the burden posed by traditional data cleansing processes. Read more…

The Future of Hadoop Runs on Tez, Hortonworks Says

The Hadoop community has spent much energy over the past two years trying to make Hadoop faster, simpler to program, and easier to extend to other systems. While the introduction of YARN in Hadoop version 2 helped to unhook the framework from its MapReduce roots, the folks at Hortonworks say the next step of the Hadoop journey will ride atop the Apache Tez engine. Read more…

Datanami Dishes on ‘Big Data’ Predictions for 2014

This space was going to feature a "Top 10 Big Data Predictions for 2014" story. But considering the large number of such stories currently in circulation, a different tact was in order. Instead, you'll find a selection of pertinent predictions from players in the "big data" software industry, followed by Datanami's opinion as to whether it will be spot on or whether the soothsaying will miss the mark. Read more…

Intel Goes Graph with Hadoop Distro

Intel will be targeting big retail operations with a new graph database that it unveiled today as part of its Intel Distribution for Apache Hadoop version 3 announcement. The graph engine will enable customers to make product or customer recommendations in real time, a la Netflix or Amazon, based on existing data. The chip giant also fleshed out its Hadoop distro with a 20x speedup in encryption functions, a data tokenization option, and a handful of new machine learning algorithms aimed at solving common problems. Read more…

Syncsort Siphons Up Legacy Workloads for Amazon EMR

Syncsort is bringing its flavor of super-charged MapReduce code generation capabilities to Amazon's Elastic MapReduce cloud, the companies announced today. The IronCluster ETL as-a-service offering will allow Amazon EMR customers to generate faster MapReduce jobs from a GUI, which the companies say will make it easier to migrate expensive data warehouse workloads from Teradata or the IBM mainframe into Amazon's incredibly inexpensive cloud. Read more…

OLTP Clearly in Hadoop’s Future, Cutting Says

Think Hadoop is just for analytics? Think again, says Hadoop creator Doug Cutting, who last week predicted that, in the future, organizations will run all sorts of workloads on their Hadoop clusters, even online transaction processing (OLTP) workloads, the last bastion of the relational legacy. Read more…

HortonWorks Reaches Out to SAS and Storm

Hortonworks this week revealed a new partnership with SAS that will enable the analytics giant to use its tools to analyze data stored in Hortonworks' Hadoop distribution. It also announced plans to integrate the Apache Storm stream processing engine into its distribution, and to ship a preview by the end of the year. Read more…

Spelunking Shops and Supercomputers

While this might come as a surprise to those outside the bubble, the majority of data that is collected within an enterprise setting is machine-generated. In other words, everything from operational data (messaging, web services, networking and other system data) to customer-facing systems and beyond. This week we talk with Splunk about its role in IT shops to supercomputers and... Read more…

Datanami