Follow Datanami:

Tag: ETL

Taking the Pain Out of Buying and Selling Data

Jul 16, 2020 |

We’re well into big data’s second decade, and we’ve made a ton of progress on many fronts. We have cloud-based systems with infinite storage capacity, sophisticated machine learning software that improves by the month, and powerful clusters turbo-charged with GPUs. Read more…

To Centralize or Not to Centralize Your Data–That Is the Question

Jul 14, 2020 |

Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how data systems are architected, particularly with the rise of cloud data lakes. Read more…

Reproducibility in Data Analytics Under Fire in Stanford Report

May 27, 2020 |

Armed with the same data and told to test the same hypotheses, dozens of independent researchers instead came to widely different conclusions using a variety of analytics techniques, according to a new report from Stanford University that pushes the reproducibility crises in science into a new realm. Read more…

Spark 3.0 to Get Native GPU Acceleration

May 14, 2020 |

NVIDIA today announced that it’s working with Apache Spark’s open source community to bring native GPU acceleration to the next version of the big data processing framework. With Spark version 3.0, which is due out next month, organizations will be able to speed up all of their Spark workloads, from ETL jobs to machine learning training, without making wholesale changes to their code. Read more…

Top 12 Datanami Stories of 2019

Jan 3, 2020 |

2019 was an eventful year in the big data space, with enough intersecting story lines to keep a big data watcher enmeshed for hours – if not days — on end. Read more…

Beyond BI: Looker Seeks Bigger Role for Data

Nov 7, 2019 |

Looker is best known as a business intelligence platform, which it definitely is. But with today’s release of Looker 7, the company is making a strong case that it’s much more than that. Read more…

How ML Helps Solve the Big Data Transform/Mastering Problem

Oct 10, 2019 |

Despite the astounding technological progress in big data analytics, we largely have yet to move past manual techniques for important tasks, such as data transformation and master data management. As data volumes grow, the productivity gap posed by manual methods grows wider, putting the dreams of AI- and machine learning-powered automation further out of reach. Read more…

Dremio Noses Into Cloud Lakes with Analytics Speedup

Sep 17, 2019 |

Most of today’s big data action is occurring in the cloud, where companies are building massive data lakes atop object storage systems like AWS S3 and Microsoft ADLS. While object stores offer tremendous scalability, they’re notoriously slow. Read more…

StreamSets Eases Spark-ETL Pipeline Development

Sep 5, 2019 |

Apache Spark gives developers a powerful tool for creating data pipelines for ETL workflows, but the framework is complex and can be difficult to troubleshoot. StreamSets is aiming to simplify Spark pipeline development with Transformer, the latest addition to its DataOps platform. Read more…

Do NOT follow this link or you will be banned from the site!