Tag: Spark

DataTorrent Glues Open Source Componentry with ‘Apoxi’

Feb 22, 2018 |

Building an enterprise-grade big data application with open source components is not easy. Anybody who has worked with Apache Hadoop ecosystem technology can tell you that. But the folks at DataTorrent say they’ve found a way to accelerate the delivery of secure and scalable big data applications with Apoxi, a new framework they created to stitch together major open source components like Hadoop, Spark, and Kafka, in an extensible and pluggable fashion. Read more…

The Hybrid Database Capturing Perishable Insights at Yiguo

Feb 22, 2018 |

Yiguo.com is the largest B2C fresh produce online marketplace in China, serving close to 5 million users and more than 1,000 enterprise customers. We have long devoted ourselves to providing fresh food for ordinary consumers and have gained popularity since our founding in 2005. Read more…

ParallelM Aims to Close the Gap in ML Operationalization

Feb 21, 2018 |

A startup named ParallelM today unveiled new software aimed at alleviating data scientists from the burden of manually deploying, monitoring, and managing machine learning pipelines in production.

Dubbed MLOps, ParallelM‘s software helps to automate many of the operational tasks required to turn a machine learning model from a promising piece of code running nn Spark, Flink, TensorFlow, or PyTorch processing engines into a secure, governed, and production-ready machine learning system. Read more…

Snowflake Taps Qubole for Deep Machine Learning in the Cloud

Feb 13, 2018 |

Organizations storing big data in Snowflake’s cloud data warehouse can now run machine learning and deep learning algorithms against that data thanks to a new partnership with Qubole.

The two companies today announced a partnership that will allow Qubole’s big data processing engines, including Apache Spark and TensorFlow, to read and write data to Snowflake’s data warehouse. Read more…

Dr. Elephant Leads the Performance Parade

Jan 12, 2018 |

I started working on big data infrastructure in 2009 when I joined Cloudera, which at the time was a small startup with about 10 engineers. It was a fun place to work. Read more…

Databricks Puts ‘Delta’ at the Confluence of Lakes, Streams, and Warehouses

Oct 25, 2017 |

Databricks today launched a new managed cloud offering called Delta that seeks to combine the advantages of MPP data warehouses, Hadoop data lakes, and streaming data analytics in a unifying platform designed to let users analyze their freshest data without incurring enormous complexity and costs. Read more…

Containerized Spark Deployment Pays Dividends

Aug 7, 2017 |

Hadoop has emerged as a general purpose big data operating system that can perform a range of tasks and run all kinds of processing engines. But all that power and flexibility comes with a cost, which is something that one prominent healthcare analytics firm decided it didn’t want to pay anymore. Read more…

DataRobot Reaches Out to SAS, Financial Services

Jul 24, 2017 |

Companies that use DataRobot’s software to automate data science tasks can now output models directly from SAS, the dominant analytics company whose software is widely deployed in enterprises around the world. Read more…

Taking the Data Scientist Out of Data Science

Jul 21, 2017 |

If you were a data scientist three years ago, you could pretty much write your own ticket. Everybody in the industry, it seemed, either wanted to hire a data scientist, or wanted to be one. Read more…

IBM Bolsters Spark Ties with Latest SQL Engine

Jul 18, 2017 |

IBM is extending its commitment to Apache Spark as a key component of in-memory analytics with the latest release of its SQL engine for Hadoop.

The new version of IBM Big SQL released last week also solidifies the company’s joint distribution deal with Hortonworks announced last month that includes Hortonwork’s Hadoop and stream processing distributions. Read more…