Big Data • Big Analytics • Big Insight

Sectors » Other


When Big Data Becomes Too Much Data

Oct 27, 2014 |

About 2.5 exabytes of data will be generated today, or roughly the amount of data that was generated from the dawn of time until 2004. What’s in there, and will any of it be useful? The reality is the amount of data is so vast, its quality so dubious, and our abilities so relatively weak that most of it will have no impact whatsoever. In a perfect world, each additional byte of data we generate and absorb would shave a Read more…

Today’s Baseball Analytics Make Moneyball Look Like Child’s Play

Oct 24, 2014 |

Baseball has always been a game of numbers and statistics. But thanks to an explosion of data over the past seven years and the advent of new analytic software running on supercomputers, the game is on the cusp of changes that will make Moneyball look like it belongs in the minor leagues. When the San Francisco Giants take the field against the Kansas City Royals in game three of the World Series tonight, you can bet that the choices made Read more…

Spark Smashes MapReduce in Big Data Benchmark

Oct 10, 2014 |

Databricks today released benchmark results for Apache Spark running the Sort Benchmark, a competition for measuring the sorting performance of large clusters. Spark running on Hadoop sorted 100 TB of data in 23 minutes, three times faster than the previous record held by Yahoo using MapReduce on Hadoop. The result, Databricks says, are due to targeted improvements the Spark community made to improve performance, and should lay to rest any concerns about Spark’s scalability. Databricks, which is the commercial outfit Read more…

Top Three Things Not To Do in Excel

Oct 2, 2014 |

Let’s face it: We all have seen a crazy Microsoft Excel spreadsheet or encountered one of its dreaded “Not Responding” messages. Unfortunately, the flexibility and ease of Excel makes it the ideal candidate for inappropriate use and widespread abuse. As the most widely used analytical tool in the world, Excel has indeed come a long way since the days of Visicalc and MultiPlan. Modern Excel 2013 and the latest Power BI add-ins do sizzle in demonstrations, but there are analyses Read more…

Five Steps to Running ETL on Hadoop for Web Companies

Sep 1, 2014 |

Mention ETL (Extract, Transform and Load) and eyes glaze over. The thought goes: “That stuff is old and meant for clunky enterprise data warehouses. What does it have to do with my Internet/Web/ecommerce application?” Quite a lot, actually. ETL did originate in enterprise IT where data from online databases is Extracted, then Transformed to normalize it and finally Loaded into enterprise data warehouses for analysis. Although Internet companies feel they have no use for expensive, proprietary data warehouses, the fact Read more…

News In Brief

Dairy Industry Asks: Got Big Data?

Oct 31, 2014 |

The bucolic days of the family dairy farm are long gone. Even in places like “America’s Dairyland” (aka, Wisconsin), huge dairy operations that milk thousands of Holstein cows twice a day are an increasingly common site. These days, the dairy industry is all about production. Enter big data technology as “big dairy” becomes the primary supplier of milk, cheese and, at least in the author’s home state, fried cheese curds. Among the big data applications being embraced by corporate farmers Read more…

S. Korea Eyes Big Data to Reduce Car Accidents

Oct 8, 2014 |

Among the many striking features of the bustling city of Seoul, South Korea, are its connectedness, a function of its extensive deployment of broadband networks, and its roaring, non-stop traffic. An electronic sign in the city’s shopping district actually keeps track of the number of Korean auto fatalities. The total seemingly increases by the minute. Hence, the Korean government wants to use spatial, weather and other big data sources to provide drivers with what officials call an accident forecast service Read more…

Twitter Funds MIT ‘Social Machines’ Effort

Oct 3, 2014 |

Machines could become more social thanks to a new Twitter-funded initiative at the Massachusetts Institute of Technology’s vaunted Media Lab that will seek to develop new technologies to make sense of social chatter ranging from tweets to data streams to digital content. MIT Media Lab announced Oct. 1 the creation of a Laboratory for Social Machines using $10 million in funding from Twitter over the next five years. Twitter said it also would provide the new MIT lab with full Read more…

MongoDB Teams with Weather Channel on Digital Alerts

Oct 1, 2014 |

Somebody is finally trying to do something about the weather. The Weather Channel has begun transitioning its digital platforms, including its mobile apps running on iOS and Android, to MongoDB’s database. The new platform will allow the Weather Channel to serve weather alerts and other real-time information to an estimated 40 million users, the database specialist announced Oct. 1. MongoDB said it would serve as the “data store” for all Weather Channel feeds and user information delivered by its digital Read more…

Stinger Initiative Prepares for .next Phase

Sep 4, 2014 |

The Hadoop developer community that recently delivered the final tweaks to the Stinger Initiative, an effort to bring SQL capabilities to Apache Hive, said its effort would focus on further enhancements to SQL for supporting real-time access in Hive along with support for transactional capabilities. In a blog post, Hortonworks developers Alan Gates and Raj Bains reported that 145 developers from 44 companies have contributed 390,000 lines of code over the last 13 months to the Stinger Initiative. They Read more…

This Just In

WANdisco Announces Integration with Cloudera Manager and Ambari

Nov 20, 2014 |

BARCELONA, Spain, Nov. 20 — At Strata + Hadoop World Barcelona, WANdisco, a leading provider of continuous availability software for global enterprises to meet the challenges of Big Data, today announced that its Non-Stop Hadoop products for Cloudera and Hortonworks now offer seamless integration with Cloudera Manager, Cloudera’s administration console and the Apache Ambari management console, used by Hortonworks. WANdisco also announced it has greatly simplified the installation process. Now its Non-Stop Hadoop solution can be deployed across multiple data centers any distance apart Read more…

GoGrid Announces New Partnership with Cloudera

Nov 18, 2014 |

SAN FRANCISCO, Calif., Nov. 18 — GoGrid, an infrastructure-as-a-service leader specializing in multi-cloud solutions, today unveiled its new partnership with Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop, providing companies with a fast and easy way to evaluate and run the market-leading platform for Big Data through Cloudera Live. To run the program, Cloudera has taken advantage of GoGrid’s 1-Button Deploy orchestration process, empowering companies to get started on CDH in a matter of hours, rather than weeks or Read more…

HPCwire Reveals Winners of the 2014 Readers’ and Editors’ Choice Awards at SC14

Nov 17, 2014 |

NEW ORLEANS, La., Nov. 17 — HPCwire, the leading publication for news and information for the high performance computing industry announced the winners of the 2014 HPCwire Readers’ and Editors’ Choice Awards at the Supercomputing Conference (SC14) taking place this week in New Orleans, LA. Tom Tabor, CEO of Tabor Communications Inc., unveiled the list of winners just before the opening gala reception. “HPCwire readers are among the most informed in the HPC community and these awards are given to the organizations that are making Read more…

Big Data Expert Joins DataRPM Advisory Board

Nov 14, 2014 |

FAIRFAX, Va., Nov. 14 — DataRPM, the award-winning provider of Smart Machine Analytics for Big Data, announces that renowned Big Data expert, author and 30-year industry veteran, Bill Schmarzo, has joined its advisory board. “The industry is desperate for tools that help accelerate the data science discovery and analytic model development process, and DataRPM provides the first industry product that addresses the gap in helping BI-centric organizations transition to data science,” said Bill Schmarzo. “DataRPM’s ‘smart insights’ jump starts the data Read more…

Databricks Launches Spark Certification Program for Systems Integrators

Nov 12, 2014 |

BERKELEY, Calif., Nov. 12 — Databricks, the company founded by the creators of the popular open-source Big Data processing engine Apache Spark, today announced a certification program to enable all enterprises to quickly find qualified resources and support for their Apache Spark-based data analytic projects. The System Integrators Certification Program will foster the growth of an ecosystem of qualified resources, with validated expertise, that provide Spark-based professional services to organizations looking to use and implement the platform. Databricks is releasing the Read more…