Technologies » Frameworks


Five Steps to Running ETL on Hadoop for Web Companies

Sep 1, 2014 |

Mention ETL (Extract, Transform and Load) and eyes glaze over. The thought goes: “That stuff is old and meant for clunky enterprise data warehouses. What does it have to do with my Internet/Web/ecommerce application?” Quite a lot, actually. ETL did originate in enterprise IT where data from online databases is Extracted, then Transformed to normalize it and finally Loaded into enterprise data warehouses for analysis. Although Internet companies feel they have no use for expensive, proprietary data warehouses, the fact Read more…

Hadoop Labor Update: Cloudera Talks Impala 2.0 as Hortonworks Previews Kafka

Aug 29, 2014 |

Say what you will about Hadoop (and we do), the big data platform is evolving at an incredible rate. This week, two of the biggest Hadoop distributors, Hortonworks and Cloudera, shared how they’re working to improve two key aspects of the platform: real-time data pipelining via Apache Kafka and SQL-based data warehousing via Impala. Let’s start with Cloudera. This week, the Hadoop distributor announced that the upcoming release of Impala 2.0 will add much more complete SQL functionality to CDH, Read more…

Why Hadoop Isn’t the Big Data Solution You Think It Is

Aug 26, 2014 |

Hadoop carries a lot of promise in the IT world for the way it has democratized access to massively parallel storage and computational power. But the level of hype that surrounds Hadoop is disproportionate to its present capabilities, raising the possibility of a big data letdown of elephantine proportions. The emergence of Hadoop as a next-generation platform for parallel computing has piqued the interest of customers and investors alike. What mid-sized company looking for a big data edge wouldn’t want Read more…

How to Move 80PB Without Downtime

Aug 25, 2014 |

When the online photo company Shutterfly decided to move its entire data center recently, the possibility of downtime was a big issue. After all, the company had 80 petabytes of customer data spread across tens of thousands of spinning disks, and those disks wouldn’t be spinning while being physically moved. Months later, after the last deliver made its way to Shutterfly‘s new data center, not one piece of data was lost or even temporarily unavailable from the company’s website. How Read more…

If You’re Missing Fast Data, Big Data Isn’t Working for You

Aug 19, 2014 |

Big Data analytics are all the rage. There is little doubt some great things can be accomplished when an organization takes to mining its data to produce meaningful change in the business. Yet, 64 percent of enterprises that invest in Big Data projects struggle to unlock the value of their Big Data insights, according to analyst firm Gartner. A recent survey on Big Data by my company revealed that 72 percent of respondents could not utilize the majority of the Read more…

News In Brief

Data Startup Targets Machine Learning for Healthcare

Aug 27, 2014 |

A medical diagnostic startup is attempting to use recent advances in machine learning as a way to make it easier for doctors to sort through medical information in the form of images, unstructured data like notes on a patient’s history and structured laboratory test results. “Medical diagnostics is, at its heart, a data problem,” notes Jeremy Howard, founder and CEO of Enlitic, a San Francisco-based startup that wants to use machine-learning technology to transform diagnostic healthcare. “Recent applied machine learning Read more…

Poll: SAS Use Surges for Data Mining

Aug 26, 2014 |

A recent poll querying data scientists on which programming and statistics languages they used in 2014 for analytics, data mining and data science found that four main languages dominated. The data mining community web site KDnuggets reported earlier this month that respondents identified R, Python, SAS and SQL (in that order) as a preferred programming language. Fully 91 percent of respondents used one of the four languages. The R programming language led the way, cited by 49 percent of respondents Read more…

Performance Analytics Tackles Sports Injuries

Aug 25, 2014 |

In the human demolition derby known as the National Football League, season-ending, often career-threatening, injuries are already piling up like lineman on a loose football. And it’s only preseason! An injury to a star player—the St. Louis Rams lost their starting quarterback Sam Bradford to his second torn ACL in as many years over the weekend—can ruin a team’s season before it begins. That translates into empty seats in stadiums, lost revenues from missing the playoffs and potentially years of Read more…

Gauging Human Emotions at the Stroke of a Key

Aug 22, 2014 |

Indian researchers using text pattern analysis and “keystroke dynamics” claim they have designed a computer program that can accurately recognize a computer user’s emotions. “Depending on the emotion,” the researchers claimed, they accurately recognized and “emotional states” most of the time, indeed with great precision: 87 percent. Writing in the journal Behavior & Information Technology, a team of Indian researchers asked volunteer typists to note their emotional state after typing “fixed” text along with prescribed intervals of regular computer use. Read more…

Automakers Embrace Predictive Analytics to Boost Sales

Aug 20, 2014 |

The use of predictive analytics to discern consumer preferences is expanding to car dealerships where manufacturers led by Ford Motor Co. are trying to convince local dealers that big data can help them reduce the number of days cars sit unsold on their lots. While some dealers are resisting new analytical tools like Ford’s Smart Inventory Management System rolled out in 2009, industry analysts recently told the publication Automotive News that predictive analytics could help save dealerships $100 or more Read more…

This Just In

Apache Unveils Hadoop 2

Oct 17, 2013 |

Apache Software Foundation, which oversees the 150 or so open source projects under the famous Apache umbrella, this week announced Hadoop 2 – the latest version of the popular software framework for distributed computing.