Features

Analyzing Video, the Biggest Data of Them All

May 26, 2016 |

Video data is the fastest growing data type on the Internet, and arguably one of the fastest growing period. Thanks to a proliferation of high-definition video cameras, the volume of video data is exploding. While Web-based delivery of video for entertainment purposes is projected to become a $100 billion industry unto its own, another big opportunity revolves around video analytics. We’re in the midst of a hockey-stick like curve with respect to the growth of video data. Consider that, in Read more…

Smart Billboards Powered by Data Analytics

May 25, 2016 |

If you thought it was a coincidence that a digital billboard displayed an advertisement for a handbag you just checked out at a high-end retailer downtown, think again–that digital billboard advertisement was, indeed, meant for you. Well, maybe not for you, in particular. But somebody like you. Thanks to companies’ unprecedented capability to track people’s movements via cell phones and categorize their interests, the age of the smart billboard is here. It may seem a little creepy to some people, Read more…

Kafka Creators Tackle Consistency Problem in Data Pipelines

May 24, 2016 |

One of the big questions surrounding the rise of real-time stream processing applications is consistency. When you have a distributed application involving thousands of data sources and data consumers, how can you be sure that the data going in one side comes out the other unchanged? That’s the challenge that Confluent is addressing with today’s launch of new software for Apache Kafka. If you’re moving big data today, you’re probably using Apache Kafka, or at least looking at it. The Read more…

Big Data Doesn’t Always Mean Better Business

May 24, 2016 |

There is an unprecedented volume of data being created, with an unprecedented number of people around the world regularly producing and storing data. Research shows that 90 percent of the data in the world today was created in the last two years alone. This may not be news to those of us who plan for, manage, or process this barrage of data, but questions still remain about best practices when taking on infrastructure changes to address big data in a big Read more…

How Spark and Hadoop Are Advancing Cancer Research

May 23, 2016 |

The combination of Spark and Hadoop has supercharged big data analysis across many industries and use cases by lowering the barrier of entry to advanced analytics and thereby enabling data scientists to create data-driven products that weren’t previously possible. But one area where Spark and Hadoop are having an especially strong impact revolves around cancer research. Cancer killed about 590,000 Americans last year, according to the Centers for Disease Control. That makes it the second leading causes of death in Read more…

Apache Foundation Keeps Eyes Wide Open with ODPi

May 20, 2016 |

If you’re looking for controversy in the Apache Hadoop community, you need look no further than the 2015 launch of the Open Data Platform Initiative (ODPi), which some perceived as an attempt to wrest control of Apache Hadoop from its open source roots. In fact, some Apache Software Foundation (ASF) leaders see potential good coming out of the ODPi, although there are valid concerns about negatives too. Jim Jagielski, a founding member of the ASF and a member of its Read more…

Biotech Crop Discovery Poised for Fast Growth Thanks to Big Data

May 19, 2016 |

Big agriculture companies have been using HPC techniques to understand and manipulate the genes of food staples like corn and soy for many years. Now, thanks to the big data revolution, that kind of fine-grained genetic control will soon be wielded by smaller firms targeting a much wider swath of biotech crops. One of the companies on the cutting edge of biotech crops is Benson Hill Biosystems. The company, which came out of the Donald Danforth Plance Science Center in Read more…

Hadoop 3 Poised to Boost Storage Capacity, Resilience with Erasure Coding

May 18, 2016 |

The next major version of Apache Hadoop could effectively double storage capacity while increasing data resiliency by 50 percent through the addition of erasure coding, according to a presentation at the Apache Big Data conference last week. Apache Hadoop version 3 is currently being developed by members of the Apache Hadoop team at the Apache Software Foundation. Akira Ajisaka, who is an Apache Hadoop committer and a PMC member, shared information about the next major release at last week’s Apache Read more…

Hadoop Past, Present, and Future

May 17, 2016 |

Every few years the technology industry seems to be consumed with a shiny new object that gets hyped far beyond reality. At worst, the inevitable bursting of the hype bubble leads to the disappearance of the technology from relevance (remember Internet browsing on your TV?), but more often the hype subsides until a real but narrower focus for the technology is found. It’s been a decade since Hadoop was first created as an Apache top-level project, and during that decade Read more…

Data Gravity Pulls to the Cloud

May 16, 2016 |

Last month, Spotify grabbed headlines by announcing plans to get rid of its data centers and move onto Google’s Cloud Platform (GCP), claiming that the storage, compute and network services in the cloud are as high quality as on-premise alternatives. While few people take a second look at a digital-native company choosing to store data in the cloud, it seems to be generally accepted that for certain companies and industries, the cloud just isn’t a fit. Hadoop distribution vendors like Cloudera and Read more…