Tag: Hadoop

What’s Hot This Summer: Data Science Bootcamps

Jun 23, 2016 |

Summer is here and temperatures are rising. While some of us take vacations or cool off at the beach, prospective data scientists are heating up their job prospects by participating in one of a growing number of data science bootcamps. Bootcamps of all types are growing quickly. According to Course Report, a website that tracks bootcamps, the number of graduates from the country’s 91 full-time coding bootcamps will grow by 60 percent this year, increasing to 17,966 graduates accounting for Read more…

Avoid These Five Big Data Governance Mistakes

Jun 22, 2016 |

If you’re embarking upon a big data project, then you’re likely running into one or more data management challenges. The decisions you make regarding how you enforce data governance and how you control data flows can make or break your project. Here are five data governance mistakes you should avoid: 1. You Have No Data Governance Strategy If you said to yourself, “Huh, what’s data governance?” then you’re likely making this mistake. Data governance refers to an overarching strategy that Read more…

The Growing Menace of Data Hoarding

Jun 13, 2016 |

One of the downsides of living and working in a data-rich environment is the desire to squirrel away every last bit and byte for future use. Thanks to cheap storage systems such as Amazon S3 and Hadoop, it’s technically possible to store every piece of data you’ve collected. But going too far down that path can lead to a perilous condition known as data hoarding. While data hoarding may not be as great a threat as physically hoarding real-world items, Read more…

Big Data Benchmark Gauges Hadoop Platforms

Jun 1, 2016 |

In another indication of a maturing technology and growing demand, an industry group has released a big data analytics benchmark designed to gauge the performance of Hadoop-based systems. The Transaction Processing Performance Council said this week its TPCx-BB benchmark for big data analytics systems covers systems such as MapReduce, Apache Hive, Apache Spark and Machine Learning Library, or MLib. According to the TPC website, the “express” benchmark measures the performance of Hadoop-based systems, including hardware and software components. The benchmark Read more…

Trifacta Brings Partners Into Data Prep Fold

May 24, 2016 |

The market for self-service data preparation tools is having a golden moment in the sun, with analyst firms like Gartner deciding that it does, in fact, have legs to stand on its own. The health of that market is also why Trifacta today launched a formal business partner program. With the new Wrangler Partner Program, Trifacta aims to bring a variety of types of firm into the self-service data prep fold, including system integrators, consulting firms, software vendors, and Hadoop Read more…

Kafka Creators Tackle Consistency Problem in Data Pipelines

May 24, 2016 |

One of the big questions surrounding the rise of real-time stream processing applications is consistency. When you have a distributed application involving thousands of data sources and data consumers, how can you be sure that the data going in one side comes out the other unchanged? That’s the challenge that Confluent is addressing with today’s launch of new software for Apache Kafka. If you’re moving big data today, you’re probably using Apache Kafka, or at least looking at it. The Read more…

How Spark and Hadoop Are Advancing Cancer Research

May 23, 2016 |

The combination of Spark and Hadoop has supercharged big data analysis across many industries and use cases by lowering the barrier of entry to advanced analytics and thereby enabling data scientists to create data-driven products that weren’t previously possible. But one area where Spark and Hadoop are having an especially strong impact revolves around cancer research. Cancer killed about 590,000 Americans last year, according to the Centers for Disease Control. That makes it the second leading causes of death in Read more…

Skills Gap Also Includes ‘Failure to Communicate’

May 17, 2016 |

The data science skills gap continues to widen, with emerging automation tools like machine learning only just now starting to take up some of the slack. PayScale, the online salary database, released a report Tuesday (May 17) on the state of the “skills economy” that ranks data analytics, programming and cloud computing skills among the most sought-after by U.S. employers. Nevertheless, the skills survey also highlights a continuing lack of writing and other communications skills among recent college graduate along Read more…

Hadoop Past, Present, and Future

May 17, 2016 |

Every few years the technology industry seems to be consumed with a shiny new object that gets hyped far beyond reality. At worst, the inevitable bursting of the hype bubble leads to the disappearance of the technology from relevance (remember Internet browsing on your TV?), but more often the hype subsides until a real but narrower focus for the technology is found. It’s been a decade since Hadoop was first created as an Apache top-level project, and during that decade Read more…

Data Gravity Pulls to the Cloud

May 16, 2016 |

Last month, Spotify grabbed headlines by announcing plans to get rid of its data centers and move onto Google’s Cloud Platform (GCP), claiming that the storage, compute and network services in the cloud are as high quality as on-premise alternatives. While few people take a second look at a digital-native company choosing to store data in the cloud, it seems to be generally accepted that for certain companies and industries, the cloud just isn’t a fit. Hadoop distribution vendors like Cloudera and Read more…