Technologies » Frameworks


Big Data Outlier Detection, for Fun and Profit

Sep 30, 2014 |

As we discussed in the first part of this series, how you handle data outliers can determine whether your big data project ends with a bang or flames out in failure. But before you even decide what to do with outliers, you need to be able to detect them. That is easier said than done. Because they can mean different things at different times, outliers can be extremely challenging to deal with in a big data context. On the one Read more…

How a Facebook-Like Graph Powers Drug Discovery

Sep 29, 2014 |

Researchers have long sought to identify the key proteins involved in the development of diseases like cancer. However, the time and effort required to check each combination of proteins can be daunting. But thanks to the advent of graph analytics, researchers can now build models of protein networks, thereby enabling mass parallelization of the protein problem and powering a more efficient drug discovery process. One of the companies employing advanced graph analytics in drug discovery is e-Therapeutics, a British biotech Read more…

Hadoop Alternative Is Faster and Lighter, Proponents Say

Sep 29, 2014 |

For all its benefits, Hadoop has its drawbacks. The actual movement of data can be complex, and does not lend itself to efficient execution. Discouraged with Hadoop’s heaviness, an online advertising company developed and released as open source an alternative called Cluster Map Reduce (CMR) that it says is lighter, faster, and simpler to program. CMR was developed by Chitika, an online advertising company based in Massachusetts. Chitika was using the Hadoop Distributed File System (HDFS) as the core of Read more…

Hadoop Data Virtualization from Cask Now Open Source

Sep 25, 2014 |

Continuuity, a big data startup that seeks to drive complexity out of Hadoop by virtualizating data and applications, today announced that it’s changing its name to Cask and making its software open source. The company also open sourced a streaming engine now named Tigon, and announced the hiring of former Intel executive as COO. Former Facebook engineer Jonathan Gray co-founded Continuuity with former Yahoo engineer Nitin Motgi about three years ago to address the challenges they saw enveloping the Hadoop Read more…

Hortonworks Hatches a Roadmap to Improve Apache Spark

Sep 24, 2014 |

Hortonworks today issued a broad and detailed roadmap outlining the investment it would like to see made to Apache Spark, the in-memory processing framework that has become one of Hadoop’s most popular subprojects. The plan focuses on improving how Spark runs with YARN, enabling monitoring and management of Spark, and ensuring that Spark plays nicely with Hive and other Hadoop engines. In the blog piece, titled “An investment in Apache Spark for the Enterprise,” Hortonworks director of product management Vinay Read more…

News In Brief

Fast Data Specialist Tibco Goes Private

Sep 29, 2014 |

Fast data specialist Tibco Software said Sept. 29 it is being acquired by Vista Equity Partners for about $4.3 billion. The deal is reportedly the largest such tech buyout this year. Under terms of the acquisition, Palo Alto-based Tibco said Vista Equity Partners would acquire all outstanding Tibco common stock for $24 per share in cash. Vista also will assume the enterprise software firm’s net debt. The purchase price represents a 26.3 percent premium to the closing price of Tibco Read more…

States Seek to Backstop U.S. Data Privacy Laws

Sep 25, 2014 |

California is poised to become the first state to restrict the use of student data by third-party technology vendors. Two student data privacy bills were sent to California Gov. Jerry Brown earlier this month, including one that prescribes privacy guidelines for contracts between school districts and technology vendors. Along with contracts, California State Assembly Bill 1584 covers the privacy of student records and digital storage services along with educational software that could be used with data analysis tools for marketing Read more…

U.S. Cracking Down on Data Brokers

Sep 22, 2014 |

The U.S. is stepping scrutiny of big data companies that regulators increasing view as “stewards of information detailing nearly every facet of consumers’ lives.” The U.S. Federal Trade Commission (FTC) has been leading the charge with tougher enforcement of consumer protection laws. Earlier this year, it reached settlements with two data brokers for violations of the Fair Credit Reporting Act. The web site Instant Checkmate and InfoTrack Information Services both agreed to pay civil fines and permanent injunctions against continuing Read more…

Concerns About Big Data Abuses Grow

Sep 18, 2014 |

The tension between the rise of big data and concerns over privacy and fairness continues to mount as federal regulators convened this week to ponder whether big data is a “tool for inclusion or exclusion.” That was the title of a Sept. 15 Federal Trade Commission workshop examining the impact of big data on U.S. consumers, particularly the poor and underserved. “A growing number of companies are increasingly using big data analytics techniques to categorize consumers and make predictions about Read more…

IBM Moves to Make Watson Accessible to the Masses

Sep 17, 2014 |

IBM is promising data crunching for the masses with its Watson Analytics natural-language cognitive service. The big data leader said Sept. 16 the extended release of the cloud-based analytics service promises to broaden access to predictive and visual analytic tools. The free version 1 release will run on desktops as well as mobile device, IBM said. The self-service analytics package includes data refinement and warehousing services that would allow users to move beyond simple spreadsheets to analyze and visualize data. Read more…

This Just In

Apache Unveils Hadoop 2

Oct 17, 2013 |

Apache Software Foundation, which oversees the 150 or so open source projects under the famous Apache umbrella, this week announced Hadoop 2 – the latest version of the popular software framework for distributed computing.