Follow Datanami:

Tag: hpc

Accelerate Hadoop MapReduce Performance using Dedicated OrangeFS Servers

Recent tests performed at Clemson University achieved a 25 percent improvement in Apache Hadoop Terasort run times by replacing Hadoop Distributed File System (HDFS) with an OrangeFS configuration using dedicated servers. Key components included extension of the MapReduce “FileSystem” class and a Java Native Interface (JNI) shim to the OrangeFS client. No modifications of Hadoop were required, and existing MapReduce jobs require no modification to utilize OrangeFS. Read more…

IDC Talks Convergence in High Performance Data Analysis

At the International Supercomputing Conference (ISC’13) this week, convergence is in the air as many discussions are including talk about the merging of traditional high performance technical computing with the rising data tides of the enterprise. Putting data and use cases on display, the analysts at IDC gave their view of this convergence space – and shared their own name for it: High Performance Data Analysis (or HPDA). Read more…

Why Big Data Needs InfiniBand to Continue Evolving

Increasingly, it’s a Big Data world we live in. Just in case you’ve been living under a rock and need proof of that, <a href="http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/" target="_blank">a major retailer can use an unimaginable number of data points to predict the pregnancy of a teenage girl outside Minneapolis before she gets a chance to tell her family</a>.  That’s just one example, but there are countless others that point to the idea that mining huge data volumes can uncover gold nuggets of actionable proportions (although sometimes they freak people out...) Read more…

Big Data & Virtual Prototyping Changing Auto Design Culture

Design and engineering teams at Jaguar Land Rover say that big data and virtual prototyping are changing the culture and way they think about their work. Read more…

Sharing Infrastructure: Can Hadoop Play Well With Others?

A lot of big data/Hadoop implementations are swimming against the currents of what recent history has taught about large scale computing and the result is a significant amount of waste, says Univa CEO, Gary Tyreman, who believes that Hadoop shared-infrastructure environments are on the rise. Read more…

Study: Reverse Debuggers can Decrease Debugging Time by an Average of 26%

According to a <a href="http://www.roguewave.com/company/news-events/press-releases/2013/university-of-cambridge-reverse-debugging-study.aspx?utm_source=HPCwire&utm_medium=Spotlight&utm_campaign=ITS-20130325" target="_blank">recent study</a> at the University of Cambridge, researchers found that when respondents use reverse debugging tools, like Rogue Wave’s <a href="http://www.roguewave.com/products/totalview/replayengine.aspx?utm_source=HPCwire&utm_medium=Spotlight&utm_campaign=ITS-20130325" target="_blank">Replay Engine</a>, their debugging time decreased by an average of 26%. Developers, like you, can leverage these time savings to develop additional products, features, and capabilities. Read more…

Python Wraps Around Big, Fast Data

Python is finding its way into an ever-expanding set of use cases that fall into both the high performance computing and big data buckets. Read more…

How Ford is Putting Hadoop Pedal to the Metal

Ford Motor, like any other company at its scale, has been contending with a slew of big data problems--a matter that is being complicated by ever-growing additions to the data feed, from its own internal feeds to the terabytes of vehicle-fed machine data. We talked in depth with Ford's data science lead about the company's tough vendor and technology decisions around Hadoop and where the real value of such.... Read more…

Juneja: HPC, Cloud, and Open Source the Nexus of Big Data Innovation

It should come as no surprise that Intel is converging HPC, cloud and open source, said Intel's big data chief at the Strata 2013 conference this week. Read more…

SGI Plants Big Data Seeds in HPC

Mannel and Conway talked this week about SGI and their various big data functions. Their focus was specifically data-intensive high performance computing, otherwise known as high performance data analytics (HPDA). The talk touched on both the rise of big data within HPC and how SGI has molded to that. Read more…

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real-Time

Continuous analysis of fast-changing operational data unlocks the potential to extract important patterns. Big data systems such as Hadoop are not well suited for this challenge, however, in-memory data grids (IMDGs) offer breakthroughs that enable real-time analysis of fast-changing data. Recent measurements demonstrate that an IMDG can deliver a complete map/reduce analysis every four seconds across a terabyte data set which is being updated continuously.<br /> Read more…

Marking Spikes for the Enterprise Graph

In some ways, the story of Sun Microsystems closely parallels what’s happening with the present merge between high performance computing and the more enterprise-geared momentum behind big data. According to a former Sun lead on the R&D side and current research and development exec at Cray's big data arm, this story can be.... Read more…

BioInformatics: A Data Deluge with Hadoop to the Rescue

Apache Hadoop-based massively parallel processing is well suited to address many challenges in the growing field of BioInformatics. BioInformatics is not a “spectator sport”; this article explains how to get started via hands-on experience with the FDA Adverse Event Reporting System (FAERS). Read more…

Using In-Memory Data Grids for Global Data Integration

By enabling extremely fast and scalable data access even under large and growing workloads, in-memory data grids (IMDGs) have proven their value in storing fast-changing application data, such as financial trading data, shopping data, and much more. As organizations work to efficiently access their critical business data across multiple sites or scale their processing into the cloud, the need rapidly has grown to quickly and seamlessly migrate data where it is needed. The use of IMDGs creates an exciting opportunity for organizations to employ powerful global strategies for data sharing. Federating IMDGs across multiple sites enables seamless, transparent access to data from any site and provides an ideal solution to the challenge of global data integration. Read more…

TIBCO CTO Flashes R&D Future for Fast Data

This week we spent quality time with software giant TIBCO's CTO, Matt Quinn to discuss what the future of big data research and development looks like--and what trends are on the horizon in the post-Hadoop ecosystem. From network and switch tweaks to getting a handle on... Read more…

The GPU “Sweet Spot” for Big Data

GPUs have stirred some vicious waves in the supercomputing community, and these same performance boosts are being explored for large-scale data mining by a number of enterprise users. During our conversation with NVIDIA's Tesla senior manager for high performance computing, Sumit Gupta, we explored how traditional data... Read more…

MapReduce Makes Further Inroads in Academia

Most conversations about Hadoop and MapReduce tend to filter in from enterprise quarters, but if the recent uptick in scholarly articles extolling its benefit for scientific and technical computing applications is any indication, the research world might... Read more…

Using an In-Memory Data Grid for Near Real-Time Data Analysis

With the ever increasing explosion in data for analysis and the need for fast insights on emerging trends, in-memory data grids (IMDGs) offer a highly attractive platform for hosting map/reduce analysis. In comparison to disk-based map/reduce platforms such as Hadoop, IMDGs reduce analysis times by reducing data motion while simplifying the development model. For applications which need to analyze fast-changing application data, such as shopping or financial trading data, IMDGs can provide near real-time results. Read more…

Top 5 Challenges for Hadoop MapReduce in the Enterprise

Reporting and analysis tools help businesses make better quality decisions faster. The source of information that enables these decisions is data. There are broadly two types of data: structured and unstructured. Recently, IT has struggled to deliver timely analysis using data warehousing architectures designed for batch processing. These architectures can no longer meet demand owing to rapidly rising data volumes and new data types that beg for a continuous approach to data processing. Read more…

Midwest Growing Hub for HPC, Big Data

From the outside, it might seem like most technology innovation filters through Silicon Valley and other large tech hubs on America's east coast. As Gary Stiehr argues, midwest centers like St. Louis are growing big data and high performance computing hot spots with startup activity that... Read more…

Datanami