Follow Datanami:

Tag: Spark

FPGA System Smokes Spark on Streaming Analytics

Technologists with decades of experience building field-programmable gate array (FPGA) systems for the federal government today unveiled a commercial FPGA offering it claims holds 100x performance advantage over Apache S Read more…

Tachyon Support Coming to Big Data Hypervisor

Organizations that are deploying Apache Spark to do data science on big data may be inclined to invest in Tachyon, the in-memory file system that was developed next to Spark at the AMPlab. Getting Spark and Tachyon spun Read more…

Three Ways Big Data and HPC Are Converging

Big data is becoming much more than just widespread distribution of cheap storage and cheap computation on commodity hardware.  Big data analytics may soon become the new “killer app” for high performance computing Read more…

What You May Have Missed at Strata + Hadoop World 2014

Talk about information overload. If you were one of the lucky 5,000 to attend the Strata + Hadoop World conference last week, then you were subject to a marathon session of big data keynotes delivered continually for the Read more…

A Storyboard Approach to Big Data Insights

Big data by itself is just a worthless collection of numbers and characters. To make big data work, you need to show how the information is meaningful. Taking a storytelling approach to analytics is one way to put big da Read more…

Hadoop ISVs Break Away from MapReduce, Embrace Spark, In-Memory Processing

Big data analytic software vendors who run on Hadoop are increasingly replacing their MapReduce engines with Apache Spark and other in-memory analytic engines as the runtime of choice. Many of these next-gen Hadoop vendo Read more…

Hortonworks Goes Broad and Deep with HDP 2.2

From full support for Apache Spark, Apache Kafka, and the Cascading framework to updated management consoles and SQL enhancements in Hive, there's something for everybody in Hortonworks' latest Hadoop distribution, which Read more…

How Streaming Analytics Helps Telcos Overcome the Data Deluge

Real-time streaming analytics is all the rage these days, as organizations seek to wring value from their data as quickly as possible. While the technology is bleeding edge for many, it's commonplace in the telecommunica Read more…

Apache Spark Gets YARN Approval from Hortonworks

Hortonworks today announced that Apache Spark is certified to work with YARN, the quarterback calling plays in next-gen Hadoop v2 clusters. The YARN stamp of approval clears the way for Hortonworks to fully support Spark Read more…

Cascading Now Supports Tez–Spark and Storm Up Next

Concurrent, the company behind the open source Cascading framework, today unveiled a major update that will allow its customers to migrate their Hadoop applications from using MapReduce to use the new Apache Tez engine, Read more…

Crossing the Big Data Stream with DataTorrent

Enterprises eager for a competitive edge are turning to in-memory stream processing technologies to help them analyze big data in real time. The Apache Spark and Storm projects have gained lots of momentum in this area, as have some analytic NoSQL databases and in-memory data grids. Another streaming technology worth keeping an eye on is DataTorrent. Read more…

Hortonworks Keen on Cascading-Tez Combo

In the future, it will be easier to build big data applications, and they'll run faster and utilize more real-time data than today's apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they'll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark. Read more…

How Fast Data is Driving Analytics on the IoT Superhighway

The promise of big data is morphing into the fast data opportunity. Unless you have the capability to respond to the Internet of Things and the trillions of data points generated by smartphones, sensors, and social media, the business opportunities of fast data can pass you by. Read more…

Glimpsing Hadoop’s Real-Time Analytic Future

There's been a lot said about the need to move Hadoop away from the batch paradigm and remake it as a real-time system. But how will that work when it comes to the heavy-duty machine learning models and predictive analytics? That's an area that hasn't been fleshed out entirely, and is an area where Cloudera may help pave the way with its Oryx project. Read more…

Databricks Moves to Standardize Apache Spark

Databricks, the company behind open source Apache Spark, today rolled out a certification program that creates a Spark standard that big data analytic application developers can write to, and that customers can rely on. It's a smart move by Databricks, which is looking to avoid the forking that has clouded Hadoop's march into the enterprise. Read more…

Astronomical Algorithm Powers Data Analytics Startup

Astronomers at the national labs have enjoyed a handy fallback plan when faced with a glut of images that need analysis: grad students. So when researchers at UC Berkeley developed machine learning algorithms that could automatically scan these images, not only did grad students need something else to do between classes, but the developers realized they might have the makings of a winning big data analytics business plan on their hands. Read more…

Picking the Right Tool for Your Big Data Job

There is a lot of debate in the big data space about tools and technology, and which ones are best. Is SQL better than NoSQL? Hadoop or Spark? What about R or Python? Of course no single tool or technology is the best for all situations, and you would do well to pick the right tool or technology for the job at hand. Read more…

Lessons In Machine Learning From GE Capital

The financial services industry is always on the cutting edge, and so it is with machine learning at GE Capital, the lending and leasing arm of the industrial giant. Read more…

Apache Spark: 3 Real-World Use Cases

The Hadoop processing engine Spark has risen to become one of the hottest big data technologies in a short amount of time. And while Spark has been a Top-Level Project at the Apache Software Foundation for barely a week, the technology has already proven itself in the production systems of early adopters, including Conviva, ClearStory Data, and Yahoo. Read more…

Spark Graduates Apache Incubator

As we've touched on before, Hadoop was designed as a batch-oriented system, and its real-time capabilities are still emerging. Those eagerly awaiting this next evolution will be pleased to hear about the graduation of Apache Spark from the Apache Incubator. On Sunday, the Apache Spark Project committee unanimously voted to promote the fast data-processing tool out of the Apache Incubator. Read more…

Datanami