Follow Datanami:
October 31, 2016

RISELab Replaces AMPLab with Secure, Real-Time Focus

UC Berkeley AMPLab will shut its doors in December after six years of delivering major technological innovations like Apache Spark, Apache Mesos, and Alluxio. Taking its place is the RISELab, which will focus its efforts on delivering a secure real-time decision stack, dubbed SRDS.

Ion Stoica, the co-founder and executive chairman of Spark backer Databricks and a professor of computer science and electrical engineering at Berkeley, discussed the new RISELab last week during the Spark Summit EU 2016 conference in Brussels, Belgium.

Stoica says RISELab, which stands for Real-time Intelligent Secure Execution (or RISE), is positioned to tackle the next phase in distributed computing.

“Committed to the goal of building open-source frameworks, tools, and algorithms that make building real-time applications decisions on live data with stronger security,” according to the Databricks blog, “this new phase is set to innovate and enhance Spark with two projects—Drizzle and Opaque—Stoica said.”

Bolstering Spark’s security and real-time capabilities will be the early drivers of RISELab. To that end, the Drizzle project will be aimed at reducing Spark Streaming’s latency by a factor of ten, and bolstering its fault-tolerance as well, according to Databricks. Opaque, meanwhile, will bring stronger data encryption capabilities to Spark, for both data in motion and data at rest.riselab

References to a number of other research projects that make up the SRDS can be found in this document on the Berkeley website. For instances, there’s something called Arx that would appear to enable queries to be performed on encrypted data stored in HDFS, S3, and NoSQL databases like MongoDB and Apache Cassandra. Ray, Clipper, and Succinct also hold places next to the Spark engine in the stack.

LatticeFlow, LatticeKVS, and Bedrock are the working names of three other projects that show up in the document. LatticeFlow provides a core programming API for a new asynchronous data coordination framework; LatticeKVS would be a key-value store for storing data; while Bedrock would provide immutable, never-forget versioned “underground storage.” Ground, a “data context system” that’s currently under development at Berkeley, also appears to be one of the early RISELab projects.

RISELab has some major shoes to fill as the follow-on to AMPLab, which delivered major open source successes with the development of Apache Spark and Apache Mesos. Databricks co-founder and CTO Matei Zaharia had a hand in both of those influential projects while a PhD student under Stoica’s tutelage (Mesos, in fact, started under the AMPLab’s predecessor, the RADLab.) Alluxio, the distributed in-memory file system originally created to serve data to Spark, is also gaining traction, and is currently the most popular open source project when measured by the number of contributors.

Thirty lucky undergraduate students at UC Berkeley will get to participate in the RISELab, which will be held Mondays from 3:30 to 5:30 in Soda 405 on the UC Berkeley campus, according to this description of the class posted to Github.

riselab_1Stoica will be joined by three others and assistant professors in the electrical engineering and computer science department in teaching the class, including Professor Joe Hellerstein, who is also a co-founder of Trifacta, and assistant professors Joseph Gonzalez and Raluca Ada Popa, who are co-founders at machine-learning software company Turi and PreVeil, a security software startup, respectively.

Only time will tell what kind of mark the new RISELab will leave on the open source software community, or what commercial projects, such as Trifacta’s Wrangler or the machine learning framework from Turi (formerly Dato and GraphLab) will come out of the project. But judging by the information available, the RISELab is looking to tackle some pretty major problems, including the building of large-scale machine learning systems that pull data from the sensors all around us and power a new era of intelligent systems.

The SRDS envisioned by RISELab will deliver:

  • An analytic tool that provides 100x lower latency and 1,000x higher throughput than Spark;
  • Machine learning algorithms that produce reliable results in real-time on noisy data with unforeseen inputs;
  • Ensure user privacy and application security.

People have built such real-time decision making systems for live data, most notably in the areas of high-frequency trading and ad bidding. However, the resources required to build such highly specialized, one-off solutions present a formidable barrier to wider adoption use.

“The goal of the RISE Lab,” the class description posted to Github reads, “is to dramatically lower the barrier of building such solutions by developing a general-purpose secure real-time decision stack (SRDS). SRDS will enable many more people to build sophisticated decision and predictive analytics applications which will fundamentally change the way we interact with our world, and unlock massive value from the ever increasing amount of data collected by individuals and organizations alike.”riselab_2

Delivering robust inference models appears to be one of the areas that RISELab will focus on, particularly as the world moves toward automated systems, such as self-driving cars and artificial intelligence (AI) chat-bots. Closing the loop between the decision-making engine and feedback from those actions is a particularly thorny area that the SRDS will attempt to address, with an inference engine that works in less than 10 milliseconds.

“Enabling real-time decisions on live data will lead to a phase transition in data processing, similar to the transition from small to big data,” the class description on RISELab reads. “Just as big data led to dramatically better results even when using traditional algorithms, we believe that real-time processing on live data will lead to qualitatively superior results, by enabling rapid exploration of the search space and continuous adaptation to changes in the environment.”

Related Items:

Über File System from Alluxio Gaining Enterprise Traction

Databricks CEO on Streaming Analytics, Deep Learning, and SQL

AMPLab Releases Succinct, A New Way to Query Data in Spark

Datanami