Follow Datanami:
June 29, 2020

LLNL Data Science Challenge Continues Amidst COVID-19 Lockdowns

June 29, 2 020 — The COVID-19 pandemic and its subsequent restrictions on gatherings and travel have forced institutions and companies around the world to rethink how they offer their summer programs and internships, and Lawrence Livermore National Laboratory (LLNL) is no exception.

Due to the COVID-19 pandemic and shelter-in-place restrictions, this year’s Data Science Challenge with the University of California, Merced was an all-virtual offering. The two-week challenge involved 21 UC Merced students who worked from their homes through video conferencing and chat programs to develop machine learning models capable of differentiating potentially explosive materials from other types of molecules. Photo courtesy of Marisol Gamboa/LLNL.

This year’s Data Science Challenge with the University of California, Merced, the second such event of its kind, was an all-virtual offering and an experiment in distance learning and online collaboration. The two-week challenge involved 21 UC Merced students (16 undergraduate and five graduate students) who worked from their homes through video conferencing and chat programs to develop machine learning models capable of differentiating potentially explosive materials from other types of molecules.

While the unique circumstances forced organizers to adapt, the goals of the challenge remained the same: to encourage students to pursue graduate degrees, expose them to real-world Laboratory data science projects, provide the experience of working in a multidisciplinary team and make students aware of LLNL as a possible career opportunity.

“It was a really heavy lift this time around,” said lead organizer Marisol Gamboa. “The times are definitely challenging, but I’m really excited about the leaps and bounds we made to create this cohesive team environment. I’m even more proud of the Lab for allowing us to take a stab at this. It really says a lot about Livermore. We decided this was important to do for these students and that the benefits would outweigh the risks. The two weeks we selected may be our only opportunity to have the students’ attention.”

Offered through the Lab’s Data Science Institute (DSI) and the Center for Applied Scientific Computing (CASC), program organizers selected the students just prior to the lockdown restrictions. When the Lab site shuttered, Gamboa, along with administrator Jennifer Bellig and UC Merced applied math professor Suzanne Sindi, had to quickly brainstorm how to make the challenge work virtually, essentially rebuilding it from the ground up.

“We had all these flowcharts and task lists from the previous year and basically we had to get rid of them and start over,” Bellig said. “We just got creative. Everybody had this mindset of ‘we’re going to will this to happen; let’s figure this out.’ It’s cool to be part of something like that and see it come to fruition. It’s really been an encouragement to me during this down time.”

UC Merced provided support to students to ensure they had the tools they needed to conduct research in the new virtual environment. The university supplied laptops for students and shipped hotspots to two team leads with unstable wi-fi connections. UC Merced IT System Administrator Sarvani Chadalapaka also created guest accounts on the university’s MERCED high performance computing cluster so students could scale up their computational analyses.

“It was definitely a challenge, but Marisol and Jennifer were fantastic partners,” Sindi said. “We had so many planning meetings and as we worked through each potential problem. We gained more and more confidence that not only would the program work this year, it would be awesome. Everyone I worked with at UC Merced was deeply committed to making this year’s challenge a success.”

Organizers distributed the five graduate-level student team leads among five teams of undergraduate and recently graduated students. Each team also was paired with one Lab mentor who was available for guidance and to answer any questions the students had along the way. Prior to the challenge, the mentors met virtually with student team leads to address any anxieties and preconceptions they had.

Facing the challenge

On the first day of the challenge, June 1, students and mentors introduced themselves via WebEx, and Lab employees presented overviews of the Lab, CASC (by center director Jeff Hittinger) and the DSI (by institute director Mike Goldman). The mentors then discussed the challenge problem, centered on using machine learning to develop a classifier for potential explosive materials given only their molecular structures.

The machine learning algorithms were to be trained on a dataset of 400 known explosive compounds and about 5,000 pharmaceutical drug compounds, with students tasked to calculate chemistry features and come up with models capable of differentiating them. The problem is tied to actual Lab work and is difficult, mentors said, because drug molecules can look very similar to explosives but behave differently based on how they are bonded or where they are positioned in 3D space.

“At the Lab, we’ve been working on machine learning applied to materials science and we wanted to include a chemistry component to make the program more interdisciplinary,” said mentor and LLNL computer scientist Brian Gallagher. “It’s nice that it’s a real problem, it’s compelling and easy to understand. It gives students a chance to try something where we don’t really know the answer, so we can work on it together. From my perspective, it worked out as well as I could’ve hoped. By the end of the first day I started to get a good feeling because I could tell everybody was going to get something out of it.”

The Lab mentors, including Gallagher, Donald Loveland, Phan Nguyen, Piyush Karande and Anna Hiszpanski, told students that while there were numerous ML techniques and models to choose from, there was no “silver bullet” known to best solve the problem. Students were encouraged to experiment with a range of techniques to find out what worked best for them, and to try to understand why the models made the predictions they did. After the mentors introduced a few general machine learning approaches and different aspects of machine learning, they set the students loose.

The UC Merced students were challenged to use machine learning to develop a classifier for potential explosive materials given only their molecular structures. The algorithms were trained on a dataset of 400 known explosive compounds and about 5,000 pharmaceutical drug compounds, with students tasked to calculate chemistry features and come up with models capable of differentiating them. Photo courtesy of LLNL. 

Each day began with status updates and check-ins with mentors, followed by working sessions over Zoom, WebEx and Microsoft Teams. Students used an open-source cheminformatics software called RDkit to package the molecules and coded in Python. To recreate the experience of in-person collaboration and stimulate engagement, students were encouraged to use their webcams as much as possible. Algorithm code and other files were uploaded daily through Box, a shared collaborative space where students and team leads could interact in real time, and teams reported on their progress at each day’s end. Students said while the online-only interaction took some getting used to, it didn’t take long to get into a flow.

“This whole thing has been a process of understanding how to collaborate,” said Maia Powell, a third-year Ph.D. student in applied math at UC Merced and one of the team leads. “It’s always difficult working with code (in an online environment) but we’ve made it work. Everyone at Livermore made it point to say ‘It’s not about the results that you get, it’s about what you learn.’ I feel like we got so much done over two weeks and learned so much in a short amount of time.”

Powell, a recipient of a National Science Foundation Graduate Research Fellowship, said she had always wanted to explore machine learning and applied for the challenge hoping to gain leadership experience. The challenge not only “demystified” ML for her, but also taught her a lot about teamwork.

“Working on this interdisciplinary team has been really interesting and what I imagine working at the Lab is like,” Powell said. “I’ve met a lot of Livermore employees at conferences, and I feel like they all really like their jobs and are excited to talk about it, so interacting with them has been the most helpful in terms of what to expect if I were to have an internship or eventually work at the Lab.”

Learning made easy

UC Merced student Arianna Malakis, who grew up and resides in Livermore, graduated in the spring with a degree in cognitive science and is seeking a career in data science. She said she was drawn to the challenge to learn more about the Lab and big data. Through the support of her teammates and mentor, Malakis said she was able to work through any pitfalls and discovered machine learning wasn’t as scary as she’d been led to believe.

“I came into this absolutely terrified that I would have to learn all these equations that I was never exposed to before,” Malakis said. “Then I’m reading all the guides and doing my research and realized it’s manageable — it’s not typing in lines and lines of math I’ve never taken. It’s also made me realize I’m on the right path. I’ve loved this challenge and I’ve loved every problem I’ve encountered, because when I get past it, I feel so satisfied, and when my team makes a breakthrough, I feel happy. This challenge has shown me that machine learning is easy to learn and that anyone can do it, and that I’m doing exactly what I want to be doing.”

Pedro Torres, a first-generation college student going into his senior year at UC Merced in computer science, also was new to machine learning but said he now knows how to implement it. Although, like many of the students, Torres was initially disappointed that he didn’t get to work on site, he found the online collaboration engaging and educational.

“I’ve been having fun with my own team and getting to know them and collaborating together,” Torres said. “Even if it’s only two weeks, I feel like there’s a sense that you get some form of experience before it becomes your actual job. I’m glad that programs like this exist so I can get a better idea of where I stand, how I can improve and, going forward, what I need to study more. It gave me a good idea of what the Lab is doing.”

With access to the Lab restricted, this year’s tour of the National Ignition Facility was virtual. In addition to the daily working sessions, organizers held online workshops on professional development, including job qualifications, how to prep for interviews and resume writing. And to recreate the social dynamic that students had from working in the same building last summer, organizers held virtual social times, where students played trivia and participated in a scavenger hunt, collecting items from their own homes. The students also attended online lectures from Lab scientists Laura Kegelmeyer and Marisa Torres, who inspired the students to think about what they wanted from their own careers.

“Talking to someone professionally who works at the Lab and hearing about their job and seeing their happiness just radiate through was the most impactful thing for me,” Malakis said. “The entire time I thought, ‘wow she loves her job.’ I want to have that much love for whatever I do when I have a career.”

For the results and full story, visit https://www.llnl.gov/news/lockdown-doesnt-hinder-annual-data-science-challenge


Source: Lawrence Livermore National Laboratory

Datanami