As Principal Member of Technical Staff at Sandia National Labs, I work on large scale data management related issues and work with people from around the world. I am the [email protected] Chair this year, home of this new program within the overall SC organization. My first SC was 2006 as a grad student and I have attended every year since. At SC17, Michela Taufer asked me to work with the lead student volunteers (now SCALE) for SC19, pulling me into the planning and organization side for [email protected] where I have worked each year since.
Why did we create this new competition?
There were two motivations. The first is that data science has been incorporated into SC for several years now, but it hasn’t been represented well within the [email protected] program. It is increasingly clear that no matter what students get involved in, they will have to analyze large data sets to learn something. This is true for humanities, social sciences, physical sciences, and clearly many commercial applications. Finding a good way to bring a stronger program for students in this area would both bolster the [email protected] program with a broader audience we have not been as focused on and to better help emphasize the rightful place of Data Science at SC.
Second, when I attended the CHPC annual conference in December 2019, I saw the Student Datathon Challenge and realized that that was the answer. The students were given a single data set that all teams had to find answers in. Then they had to find a second data set that had something to do with South Africa (their home country) and find an answer there. Seeing this, I knew I had to bring it to SC. Unfortunately, DIRISA has not been able to participate with the SC version of the event due to covid impact in South Africa. The two programs that already have data science components are Computing4Change and HPC in the City. They are focused on first exposure students. Once the students have participated in these programs, they have no further engagement with SC without making a concerted effort themselves. The Data Science Competition offers a way for the alumni of these programs to stay involved for another year and bring them more strongly into the [email protected] program.
When will it be held?
We will hold the Data Science Competition immediately before the SC conference and are aiming to bring the competitors to SC to attend the full conference with partial support. Should they wish to have a more integrated experience, they could also apply to be a student volunteer without any schedule conflicts with the competition. In this way, we are looking to incorporate a new group of students into the SC community.
How will it be structured?
The currently proposed structure of the Data Science Competition is similar to the DIRISA event, but split across nine days. This will enable undergraduate students to do the substantial work on the weekends and then spread the rest of the slower work across the weekdays. Consequently, the competition should not interfere with their studies, and it is less intense than the original 4- or 5-day event. The first weekend will be a fixed data set for all teams to work on. Over the subsequent weekdays, the teams have to find a data set related to the region in which SC is being held, St. Louis, and gain approval by Friday afternoon. Over the next 2 days, the teams will solve a data science question and build their presentations about what they have learned and the impacts. The teams will be judged based on these presentations with the results shared either on GitHub and/or via a journal paper. The specific structure is still being finalized and this is absolutely subject to change as we work out the details of what is possible given available resources and the number of participating teams.
Who is chairing this new program?
Kristen Brown was chosen as the chair for this competition based on her interest, alignment with her day job, and for her several years of exceptional performance as a volunteer for [email protected] With Kristen at the helm, I am confident that the competition will be successful. She has assembled a small team of dedicated experts to help her make this first event one to remember.
I started attending SC as a student volunteer in 2016 after one of my HPC & scientific visualization professors, Bruce Loftis recommended me for the program. I continued as a lead student volunteer (known as the SCALE program today) and eventually joined the planning committee. As a data scientist, I was excited to learn from Jay Lofstead that there was interest in adding this new program and bringing more students interested in the field into the HPC community.
What is this new program all about?
With the increasing amount of data science focused HPC workloads, we’re interested in expanding the opportunities for students to learn about these kinds of projects and building additional educational paths into the SC programs. We’re introducing a new, remote data science competition for teams to compete in before the conference this year as part of the [email protected] program. The fact that the competition will be held before the conference will enable the competitors to fully participate and attend SC21, and we intend to involve them in broader participation of the SC events as well. Finally, we also want to continue engaging with students interested in solving problems that are important to them and within the local community.
Applications are tentatively set to open mid-April via the SC Submissions Site (view a sample form). One team member should fill out the form for their team. The deadline to apply will be early August.
Click here to read the full post.
Source: Christine Baissac-Hayden, SC21