Follow Datanami:
August 23, 2021

The Data Science of Digital Alchemy


In nature, particles will spontaneously self-assemble into crystalline structures that have certain properties. Humans have learned how to leverage this behavior to create beneficial materials in the lab, but we’re limited by the physical nature of experimentation. A group of researchers at the University of Michigan under the leadership of “digital alchemist” Sharon C. Glotzer are using data science and high-performance computing resources to predict which nano-particles will undergo self-assembly, thereby accelerating the creation of novel materials.

Glotzer, one of the world’s leading researchers into the field of nanoparticle self-assembly, heads the Glotzer Group, a group of 30 or so researchers at the University of Michigan’s  Department of Chemical Engineering and its Biointerfaces Institute. During the ACM SIGKDD 2021 conference last week, Glotzer described how her team uses data science software, HPC hardware, and human ingenuity to make the magic work.

Crystals are ubiquitous in nature. Ice and table salt are examples of crystal structures that form spontaneously when the right elements are presented with the proper conditions. But nature has far more complicated crystals hiding from view. At the microscopic level, crystal structures will self-assemble from various elements into extremely complex repeating units that feature tens of thousands of atoms each. The possible number of combinations is too great to fathom, but this is the field that Glotzer and her colleagues have dedicated themselves to understanding.

“We’re obsessed with understanding how does such complexity arise? How did the system figure out to self-organize into different crystal structures? Why does it prefer one crystal structure over another? And how does it get there? How does it do it?” Glotzer asked during her session at KDD2021, which held virtually due to the COVID pandemic.

“We know that quantum mechanics explains a lot about bonding. Thermodynamics is important in dictating what the stable phase will be. And any crystal structure that you get must obey the laws of statistical thermodynamics,” Glotzer continued. “But what we don’t have a theory for is understanding the microscopic factors that lead from disorder to order.”

(Image courtesy Glotzer Group)

Glotzer and her colleagues attack the problem logically, using data science and HPC resources. The goal is to widen our understanding of the assembly pathways through which nano-particles will self-assemble–that is, create stable crystal structures all by themselves, with a minimum of human encouragement. In the end, the creation of novel materials that have beneficial human uses across various use cases

“It’s truly an infinite design space of possibilities,” said Glotzer, who was dubbed a “digital alchemist” in a 2017 Quanta Magazine article. “Computer simulations are the perfect tool to explore design space because we can do it faster than experiments, and we can keep certain things constant and vary other things in ways that maybe experiments can’t.”

Glotzer’s team isn’t focused on metallic structures, but rather “soft matter,” things like proteins, DNA, virus capsids, and gamma particles, she said. One of the key aspects of the research is knowing which organic molecules will function as the bonding agent, or a ligand, that connect the building blocks together. DNA is one example of a ligand.

The researchers work backwards from where they want to be. “We want to start from ‘Here are the properties of behavior we want our material to have.’ Based on that, this is the structure of the crystal we need,” Glotzer said. “So based on that, what nanoparticles should we make and what bonding elements should we use, so that when we throw these particles into a bucket of water, they will self-assemble into exactly the structure that we want.”

Biochemists today have a great deal of control over the nano-particle manufacturing process. According to Glotzer, it is possible today “to make essentially any kind of nano-particle shape out of many, many types of materials, with great uniformity, so that all the particles are roughly the same size and the same shape.”

Presented with a large and compelling space filled with building blocks and glue, it is Glotzer’s job to figure out how this can all come together, and come together in the most advantageous manner.

(Image courtesy Glotzer Group)

“If I gave you a crystal structure and said, tell me what nanoparticle shape I should use? You’d be hard pressed to say what that shape should be,” she said. “If you have a bunch of different shaped particles, say, and they can all self-assemble into structures, like that clathrate structure–which one does it the best? Which one makes the best crystal with the highest yield and the highest quality? Those are the kinds of questions that we use computer simulations to try to answer.”

There are a couple of basic approaches that Glotzer takes. One approach is to create a simulation that uses molecular dynamics to predict the forces that various particles will have on each other, and the resulting structure that might form from it. The other is a Monte Carlo simulation, whereby the system mimics the Brownian motion of nanoparticles in a fluid.

“When we’re studying the systems, we don’t know what they’re going to do,” she said. “When we start with a shape, we don’t know what they’re going to make, or if they’re even going to make anything. We don’t know if they are going to self-assemble, at what concentration or pressure or temperature are they going to self assemble into that crystal structure. We don’t know any of that, and so we have to do a lot of simulation to hope that we start to see something self assemble.”

The Glotzer Group developed its own piece of code called HOOMD-blue to run the simulations. She said her team has run hundreds of thousands of simulations over 50,000 particle shapes. Because there’s so much uncertainty over what, if any, structures will form, her team needs access to a lot of computing power to make it worthwhile. That includes Summit, the 200-petaflops supercomputer installed at the Oak Ridge National Laboratory.

Glotzer Group shares the tools it developed to study self-assembly of nanoparticles (Image courtesy Glotzer Group)

“We don’t know what the size of the unit cell is, and we don’t want to influence what it would be because we have too small of a plot of particles, and so we have to have really big systems,” she said. “All of that together means that we generate boatloads of data on a daily basis, terabytes and terabytes of data, and so we need a way to organize that data so that we can do science with it.”

One of the tools that Glotzer’s team uses is signac, a lightweight, application-agnostic framework written in Python that helps users manage and scale file-based workflows. According to Glotzer, signac is the glue that connects the various components in her team’s HPC workflow together, and is critical for ensuring that the data generated is transparent, reproducible, usable by others, and extensible.

“What signac is great for is managing file-based heterogeneous data on a local file system, for searching for data, and accessing it,” she said. “You can do it within Python or on the command line, developing scalable and reproducible computational workflows, including very complex workflows.”

Picking the relevant patterns out of the morass of data created by the simulations is also important, and for this, Glotzer’s team uses unsupervised machine learning algorithms to generate the descriptors.

“The idea is to develop a microscopic understanding of assembly pathways using machine learning,” she said. “So we’re starting off with particles, and we need to have some descriptors that tell us what the local particle environment is, so that we can distinguish one crystal structure from another, and crystal cultures from a liquid, or parts of structure form other parts of crystal structure.”

University of Michigan professor Sharon C. Glotzer received her Ph.D. in theoretical soft condensed matter physics from Boston University in 1993 

One big challenge that arises here is the high dimensionality of the data. Glotzer’s group uses an algorithm called Uniform Manifold Approximation and Projection (UMAP) to reduce the dimensionality of the data while preserving its native shape in the reduced dimensional space. It also offered good performance on GPUs using Nvidia’s RAPIC library for CUDA, she said.

Armed with the continuously topological order parameter from UMAP, Glotzer’s group now has insight into how the nanoparticles will self-assemble. The color-coded results from that analysis also provide insight into the nature of the packing of the nano-particles. “We can follow these down the pathway from fluid to crystal and know how every particle environment changes in time along this pathway,” she says.

When Glotzer assembles the various UMAP embeddings, a picture begins to emerge. “We can see the whole structure of the manifold with all of the different crystal structures that can form from all of the different systems that we’re looking at, from the start as a fluid and end as a crystal,” she said. “This information helps us to engineer new assembly pathways.”

For more information on Glotzer’s work, you can visit her team’s web page at

Related Items:

What Is Data Science? A Turing Award Winner Shares His View

Clemson Software Optimizes Big Data Transfers

AI Enlisted to Track Complex Chemical Interactions in People