Follow Datanami:
February 12, 2021

MIT’s New ‘SpAtten’ Tool is Paying Attention to Your Sentences

(Wright Studio/Shutterstock)

In an episode of The Office, the character Kevin Malone famously opined: “Why waste time say lot word when few word do trick?” Indeed, language can be inefficient, leading to bloated and less-accurate natural language processing (NLP) models. This has given rise to attention mechanisms, which help NLP models identify key words, in popular models like OpenAI’s GPT-3. These tools are now also at the heart of MIT’s new “SpAtten” model, a combined hardware-software system for streamlining NLP through a robust attention mechanism.

The most powerful NLP models are robust, but come at extraordinary computational expense. “This part is actually the bottleneck for NLP models,” said Hanrui Wang, a PhD candidate at MIT and lead author of the paper presenting SpAtten, in an interview with MIT’s Daniel Ackerman. “We need algorithmic optimizations and dedicated hardware to process the ever-increasing computational demand.”

Enter SpAtten, which delivers both as a single, integrated platform. SpAtten, for instance, uses a technique called “cascade pruning” to jettison vestigial words – and all associated data work on those words – once key words are identified. SpAtten is also able to use lower-precision analysis for simpler sentences, only breaking out the big guns when faced with a complex sentence. And, on the hardware front, SpAtten is highly parallelized, allowing it to simultaneously assess every word in a given sentence. “Our system is similar to how the human brain processes language,” Wang said. “We read very fast and just focus on key words. That’s the idea with SpAtten.”

For now, SpAtten’s hardware only exists in simulations, but in those simulations, it performed over a hundred times faster than an Nvidia Titan Xp GPU (the next fastest hardware tested) and, according to MIT, a thousand times more efficiently. Combined, the speed and efficiency advantages have serious implications for reducing the energy demands of advanced NLP models in the future – assuming SpAtten’s hardware performs similarly in real life. 

“Our vision for the future is that new algorithms and hardware that remove the redundancy in languages will reduce cost and save on the power budget for data center NLP workloads,” said Wang, who went on to imagine the kinds of impacts that SpAtten-like technology could have on other major AI- and NLP-driven sectors. “We can improve the battery life for mobile phone or IoT devices. That’s especially important because in the future, numerous IoT devices will interact with humans by voice and natural language, so NLP will be the first application we want to employ.”