Memray Shines a Light on Python-C/C++ Memory Problems
Data scientists and other developers who are frustrated with the inability to see what’s going on with their hybrid Python-C/C++ applications will appreciate Memray, a new open source memory profiler created by Bloomberg’s Pablo Galindo that crosses the code boundary to show developers exactly what’s going wrong.
Python is unique among scripted languages in that it can bind to compiled languages, such as C and C++. This is helpful for developers who want the low-level performance and speed offered by C and C++, but without giving up the readability and simplicity of Python and its API.
There are plenty of memory profilers for Python and plenty for C and C++, but up to this point, there hasn’t been a memory profiler that can work with both Python and C/C++ simultaneously, says Galindo, who is a core Python developer, a Python Steering Council member, and release manager for versions 3.10 and 3.11 of the world’s most popular computer language.
“It was a problem,” Galindo says, “but nobody had tried to solve it.”
The key is being able to follow the application as it moves from Python into C or C++ code. The developer needs to see what’s happening as it’s crossing the boundary. When most Python memory profilers enter the C++ world, they cannot tell you anything worthwhile, he says.
“They say ‘We know there is something over here, but we cannot see,’” Galindo tells Datanami. “You hear a sound in the other room, but you can’t hear. Our profiler is the only one that can hear in the room.”
Memray, which runs only on Linux, is available for download from Bloomberg’s GitHub repository. In addition to working with C and C++ bindings, it also can work with native code, giving users the abilty to profile their NumPy and Pandas code, for example. It also sports a live mode that allows the user to run code in the background and see how the memory is used.
The core function of the profiler is to examines memory allocation in the Python application and tell the user what the problems are. The software does this well, Galindo says. “It tells the developer not only where the problems are, but how they are appearing, how they happened and what they can do to solve the problem,” he says.
Galindo, whose day job is managing the Python infrastructure at Bloomberg, started creating Memray about a year ago after Bloomberg developers came to his team asking for a solution for Python application memory leaks. Bloomberg, which was a C++ shop before transitioning to become a Python shop, still has a lot of C++ around.
“The problem is that you have a script or a gigantic application and suddenly it’s using a lot of memory and you don’t know why,” Galindo says. “This is quite common these days especially with Python because Python is very far from where the memory management happens.”
After researching the issue with hybrid Python-C/C++ applications and finding that there was nothing available in the commercial or open source markets, Galindo and his teak took it upon themselves to build it. Memray was not easy to build, he says, but it was worth it because of how useful it will be to many Python developers around the world, Galindo says.
“We put a lot of effort into making it easy to use,” he says. “You need experience from both sides, from the Python world and the C++ world, and you need to merge them into one, and that is extremely difficult to do because they are such a different two languages and they have such a different set of problems in the same space.”
With Memray following application threads from Python into C++, there is no longer a black hole of memory leaks costing Bloomberg money, either in the form of overprovisioned machines or the need to continuously restart machines to clear the memory. Galindo’s hope is that others will see a benefit from Memray, too.
Editor’s note: This article has been corrected. Galindo is not the release manager for Python version 3.12. Datanami regrets the error.