Memory reclaiming in Python

June 14, 2013

Running GC-based languages on embedded systems always give a challenge to limit the physical memory amount taken by the processes. Python scripting is obviously a good example what would happen if you use long-running processes and which problems you could face. Let me show my research and the way I've used to fix the memory consumption.

First, a trivial example which is used in the Internet, and which is actually wrong and doesn't show the problem:

import gc
import os
iterations = 1000000
pid = os.getpid()

def rss():
with open('/proc/%d/status' % pid, 'r') as f:
for line in f:
if 'VmRSS' in line:
return line

def main():
print 'Before allocating ', rss(),

l = []
for i in xrange(iterations):
l.append({})

print 'After allocating ', rss(),

# Ignore optimizations, just try to free whatever possible

# First kill
for i in xrange(iterations):
l[i] = None

# Second kill
l = None

# Control shot
gc.collect()

print 'After free ', rss(),

if __name__ == '__main__':
main()

Running it shows that everything is fine (here and below I use Python 2.6.8):

Before allocating VmRSS: 3344 kB
After allocating VmRSS: 149216 kB
After free VmRSS: 4748 kB

But let's use a dictionary object instead of the list now:

import gc
import os
iterations = 1000000
pid = os.getpid()

def rss():
with open('/proc/%d/status' % pid, 'r') as f:
for line in f:
if 'VmRSS' in line:
return line

def main():
print 'Before allocating ', rss(),

l = {}
for i in xrange(iterations):
l[i] = {}

print 'After allocating ', rss(),

# Ignore optimizations, just try to free whatever possible

# First kill
for i in xrange(iterations):
l[i] = None

# Second kill
l.clear()

# Third kill
l = None

# Control shot
gc.collect()

print 'After free ', rss(),

if __name__ == '__main__':
main()

Let's run it:

Before allocating VmRSS: 3348 kB
After allocating VmRSS: 179800 kB
After free VmRSS: 155300 kB

That doesn't look good, isn't it? Obviously Python manipulates dictionaries in a different way, but unfortunately it's not the good news for us.

The first guess is that Python uses PyMalloc which doesn't free the memory but reuse it later. It's fine for the desktop/server systems, but not so good for the embedded systems, because other processes have needs in memory too. Please notice that an operating system can behave differently for the embedded systems and might not send special signals to the processes to reclaim the memory (as in my case). Also PyMalloc's freelist memory pool for integers and floats is never claimed back to the operating system at all.

The second attempt is to recompile Python without PyMalloc:

$ ./configure --without-pymalloc
$ make -sj4
$ ./python test.py

Before allocating VmRSS: 3304 kB
After allocating VmRSS: 180112 kB
After free VmRSS: 155748 kB

Mostly the same numbers, so looks like PyMalloc has nothing to do with it. And actually it's true, this it not PyMalloc behaviour, but glibc. See the bug: http://bugs.python.org/issue11849 In few words, if the process allocates a lot of small objects, glibc uses different approach and to enforce releasing the memory pool one should use malloc_trim(). The patch has been applied to Python 3.3 to improve the situation (the sample code should be slightly modified to be compatible with Py3k, I skipped it here):

$ python 3.3 test.py

Before allocating VmRSS: 4780 kB
After allocating VmRSS: 193776 kB
After free VmRSS: 83288 kB

But as you can see, the problem still exists. Using memory_trim() manually in the Python memory allocator doesn't sound like a good solution, however another solution can be applied - to use not glibc memory allocator, but 3rd-side one. In my cases it's jemalloc:

$ sudo apt-get install libjemalloc1

$ LD_PRELOAD=/usr/local/lib/libjemalloc.so ./python test.py

Before allocating VmRSS: 3692 kB

After allocating VmRSS: 197780 kB

After free VmRSS: 3984 kB

Presto, problem solved! It wouldn't be so easy for certain environments, especially with prefixed API, but it's a start. Also the custom memory allocator could be applied for different processes, not only Python, and eventually it can save a lot of memory in your embedded system.

Comments

GregOctober 24, 2013 at 10:00 AM
Bravo! Very interesting.

I ran your second snippet, minus the first and second kills, on Python 2.7.3 with libjemalloc and got the same results. This probably doesn't surprise you, but I thought I'd mention it.
ReplyDelete
Replies

Add comment

Search This Blog

F = T ∇ Sτ

Memory reclaiming in Python

Comments

Post a Comment

Popular posts from this blog

Web application framework comparison by memory consumption

Trac Ticket Workflow

Shellcode detection using libemu