TL;DR: Django has tremendous memory leaks for any long running task with many SQL operations with debug on.


Solution: Turn debug off.




So I just spent a long time digging this up. We have a management command that runs a ton of SQL statements and queries/inserts all kinds of objects into the database. I was attempting to run it locally but it kept getting slower and slower over time and the memory was just linearly expanding. I naively first did a really simple profile with heapy. This told me that I had a whole lot of strings taking up a whole lot of memory. My first inclination was that maybe python was doing something dumb by not interning strings properly or by not garbage collecting properly. I added a few intern() functions on what I thought might be candidates for duplication, and i explicitly garbage collected throughout the script. It had zero effect.


So I had to dig deeper with heapy. I found this really good heapy tutorial, which I barely understood, but could map to my own memory situation to dig into what was actually causing the issue. Finally I traced it down to a single giant dictionary associated with the following class:        django.db.backends.postgresql_psycopg2.base.DatabaseWrapper. So at this point I immediately googled “django orm memory leak” which quickly led me to a few random blog posts that indicated that django stores all queries ever run in debug mode, until you explicitly call db.reset_queries(). You don’t bump into this normally because it’s automatically called at the end of every HTTP request.


Anyway, debug off, memory is now beautifully constant.