(Re)Importing in python - don't touch sys.modules
So I am sitting here watching David Beazley’s pycon talk about modules, packages and imports and he is talking about sys.modules
sort of guarding multiple imports which inspired me to fire up the python interpreter myself and start messing about.
Reimporting in Python - Basics
As David mentions, in python you cannot just import module
again to reload the module. The canonical albeit still bad way is to import importlib
and use importlib.reload(module)
.
I’d like to take this opportunity to disclaim here immediately and sort of spoil the conclusion by stating that you should absolutely avoid reimporting any modules in python. As you’ll see towards the end, things can get very messy very quickly when you reimport modules. It can cause severe bugs which can be virtually impossible to track down.
If you find yourself playing with the idea of reimporting modules in software that’ll be used productively consider alternatives such as writing unittests (if you’re using it to test code while developing), using multi-/subprocess to run the code in a separate interpreter, refactoring or simply restarting the interpreter.
Also in this article I deliberately try to make programs fail and break which is intended to explore features of the interpreter and standard library and not meant to be done in productive software.
Messing with sys.modules
David Beazley also mentions in his talk that the instance actually recording imported modules is located in sys.modules
which happens to be a standard python dict.
The interesting thing about that is that unlike mappingproxy
which is the dict-like object/wrapper/imitator that a lot of the builtin data structures (such as the dict
itself) use to imitate a dict while avoiding modification1 and infinite recursion2 this sys.modules
standard dict supports item assignment __setitem__
as well as deletion __delitem__
.
This got me thinking “How much does the sys.modules dict actually influence the import process.” and as it turns out a lot and it allows you to mess with it.
If you import a module, let’s call it test
, modify the file and import again (in the same interpreter instance) nothing changes, you’re still running the old code. But what happens if you delete the module from sys.modules first?
The answer: nothing at first. The code still runs, all functions that were in the module previously are still there, as is the module itself, but something odd happens if you execute import test
again: it reimports the module.
Consequences
This is would per se not be all that bad, however this hacked reload does not facilitate the same behaviour as importlib.reload
.
The difference between reloading the module this way, which I do not recommend anyone actually does, and using importlib.reload
is that this particular way of reloading only reloads the module in the current namespace.
Let’s suppose we have two modules foo.py
and bar.py
where bar
imports foo
and uses a function defined therein:
We can then do the following experiment:
As you can see using importlib.reload
reloads the module and references to the module are updated as well.4 This behavior is different if you reload using our dirty little trick.
Here the reference to foo
in bar
is not being updated which seems to indicate that this import
is overwriting the definition of the module wherever it is being kept and the old version of the code remains in the globals()
dicts of the modules using it.
What is sys.modules
?
As we have seen deleting entries in sys.modules
causes the interpreter to reload modules in import
statements, but why is that and what are the entries in sys.modules
?
Well, sys.modules
contains references to already imported modules. You can query it on the type of the entries and it tells you that the entries are actual modules, the same class/type you’d obtain when querying the module directly.
In fact the module reference in sys.modules
is the the exact same object as your module itself.
Knowing all this, here is a very crude sketch of how the __import__
function in python works which is the implementation of the import
statement.
Now if we were to delete the entry from sys.modules
__import__
would do the expensive import of the file again, since it cannot find the module in sys.modules
. It then returns the new module and adds the reference to sys.modules
which then would also point to the new module, however any module that imported name
previously still has a reference to the module object in its globals()
(or __dict__
if you prefer) dict and as such runs the old code.
As for the behavior of importlib.reload
, it reloads the module back into the original module
object and ‘fixes’ (though ‘changes’ might be the better term to use here) the references in-place.6 As a result any module that imported using import module
and then uses module.attribute
or module.function()
instead of reassigning with from module import attribute
or myattribute = module.attribute
will now have the updated, reimported version of the code.7
What it doesn’t do however is remove any keys. This means if you imported a module bar
with a function hello
and you were to edit the file, removing the function entirely or commenting it out and then reimport the module using importlib.reload
the new module object bar
still has the hello
attribute with the original function in it.
Conclusions
What should one take away from it? Don’t reimport modules.
It does not matter whether you use importlib.reload
or something worse, unless you know exactly what you’re doing and act very cautiously you’re very likely to end up with code in a state, where some parts of the program have older and some parts have newer references to the code and there’s no way for you to predict the outcome of a particular computation. Write unittests instead.
However if you feel pathologically adventurous or absolutely require dynamic reloads, try to only keep references to the top level modules and reload them individually using importlib
.
Good luck, have fun and remember that collections
is worth a look and use yield
, it’s awesome.
Fun facts and extras
What happens with importlib.reload
when you delete the module from sys.modules
?
It fails. In order to reload the module it must be in sys.modules
.
The same applies if you reassign sys.modules['bar'] = foo
. You’ll get the exact same error.
My crude implementation of importlib.reload
-
dict.__dict__.__getitem__ = 8
results inAttributeError: 'mappingproxy' object attribute '__getitem__' is read-only
↩ -
Otherwise any
dict
would have an instance dictdict.__dict__
which would have an instance dictdict.__dict.__.__dict__
of type dict which would have an instance dict and so on. ↩ -
The reason why calling
foo.hello
refers to the new code instead of the old one, is becauseimport
overwrites it’s value in our currentglobals()
dict when we use it. As such it reloads the module for whatevernamespace we happened to be in. ↩ -
This does not work if you reassign contents of the imported module. I you do something like
var = module.other_var
change the value ofother_var
and reloadvar
will still have the old value. That applies to functions and variables as well asfrom module import symbol
imports. From this I can only assume thatimportlib.reload
changes the module object in place rather than replace it. ↩ -
This, again, is not the actual implementation of the
__import__
function but rather a very crude approximation for the purposes of this article. For instance this function could not deal at all with importing submodules, such asfoo.bar
↩ -
Which you can actually do yourself.
module
objects are not immutable and you can freely assign, remove or alter any part of itmodule.foo = 0
ormodule.bar = lambda k: print(k)
↩ -
The same rules apply to if you’ve altered the module
module.attribute = "new value"
ormodule.function = lambda a: print(a, "hello")
, only modules importing the base moduleimport module
will have updated refsmodule.attribute ==> "new value"
, not modules usingfrom module import attribute
ormyattr = module.attribute ==> myattr == "only value"
↩ -
Again, this is not the official implementation and strongly simplified. It also does not interact with
sys.modules
, which we know it should/does, and it again only works for top-level modules. It is only here to illustrate how some of the behavior of the function could be implemented in python not how it is actually done. ↩