import bug [Python]

Prev: Problems with cx_Oracle and Oracle 11.1 on Windows
Next: ANN: python-ldap-2.3.10

From: Steven D'Aprano on 31 Oct 2009 21:27

On Sat, 31 Oct 2009 16:27:20 +0000, kj wrote:

>>1) it's a bad idea to name your own modules after modules in the stdlib
>
> Obviously, since it leads to the headaches this thread illustrates. But
> there is nothing intrisically wrong with it. The fact that it is
> problematic in Python is a design bug, plain and simple. There's no
> rational basis for it,

Incorrect. Simplicity of implementation and API is a virtue, in and of
itself. The existing module machinery is quite simple to understand, use
and maintain. Dealing with name clashes doesn't come for free. If you
think it does, I encourage you to write a patch implementing the
behaviour you would prefer.

In addition, there are use-cases where the current behaviour is the
correct behaviour. Here's one way to backport (say) functools to older
versions of Python (untested):

# === functools.py ===

import sys

if sys.version >= '2.5':
# Use the standard library version if it is available.
old_path = sys.path[:]
del sys.path[0] # Delete the current directory.
from functools import *
sys.path[:] = old_path # Restore the path.
else:
# Backport code you want.
pass

> and represents an unreasonable demand on module
> writers, since contrary to the tight control on reserved Python
> keywords, there does not seem to be a similar control on the names of
> stdlib modules. What if, for example, in the future it was decided that
> my_favorite_module name would become part of the standard library? This
> alone would cause code to break.

Not necessarily. Obviously your module my_favorite_module.py isn't
calling the standard library version, because it didn't exist when you
wrote it. Nor are any of your callers. Mere name clashes alone aren't
necessarily an issue. One problem comes about when some module you import
is modified to start using the standard library module, which conflicts
with yours. Example:

You have a collections module, which imports the standard library stat
module. The Python standard library can safely grow a collections module,
but what it can't do is grow a collections module *and* modify stat to
use that.

But in general, yes, you are correct -- there is a risk that future
modules added to the standard library can clash with existing third party
modules. This is one of the reasons why Python is conservative about
adding to the std lib.

In other words, yes, module naming conflicts is the Python version of DLL
Hell. Python doesn't distinguish between "my modules" and "standard
modules" and "third party modules" -- they're all just modules, there
aren't three different implementations for importing a module and you
don't have to learn three different commands to import them.

But there is a downside too: if you write "import os" Python has no
possible way of knowing whether you mean the standard os.py module or
your own os.py module.

Of course, Python does expose the import machinary to you. If avoiding
standard library names is too much a trial for you, or if you are
paranoid and want to future-proof your module against changes to the
standard library (a waste of time in my opinion), you can use Python's
import machinery to build your own system.

--
Steven

From: Gabriel Genellina on 1 Nov 2009 00:38

En Sat, 31 Oct 2009 12:12:21 -0300, kj <no.email(a)please.post> escribi�:

> I'm running into an ugly bug, which, IMHO, is really a bug in the
> design of Python's module import scheme.

The basic problem is that the "import scheme" was not designed in advance.
It was a very simple thing at first. Then came packages. And then the
__import__ builtin. And later some import hooks. And later support for zip
files. And more import hooks and meta hooks. And namespace packages. And
relative imports, absolute imports, and mixed imports. And now it's a mess.

> Consider the following
> directory structure:
> [containing a re.py file in the same directory as the main script]
>
> If I now run the innocent-looking ham/spam.py, I get the following
> error:
>
> % python26 ham/spam.py
> Traceback (most recent call last):
> [...]
> File "/usr/local/python-2.6.1/lib/python2.6/string.py", line 116, in
> __init__
> 'delim' : _re.escape(cls.delimiter),
> AttributeError: 'module' object has no attribute 'escape'

> My sin appears to be having the (empty) file ham/re.py. So Python
> is confusing it with the re module of the standard library, and
> using it when the inspect module tries to import re.

Exactly; that's the root of your problem, and has been a problem ever
since import existed.

En Sat, 31 Oct 2009 13:27:20 -0300, kj <no.email(a)please.post> escribi�:

>> 2) this has been fixed in Py3
>
> In my post I illustrated that the failure occurs both with Python
> 2.6 *and* Python 3.0. Did you have a particular version of Python
> 3 in mind?

If the `re` module had been previously loaded (the true one, from the
standard library) then this bug is not apparent. This may happen if re is
imported from site.py, sitecustomize.py, any .pth file, the PYTHONSTARTUP
script, perhaps other sources...

The same error happens if ham\spam.py contains the single line: import
smtpd, and instead of re.py there is an empty asyncore.py file; that fails
on 3.1 too.

En Sat, 31 Oct 2009 22:27:09 -0300, Steven D'Aprano
<steve(a)remove-this-cybersource.com.au> escribi�:
> On Sat, 31 Oct 2009 16:27:20 +0000, kj wrote:
>
>>> 1) it's a bad idea to name your own modules after modules in the stdlib
>>
>> Obviously, since it leads to the headaches this thread illustrates. But
>> there is nothing intrisically wrong with it. The fact that it is
>> problematic in Python is a design bug, plain and simple. There's no
>> rational basis for it,
>
> Incorrect. Simplicity of implementation and API is a virtue, in and of
> itself. The existing module machinery is quite simple to understand, use
> and maintain.

Uhm... module objects might be quite simple to understand, but module
handling is everything but simple! (simplicity of implem...? quite simple
to WHAT? ROTFLOL!!! :) )

> Dealing with name clashes doesn't come for free. If you
> think it does, I encourage you to write a patch implementing the
> behaviour you would prefer.

I'd say it is really a bug, and has existed for a long time.
One way to avoid name clashes would be to put the entire standard library
under a package; a program that wants the standard re module would write
"import std.re" instead of "import re", or something similar.
Every time the std package is suggested, the main argument against it is
backwards compatibility.

> In addition, there are use-cases where the current behaviour is the
> correct behaviour. Here's one way to backport (say) functools to older
> versions of Python (untested):

You still would be able to backport or patch modules, even if the standard
ones live in the "std" package.

En Sat, 31 Oct 2009 12:12:21 -0300, kj <no.email(a)please.post> escribi�:

> I've tried a lot of things to appease Python on this one, including
> a liberal sprinkling of "from __future__ import absolute_import"
> all over the place (except, of course, in inspect.py, which I don't
> control), but to no avail.

I think the only way is to make sure *your* modules always come *after*
the standard ones in sys.path; try using this code right at the top of
your main script:

import sys, os.path
if sys.argv[0]:
script_path = os.path.dirname(os.path.abspath(sys.argv[0]))
else:
script_path = ''
if script_path in sys.path:
sys.path.remove(script_path)
sys.path.append(script_path)

(I'd want to put such code in sitecustomize.py, but sys.argv doesnt't
exist yet at the time sitecustomize.py is executed)

--
Gabriel Genellina

From: Steven D'Aprano on 1 Nov 2009 01:54

On Sun, 01 Nov 2009 01:38:16 -0300, Gabriel Genellina wrote:

>> Incorrect. Simplicity of implementation and API is a virtue, in and of
>> itself. The existing module machinery is quite simple to understand,
>> use and maintain.
>
> Uhm... module objects might be quite simple to understand, but module
> handling is everything but simple! (simplicity of implem...? quite
> simple to WHAT? ROTFLOL!!! )

I stand corrected :)

Nevertheless, the API is simple: the first time you "import name", Python
searches a single namespace (the path) for a module called name. There
are other variants of import, but the basics remain:

search the path for the module called name, and do something with the
first one you find.

>> Dealing with name clashes doesn't come for free. If you think it does,
>> I encourage you to write a patch implementing the behaviour you would
>> prefer.
>
> I'd say it is really a bug, and has existed for a long time.

Since import is advertised to return the first module with the given name
it finds, I don't see it as a bug even if it doesn't do what the
programmer intended it to do. If I do this:

>>> len = 1
>>> def parrot(s):
.... print len(s)
....
>>> parrot("spam spam spam")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in parrot
TypeError: 'int' object is not callable

it isn't a bug in Python that I have misunderstood scopes and
inadvertently shadowed a builtin. Shadowing a standard library module is
no different.

> One way to
> avoid name clashes would be to put the entire standard library under a
> package; a program that wants the standard re module would write "import
> std.re" instead of "import re", or something similar. Every time the std
> package is suggested, the main argument against it is backwards
> compatibility.

You could do it in a backwards compatible way, by adding the std package
directory into the path.

--
Steven

From: Gabriel Genellina on 1 Nov 2009 15:34

En Sun, 01 Nov 2009 02:54:15 -0300, Steven D'Aprano
<steve(a)remove-this-cybersource.com.au> escribi�:
> On Sun, 01 Nov 2009 01:38:16 -0300, Gabriel Genellina wrote:

>>> Incorrect. Simplicity of implementation and API is a virtue, in and of
>>> itself. The existing module machinery is quite simple to understand,
>>> use and maintain.
>>
>> Uhm... module objects might be quite simple to understand, but module
>> handling is everything but simple! (simplicity of implem...? quite
>> simple to WHAT? ROTFLOL!!! )
>
> I stand corrected :)
> Nevertheless, the API is simple: the first time you "import name", Python
> searches a single namespace (the path) for a module called name. There
> are other variants of import, but the basics remain:
>
> search the path for the module called name, and do something with the
> first one you find.

Sure, beautiful, a plain and simple search over a list of directories.
That's how it worked in Python 1.4, I think...
Now you have lots of "hooks" and even "meta-hooks": sys.meta_path,
sys.path_hooks, sys.path_importer_cache. And sys.path, of course, which
may contain other things apart of directory names (zip files, eggs, and
even instances of custom "loader" objects...). PEP 302 explains this but
I'm not sure the description is still current. PEP369, if approved, would
add even more hooks.
Add packages to the picture, including relative imports and __path__[]
processing, and it becomes increasingly harder to explain.
Bret Cannon has rewritten the import system in pure Python (importlib) for
3.1; this should help to understand it, I hope.
The whole system works, yes, but looks to me more like a collection of
patches over patches than a coherent system. Perhaps this is due to the
way it evolved.

>>> Dealing with name clashes doesn't come for free. If you think it does,
>>> I encourage you to write a patch implementing the behaviour you would
>>> prefer.
>>
>> I'd say it is really a bug, and has existed for a long time.
>
> Since import is advertised to return the first module with the given name
> it finds, I don't see it as a bug even if it doesn't do what the
> programmer intended it to do. [...] Shadowing a standard library module
> is no different.

But that's what namespaces are for; if the standard library had its own
namespace, such collisions would not occur. I can think of C++, Java, C#,
all of them have some way of qualifying names. Python too - packages. But
nobody came with a method to apply packages to the standard library in a
backwards compatible way. Perhaps those name collisions are not considered
serious. Perhaps every user module should live in packages and only the
standard library has the privilege of using the global module namespace.
Both C++ and XML got namespaces late in their life so in principle this
should be possible.

>> One way to
>> avoid name clashes would be to put the entire standard library under a
>> package; a program that wants the standard re module would write "import
>> std.re" instead of "import re", or something similar. Every time the std
>> package is suggested, the main argument against it is backwards
>> compatibility.
>
> You could do it in a backwards compatible way, by adding the std package
> directory into the path.

Unfortunately you can't, at least not without some special treatment of
the std package. One of the undocumented rules of the import system is
that you must not have more than one way to refer to the same module (in
this case, std.re and re). Suppose someone imports std.re; an entry in
sys.modules with that name is created. Later someone imports re; as there
is no entry in sys.modules with such name, the re module is imported
again, resulting in two module instances, darkness, weeping and the
gnashing of teeth :)
(I'm sure you know the problem: it's the same as when someone imports the
main script as a module, and gets a different module instance because the
"original" is called __main__ instead).

--
Gabriel Genellina

From: MRAB on 1 Nov 2009 17:01

Gabriel Genellina wrote:
[snip]
>>> One way to avoid name clashes would be to put the entire standard
>>> library under a package; a program that wants the standard re
>>> module would write "import std.re" instead of "import re", or
>>> something similar. Every time the std package is suggested, the
>>> main argument against it is backwards compatibility.
>>
>> You could do it in a backwards compatible way, by adding the std
>> package directory into the path.
>
> Unfortunately you can't, at least not without some special treatment
> of the std package. One of the undocumented rules of the import
> system is that you must not have more than one way to refer to the
> same module (in this case, std.re and re). Suppose someone imports
> std.re; an entry in sys.modules with that name is created. Later
> someone imports re; as there is no entry in sys.modules with such
> name, the re module is imported again, resulting in two module
> instances, darkness, weeping and the gnashing of teeth :) (I'm sure
> you know the problem: it's the same as when someone imports the main
> script as a module, and gets a different module instance because the
> "original" is called __main__ instead).
>
Couldn't the entry in sys.modules be where the module was found, so that
if 're' was found in 'std' then the entry is 'std.re' even if the import
said just 're'?

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Problems with cx_Oracle and Oracle 11.1 on Windows
Next: ANN: python-ldap-2.3.10