From: kj on


I just spent about 1-1/2 hours tracking down a bug.

An innocuous little script, let's call it buggy.py, only 10 lines
long, and whose output should have been, at most two lines, was
quickly dumping tens of megabytes of non-printable characters to
my screen (aka gobbledygook), and in the process was messing up my
terminal *royally*. Here's buggy.py:



import sys
import psycopg2
connection_params = "dbname='%s' user='%s' password='%s'" % tuple(sys.argv[1:])
conn = psycopg2.connect(connection_params)
cur = conn.cursor()
cur.execute('SELECT * FROM version;')
print '\n'.join(x[-1] for x in cur.fetchall())


(Of course, buggy.py is pretty useless; I reduced the original,
more useful, script to this to help me debug it.)

Through a *lot* of trial an error I finally discovered that the
root cause of the problem was the fact that, in the same directory
as buggy.py, there is *another* innocuous little script, totally
unrelated, whose name happens to be numbers.py. (This second script
is one I wrote as part of a little Python tutorial I put together
months ago, and is not much more of a script than hello_world.py;
it's baby-steps for the absolute beginner. But apparently, it has
a killer name! I had completely forgotten about it.)

Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them. I'll call this directory
myscripts in what follows.

It turns out that buggy.py imports psycopg2, as you can see, and
apparently psycopg2 (or something imported by psycopg2) tries to
import some standard Python module called numbers; instead it ends
up importing the innocent myscript/numbers.py, resulting in *absolute
mayhem*.

(This is no mere Python "wart"; this is a suppurating chancre, and
the fact that it remains unfixed is a neverending source of puzzlement
for me.)

How can the average Python programmer guard against this sort of
time-devouring bug in the future (while remaining a Python programmer)?
The only solution I can think of is to avoid like the plague the
basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files,
and *pray* that whatever name one chooses for one's script does
not suddenly pop up in the appropriate /usr/lib/pythonX.XX directory
of a future release.

What else can one do? Let's see, one should put every script in its
own directory, thereby containing the damage.

Anything else?

Any suggestion would be appreciated.

TIA!

~k
From: Chris Rebert on
On Mon, Feb 1, 2010 at 6:34 PM, kj <no.email(a)please.post> wrote:
> I just spent about 1-1/2 hours tracking down a bug.
<snip>
> Through a *lot* of trial an error I finally discovered that the
> root cause of the problem was the fact that, in the same directory
> as buggy.py, there is *another* innocuous little script, totally
> unrelated, whose name happens to be numbers.py.  (This second script
> is one I wrote as part of a little Python tutorial I put together
> months ago, and is not much more of a script than hello_world.py;
> it's baby-steps for the absolute beginner.  But apparently, it has
> a killer name!  I had completely forgotten about it.)
>
> Both scripts live in a directory filled with *hundreds* little
> one-off scripts like the two of them.  I'll call this directory
> myscripts in what follows.
>
> It turns out that buggy.py imports psycopg2, as you can see, and
> apparently psycopg2 (or something imported by psycopg2) tries to
> import some standard Python module called numbers; instead it ends
> up importing the innocent myscript/numbers.py, resulting in *absolute
> mayhem*.
>
> (This is no mere Python "wart"; this is a suppurating chancre, and
> the fact that it remains unfixed is a neverending source of puzzlement
> for me.)
>
> How can the average Python programmer guard against this sort of
> time-devouring bug in the future (while remaining a Python programmer)?
> The only solution I can think of is to avoid like the plague the
> basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files,
> and *pray* that whatever name one chooses for one's script does
> not suddenly pop up in the appropriate /usr/lib/pythonX.XX directory
> of a future release.
>
> What else can one do?  Let's see, one should put every script in its
> own directory, thereby containing the damage.
>
> Anything else?
>
> Any suggestion would be appreciated.

I think absolute imports avoid this problem:

from __future__ import absolute_import

For details, see PEP 328:
http://www.python.org/dev/peps/pep-0328/

Cheers,
Chris
--
http://blog.rebertia.com
From: Roy Smith on
In article <hk82uv$8kn$1(a)reader1.panix.com>, kj <no.email(a)please.post>
wrote:

> Through a *lot* of trial an error I finally discovered that the
> root cause of the problem was the fact that, in the same directory
> as buggy.py, there is *another* innocuous little script, totally
> unrelated, whose name happens to be numbers.py.
> [...]
> It turns out that buggy.py imports psycopg2, as you can see, and
> apparently psycopg2 (or something imported by psycopg2) tries to
> import some standard Python module called numbers; instead it ends
> up importing the innocent myscript/numbers.py, resulting in *absolute
> mayhem*.

I feel your pain, but this is not a Python problem, per-se. The general
pattern is:

1) You have something which refers to a resource by name.

2) There is a sequence of places which are searched for this name.

3) The search finds the wrong one because another resource by the same name
appears earlier in the search path.

I've gotten bitten like this by shells finding the wrong executable (in
$PATH). By dynamic loaders finding the wrong library (in
$LD_LIBRARY_PATH). By C compilers finding the wrong #include file. And so
on. This is just Python's import finding the wrong module in your
$PYTHON_PATH.

The solution is the same in all cases. You either have to refer to
resources by some absolute name, or you need to make sure you set up your
search paths correctly and know what's in them. In your case, one possible
solution be to make sure "." (or "") isn't in sys.path (although that might
cause other issues).
From: Steven D'Aprano on
On Tue, 02 Feb 2010 02:34:07 +0000, kj wrote:

> I just spent about 1-1/2 hours tracking down a bug.
>
> An innocuous little script, let's call it buggy.py, only 10 lines long,
> and whose output should have been, at most two lines, was quickly
> dumping tens of megabytes of non-printable characters to my screen (aka
> gobbledygook), and in the process was messing up my terminal *royally*.
> Here's buggy.py:
[...]
> It turns out that buggy.py imports psycopg2, as you can see, and
> apparently psycopg2 (or something imported by psycopg2) tries to import
> some standard Python module called numbers; instead it ends up importing
> the innocent myscript/numbers.py, resulting in *absolute mayhem*.


There is no module numbers in the standard library, at least not in 2.5.

>>> import numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named numbers

It must be specific to psycopg2.

I would think this is a problem with psycopg2 -- it sounds like it should
be written as a package, but instead is written as a bunch of loose
modules. I could be wrong of course, but if it is just a collection of
modules, I'd definitely call that a poor design decision, if not a bug.


> (This is no mere Python "wart"; this is a suppurating chancre, and the
> fact that it remains unfixed is a neverending source of puzzlement for
> me.)

No, it's a wart. There's no doubt it bites people occasionally, but I've
been programming in Python for about ten years and I've never been bitten
by this yet. I'm sure it will happen some day, but not yet.

In this case, the severity of the bug (megabytes of binary crud to the
screen) is not related to the cause of the bug (shadowing a module).

As for fixing it, unfortunately it's not quite so simple to fix without
breaking backwards-compatibility. The opportunity to do so for Python 3.0
was missed. Oh well, life goes on.


> How can the average Python programmer guard against this sort of
> time-devouring bug in the future (while remaining a Python programmer)?
> The only solution I can think of is to avoid like the plague the
> basenames of all the 200 or so /usr/lib/pythonX.XX/xyz.py{,c} files, and
> *pray* that whatever name one chooses for one's script does not suddenly
> pop up in the appropriate /usr/lib/pythonX.XX directory of a future
> release.

Unfortunately, Python makes no guarantee that there won't be some clash
between modules. You can minimize the risks by using packages, e.g. given
a package spam containing modules a, b, c, and d, if you refer to spam.a
etc. then you can't clash with modules a, b, c, d, but only spam. So
you've cut your risk profile from five potential clashes to only one.

Also, generally most module clashes are far more obvious. If you do this:

import module
x = module.y

and module is shadowed by something else, you're *much* more likely to
get an AttributeError than megabytes of crud to the screen.

I'm sorry that you got bitten so hard by this, but in practice it's
uncommon, and relatively mild when it happens.


> What else can one do? Let's see, one should put every script in its own
> directory, thereby containing the damage.

That's probably a bit extreme, but your situation:

"Both scripts live in a directory filled with *hundreds* little
one-off scripts like the two of them."

is far too chaotic for my liking. You don't need to go to the extreme of
a separate directory for each file, but you can certainly tidy things up
a bit. For example, anything that's obsolete should be moved out of the
way where it can't be accidentally executed or imported.




--
Steven
From: Tim Chase on
Stephen Hansen wrote:
> First, I don't shadow built in modules. Its really not very hard to avoid.

Given the comprehensive nature of the batteries-included in
Python, it's not as hard to accidentally shadow a built-in,
unknown to you, but yet that is imported by a module you are
using. The classic that's stung me enough times (and many others
on c.l.p and other forums, as a quick google evidences) such that
I *finally* remember:

bash$ touch email.py
bash$ python
...
>>> import smtplib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/smtplib.py", line 46, in <module>
import email.Utils
ImportError: No module named Utils

Using "email.py" is an innocuous name for a script/module you
might want to do emailish things, and it's likely you'll use
smtplib in the same code...and kablooie, things blow up even if
your code doesn't reference or directly use the built-in email.py.

Yes, as Chris mentions, PEP-328 absolute vs. relative imports
should help ameliorate the problem, but it's not yet commonly
used (unless you're using Py3, it's only at the request of a
__future__ import in 2.5+).

-tkc