From: John Nagle on
I know there's a performance penalty for running Python on a
multicore CPU, but how bad is it? I've read the key paper
("www.dabeaz.com/python/GIL.pdf"), of course. It would be adequate
if the GIL just limited Python to running on one CPU at a time,
but it's worse than that; there's excessive overhead due to
a lame locking implementation. Running CPU-bound multithreaded
code on a dual-core CPU runs HALF AS FAST as on a single-core
CPU, according to Beasley.

My main server application, which runs "sitetruth.com"
has both multiple processes and multiple threads in each process.
The system rates web sites, which involves reading and parsing
up to 20 pages from each domain. Analysis of each domain is
performed in a separate process, but each process uses multiple
threads to read process several web pages simultaneously.

Some of the threads go compute-bound for a second or two at a time as
they parse web pages. Sometimes two threads (but never more than three)
in the same process may be parsing web pages at the same time, so
they're contending for CPU time.

So this is nearly the worst case for the lame GIL lock logic.
Has anyone tried using "affinity" ("http://pypi.python.org/pypi/affinity")
to lock each Python process to a single CPU? Does that help?

John Nagle
From: exarkun on
On 11:02 pm, nagle(a)animats.com wrote:
> I know there's a performance penalty for running Python on a
>multicore CPU, but how bad is it? I've read the key paper
>("www.dabeaz.com/python/GIL.pdf"), of course. It would be adequate
>if the GIL just limited Python to running on one CPU at a time,
>but it's worse than that; there's excessive overhead due to
>a lame locking implementation. Running CPU-bound multithreaded
>code on a dual-core CPU runs HALF AS FAST as on a single-core
>CPU, according to Beasley.

It's not clear that Beasley's performance numbers apply to any platform
except OS X, which has a particularly poor implementation of the
threading primitives CPython uses to implement the GIL.

You should check to see if it actually applies to your deployment
environment.

The GIL has been re-implemented recently. Python 3.2, I think, will
include the new implementation, which should bring OS X performance up
to the level of other platforms. It may also improve certain other
aspects of thread switching.

Jean-Paul
From: alex23 on
On Feb 3, 9:02 am, John Nagle <na...(a)animats.com> wrote:
>     I know there's a performance penalty for running Python on a
> multicore CPU, but how bad is it?  I've read the key paper
> ("www.dabeaz.com/python/GIL.pdf"), of course.

It's a shame that Python 3.x is dead to you, otherwise you'd be able
to enjoy the new GIL implementation in 3.2: http://www.dabeaz.com/python/NewGIL.pdf

Actually, it looks like you probably still can:
+ patch for 2.5.4: http://thread.gmane.org/gmane.comp.python.devel/109929
+ patch for 2.7? http://bugs.python.org/issue7753

(Can't comment on affinity, though, sorry)
From: Terry Reedy on
On 2/2/2010 9:02 PM, alex23 wrote:
> On Feb 3, 9:02 am, John Nagle<na...(a)animats.com> wrote:
>> I know there's a performance penalty for running Python on a
>> multicore CPU, but how bad is it? I've read the key paper
>> ("www.dabeaz.com/python/GIL.pdf"), of course.
>
> It's a shame that Python 3.x is dead to you, otherwise you'd be able
> to enjoy the new GIL implementation in 3.2: http://www.dabeaz.com/python/NewGIL.pdf
>
> Actually, it looks like you probably still can:
> + patch for 2.5.4: http://thread.gmane.org/gmane.comp.python.devel/109929
> + patch for 2.7? http://bugs.python.org/issue7753

The patch was rejected for 2.7 (and earlier) because it could break code
as explained in the discussion. One would have to apply and compile
their own binary.

From: Paul Rubin on
John Nagle <nagle(a)animats.com> writes:
> Analysis of each domain is
> performed in a separate process, but each process uses multiple
> threads to read process several web pages simultaneously.
>
> Some of the threads go compute-bound for a second or two at a time as
> they parse web pages.

You're probably better off using separate processes for the different
pages. If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages. I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure. Maybe someday someone will do
that.