From: John Nagle on
Paul Rubin wrote:
> John Nagle <nagle(a)> writes:
>> If locking is expensive on x86, it's implemented wrong.
>>It's done right in QNX, with inline code for the non-blocking case.
> Acquiring the lock still takes an expensive instruction, LOCK XCHG or
> whatever. I think QNX is usually run on embedded cpu's with less
> extensive caching as these multicore x86's, so the lock prefix may be
> less expensive in the QNX systems.

That's not so bad. See

But there are dumb thread implementations that make
a system call for every lock.

John Nagle
From: Paul Rubin on
John Nagle <nagle(a)> writes:
> But there are dumb thread implementations that make
> a system call for every lock.

Yes, a sys call on each lock access would really be horrendous. But I
think that in a modern cpu, LOCK XCHG costs as much as hundreds of
regular instructions. Doing that on every adjustment of a Python
reference count is enough to impact the interpreter significantly.
It's not just mutating user data; every time you use an integer, or
call a function and make an arg tuple and bind the function's locals
dictionary, you're touching refcounts.

The preferred locking scheme in Linux these days is called futex,
which avoids system calls in the uncontended case--see the docs.
From: Rhamphoryncus on
On Feb 14, 4:30 pm, "MRAB" <goo...(a)> wrote:
> Hmm. I wonder whether it would be possible to have a pair of python
> cores, one for single-threaded code (no locks necessary) and the other
> for multi-threaded code. When the Python program went from single-
> threaded to multi-threaded or multi-threaded to single-threaded there
> would be a switch from one core to the other.

I have explored this option (and some simpler variants). Essentially,
you end up rewriting a massive amount of CPython's codebase to change
the refcount API. Even all the C extension types assume the refcount
can be statically initialized (which may not be true if you're trying
to make it efficient on multiple CPUs.)

Once you realize the barrier for entry is so high you start
considering alternative implementations. Personally, I'm watching
PyPy to see if they get reasonable performance using JIT. Then I can
start hacking on it.

Adam Olsen, aka Rhamphoryncus

From: Paul Boddie on
On 15 Feb, 00:14, "sjdevn...(a)" <sjdevn...(a)> wrote:
> Yeah, it's the Window equivalent to fork. Does true copy-on-write, so
> you can do efficient multiprocess work.

Aside from some code floating around the net which possibly originates
from some book on Windows systems programming, is there any reference
material on ZwCreateProcess, is anyone actually using it as "fork on
Windows", and would it be in any way suitable for an implementation of
os.fork in the Python standard library? I only ask because there's a
lot of folklore about this particular function (everyone seems to
repeat more or less what you've just said), but aside from various
Cygwin mailing list threads where they reject its usage, there's
precious little information of substance.

Not that I care about Windows, but it would be useful to be able to
offer fork-based multiprocessing solutions to people using that
platform. Although the python-dev people currently seem more intent in
considering (and now hopefully rejecting) yet more syntax sugar [1],
it'd be nice to consider matters seemingly below the python-dev
threshold of consideration and offer some kind of roadmap for
convenient parallel processing.



From: skip on

Maric> Le mercredi 14 février 2007 16:24, garrickp(a) a écrit :
>> "Some time back, a group did remove the GIL from the python core, and
>> implemented locks on the core code to make it threadsafe. Well, the
>> problem was that while it worked, the necessary locks it made single
>> threaded code take significantly longer to execute."

Maric> Very interesting point, this is exactly the sort of thing I'm
Maric> looking for. Any valuable link on this ?

Google for "python free threading stein" then click the first link.