From: Ryan Kelly on
On Mon, 2010-06-28 at 16:46 -0700, Zac Burns wrote:
> In my experience it is far more expensive to allocate a lock in python
> then it is the types that use them. Here are some examples:
>
> >>> timeit.timeit('Lock()', 'from threading import Lock')
> 1.4449114807669048
>
> >>> timeit.timeit('dict()')
> 0.2821554294221187
>
> >>> timeit.timeit('list()')
> 0.17358153222312467
>
>
> Granted, I know these types do not allocate a lock - but a similarly
> defined user type would need to allocate a lock to guarantee
> thread-safety.

Sure, but I think you're timing the wrong thing here. You would only
allocate the lock relatively rarely - it's the overhead of *acquiring*
the lock that's the real problem.

rfk(a)durian:~$ python -m timeit -s "from threading import Lock; l = Lock()" "l.acquire(); l.release()"
1000000 loops, best of 3: 0.378 usec per loop

Compare to one of my favourite dict methods, setdefault:

rfk(a)durian:~$ python -m timeit -s "d = dict()" "d.setdefault('hello','world')"
10000000 loops, best of 3: 0.189 usec per loop


In the same ballpark really.

> I have not done a great deal of research into lockless algorithms
> thusfar, but would it be worthwhile to investigate implementing some
> of them in python to optimize performance critical sections of code?
>
> Many lockless algorithms that I have looked at thusfar require a
> "Compare and Swap" operation. Does python have an equivalent to this?
> Is python high level enough that it has something better than this or
> it simply doesn't need it?

I've managed to avoid locking in some cases by using dict.setdefault()
as a kind of atomic test-and-set operation. It's not a full
compare-and-swap but you could implement a simple locking scheme using
it like this:


class PretendLock:
def __init__(self):
self.d = dict()
def acquire(self):
myid = threading.currentThread().ident
while self.d.setdefault("owner",myid) != myid:
pass
def release(self):
del self.d["owner"]


This depends on some peculiarities of the CPython implementation that
aren't guaranteed by the language spec - namely that thread switches
can't happen while executing C code, that dict.setdefault is implemented
in C, and that is can calculate the hash of a string key without calling
back out to python code.

Usually, it's not worth it. The one time I use this regularly is for
lazy object initialisation, e.g. something like this:

def get_my_object(key):
try:
return objects[key]
except KeyError:
return objects.setdefault(key,create_my_object(key))

If two threads happen to try initialising the object at the same time,
only one will wind up in the dict and being used; the other will be
discarded and garbage-collected.



Cheers,

Ryan




--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

From: Ryan Kelly on
On Mon, 2010-06-28 at 18:27 -0700, Zac Burns wrote:
>
>
>
> I've managed to avoid locking in some cases by using
> dict.setdefault() as a kind of atomic test-and-set operation.
> It's not a full compare-and-swap but you could implement a
> simple locking scheme using it like this:
>
>
> class PretendLock:
> def __init__(self):
> self.d = dict()
> def acquire(self):
> myid = threading.currentThread().ident
> while self.d.setdefault("owner",myid) != myid:
> pass
> def release(self):
> del self.d["owner"]
>
>
> This depends on some peculiarities of the CPython
> implementation that aren't guaranteed by the language spec -
> namely that thread switches can't happen while executing C
> code, that dict.setdefault is implemented in C, and that is
> can calculate the hash of a string key without calling back
> out to python code.
>
>
> Thanks Ryan,
>
> Agreed, I was timing the wrong thing - but largely because I wasn't
> sure what to time against. setdefault certainly seems to be a
> candidate for something to time. Even so, setdefault wins.
>
> Though, I wouldn't want to use PretendLock in this case... it doesn't
> seem that having the spin loop would justify reducing reducing the
> lock allocation overhead in a no-contention scenario.


Oh totally - nobody reading this should even consider using that
PretendLock class for anything. It's just an example of how to use
setdefault as a test-and-set kind of operator.


Ryan

--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

From: sturlamolden on

> > Many lockless algorithms that I have looked at thusfar require a
> > "Compare and Swap" operation. Does python have an equivalent to this?
> > Is python high level enough that it has something better than this or
> > it simply doesn't need it?

Python does have a GIL, and contrary to the title, this has a lot to
do with the GIL. Specifically, any set of operations protected by the
GIL are atomic in the CPython interpreter.

We can therefore get global synchronization 'for free', allowing 'lock
free' data structures that avoid using threading.Lock, and replaces it
with, well, nothing. :)

How to do it (it takes tiny amounts of C or Cython):

Take at look at page 14:

http://www.dabeaz.com/python/UnderstandingGIL.pdf

The critical part is the variable _Py_Ticker in Python's source file
ceval.c, which is declared volatile int. As it is not declared static
it is a global variable, and we can actually manipulate its value from
a C extension:

extern volatile int _Py_Ticker; // evil!!!

Now imagine temporarily setting _Py_Ticker to some ridiculous high
value like 0x7fffffff. That locks out all other threads, practically
forever, until we restore _Py_Ticker back to it's original value.

What this gives us is global synchronization for free! The GIL is
there anyway, so it adds no additional overhead.

By messing with _Py_Ticker we can force blocks of Python code to
become atomic in the interpreter. With this hack we can therefore
create data structures that are lock free relative to the CPython
interpreter. It is evil, but very cheap compared to using
threading.Lock. In fact it has exactly zero overhead: the GIL is there
anyway, and it is already acquired!

The safest placement of a _Py_Ticker hack is in a context manager. We
have to make sure it gets restored to a sane value, even in presence
of an exception.

Enjoy :)



P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
do the trick, you are actually not that smart. It will lock our own
thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
have to attack _Py_Ticker from C.

P.P.S. And when we have taken control over the GIL, don't do anything
that might release it from C (like reading from a file). We cannot
afford to loose it :)





From: sturlamolden on

> > Many lockless algorithms that I have looked at thusfar require a
> > "Compare and Swap" operation. Does python have an equivalent to this?
> > Is python high level enough that it has something better than this or
> > it simply doesn't need it?

Python does have a GIL, and contrary to the title, this has a lot to
do with the GIL. Specifically, any set of operations protected by the
GIL are atomic in the CPython interpreter.

We can therefore get global synchronization 'for free', allowing 'lock
free' data structures that avoid using threading.Lock, and replaces it
with, well, nothing. :)

How to do it (it takes tiny amounts of C or Cython):

Take at look at page 14:

http://www.dabeaz.com/python/UnderstandingGIL.pdf

The critical part is the variable _Py_Ticker in Python's source file
ceval.c, which is declared volatile int. As it is not declared static
it is a global variable, and we can actually manipulate its value from
a C extension:

extern volatile int _Py_Ticker; // evil!!!

Now imagine temporarily setting _Py_Ticker to some ridiculous high
value like 0x7fffffff. That locks out all other threads, practically
forever, until we restore _Py_Ticker back to it's original value.

What this gives us is global synchronization for free! The GIL is
there anyway, so it adds no additional overhead.

By messing with _Py_Ticker we can force blocks of Python code to
become atomic in the interpreter. With this hack we can therefore
create data structures that are lock free relative to the CPython
interpreter. It is evil, but very cheap compared to using
threading.Lock. In fact it has exactly zero overhead: the GIL is there
anyway, and it is already acquired!

The safest placement of a _Py_Ticker hack is in a context manager. We
have to make sure it gets restored to a sane value, even in presence
of an exception.

Enjoy :)



P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
do the trick, you are actually not that smart. It will lock our own
thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
have to attack _Py_Ticker from C.

P.P.S. And when we have taken control over the GIL, don't do anything
that might release it from C (like reading from a file). We cannot
afford to loose it :)






From: Ryan Kelly on
On Mon, 2010-06-28 at 19:45 -0700, sturlamolden wrote:
> > > Many lockless algorithms that I have looked at thusfar require a
> > > "Compare and Swap" operation. Does python have an equivalent to this?
> > > Is python high level enough that it has something better than this or
> > > it simply doesn't need it?
>
> Python does have a GIL, and contrary to the title, this has a lot to
> do with the GIL. Specifically, any set of operations protected by the
> GIL are atomic in the CPython interpreter.
>
> We can therefore get global synchronization 'for free', allowing 'lock
> free' data structures that avoid using threading.Lock, and replaces it
> with, well, nothing. :)
>
> How to do it (it takes tiny amounts of C or Cython):
>
> Take at look at page 14:
>
> http://www.dabeaz.com/python/UnderstandingGIL.pdf
>
> The critical part is the variable _Py_Ticker in Python's source file
> ceval.c, which is declared volatile int. As it is not declared static
> it is a global variable, and we can actually manipulate its value from
> a C extension:
>
> extern volatile int _Py_Ticker; // evil!!!
>
> Now imagine temporarily setting _Py_Ticker to some ridiculous high
> value like 0x7fffffff. That locks out all other threads, practically
> forever, until we restore _Py_Ticker back to it's original value.
>
> What this gives us is global synchronization for free! The GIL is
> there anyway, so it adds no additional overhead.

Very interesting idea. Will it work if accessed through ctypes?

ticker = ctypes.c_int.in_dll(ctypes.pythonapi,"_Py_Ticker")
ticker.value = 0x7fffffff

Or does ctypes muck with the GIL in a way that would break this idea?


Ryan


--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details