Prev: Football was: Python dynamic attribute creation
Next: [python] how to ensure item in list or dict bind with "an uuid meaning" integer type ID?
From: Ryan Kelly on 28 Jun 2010 20:30 On Mon, 2010-06-28 at 16:46 -0700, Zac Burns wrote: > In my experience it is far more expensive to allocate a lock in python > then it is the types that use them. Here are some examples: > > >>> timeit.timeit('Lock()', 'from threading import Lock') > 1.4449114807669048 > > >>> timeit.timeit('dict()') > 0.2821554294221187 > > >>> timeit.timeit('list()') > 0.17358153222312467 > > > Granted, I know these types do not allocate a lock - but a similarly > defined user type would need to allocate a lock to guarantee > thread-safety. Sure, but I think you're timing the wrong thing here. You would only allocate the lock relatively rarely - it's the overhead of *acquiring* the lock that's the real problem. rfk(a)durian:~$ python -m timeit -s "from threading import Lock; l = Lock()" "l.acquire(); l.release()" 1000000 loops, best of 3: 0.378 usec per loop Compare to one of my favourite dict methods, setdefault: rfk(a)durian:~$ python -m timeit -s "d = dict()" "d.setdefault('hello','world')" 10000000 loops, best of 3: 0.189 usec per loop In the same ballpark really. > I have not done a great deal of research into lockless algorithms > thusfar, but would it be worthwhile to investigate implementing some > of them in python to optimize performance critical sections of code? > > Many lockless algorithms that I have looked at thusfar require a > "Compare and Swap" operation. Does python have an equivalent to this? > Is python high level enough that it has something better than this or > it simply doesn't need it? I've managed to avoid locking in some cases by using dict.setdefault() as a kind of atomic test-and-set operation. It's not a full compare-and-swap but you could implement a simple locking scheme using it like this: class PretendLock: def __init__(self): self.d = dict() def acquire(self): myid = threading.currentThread().ident while self.d.setdefault("owner",myid) != myid: pass def release(self): del self.d["owner"] This depends on some peculiarities of the CPython implementation that aren't guaranteed by the language spec - namely that thread switches can't happen while executing C code, that dict.setdefault is implemented in C, and that is can calculate the hash of a string key without calling back out to python code. Usually, it's not worth it. The one time I use this regularly is for lazy object initialisation, e.g. something like this: def get_my_object(key): try: return objects[key] except KeyError: return objects.setdefault(key,create_my_object(key)) If two threads happen to try initialising the object at the same time, only one will wind up in the dict and being used; the other will be discarded and garbage-collected. Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
From: Ryan Kelly on 28 Jun 2010 21:35 On Mon, 2010-06-28 at 18:27 -0700, Zac Burns wrote: > > > > I've managed to avoid locking in some cases by using > dict.setdefault() as a kind of atomic test-and-set operation. > It's not a full compare-and-swap but you could implement a > simple locking scheme using it like this: > > > class PretendLock: > def __init__(self): > self.d = dict() > def acquire(self): > myid = threading.currentThread().ident > while self.d.setdefault("owner",myid) != myid: > pass > def release(self): > del self.d["owner"] > > > This depends on some peculiarities of the CPython > implementation that aren't guaranteed by the language spec - > namely that thread switches can't happen while executing C > code, that dict.setdefault is implemented in C, and that is > can calculate the hash of a string key without calling back > out to python code. > > > Thanks Ryan, > > Agreed, I was timing the wrong thing - but largely because I wasn't > sure what to time against. setdefault certainly seems to be a > candidate for something to time. Even so, setdefault wins. > > Though, I wouldn't want to use PretendLock in this case... it doesn't > seem that having the spin loop would justify reducing reducing the > lock allocation overhead in a no-contention scenario. Oh totally - nobody reading this should even consider using that PretendLock class for anything. It's just an example of how to use setdefault as a test-and-set kind of operator. Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
From: sturlamolden on 28 Jun 2010 22:44 > > Many lockless algorithms that I have looked at thusfar require a > > "Compare and Swap" operation. Does python have an equivalent to this? > > Is python high level enough that it has something better than this or > > it simply doesn't need it? Python does have a GIL, and contrary to the title, this has a lot to do with the GIL. Specifically, any set of operations protected by the GIL are atomic in the CPython interpreter. We can therefore get global synchronization 'for free', allowing 'lock free' data structures that avoid using threading.Lock, and replaces it with, well, nothing. :) How to do it (it takes tiny amounts of C or Cython): Take at look at page 14: http://www.dabeaz.com/python/UnderstandingGIL.pdf The critical part is the variable _Py_Ticker in Python's source file ceval.c, which is declared volatile int. As it is not declared static it is a global variable, and we can actually manipulate its value from a C extension: extern volatile int _Py_Ticker; // evil!!! Now imagine temporarily setting _Py_Ticker to some ridiculous high value like 0x7fffffff. That locks out all other threads, practically forever, until we restore _Py_Ticker back to it's original value. What this gives us is global synchronization for free! The GIL is there anyway, so it adds no additional overhead. By messing with _Py_Ticker we can force blocks of Python code to become atomic in the interpreter. With this hack we can therefore create data structures that are lock free relative to the CPython interpreter. It is evil, but very cheap compared to using threading.Lock. In fact it has exactly zero overhead: the GIL is there anyway, and it is already acquired! The safest placement of a _Py_Ticker hack is in a context manager. We have to make sure it gets restored to a sane value, even in presence of an exception. Enjoy :) P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would do the trick, you are actually not that smart. It will lock our own thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we have to attack _Py_Ticker from C. P.P.S. And when we have taken control over the GIL, don't do anything that might release it from C (like reading from a file). We cannot afford to loose it :)
From: sturlamolden on 28 Jun 2010 22:45 > > Many lockless algorithms that I have looked at thusfar require a > > "Compare and Swap" operation. Does python have an equivalent to this? > > Is python high level enough that it has something better than this or > > it simply doesn't need it? Python does have a GIL, and contrary to the title, this has a lot to do with the GIL. Specifically, any set of operations protected by the GIL are atomic in the CPython interpreter. We can therefore get global synchronization 'for free', allowing 'lock free' data structures that avoid using threading.Lock, and replaces it with, well, nothing. :) How to do it (it takes tiny amounts of C or Cython): Take at look at page 14: http://www.dabeaz.com/python/UnderstandingGIL.pdf The critical part is the variable _Py_Ticker in Python's source file ceval.c, which is declared volatile int. As it is not declared static it is a global variable, and we can actually manipulate its value from a C extension: extern volatile int _Py_Ticker; // evil!!! Now imagine temporarily setting _Py_Ticker to some ridiculous high value like 0x7fffffff. That locks out all other threads, practically forever, until we restore _Py_Ticker back to it's original value. What this gives us is global synchronization for free! The GIL is there anyway, so it adds no additional overhead. By messing with _Py_Ticker we can force blocks of Python code to become atomic in the interpreter. With this hack we can therefore create data structures that are lock free relative to the CPython interpreter. It is evil, but very cheap compared to using threading.Lock. In fact it has exactly zero overhead: the GIL is there anyway, and it is already acquired! The safest placement of a _Py_Ticker hack is in a context manager. We have to make sure it gets restored to a sane value, even in presence of an exception. Enjoy :) P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would do the trick, you are actually not that smart. It will lock our own thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we have to attack _Py_Ticker from C. P.P.S. And when we have taken control over the GIL, don't do anything that might release it from C (like reading from a file). We cannot afford to loose it :)
From: Ryan Kelly on 28 Jun 2010 23:11
On Mon, 2010-06-28 at 19:45 -0700, sturlamolden wrote: > > > Many lockless algorithms that I have looked at thusfar require a > > > "Compare and Swap" operation. Does python have an equivalent to this? > > > Is python high level enough that it has something better than this or > > > it simply doesn't need it? > > Python does have a GIL, and contrary to the title, this has a lot to > do with the GIL. Specifically, any set of operations protected by the > GIL are atomic in the CPython interpreter. > > We can therefore get global synchronization 'for free', allowing 'lock > free' data structures that avoid using threading.Lock, and replaces it > with, well, nothing. :) > > How to do it (it takes tiny amounts of C or Cython): > > Take at look at page 14: > > http://www.dabeaz.com/python/UnderstandingGIL.pdf > > The critical part is the variable _Py_Ticker in Python's source file > ceval.c, which is declared volatile int. As it is not declared static > it is a global variable, and we can actually manipulate its value from > a C extension: > > extern volatile int _Py_Ticker; // evil!!! > > Now imagine temporarily setting _Py_Ticker to some ridiculous high > value like 0x7fffffff. That locks out all other threads, practically > forever, until we restore _Py_Ticker back to it's original value. > > What this gives us is global synchronization for free! The GIL is > there anyway, so it adds no additional overhead. Very interesting idea. Will it work if accessed through ctypes? ticker = ctypes.c_int.in_dll(ctypes.pythonapi,"_Py_Ticker") ticker.value = 0x7fffffff Or does ctypes muck with the GIL in a way that would break this idea? Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan(a)rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details |