From: Shailendra on
Hi All,
I have a following situation.
==================PSUDO CODE START==================
class holds_big_array:
    big_array  #has a big array

    def get_some_element(self, cond) # return some data from the array
from the big array
==================PSUDO CODE END====================
I wanted to use multiprocessing module to parallelise calling
"get_some_element". I used following kind of code

==================PSUDO CODE START==================
pool = Pool(processes=2)
holder =holds_big_array() #class instantiation
def callback_f(result):
         do something with result
loop many times
   pool.apply_async(holder.get_some_element,args,callback=callback_f)
pool.close()
pool.join()
==================PSUDO CODE END====================
Note: Had to do something to enable instance method being pickled...

I tested this with less than realistic size of big_array . My parallel
version works much slower than than the normal serial version (10-20
sec vs 7-8 min). I was wonder what could be the possible reason. Is it
something to do that it is a instance method and some locking will
make other process wait for the locks. Any idea how to trace where the
program is spending time?

Let me know if the information give is inadequate.

Thanks in advance.
Shailendra Vikas
From: Joshua Kordani on
The first thing that is generically tried when wishing to measure how
long certain parts take is to record your own time snapshots in the code
yourself. take the current time before an operation, take it after,
subract, report it to yourself.

Also, try working with an array that is actually big, so that you can
see meaningful differences in approaches.

Shailendra wrote:
> Hi All,
> I have a following situation.
> ==================PSUDO CODE START==================
> class holds_big_array:
> big_array #has a big array
>
> def get_some_element(self, cond) # return some data from the array
> from the big array
> ==================PSUDO CODE END====================
> I wanted to use multiprocessing module to parallelise calling
> "get_some_element". I used following kind of code
>
> ==================PSUDO CODE START==================
> pool = Pool(processes=2)
> holder =holds_big_array() #class instantiation
> def callback_f(result):
> do something with result
> loop many times
> pool.apply_async(holder.get_some_element,args,callback=callback_f)
> pool.close()
> pool.join()
> ==================PSUDO CODE END====================
> Note: Had to do something to enable instance method being pickled...
>
> I tested this with less than realistic size of big_array . My parallel
> version works much slower than than the normal serial version (10-20
> sec vs 7-8 min). I was wonder what could be the possible reason. Is it
> something to do that it is a instance method and some locking will
> make other process wait for the locks. Any idea how to trace where the
> program is spending time?
>
> Let me know if the information give is inadequate.
>
> Thanks in advance.
> Shailendra Vikas

From: John Nagle on
On 7/29/2010 11:08 AM, Shailendra wrote:
> Hi All,
> I have a following situation.
> ==================PSUDO CODE START==================
> class holds_big_array:
> big_array #has a big array
>
> def get_some_element(self, cond) # return some data from the array
> from the big array
> ==================PSUDO CODE END====================
> I wanted to use multiprocessing module to parallelise calling
> "get_some_element". I used following kind of code
>
> ==================PSUDO CODE START==================
> pool = Pool(processes=2)
> holder =holds_big_array() #class instantiation
> def callback_f(result):
> do something with result
> loop many times
> pool.apply_async(holder.get_some_element,args,callback=callback_f)
> pool.close()
> pool.join()
> ==================PSUDO CODE END====================
> Note: Had to do something to enable instance method being pickled...
>
> I tested this with less than realistic size of big_array . My parallel
> version works much slower than than the normal serial version (10-20
> sec vs 7-8 min). I was wonder what could be the possible reason.

It's hard to tell from your "PSUDO CODE", but it looks like each
access to the "big array" involves calling another process.

Calling a function in another process is done by creating an
object to contain the request, running it through "pickle" to convert
it to a stream of bytes, sending the stream of bytes through a socket or
pipe to the other process, running the byte stream through "unpickle" to
create an object like the original one, but in a different process, and
calling a function on the newly created object in the receiving process.
This entire sequence has to be done again in reverse
to get a reply back.

This is hundreds of times slower than a call to a local function.

The "multiprocessing module" is not a replacement for thread-level
parallelism. It looks like it is, but it isn't. It's only useful for
big tasks which require large amounts of computation and little
interprocess communication. Appropriately-sized tasks to send out
to another process are things like "parse large web page" or
"compress video file", not "access element of array".

John Nagle