Proposing a new module: Parallel::Loops [Perl]

Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?
Next: what cpu core is running the script?

From: Peter Valdemar Mørch on 22 Jun 2010 14:28

perldoc perlmodlib suggests posting here before posting on CPAN, so
here goes:

I have a new module that I'd like to upload: Parallel::Loops, and
following is the bulk of the synopsis. Is the Parallel::Loops name
appropriate and does anybody have any comments on it before I post it
on CPAN?
Its repository can be found here (code, complete perldoc, etc.)
http://github.com/pmorch/perl-Parallel-Loops

Synopsis:

use Parallel::Loops;

my $maxProcs = 5;
my $pl = new Parallel::Loops($maxProcs);

my @input = ( 0 .. 9 );

my %output;
$pl->tieOutput( \%output );

$pl->foreach(
\@input,
sub {
# This sub "magically" executed in parallel forked child
# processes

# Lets just create a simple example, but this could be a
# massive calculation that will be parallelized, so that
# $maxProcs different processes are calculating sqrt
# simultaneously for different values of $_ on different CPUs

$output{$_} = sqrt($_);
}
);

From: Chris Nehren on 24 Jun 2010 12:20

On 2010-06-22, Peter Valdemar Mørch scribbled these curious markings:
> perldoc perlmodlib suggests posting here before posting on CPAN, so
> here goes:

I find it quaint that some people still follow that guideline. Most
folks just upload.

> I have a new module that I'd like to upload: Parallel::Loops, and
> following is the bulk of the synopsis. Is the Parallel::Loops name
> appropriate and does anybody have any comments on it before I post it
> on CPAN?

How does this differ from e.g. Coro or other similar modules?

> my $pl = new Parallel::Loops($maxProcs);

Indirect object syntax considered harmful:
http://www.shadowcat.co.uk/blog/matt-s-trout/indirect-but-still-fatal/

>
> my @input = ( 0 .. 9 );
>
> my %output;
> $pl->tieOutput( \%output );

Why are you using tie here?

--
Thanks and best regards,
Chris Nehren
Unless noted, all content I post is CC-BY-SA.

From: Peter Valdemar Mørch on 24 Jun 2010 17:56

On Jun 24, 6:20 pm, Chris Nehren <apei...(a)isuckatdomains.net.invalid>
wrote:
> How does this differ from e.g. Coro or other similar modules?

It differs from Coro especially because there are several processes
involved in Parallel::Loops. Each of the iterations in the loop run in
each their own process - in parallel. Whereas Coro::Intro has:

> only one thread ever has the CPU, and if another thread wants
> the CPU, the running thread has to give it up

, the idea behind Parallel::Loops is exactly to make it easy to use
several CPUs in what resembles code for one CPU.

> > my %output;
> > $pl->tieOutput( \%output );
>
> Why are you using tie here?

Hmm... I thought the idea would be more obvious than it apparently
is...

Outside the $pl->foreach() loop, we're running in the parent process.
Inside the $pl->foreach() loop, we're running in a child process. $pl-
>tieOutput is actually the raison d'etre of Parallel::Loops. When the
child process has a result, it stores it in %output (which is tied
with Tie::Hash behind the scenes in the child process).

Behind the scenes, when the child process exits, it sends the results
(the keys written to %output) back to the parent process's version/
copy of %output, so that the user of Parallel::Loops doesn't have to
do any inter-process communication.

Perhaps the Synopsis needs to be a bit more clear on these points.

> > my $pl = new Parallel::Loops($maxProcs);
>
> Indirect object syntax considered harmful:http://www.shadowcat.co.uk/blog/matt-s-trout/indirect-but-still-fatal/

OK, thanks, I'll fix that

From: Ben Morrow on 24 Jun 2010 21:16

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402(a)sneakemail.com>:
> On Jun 24, 6:20�pm, Chris Nehren <apei...(a)isuckatdomains.net.invalid>
> wrote:
> > How does this differ from e.g. Coro or other similar modules?
>
> It differs from Coro especially because there are several processes
> involved in Parallel::Loops. Each of the iterations in the loop run in
> each their own process - in parallel. Whereas Coro::Intro has:
>
> > only one thread ever has the CPU, and if another thread wants
> > the CPU, the running thread has to give it up
>
> , the idea behind Parallel::Loops is exactly to make it easy to use
> several CPUs in what resembles code for one CPU.

OK; how is this different from forks and forks::shared?

Ben

From: Peter Valdemar Mørch on 25 Jun 2010 04:14

On Jun 25, 3:16 am, Ben Morrow <b...(a)morrow.me.uk> wrote:
> OK; how is this different from forks and forks::shared?

It is _much_ more similar to forks and forks::shared than to Coro.

While the forks and forks::shared API emulate the API of threads and
threads::shared (perfectly?), Parallel::Loops tries to emulate the
standard foreach and while loops as close as possible as in:

$pl->foreach(\@input, sub {
$output{$_} = do_some_hefty_calculation($_);
});

All the forking, waiting for subprocesses to finish etc. is done
behind the scenes. I find that so often, I have large calculations
that need to operate on all the elements of an array or hash, that
really could be parallelized, and with this close-to-foreach syntax,
it is so easy to write and understand/read later on.

I guess Parallel::Loops could have been written with forks and
forks::shared, and only provided syntactic sugar. (In fact it uses
Parallel::ForkManager and Tie::Hash/Tie::Array instead.)

Perhaps $pl->share(\%output) is a better name than $pl->tieOutput(\
%output), tough. I guess now is the time to change it! ;-)

I'm impressed that you guys take the time to read and comment. Thanks!

| Next | Last
Pages: 1 2 3 4
Prev: FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?
Next: what cpu core is running the script?