From: Stefan Bellon on
Hi all!

Although I am quite familiar with Ada, tasking is quite new to me, so
please bear with me with my question. ;-)

I have a set of buckets where I do have to do some processing on them.
This processing can be done in parallel for each bucket. The results
of the processing however are accumulated into another data structure.

At present I have task types for the processing of the buckets, an
access type to the task type and basically just do:

declare
type Task_Access is access all Task_Type;
My_Task : Task_Access;
begin
for I in Buckets'Range loop
My_Task := new Task_Type'(Buckets (I));
end loop;
end;

The result data structure is a protected object with an entry Add that
adds the processing result to the container and which is called by the
task body.

All is well so far.

However the number of buckets may be quite large and I have the feeling
that the context switching which is needed for say 1000 or more tasks
eats up the gain of the parallelism. At least in my test cases I do not
gain anything at all on a Core Duo system.

Therefore I have the idea of just starting N tasks in parallel (where N
can be specified by the user e.g. according to the number of CPU cores
of the machine) instead of tasks for all buckets at once.

Starting N tasks, then waiting for them to get all finished and only
then starting the next N tasks is not difficult. But how would I do it,
so that there are always N tasks running (apart of course when
everything has been processed) and that a new tasks is starting on the
next bucket as soon as a task on a previous bucket has finished?

Any ideas are very welcome!

--
Stefan Bellon
From: Randy Brukardt on
"Stefan Bellon" <sbellon(a)sbellon.de> wrote in message
news:20070418201307.18a85fd9(a)cube.tz.axivion.com...
....
> Therefore I have the idea of just starting N tasks in parallel (where N
> can be specified by the user e.g. according to the number of CPU cores
> of the machine) instead of tasks for all buckets at once.
>
> Starting N tasks, then waiting for them to get all finished and only
> then starting the next N tasks is not difficult. But how would I do it,
> so that there are always N tasks running (apart of course when
> everything has been processed) and that a new tasks is starting on the
> next bucket as soon as a task on a previous bucket has finished?
>
> Any ideas are very welcome!

I'd suggest organizing the tasks as workers and the buckets as jobs. The
idea is that each task is a loop that just gets a job (a bucket in your
case), processes it, sends the result, and gets another job, etc. You just
need a protected data structure to serve the jobs, and an array of worker
tasks, and then you can easy vary N to any value you want to try.

Randy.


From: Alex R. Mosteo on
Stefan Bellon wrote:

> Hi all!
>
> Although I am quite familiar with Ada, tasking is quite new to me, so
> please bear with me with my question. ;-)
>
> I have a set of buckets where I do have to do some processing on them.
> This processing can be done in parallel for each bucket. The results
> of the processing however are accumulated into another data structure.
>
> At present I have task types for the processing of the buckets, an
> access type to the task type and basically just do:
>
> declare
> type Task_Access is access all Task_Type;
> My_Task : Task_Access;
> begin
> for I in Buckets'Range loop
> My_Task := new Task_Type'(Buckets (I));

Be aware that this causes heap consumption. That is, a finished task doesn't
just vanishes, it has some heap used for task context that is not
automatically released. To properly free a task access you must ensure that
the task is terminated (My_Task'Terminated) before doing an
Unchecked_Deallocation.

If you try to free an unfinished task, the call won't fail but the task
context memory won't be freed...

I have only GNAT experience though.

> end loop;
> end;
>
> The result data structure is a protected object with an entry Add that
> adds the processing result to the container and which is called by the
> task body.
>
> All is well so far.
>
> However the number of buckets may be quite large and I have the feeling
> that the context switching which is needed for say 1000 or more tasks
> eats up the gain of the parallelism. At least in my test cases I do not
> gain anything at all on a Core Duo system.
>
> Therefore I have the idea of just starting N tasks in parallel (where N
> can be specified by the user e.g. according to the number of CPU cores
> of the machine) instead of tasks for all buckets at once.
>
> Starting N tasks, then waiting for them to get all finished and only
> then starting the next N tasks is not difficult. But how would I do it,
> so that there are always N tasks running (apart of course when
> everything has been processed) and that a new tasks is starting on the
> next bucket as soon as a task on a previous bucket has finished?
>
> Any ideas are very welcome!
>

From: Jeffrey R. Carter on
Randy Brukardt wrote:
>
> I'd suggest organizing the tasks as workers and the buckets as jobs. The
> idea is that each task is a loop that just gets a job (a bucket in your
> case), processes it, sends the result, and gets another job, etc. You just
> need a protected data structure to serve the jobs, and an array of worker
> tasks, and then you can easy vary N to any value you want to try.

In this case, I'd say all that's needed is a protected object to supply
the index into the Buckets array of the next bucket needing processing.

--
Jeff Carter
"Blessed is just about anyone with a vested interest in the status quo."
Monty Python's Life of Brian
73
From: Leif Holmgren on

Randy Brukardt wrote:
> "Stefan Bellon" <sbellon(a)sbellon.de> wrote in message
>>
>>Starting N tasks, then waiting for them to get all finished and only
>>then starting the next N tasks is not difficult.
....

> and an array of worker
> tasks, and then you can easy vary N to any value you want to try.

If my memory does not fail me the advice to use an array here is double
good.

First of all you don't need to bother with the nitty gritty details of
dynamic allocation yourself.

Secondly and perhaps most important Ada will handle the synchronization
of task termination for you automatically. It will not allow the array
to go out of scope before all the tasks are completed.

Years ago I implemented such a system (using dynamically allocated
tasks) and it worked very well. By doing as suggested you will gain
maximum performance even if the buckets take different time to process.

/Leif