From: Alex Mizrahi on
(message (Hello 'Madhu)
(you :wrote :on '(Fri, 12 Jan 2007 19:35:20 +0530))
(

M> | and how order is important, and how will it help you to find
M> | circularities? i've checked if me is dumb reading wikipedia
M> | article. didn't find anything new about 'important order' -- they
M> | offer two approaches for graphs with cycles: remembering which nodes
M> | were traversed, and interative deepening (if graph is very large).
M> | both of this approaches will work for parallel DFS.

M> "Remembering nodes" effectively imposes an ordering - the order
M> induced by the DFS search.

i see no connection of remembering nodes and ordering.

obvious way will be to use a hash-table like construct to remember nodes,
although it will have questionable performance impact, it can be shared
among different processors, so there's no need for any ordering, and it
should work.
other obvious way will be setting marks on nodes themselves -- and again,
there's no order involved and no issues with multiple processes.

M> | btw, you can find articles about parallel DFS and BFS in
M> | google. maybe you'd better first educate yourself before saying
M> | strange stuff?

M> I think the theoretical limits of what I'm saying have been well
M> established for decades now. I learnt some of this stuff in a graduate
M> course in parallel algorithms.

you're saying that ordering is somehow important in DFS, i don't see why.
can you point to such order-sensitive DFS?
maybe you just don't recall correctly what you have learned?

btw, 'decades from now' parallelization might mean calculations on some
cluster. but we are speaking about calculations on multiple cores of one
machine -- in this case 'communication' speed between nodes is orders of
magnitutude more effective, thus algorithms that are practically infeasible
on a cluster will work very well on multicore system. for example,
aforementioned marking method can consume lotsa traffic, but on SMP system
it's cost about 0, since multiple cores use same memory (they might need to
synchronize caches, but i think it's cost is very low).

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity")


From: Maciek Pasternacki on
On Boomtime, Chaos 12, 3173 YOLD, Madhu wrote:

> |>> | If you want to analyse chess positions you can never have too
> |>> | much speed and it has nothing to do with rendering. I'm sure
> |>> | it's the same situation with go and many other games.
> |>>
> |>> But having more than one core will not be a benefit if your
> |>> algorithms are graph based and have to search a tree. IIRC most
> |>> graph algorithms (dfs bfs) are inherently unparallelizable.
> |>
> |> And did not a parallel search tree could distribute subtree search
> |> between cores at each branching point?
> [...]
> | single thread would work like:
> | (loop
> | (if *node-queue*
> | (let ((node (dequeue *node-queue*)))
> | (do-something-with node)
> | (dolist (subnode (children node))
> | (enqueue subnode *node-queue*)))
> | (return))
> |
> | Search would start with enqueuing root node, and would end by any
> | thread setting *node-queue* to NIL. This would be parallelizable
> | over any number of cores (supposing one doesn't care about exact DFS
> | search order -- but if one cared about order, one wouldn't have
> | considered parallelizing).
>
> Your stopping criterion will have to be different. Also, if your
> input is not a tree, this algorithm will expand the same node multiple
> times. This [inefficiency] can be done in parallel, of course :)

Yup, I posted DFS for trees for simplicity, but it can be generalized
by marking visited nodes (with single lock -- atomically mark and
enqueue a node, and don't enqueue marked nodes).

--
__ Maciek Pasternacki <maciekp(a)japhy.fnord.org> [ http://japhy.fnord.org/ ]
`| _ |_\ / { Razors pain you; rivers are damp; acids stain you; and drugs
,|{-}|}| }\/ cause cramp. Guns aren't lawful; nooses give; gas smells awful;
\/ |____/ you might as well live. } ( Dorothy Parker ) -><-
From: mark.hoemmen on
Tim Bradshaw wrote:
> It doesn't matter very much who moves the data, it's still cache :-).

Actually, for performance, it can help a lot (in the HPC world) if you
have software control of data movement. I can dig up some citations if
you are interested.

I would argue that a good programming language for HPC applications
would let programmers know when they are about to do something
expensive. Communication is way more expensive than arithmetic, no?

mfh

From: mark.hoemmen on
Rob Warnock wrote:
> Yup. Far too much of the HPC market consists of simply rerunning 1960's
> "dusty deck" codes with different inputs and larger array dimensions.

Um, not in my world ;)

1. First of all the 1960's codes probably won't even compile since
that's pre - Fortran 77 ;P

2. The parallel codes tend to use MPI which is a creation of the early
90's. There is a strong push to develop parallel HPC languages that
give a shared-memory model without hiding the costs of communication.
DARPA is paying big bucks to several different companies (Sun, IBM,
Cray) to work on this.

3. The _interfaces_ might be old (BLAS, LAPACK) but are constantly
under development -- people demand new routines all the time, and
_they_get_them_.

4. The algorithms are continually evolving -- for example, if the
1960's code was for solving an elliptic PDE, it was probably using some
relaxation iteration to solve the linear system. Recent codes probably
use multigrid, which is _way_ more efficient.

so yeah, don't diss the HPC world, we're the ones who'll figure out how
to use all those cores that are going to be on your desktop in the next
few years ;P

with friendliness
mfh

From: Tim Bradshaw on
mark.hoemmen(a)gmail.com wrote:


> Actually, for performance, it can help a lot (in the HPC world) if you
> have software control of data movement. I can dig up some citations if
> you are interested.

Yes, of course, but the HPC world is an odd one. A similar argument
(and this isn't sarcasm) says that for performance it can help a lot if
you don't use a dynamic/high-level language, avoid non-array datatypes
&c &c. It can, but for most application areas there are other
considerations.

>
> I would argue that a good programming language for HPC applications
> would let programmers know when they are about to do something
> expensive. Communication is way more expensive than arithmetic, no?

Yes, I agree with this. but HPC is, as I said, odd (though I'm
interested in it). For most applications you want to have some idea of
what the performance model of the machine is like (which is beyond the
vast majority of programmers for a start), to write code which (if
performance is an issue which very often it is not) should sit well
with that model, and then to allow the machine (in the `compiler, HW
and OS' sense) do most of the boring work of, for instance, making sure
memory is local to threads &c &c.

I can see that I've now got completely sidetracked. My original point
in this thread was that multicore systems will end up on desktops for
reasons which have rather little to do with performance, and now I'm
talking about HPC :-) I will go and devour some minions.

--tim