From: Juan R. on
John Thingstad ha escrito:

> On Thu, 11 Jan 2007 13:05:17 +0100, Juan R.
> <juanrgonzaleza(a)canonicalscience.com> wrote:
>
> Nop. The numbers are gotten from Amhdal's law.
> They are also the numbers Intel use! (180%)
> Yes, they are approximate. Obviously.

I estimated b = 35 from AMD Opteron data on dual-cores [1]

>From that parameter and topological performance

[N * 100] - [b * [ [N * [N - 1]] / 2] ]

I got 190% for the AMD quad-core. It is close to the 180% you claim for
an Intel quad-core. If b = 36.67 for the Intel then you would got the
180%.

If you compute the speedup from above formula you obtain

N - [ [[b * N] / 200] * [N - 1] ]

Looking like the Gustafson's law [2] for an alpha = [[b * N] / 200]

The N-dependence of alpha is function of the explicit expression for b
= b(N), which is both technological and algorithmical dependant.

For a small number of cores, both laws may provide 'equivalent'
answers. But I know of publications on further violations of the
Amhdal's law for 1024-cores. I do not know of violations for the
Gustafson's law.

Any case, I think that main idea on the thread is that parallelization
is not good for general usage because with four cores one obtains
generally less than the double of power over a single core and a double
core just adds less than 3/4 over a single one. This is the main reason
that 2 and 4 cores Intel and AMD are more popular -and economically
viable- than hypotetical 8, 16, or 32 cores units.

About the question of LISP being used on scientific computing, I know
of none scientist or mathematician working in high-performance
algorithms (e.g. in computational chemistry) using LISP. I always read
Fortran code.

P.S: What do you mean by "nop"? No-op?


[1]
http://www-03.ibm.com/servers/eserver/opteron/pdf/IBM_dualcore_whitepaper.pdf

[2] http://en.wikipedia.org/wiki/Gustafson%27s_law

From: Tim Bradshaw on
Pascal Bourguignon wrote:

> You've probably already seen this pyramid with the registers in the
> top corner, above layers of memories, L1, L2 and now L3, the RAM, the
> HD, the tapes, etc. We could also add layers for the Internet and the
> physical world.

Thanks. I'm reasonably familiar with computer architecture.

From: Pascal Bourguignon on
"Tim Bradshaw" <tfb+google(a)tfeb.org> writes:

> Pascal Bourguignon wrote:
>
>> You've probably already seen this pyramid with the registers in the
>> top corner, above layers of memories, L1, L2 and now L3, the RAM, the
>> HD, the tapes, etc. We could also add layers for the Internet and the
>> physical world.
>
> Thanks. I'm reasonably familiar with computer architecture.

Well, my point was that it doesn't matter much if the memory that's
near the processor is cache memory or normal RAM, the only difference
being who will fill it, the MMU, or the OS.

--
__Pascal Bourguignon__ http://www.informatimago.com/

"Indentation! -- I will show you how to indent when I indent your skull!"
From: Tim Bradshaw on
George Neuner wrote:

> Well, on Cells the private memories are not cache but staging memories
> ... the main processor has to move data into and out of them on behalf
> of the coprocessors. It's very similar to the multi-level memory
> system used on the old Cray's where the CPU had to fetch and organize
> data to feed the array processors and store the results back to the
> shared main memory.

It doesn't matter very much who moves the data, it's still cache :-).
The issue that counts, really, is what the programming model is at the
user level. No one should need to care whether things are done
automagically by the hardware as most L1/L2 caches are today, or by
hardware with substantial SW support as, say, MMUs, or almost entirely
by SW with some small amount of HW support, as, say disk paging.
(Actually, the second thing that counts is whether the HW can
efficiently support the programming model you choose.)

>
> AFAIK, no one has tried to offer a hardware solution to staging
> computations in a distributed memory system since the KSR1 (circa
> 1990, which failed due to the company's creative bookkeeping rather
> than the machine's technology). Everyone now relies on software
> approaches like MPI and PVM.

Well, I think they have actually, in all but name: that's essentially
what NUMA machines are. Such machines are quite common, of course
(well, for bigger systems anyway): all Sun's recent larger machines (4
& 5-digit sunfire boxes) are basically NUMA, and it may be that smaller
ones are too.

Of course, as I said above, this comes down to programming model and
how much HW support you need for it. I think the experience of the
last 10-20 years is that a shared memory model (perhaps "shared address
space"?), preferably with cache-coherency, is a substantially easier
thing to program for than a distributed memory model. Whether that will
persist, who knows (I suspect it will, for a surprisingly long time).
Of course the physical memory that underlies this model will become
increasingly distributed, as it already has to a great extent.

--tim

From: Alex Mizrahi on
(message (Hello 'Madhu)
(you :wrote :on '(Fri, 12 Jan 2007 09:18:17 +0530))
(

M> Your stopping criterion will have to be different. Also, if your
M> input is not a tree, this algorithm will expand the same node multiple
M> times. This [inefficiency] can be done in parallel, of course :)

M> Which is why order tends to be important in DFS, and why it is
M> unsuitable for decomposition.

and how order is important, and how will it help you to find circularities?

i've checked if me is dumb reading wikipedia article. didn't find anything
new about 'important order' -- they offer two approaches for graphs with
cycles: remembering which nodes were traversed, and interative deepening (if
graph is very large).
both of this approaches will work for parallel DFS.

btw, you can find articles about parallel DFS and BFS in google. maybe you'd
better first educate yourself before saying strange stuff?

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity")