Why so little parallelism? [Computer Architecture]

Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler

From: Eugene Miya on 12 Dec 2006 12:33

>>>X Unix,
>>>Presentation Manager

In article <KDAfh.15787$HV6.7154(a)newsfe1-gui.ntli.net>,
ChrisQuayle <nospam(a)devnul.co.uk> wrote:
>I would have thought graphics work was an ideal match for parallel
>processing. Modifying or other manipulation of on screen images etc.

Some graphics, synchronous graphics, can in some cases be considered
embarassingly parallel.

But how much do you know about graphics?
The problem for basic architectures of machine is that it's not just 2-d.
It's not just your screen or windowing systems. Images are merely the
2-d representation of 3-d and even more complex worlds.

Business graphics is one thing. Games are another, scientific analysis
is another, motion pictures and TV series are yet a 3rd.
The scale for resolution may result in an
object being rendered may take lots of cycles to merely resolve: to 1 pixel
or to be determined to be a back facing polygon and not rendered at all.
Those cycles still have to be expended in some cases.

>X is a system killer though, not because of bad design, it is quite
>elegant, but because windowing systems of any kind are very compute
>intensive and need lots of memory for the higher resolutions. Having
>programmed X and written a simple windowing gui for embedded work, I
>speak from experience. The amount of code that needs to be executed just
>to get a window onscreen is quite substantial. Ok, only a few system
>calls at api level, but the devil is all in the internals...

Quite true.
Lots of detail huh?

I got out of graphics almost 2 decades ago. I ran a SIGGRAPH chapter.
Each community has its own interests (biases) and wants talented people
to work on graphics not parallel programming (sorry guys). My
succeeding chapter (they want me back) just had a talk on the death of
ray tracing. Yeah right. 8^)

But that's not business to some firms.

--

From: Nick Maclaren on 12 Dec 2006 13:48

In article <KDAfh.15787$HV6.7154(a)newsfe1-gui.ntli.net>,
ChrisQuayle <nospam(a)devnul.co.uk> writes:
|>
|> X is a system killer though, not because of bad design, it is quite
|> elegant, ...

Heaven help us! X's design is utterly ghastly from start to finish,
though most of the aspects of that are irrelevant to this thread.
However, two of them are relevant.

The way that keyboard and mouse interactions have to be sent the
whole way back to the application before they take effect forces
the c. 10 context switches, many more page touches / TLB misses and
innumerable pipeline drains. NeWS was a lot better.

The required lack of orthogonality between windows forces unnecessary
serialisation, though it is primary an ergonomic and security problem.
But, even in theory, X windows can't be handled independently. ISO VT
ones could.

Regards,
Nick Maclaren.

From: ChrisQuayle on 12 Dec 2006 17:15

Eugene Miya wrote:
>
> But how much do you know about graphics?
> The problem for basic architectures of machine is that it's not just 2-d.
> It's not just your screen or windowing systems. Images are merely the
> 2-d representation of 3-d and even more complex worlds.
>
> Business graphics is one thing. Games are another, scientific analysis
> is another, motion pictures and TV series are yet a 3rd.
> The scale for resolution may result in an
> object being rendered may take lots of cycles to merely resolve: to 1 pixel
> or to be determined to be a back facing polygon and not rendered at all.
> Those cycles still have to be expended in some cases.

I probably don't know that much about graphics, even after accumulating
half a dozen or so books on the subject, written low level drivers,
primitives etc. I guess it's something that that you have to specialise
in, otherwise you never master all the detail. What I was surprised by,
programming low level embedded graphics (typically, dumb frame buffer in
memory) drivers is the amount of code required just to put a line of
pixels onscreen. The number of bit shifts and / or masks etc to take
account of overlap between memory and the input raster line alignment is
quite substantial on it's own. Add greyscale, colour, objects, 3d etc
and all the translate layers to convert various standards and the code
just mushrooms. One thing that I did notice was that much graphics code
seems to use floats, when scaled integers, table lookup etc could
possibly ease the compute overhead. That's an embedded head talking
though and may not be relevant for a workstation that uses dedicated
hardware for the low level stuff and has far more resources to start
with. The sum experience with graphics here is that some problems are
just plain complex and cannot be simplified beyond a certain level.

>
>>X is a system killer though, not because of bad design, it is quite
>>elegant, but because windowing systems of any kind are very compute
>>intensive and need lots of memory for the higher resolutions. Having
>>programmed X and written a simple windowing gui for embedded work, I
>>speak from experience. The amount of code that needs to be executed just
>>to get a window onscreen is quite substantial. Ok, only a few system
>>calls at api level, but the devil is all in the internals...
>
>
> Quite true.
> Lots of detail huh?
>
> I got out of graphics almost 2 decades ago. I ran a SIGGRAPH chapter.
> Each community has its own interests (biases) and wants talented people
> to work on graphics not parallel programming (sorry guys). My
> succeeding chapter (they want me back) just had a talk on the death of
> ray tracing. Yeah right. 8^)
>
> But that's not business to some firms.
>

Agreed - elegant solutions don't always sell product :-(...

Chris

From: Eugene Miya on 12 Dec 2006 17:40

In article <rSFfh.1243$v4.405(a)newsfe3-win.ntli.net>,
ChrisQuayle <nospam(a)devnul.co.uk> wrote:
>I probably don't know that much about graphics, even after accumulating
>half a dozen or so books on the subject, written low level drivers,
Whose books? Foley and Van Dam, etc.?
Newman and Sproull?
>primitives etc. I guess it's something that that you have to specialise
>in, otherwise you never master all the detail. What I was surprised by,
This is true.
>programming low level embedded graphics (typically, dumb frame buffer in
>memory) drivers is the amount of code required just to put a line of
>pixels onscreen. The number of bit shifts and / or masks etc to take
Are these pixels aliased or anti-aliased?
>account of overlap between memory and the input raster line alignment is
>quite substantial on it's own. Add greyscale, colour, objects, 3d etc
>and all the translate layers to convert various standards and the code
>just mushrooms. One thing that I did notice was that much graphics code
>seems to use floats, when scaled integers, table lookup etc could
>possibly ease the compute overhead. That's an embedded head talking
>though and may not be relevant for a workstation that uses dedicated
>hardware for the low level stuff and has far more resources to start
>with. The sum experience with graphics here is that some problems are
>just plain complex and cannot be simplified beyond a certain level.

You are greatly honest and provide sufficient context.
A fair number of early graphics did indeed rail against floats when I started.

>>>X is a system killer
>> succeeding chapter (they want me back) just had a talk on the death of
>> ray tracing. Yeah right. 8^)
>
>Agreed - elegant solutions don't always sell product :-(...

I have friends who are looking to hire graphics talent, but they are
finding that the talent pool is drying up because the perception is that
graphics is a "solved" problem. They are not even finding it in India
and China. The normal computing channels have guys who think they don't
know graphics and art, and they more than enough graphics arts people,
but fewer and fewer in the technical coding aspects.

--

From: Bill Todd on 12 Dec 2006 19:04

Eugene Miya wrote:
> In article <yeadnXYnjLbPXuDYnZ2dnUVZ_tyinZ2d(a)metrocastcablevision.com>,
> Bill Todd <billtodd(a)metrocast.net> wrote:
>> Eugene Miya wrote:
>>> In article <_oOdneG2v-ejCODYnZ2dnUVZ_rOqnZ2d(a)metrocastcablevision.com>,
>>> Bill Todd <billtodd(a)metrocast.net> wrote:
>>>> Del Cecchi wrote:
>>>> And Threads? Aren't
>>>>> they just parallel sugar on a serial mechanism?
>>>> Not when each is closely associated with a separate hardware execution
>>>> context.
>>> Threads are just lightweight processes.
>> Irrelevant to what you were purportedly responding to.
>>
>>> Most people don't see the baggage which gets copied when an OS like Unix
>>> forks(2). And that fork(2) is light weight compared to the old style VMS
>>> spawn and the IBM equivalents.
>> Also irrelevant to what you were purportedly responding to.
>>
>> When (as I said, but you seem to have ignored) each thread is closely
>> associated with a *separate* hardware execution context, it's simply the
>> software vehicle for using that execution context in parallel with other
>> execution contexts.
>
> It's completely relevant.

That is not yet in evidence.

> What do you think hardware context is?

The hardware execution engine for a single stream of code (what is
conventionally referred to as a uniprocessor).

>
>>>> And when multiple threads are used on a single hardware
>>>> execution context to avoid explicitly asynchronous processing (e.g., to
>>>> let the processor keeping doing useful work on something else while one
>>>> logical thread of execution is waiting for something to happen - without
>>>> disturbing that logical serial process flow), that seems more like
>>>> serial sugar on a parallel mechanism to me.
>>> Distributed memory or shared memory?
>> Are you completely out to lunch today? Try reading what I said again.
>
> I did.

Then you obviously need to try again.

> And you've never used parallel machines?

That would depend upon the definition of such. I've been writing
multi-threaded code for 30 years on uni-and multi-processors, but I have
little interest in the kinds of workloads that find vector processors
useful.

> What do you think context is chopped liver?

I think I've already adequately described what I think it is. Perhaps
you're having another bad day.

>
>>>> Until individual processors stop being serial in the nature of the way
>>>> they execute code, I'm not sure how feasible getting rid of ideas like
>>>> 'threads' will be (at least at some level, though I've never
>>>> particularly liked the often somewhat inefficient use of them to avoid
>>>> explicit asynchrony).
>>> What's their nature?
>> To execute a serial stream of instructions, modulo the explicit
>> disruptions of branches and subroutines and hardware interrupt
>> facilities (which themselves simply suspend one serial thread of
>> execution for another). At least if one is talking about 99.99+% of the
>> processors in use today (and is willing to call SMT cores multiple
>> processors in this regard, which given the context is hardly
>> unreasonable). The fact that they may take advantage of peephole
>> optimization to reorder some of the execution is essentially under the
>> covers: the paradigm which they present to the outside world is serial
>> in nature, and constructs like software threads follow fairly directly
>>from it.
>
> Do you know anything at all about program counters, data flow, and
> operating systems?

Yes. I also know something about the onset of senility, and have for a
while been wondering whether you were beginning to suffer from it. Half
the time lately you've been a boor, and the other half rather recklessly
random and/or cryptic. Then again, perhaps you're just under some kind
of unusual strain - it can produce similar behavior.

My original comments to Del stand, and you have yet to offer anything
resembling additional insight into that particular area (to which you've
purported to be responding). Threads are the software constructs one
uses to leverage the services of multiple independent hardware execution
engines in conventional (i.e., virtually all) processing environments
(whether each such individual engine also offers internal parallelism -
e.g., vector instructions - of its own or not). As such, they aren't
sugar of any kind, but rather fairly basic to that particular variety of
concurrent programming.

Threads can be considered a form of sugar when used to share a
uniprocessor, but (as I already noted) more in the nature of serial
sugar in that they allow multiple comfortably-serial processes to share
the resources provided by the single processor and the external world it
interacts with rather than requiring the programmer to explicitly
interleave the multiple serial conceptual threads of execution within a
single serial process. Del's comment about 'parallel sugar on a serial
mechanism' misses the point that the mechanism is *not* serial: the
whole point of using multiple threads within a single process on a
uniprocessor (at least when attempting to perform multiple similar
operations in parallel, as should have been clear from my contrasting it
with the use of asynchrony, rather than just to simplify a program
conceptually) is that while the processor itself is serial the overall
system is not, and thus to leverage the ability to make forward
processing progress on one thread while another is waiting for some
non-processor operation (such as a disk access) to complete. I'd find
Del's comment much more applicable to, say, a time-sharing environment
where time-slicing is used to provide the *appearance* of parallelism to
multiple *independent* tasks (even if they're all completely compute-bound).

In the context of the current discussion, I interpreted Stefan's use of
'threads' to refer to the use by a compiler of the multiple execution
engines of an SMP and/or SMT platform to parallelize a 'for' loop (in
his example) whose iterations had sufficient independence to allow this
(an approach which compiler developers seem to have been exploring for a
while, though I don't know how mature it is yet). As such, it certainly
would not constitute 'sugar'. The absolute 'weight' of such an approach
is not (as I already noted) relevant: only the weight relative to the
benefit obtained is (for example, if each element of the parallel 'for'
was quite heavy in its own right). Equating the absolute weight of such
an approach with that of a Unix fork is absurd: not only would such
threads typically already exist (if the compiler is smart enough to use
them at all, it's presumably smart enough to create them somewhat in
advance), but creating an additional thread within a process is (or
certainly can be, given a competent OS) far less expensive than creating
an additional independent process and address space. And your question
about whether multiple threads *sharing a single processor* were
executing in a distributed or shared memory environment was simply
stupid (I'd perhaps be a bit more charitable about this if I hadn't
already given you an opportunity to rectify it).

- bill

First | Prev | Next | Last
Pages: 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
Prev: Historical question, what went wrong with bubble memory?
Next: Need SimpleScalar GCC Compiler