From: Michael J. Mahon on
mdj wrote:
> Michael J. Mahon wrote:
>
>
>>Actually, the problem of allocating computation across a hierarchy of
>>memories and multiprocessors is quite old, and the topology has not
>>varied from the "general case" in several decades--only more levels
>>of hierarchy have been introduced. It's still exactly the same
>>problem, with exactly the same range of solutions.
>>
>>Put another way, there is nothing "special purpose" about the model
>>that I propose we optimize for--it should apply to everything from
>>atoms to stars (in fact, the universe already seems to have an
>>allocation algorithm that works pretty well ;-).
>>
>>Here's an outline of a possible (incomplete) conceptualization based,
>>interestingly enough, on the dynamic stability of (most) stars.
>>
>>Given a population of processes, all working concurrently and
>>1) consuming resources, and 2) communicating, let us allocate
>>processes to a single "processing node" until its resources are
>>mostly used, then begin allocating processes to nearby nodes,
>>until all processes have been "spread out" across nodes. We will
>>treat resource contention on a node as "pressure" which leads
>>to "expansion" of the set of processes across more nodes.
>>
>>Now, as the processes communicate, we treat non-local communication
>>as an attractive force between the communicating processes, which
>>biases the allocation algorithm toward moving them closer together
>>(in "communication cost" space).
>>
>>Now note that a dynamic equilibrium is possible in which the contractive
>>"force" of communication is balanced by the expansive "force" of
>>resource contention. This is analogous to the equilibrium that exists
>>in a stable star between gravitation attraction and internal pressure.
>>
>>Such models can form the basis, when melded with control system theory,
>>for stable, quasi-optimal allocation of multiple communicating processes
>>across a fabric of processor-memory nodes. Generalizations to a
>>hierarchy of connectivity latencies and bandwidths are possible and
>>would be required except in the simplest cases.
>
>
> Indeed this is the goal. Perhaps you should get involved in one of the
> open source projects working on such fields. There seems to be a lot
> more 'hardcore' computing going on in the open source community these
> days than inside corporate walls.

Corporate R&D budgets have been systematically slashed to make
the next quarter look good. ;-(

>>Actually, most of what is being practiced in the realm of hardware
>>concurrency is quite well understood, and has been around for decades.
>>What has changed is that transistors have become cheap enough that the
>>techniques originally developed in limited form for supercomputers
>>can now be implemented in more general forms on microprocessor chips.
>>
>>Buffering, speculation, scoreboarding, data flow, and associative memory
>>are not new, but their affordability in desktop processors is.
>
>
> Certainly techniques that are not applied in the software realm, which
> is what I was pointing out.

Software folks are always pushing their own flavor of complexity, so
they don't take time to learn from what's been going on in the real
world very much.

>>Bingo.
>>
>>BTW, it guarantees the loss of strict reproducibility of results if
>>floating point arithmetic is used, because the order of operations is
>>non-deterministic, and that is what determines roundoff error. We'll
>>just have to be content with "fuzzy" reproducibility!
>>
>>(Incidentally, the variance of results produced when a computation is
>>evaluated nondeterministically several times is a lower bound on the
>>roundoff error in the computation.)
>
>
> Yes. The only practical solution is to use much larger floating point
> representations such that the fuzziness is less of a concern, although
> modern hardware can perform 256 bit fixed point arithmetic quite
> quickly :-)

You and Kahan agree--double-precision is the amateur's best friend.

>>>I suspect that given the number of possible localities, etc. and the
>>>likelihood of these changing quickly over time, that the most
>>>successful way will be to design for the worst case interconnect, and
>>>allow optimisation to occur per-architecture by some form of runtime.
>>>Still, it doesn't solve the problem of appropriate partitioning of
>>>algorithms in the first place.
>>
>>Right. And one must be able to adaptively change allocations to
>>maintain efficiency as the computation evolves, since many computations
>>have more than one "locality" behavior, depending on what's going on.
>
>
> Hence my lack of faith in languages that bind us too tightly to
> (hardware) architectural borders.

But the solution to this kind of varability is run-time adaptation,
not anything linguistic (= static). Language has pretty much done its
thing by the time it has allowed local variables to be grouped.

>>>>There is no "ideal" way--there are only real ways, and none of them
>>>>have simple behavior if they're any good!
>>>
>>>
>>>:-) I meant "ideal" in terms of the ones that currently are 'better'
>>>versus the 'worse' ones, in terms of processing units on the same
>>>system board having closer locality for similar system cost.
>>
>>Ah. Well, that's just physics. Unless the designers mess it up, things
>>that are closer together can afford to be connected at a higher level of
>>the hierarchy.
>>
>>Early multi-core designs were just sub-optimal "pasteup" jobs.
>
>
> Many of them still on the market, being espoused as equivalent or
> superior to competing solutions :-(

I know, but Darwin will have his way with them... ;-)

The only way to be successful and persistently stupid is if everyone
else is also persistently stupid--fortunately, an unstable proposition.

>>>>>>Any practical system will consist of all levels of connectivity, with
>>>>>>sharing at virtually all the different levels of the memory hierarchy.
>>>>>>And I would add another set of levels, in which there is no "memory
>>>>>>consistency" model, but message passing is the sharing mechanism.
>>>>>>This extends the multiprocessing model across networks.
>>>>>
>>>>>
>>>>>A brand new PC these days will get you two cores. There are different
>>>>>scales of systems. The big ones, for sure, will exhibit all 4 levels,
>>>>>but there's plenty of work to be done just to handle the opportunities
>>>>>that exist in a new laptop.
>>>>
>>>>Agreed--and the tools for *optimizing* applications across even two
>>>>processors are almost nonexistent. A suboptimal data partitioning
>>>>can cause a large factor of performance degradation, and designers
>>>>have virtually no tools to understand and adjust this.
>>>
>>>
>>>Indeed. There are plenty of cases where even non-parallel application
>>>performance drops when moving to multiprocessor systems, simply because
>>>of operating system design. A classic example, which is largely solved
>>>now is the naive scheduler which shifts a running process to another
>>>CPU, throwing out the cache on the one it was on in the process, and
>>>worse, this happens while the process in question was waiting on I/O
>>>:-)
>>
>>And you correctly identify this as an immature OS scheduler design--yet
>>another example of how even OS designers are blissfully unaware of the
>>concurrency trends barreling toward them like trucks! (And they've
>>been quite visibly coming for a decade, so no one can justly claim that
>>they were surprised!)
>>
>>I've been amazed by how unaware OS designers are about the implications
>>of memory hierarchy and multiprocessing on their algorithms. And those
>>who figured out how to "cope" with a 4-way system can't understand
>>that they will need a completely different approach to deal efficiently
>>with a 32- or 64-way system. (Think about queueing theory, lock
>>contention, resource pools, and error recovery.)
>
>
> Absolutely, although there are designs out there the deal well with
> machines of that size (Solaris is one example). PC based operating
> systems have conventionally been optimised only for the hardware that
> appears at the time, which considering the rate of improvement in that
> field, has left a lot of stones unturned.

Of course. HP-UX was managing large numbers of processors with
excellent performance early in the game.

I was trying to get some very real stones turned before they became
prevalent.

>>>>>>This was always an inevitable result of higher levels of integration.
>>>>>>As soon as a significant amount of cache can be shared on the chip,
>>>>>>it becomes advantageous to adorn it with multiple processors.
>>>>>
>>>>>
>>>>>Just be thankful you aren't the author of the process scheduler of a
>>>>>current operating system :-)
>>>>
>>>>Actually, that's one of the most important areas to do *much* better
>>>>and more adaptively than is done in current OSs.
>>>
>>>
>>>Yes. The current approach adds effectively a 'timeout' constant to a
>>>process. The timeout represents the cost of shifting it to the next
>>>nearest locality, and processes don't move until this time has expired
>>>without the opportunity to shift them. Once the timeout has expired,
>>>you look for a gap in the scheduling table on processors of this
>>>locality, and if one exists, you slot it in. If not, you increase the
>>>timeout to the cost of moving one level further out, and so on. Each
>>>time a process gets scheduled to run, you reset the timeout to nearest
>>>locality.
>>>
>>>This approach works very well for avoiding unnecessary shifting of
>>>single-thread processes on a multitasking system, and things tend to
>>>settle into relative harmony as the system runs.
>>>
>>>Of course, handling scheduling and moving of massively parallel
>>>processes is another kettle of fish!
>>
>>Not if approached with the right abstractions. ;-)
>
>
> And ditching explicit pointer arithmetic in code is one of the
> abstractions dammit ;-)

Well, I'll grant that operating as if there is only one contiguous
space holding all data is a problem. Now, treating data as if it were
"segmented" (which everyone hates) may prove to be quite useful... ;-)

>>>>The "distributed process allocation" problem in the OS is as fundamental
>>>>as the "code selection" problem in a compiler, and will have much larger
>>>>performance impact as the level of *system* concurrency continues to
>>>>increase.
>>>>
>>>>I would even go so far as to say that the current *lack* of good
>>>>solutions to this problem is a major reason for the limited utility
>>>>of multicore systems. In a couple of years, we could all have 8-way
>>>>systems on our desktops, but if the fundamental enablers for parallel
>>>>apps don't get done, they won't perform much better than single-core
>>>>systems (as is true today for dual-core/single-core).
>>>
>>>
>>>There's a real need to crawl before we walk here. One big problem that
>>>needs be solved even on current systems is how to deal with many
>>>concurrent independant processes all in kernel space at the same time.
>>>Unless we move to microkernel systems, which I think eventally we will,
>>>we have to solve the problem of how to effectively schedule multiple
>>>concurrent threads of operating system code competing for resources as
>>>well, effectively meaning the problem has to be solved twice. Most
>>>current OS's have pretty ordinary granularity when it comes to
>>>intra-kernel concurrency.
>>
>>True--and significant changes in the low levels of OSs is inevitable.
>>But "crawl before walk" implies that we are facing this as a *new*
>>problem, when, in fact, it has been coming toward us for a generation,
>>but we have failed to acknowledge it in our designs.
>>
>>I hope using our nose for a wall detector won't cause us too much more
>>pain in this process (though note that lack of application support for
>>even low levels of parallelism is really slowing the PC market down).
>
>
> Agreed. The crawling should have been done a decade ago, not starting
> now while oodles of performance going begging. The process we have to
> go through remains the same regardless of the timeline unfortunately.

Such is the nature of learning. And such is the inefficacy of prophecy.
;-)

>>>>>This is similar to the compiler you mentioned a while back that used
>>>>>runtime instrumentation of it's p-code engine to determine critical
>>>>>sections then tune them. Although modern VM's apply this technique
>>>>>today, that's about as far as it's got.
>>>>
>>>>It doesn't need to be a non-existent machine--most machines have timer
>>>>interrupts that permit background profiling and dynamic code changes.
>>>>All of these tricks can be (and have been) done on actual machine code.
>>>
>>>
>>>Yes. I've often considered playing with this myself with Apple Pascal,
>>>and profiling the bytecode interpreter as it runs. Then, use the output
>>>of it to selectively native compile various procedures or functions and
>>>relink the application. Should be a hoot :-)
>>
>>Sounds like great fun!
>>
>>Using a source of timer interrupts makes dynamic profiling a breeze.
>>(Of course, this would also identify the parts of the *interpreter* that
>>were hot.)
>
>
> I'm awaiting a TimeMaster HO which has a rather nice programmable
> interrupt controller. Much more flexible than relying on the 60Hz VBL I
> use at the moment for experimentation.

Actually, 60Hz is plenty fast enough to find almost anything of
real interest on a slow machine. And it is infrequent enough that
you can actually execute some code without it being intrusive.

> It's good fun exploring these ideas on smaller environments that have
> nice predictable behaviors, but you already know this :-)

....and I *love* it! Of course, I love it even more when I fail to
predict a behavior. ;-)

>>>>>One of the reasons I still like Solaris is that it's the only common OS
>>>>>that supports the concept of 'microstate' accounting. This essentially
>>>>>means that you can log the transitions of a process through various
>>>>>known 'microstates' So rather than know just that a process is blocked
>>>>>on a resource, you can find out what resources, why, how long, etc.
>>>>>This is integrated back into the development toolchain all the way
>>>>>through to the Java level profiler, so you can visualise exactly what
>>>>>different threads are doing, spending time on, and easily spot
>>>>>bottlenecks which in many cases can be tuned out at either the OS or
>>>>>the VM. Very fancy stuff.
>>>>
>>>>And not too difficult or intrusive if it is based on sampling
>>>>information at interrupt time (which automatically bases sample
>>>>density on performance impact).
>>>
>>>
>>>Unfortunately on other operating systems, you don't have this level of
>>>intrumentation, and the other systems are different enough in terms of
>>>architecture, operating system design, etc. as to make many of the
>>>performance tuning possibilities you uncover on Solaris not applicable
>>>:-(
>>
>>Users should pester OS designers to provide them with background
>>profiling capabilities. It's actually easy to do on any OS.
>>
>>Of course, if you can define interrupting timers, then you can do it
>>yourself--with considerably more overhead.
>
>
> One of the problems with the design of many operating systems is the
> inability to expose interrupts to a user level process. This has
> resulted in many awful design hacks in OS's over the years (The
> embedding of the UI subsystem of Windows NT into the kernel from
> version 4 onward being a good example)

Yes, I always provided a way for processes to handle interrupts
directed to them. (Most language designers hated the idea.)

>>>>In many applications, there are relatively weak data dependencies
>>>>connecting large segments of data, so the data can be divided up,
>>>>processed, and then merged together--all very well suited to avoidance
>>>>of excess communication. (For example, video and audio transcoding
>>>>and most image processing are such cases.)
>>>>
>>>>Note that many cases of multidimensonal data have the characteristic
>>>>that the amount of computation grows as a power of the data volume,
>>>>while the size of the interacting "boundaries" between data chunks
>>>>grows as a lesser power, so that by adjusting "chunk" size, the
>>>>ratio of processing to communication can be easily adjusted over a
>>>>wide range--and *should* be so adjusted adaptively, based on run-time
>>>>realities and available system resources.
>>>
>>>
>>>Indeed. Video encoding can very easily be parallelised through time
>>>division, across many more processors than we're likely to have in
>>>consumer machines any time soon. This can be achieved through fairly
>>>simple multithreading plus rendezvous techniques. Of course, if the
>>>number of parallel slices becomes excessive, you introduce to many full
>>>frames into your encoding, and require a multitiered approach, but it's
>>>still relatively feasible to do using simple tools.
>>
>>Now *there's* a degree of parallelism that I'd like to have access to!
>
>
> When I've needed to do such things (being a linux man) I've simply
> split the source file into two parts and ran two encoders at the same
> time, then recombined. The fine granularity of tools offered by UNIX
> style interface opens up a lot of flexibility with how you approach
> problems.
>
> I agree that it should be far simpler than this, though.

That is the very simple approach which I believe the encoders themselves
should do after taking a "census" of the system they are running on.

>>>>There are well known techniques for ensuring stability--analogous
>>>>to servo system design. (Programmers generally are not aware of
>>>>the mass of work done on stability in control systems.)
>>>
>>>
>>>Indeed.
>>>
>>>
>>>
>>>>>>This is the kind of thinking that must go into the next generation
>>>>>>of systems, and it is very different from the thinking that has
>>>>>>inspired the systems and tools of today.
>>>>>
>>>>>
>>>>>Agreed - although don't overlook the fact that there are plenty of
>>>>>systems out there that have faced a lot of these issues already.
>>>>
>>>>A few have made a start--none have approached the problem as the
>>>>absolutely fundamental problem that it is.
>>>
>>>
>>>It's beyond the understanding of most to approach the problem this way.
>>>There's a degree of research going on in the quantum computing field,
>>>but this is for algorithm design for theoretical machines. I'd imagine
>>>though that a lot of the techniques apply to real parallel systems as
>>>well.
>>
>>I wouldn't say so. Quantum computing is so radically different from
>>state machine programming that there is almost no comparison. I'd say
>>it's at least as big a difference in paradigm as going from analog
>>computing to digital computing.
>
>
> I refer to the work done in selecting the correct computations from
> many parallel ones, which should be applicable in some sense.

Quantum computing evaluates all possible computations simultaneously,
so the "selection" of the answer tends to be done at the end. ;-)

NP-complete problems become solvable because a huge combinatoric
space of possible solutions is explored "in superposition" in the
time required to explore a single solution. Then you have to "read
out" the final state(s) to discover the solution(s).

I'm reminded of "the answer to the ultimate question". At some point,
you realize that you really want the ultimate question, too. ;-)

>>What needs to be done is for trained people with an *engineering*
>>mindset to sit down and squarely face the problem of multi-level
>>concurrency.
>>
>>I expect the solutions will roll out in layers of successive refinement,
>>but we haven't yet even chosen to directly address the problem.
>
>
> It won't be long now. We can't wait much longer for companies to
> engineer processors with massively faster sequential processing speeds
> before realising that they can't :-)

Yes, I think that has dawned on them as they whip their design teams
harder while watching their stock stagnate and then fall...

>>>>>>>Multithreading approaches are very important on these systems. In fact,
>>>>>>>multithreading is important even on systems with single execution
>>>>>>>units. The gap between I/O throughput and processing throughput means
>>>>>>>you get a certain degree of 'parallelism' even though you can only run
>>>>>>>one thread at a time. Free performance improvement if you employ
>>>>>>>parallel design techniques.
>>>>>>>
>>>>>>>Of course, there are certain heavily compute-bound applications where
>>>>>>>the degree of IPC is very low, and massive parallelism is possible
>>>>>>>regardless of the interconnect used, as IPC constitutes a relatively
>>>>>>>small part of the workload. For the rest of the cases though where lots
>>>>>>>of data is being consumed, systems that allow low-overhead IPC through
>>>>>>>multithreading are the way to go.
>>>>>>
>>>>>>And a tiny fraction of todays tools and designers are even out of
>>>>>>kindergarten on issues of partitioning and locality.
>>>>>
>>>>>
>>>>>Completely agree with this point. I'm continually surprised by the
>>>>>number of people I encounter that do not understand the dining
>>>>>philosophers problem, and that's perhaps the easiest one!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Object orientation is almost totally orthogonal, if not antithetical,
>>>>>>to the *real* problems of highly parallel computing, which is the
>>>>>>platform of the future. I expect we'll figure this out sometime
>>>>>>in the next decade. ;-(
>>>>>
>>>>>
>>>>>Almost, other than the additional abstraction and encapsulation being
>>>>>required by both concepts.Current languages don't really provide
>>>>>adequate abstractions for parallel computing primitives. Some of them
>>>>>can be modelled effectively using OO techniques, some of them can't.
>>>>
>>>>Right--which is why I'm trying to emphasize that the truly *important*
>>>>dimension of decomposition for concurrency has been sadly neglected,
>>>>when it should be the real focus of our language work.
>>>
>>>
>>>There is room, no, for multiple focus groups?
>>
>>Well, after spending 50 years on notation, how about starting in
>>on parallelism? ;-)
>>
>>I realize that there have been a few seminal approaches to supporting
>>parallelism in languages--including the radically concurrent applicative
>>languages--but none of the "rubber" was ever brought anywhere near the
>>"road" of real systems, let alone ones with hierarchical connectivity.
>>
>>There are already dozens of designers focusing on notation. I'm
>>actually making a *plea* for more focus areas--and for one of
>>them to be concurrency!
>
>
> You're absolutely right, but there's a degree of dependency between the
> two that needs to be addressed, and part of that is engineering out old
> notational forms which inhibit the progress of parellel system design.

Again, I would say that very little of the current linguistic goals have
more than incidental relevance to parallelism. It's a plain case of
"looking where there's light instead of where they lost it".

If an alien intelligence is watching, a little box on page 11,325 of
their weekly report must be devoted to a betting pool on how long it
will take us to figure out that we were working on the wrong problem.
(Just in *this* area--there are lots of boxes on other pages! ;-)

-michael

Parallel computing for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
From: Michael J. Mahon on
Per1000IPDay7(a)126.com wrote:
> Avoid typing the same text again and again
> Stop wasting your time on mouse movements
> Open favorite web pages with a single hotkey press
> Record keystrokes and play them back with a single hotkey press
> ------------------------------
> http://www30.webSamba.com/SmartStudio
> ------------------------------
> EnergyKey Save yourself from repetitive tasks

SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM
SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM
SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM SPAM

Wow--I just did all that with a single hotkey press!!

-michael

Parallel computing for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
From: mdj on

Michael J. Mahon wrote:

> > Not necessarily. A common design fault in a C program is a counted
> > iteration over a buffer where the loop end condition was faulty
> > (someone picked <= instead of <, etc)
> >
> > Often this results in a program appearing to run fine, until some other
> > loop relying on null termination fails to stop, overwriting data and
> > code in the process.
>
> Um, so people should not be able to get the termination
> condition for a loop wrong? It's going to be hard to change all
> those programmers' diapers, don't you think? ;-)

I don't believe you should be able to get the termination condition
wrong, when it happens to be related the bounds of a data collection.
If only strings where the only area where this problem existed. But
then, it's been solved, the overhead is small, I can't understand the
resistance to it.

Of course, the problem is much harder when we're talking computational
loops :-)

> BTW, I independently invented FOR i OVER <subrange type> DO ... OD
> prior to ADA, so I know there are ways to simplify things--I just don't
> think it's likely that anyone will be able to eliminate the possibility
> of errors!

these days Java also has the rather cute for(reference:collection) {}
statement as well which is very nice, as it lacks the ability to handle
subtypes (grumble)

> Overwriting data is a classic problem, and is usually pretty easily
> found if it's literally a bad conditional or an off-by-1 error, since
> the error is always misbehaving.
>
> (Overwriting code is another matter, but code pages should be protected
> from self-modification except in unusual cases.)

A big problem is how to avoid execution of code within data pages, when
its been put there by a malicious perpeprator exploiting a flaw. Code
overwrite is easy, lock the codepages against writes. Can't easily lock
the data pages though :-)

> I didn't say anything about "pointer arithmetic" (though it can be
> *very* useful to a storage allocator ;-). I just want unrestricted
> pointers so I can construct arbitrary data structures, and have the
> ability to re-thread the pointers as I wish.

This is my main gripe with pointers - you don't actually need them to
have there full 'power' in order to get the full use from them.
reference style pointers are all you need, and I'm happy to keep these,
they're easily defined in a portable way.

> >>>In short, my motivation for removing such features, is one of
> >>>pragmatism, not ideology. When engaging in projects for personal use, I
> >>>prefer 'unconstrained' tools as well.
> >>
> >>OK, so you just want to keep guns away from children... ;-)
> >>
> >>My preferred approach is not to have children in the "army" in the first
> >>place. ;-)
> >
> >
> > Unfortunately though, this approach doesn't work in practice. The
> > industry is full of children :-)
>
> Then we are accountable for the consequences.

We, being the people who pay time and time again for the mistakes made
by either their predecessors in a code maintenence role, or as
customers of shoddy software...

Actual accountability stopping where it should is something that almost
never happens in practice.

> I see where you're coming from, and I wish you luck in getting the
> guns back from the children, particularly those who've developed a
> fondness for shooting. ;-)
>
> You should take some solace from the fact that probably the most
> frequently used "higher level language" is Excel, and it is completely
> pointer safe. ;-) The only mistakes you can make in Excel are the
> ones that will bankrupt your company! (Get my point?)

No it isn't! Excel spreadsheets routinely refer to a cell that hasn't
been defined. This brings a whole new meaning to segmentation
violation. Don't get me started about the number of "databases" that
exist in spreadsheets :-)

There's no restriction whatsoever in Excel code over what part of the
"address space" you reference in an expression. You can even reference
a sheet that doesn't exist (a genuine "bus error" :) )

> >>It is well known that the only long-term effective way of removing old
> >>methods from the playing field is through generational succession. ;-)
> >>
> >>Actually, for it to be maximally effective, though, the process of
> >>cultural transmission of (mistaken) ideas must also be weakened.
> >
> >
> > This is what I'm doing in this thread ;-) Misinformation spread over
> > communication mediums like this one is a major source of said
> > transmission.
>
> Hmmm. I expect that a letter to the editor of any trade rag would
> reach a much larger audience...

I think they gave up publishing such things years ago, unfortunately.

> >>>>A great designer can do great things in any language, but it will
> >>>>be much harder to do all the great things he can do in just one
> >>>>language. Imagination will not be contained.
> >>>
> >>>
> >>>It doesn't need to be. I agree with you completely on this point. I
> >>>challenge anyone to come up with a piece of code that relies on this
> >>>freedom that cannot be expressed another way.
> >>
> >>Turing equivalence argues that there are no such examples--but there
> >>are many examples of people's imaginations being truncated by the lack
> >>of a concrete machine execution model. (I cited two.)
> >
> >
> > Then the issue is a lack of understanding of the execution models used
> > by the newer tools, as obviously they DO have concrete execution
> > models.
>
> Yes, that seems to be a problem with "high level languages"--they are
> falsely advertised as "freeing their users from concern over petty
> details--like how much memory is used, how many processor cycles,
> and how many cache misses." As a result, tools to determine these
> critical numbers are hard to find and harder to use.
>
> Now *there's* a common belief that needs to be stamped out!

Agreed. It's right up there with the belief that a tool that leaves
them up to the programmer to decide for themselves is a solution ;-)

> I've generally been a bottom-up thinker (which I guess is like being
> a geometer vs. an arithmetician ;-), so I've often approached questions
> of performance with a blunt instrument.

I'm fairly similar in this regard - many it seems approach it
guesswork; look over source code, spot 'slow' bit, fix it, get 2% gain,
repeat ;-)

> For example, after implementing a compiler and getting it mostly
> working, I became curious about whether it had any significant hot
> spots that I could squash to get significant performance improvements.
>
> I went to the console of the (mainframe) machine as it was compiling
> itself, and manually hit the stop key a couple of dozen times, each
> time writing down the program counter address.
>
> If a part of the program was using more than 25% of the time, I should
> have seen its PC value represented in my sample about 6 times, plus or
> minus the square root of 6, or between 4 and 8 times. If no range
> of addresses was sampled that often, then there was no procedure in
> the compiler taking at least 25% of the time.
>
> I found a spot that was taking about a third of the time, and another
> taking about a fifth, so I looked at them and sped them up by about
> a factor of 4 or 5 each.
>
> Less than an hour after taking my PC samples, the compiler was running
> almost twice as fast.
>
> Not rocket science, but thinking across several levels of abstraction--
> and all I needed was a piece of paper, a pencil, and a load map for the
> compiler!
>
> (BTW, this method can be applied to *many* performance issues, but it
> helped that wall clock time was identical to process time in this case.)

Most development environments today provide profiling tools that make
it simple to spot such hotspots. The tricky areas now are usually where
programs are I/O bound.

> I can buy that--just as long as someone introduces the kiddies to the
> power tools at some point in their development. In my experience,
> they can get old and gray and never see a real power tool...so where
> are the great programmers of the future going to be raised?

In their own houses, more than likely. Most great programmers end up
learning most of what they know outside of structured learning
environments. I don't think we need worry about them not discovering
the power tools, they're already inventing their own :-)

> >>I'm also a big believer in "anihilating the problem" rather than solving
> >>it a piece at a time. But surely the best we can hope for from the
> >>tools you are espousing is a reduction in the number of problems--not
> >>the *elimination* of problems. The argument is only quantitative, not
> >>qualitative.
> >
> >
> > Absolutely. I'm not naive enough to believe it'll solve all problems.
> > Hopefully though, it'll mitigate enough of the old ones to allow us
> > more time to start exploring new ones. It's an evolutionary model of
> > course.
>
> I sure hope so. But, again, experience indicates that every time we
> take away 50% of someone's freedom to make mistakes, we also take away
> *at least* 50% of their freedom to learn something wonderful.

I'd agree, if I was convinced the ideas I proposed actually cause any
such loss of freedom.

> Remember, good judgement comes from experience, and experience comes
> from poor judgement.

And the tools we use, just like the tool users, should evolve to
include this experience.

> I'd rather see people *learn* to avoid mistakes by making them, rather
> than have them "protected" from mistakes and never learning about the
> law of gravity.
>
> BTW, I don't care what languages people use once they've learned what's
> what. They will choose beautiful languates when they are appropriate
> and dangerous languages when *they* are appropriate. And if they
> *haven't* learned this lesson, why should you trust them to write
> code at all?
>
> See, the world doesn't need millions of coders--it needs thousands
> of *great* programmers.
>
> There's a *reason* that we scatter broken glass around in there--but
> some clown is always coming around selling thick rubber shoes, so
> it's hard to keep the bozos out. ;-)

Come now, the broken glass was scattered by those who didn't know any
better, not to provide an entry barrier :-)

> Ever get the feeling that you may be solving a much-needed problem? ;-)

Ever get sick of tiptoeing through the broken glass, moving appreciably
slower and occasionally cutting you feet? Battlescars make for nice war
stories, I suppose, but really, do we need to have so many? :-)

> What would happen if we provided simple programs that would allow
> people to design bridges without an engineering degree? I know that
> I'd start avoiding bridges!

Hmm, hasn't this actually been happening for years? ;-)

> <rant off> ;-)
>
> >>>>>>It is certainly possible to write non-portable code in any sufficiently
> >>>>>>complex system, simply by exploiting some irregular behavior--and there
> >>>>>>will *always* be some irregular behavior to be exploited. ;-)
> >>>>>
> >>>>>
> >>>>>The hardest part of the problem, and one yet unsolved by language
> >>>>>design.
> >>>>
> >>>>Not only unsolved, but unsolvable in any finite system.
> >>>
> >>>
> >>>While you're completely right, this doesn't provide justification for
> >>>not minimising the problem when it's easy to do so. Failure to mitigate
> >>>this as complexity grows will send you flying into a glass ceiling that
> >>>could have been considerably higher.
> >>
> >>We agree about the need for abstraction. Perhaps our disagreement is
> >>just about whether the "floor" of our abstraction should be opaque or
> >>transparent.
> >>
> >>For most people, opaque floors are necessary if tall structures are to
> >>be built--it avoids vertigo when looking down. ;-) But for great
> >>designers, floors should be transparent, so that they can see the
> >>lower-level implications of everything done at a higher level of
> >>abstraction. (It's their discipline that allows them to avoid
> >>abstraction vertigo, and enables them to see the possiblity and
> >>utility of new underlying structures.)
> >
> >
> > No no no! A great designer doesn't require transparent floors either!
> > He's read and comprehended the blueprints, and can tell you about the
> > structure with his eyes shut :-)
>
> But what he needs to see is *not* what his models tell him is happening
> below, but what is *actually* happening--that's where all the learning
> happens! That's how he grows from merely great to mega-great!
>
> If you think you understand the behavior of a complex system, then you
> are woefully misinformed. It is time to meditate on bubbles blowing
> in the wind, or the fractal nature of network traffic...

This is why I advocate tools and languages that allow you observe
complexity. Nobody can understand complex systems in their entirety.

> >> >>I only note that there are successful patterns which are *not* made
> >>
> >>>>of OO concepts, as well. Perhaps some of the most useful ones...
> >>>
> >>>
> >>>Indeed many of the patterns are also applicable to non-OO designs. OO
> >>>at the end of the day is just another level of abstraction above
> >>>structures and functions, allowing convenient paring of related
> >>>structures and functions, resulting in more manageable code. Going back
> >>>to languages that lack it is very 'constraining' :-)
> >>
> >>I have always maintained this pairing, even in assembly language.
> >>
> >>It's nice when a language supports something you know is a good idea,
> >>but you can generally do it even without explicit support.
> >
> >
> > True. It's very hard though when you drop down 2 or 3 levels of
> > abstraction to recreate them well, as you're ultimately implementing
> > poor-mans tools as you do so.
>
> I have not often found it so.
>
> Most often, I find that I am exploiting a pattern of behavior that
> spans multiple levels of abstraction, and often couples code and data
> that are surprisingly distant in conceptual space.
>
> In most cases like this, no one looking at the high level code would
> ever have imagined that there was any such relationship to be exploited.
>
> The world of behavior is much stranger and more wonderful than our
> abstract models suggest--that's what makes them "abstract".
>
> There is enlightenment here, for those who would find it.

The interesting elements of behavior only manifest when it's possible
to observe them, and observations of complex system behaviors are
difficult with current toolsets.

> >>The only truly annoying thing is when a language obstructs you from
> >>doing something that should be done, while making inferior approaches
> >>easier. (Block structure is a nice local concept, but not enough for
> >>a robust global structure.)
> >>
> >>
> >>>I found the hype around OO that flew about over the last 15 years to be
> >>>quite ridiculous. Had it been sold for what it actually is, rate of
> >>>adoption would be far higher.
> >>
> >>A practical problem, though, is that any team that newly adopts an
> >>OO approach for a project *will* throw out their false start sometime
> >>between 6 and 12 months down the road, and restart the project.
> >>
> >>This is a good thing, since it is evidence of real learning, but it
> >>strikes fear into the hearts of schedule-driven management. ;-)
> >>(Note my .sig line. ;-)
> >
> >
> > Indeed. It's a growing pain that must be experienced at some point, if
> > progress is to be made. I'm a pretty big fan of the 'plan to throw one
> > away' school of thought. Often on a smaller scale single man
> > implementation (that I still do a lot of) I'll hack together very
> > quickly a prototype that solves the problem, and then once I truly
> > understand the issues, reimplement.
>
> *Excellent* plan! (Of course, the subtle part is making the prototype
> real enough that all the hard parts manifest themselves.)
>
> A related development philosophy (actually, quite bottom-up) is to
> code the hardest parts of a program first--and be honest about what
> they are!

Indeed, and if time doesn't permit a full prototype, this is the next
path I follow - prototype the difficult components

> > JVM's have been doing this for a number of years, and with each passing
> > year they do it better. The .NET CLR also does it, at least if you use
> > a language that allows it.
>
> Glad to hear it. When I last looked seriously (2000), I was arguing
> that the CLR should try to do this *very* well.

There's a lot of incremental improvement going on in this field.
Building the optimisers is taking a long time, but the ones we have now
are already very impressive, and very hard to beat in any reasonable
time frame 'by hand'. I imagine it won't be too long before they're
impossible to beat for any relatively complex program.

> > Indeed. There seems to be some disagreement about what must be
> > sacrificed to achieve these levels of efficiency though.
>
> Actually, almost *nothing* needs to be sacrificed if the optimization
> is based on run-time truth.
>
> All you have to do is generate a predicate that ensures that the
> preconditions you have measured continue to exist prior to executing
> the optimized code. If that predicate fails, then it's time to fall
> back to the original code and do some more measurement to see if a
> new, perhaps more general, behavioral pattern has emerged.
>
> Compiler technology is based on theorem proving in a fairly weak
> axiom system. You want to strengthen the axioms to allow more
> theorems to be proved (optimizations to be done) statically. But
> it is not necessary to limit yourself to static knowledge, as you
> know. You can replace "proof" with "probable inference" on the
> basis of observed behavior, and then generate code to verify that
> this inference is still valid--or escape to the old "interpretive"
> approach if it is not. This is a much more robust system design
> than one based on static proof rules, and, in fact, it largely
> obsoletes the static approaches.
>
> Of course, separate compilation of applications as *hundreds* of
> separate modules already almost totally invalidates any hope of
> strong axioms holding across all modules. Better to just rely
> on dynamic inference anyway.

This is precisely why I argue so strongly against using C/C++, you lose
any ability to do this kind of optimisation when you have no runtime,
and no runtime typing.

> > It's probably a combination of obstinacy and caution. On todays
> > hardware, you can do some pretty amazing numerics on GPU's rather than
> > CPU's, which have orders of magnitude more computational power than
> > CPU's with regards to numerical calculations. Getting it *right* for
> > numerics is hard, and given the rate of change in the hardware arena
> > thus far, it may prove to be more sensible to have left it until the
> > dust settles a little.
>
> Whoops--cop out. There is essentially universal agreement on IEEE
> FP, with the only holdouts being...LANGUAGE DESIGNERS! It's at least
> a decade past time to get with the program!

A decade sounds about right :-) As I noted, these changes can be made,
and should be. Let's just hope it does eventually happen.

> And don't worry about the GPUs and DSPs, not only are they moving to
> IEEE FP as well, but their computations are practically *never* well
> described in a conventional high level language, despite lots of
> publicity to the contrary.

I think things will be much easier once there's universal adoption.

> Any 2-bit (!) DSP coder can run rings around your favorite high level
> language on any algorithm whose performance matters. This will
> continue to be true as long as the DSP/GPU actually has lots of
> various kinds of parallelism--and for that matter, it also applies
> to the various multimedia extensions (SIMD) in popular desktop
> processors, as well. The only known "solution" to this problem
> is providing libraries of carefully hand-tuned code for applications
> to call--and it changes radically with each generation.

There's actually a great absence of such libraries though.

> > It's important to note one obvious area of parallelism that many fail
> > to consider: garbage collection. In older environments where you're
> > forced to collect garbage yourself, you end up with your collection
> > being executed sequentially with your other code. While it's arguable
> > that this is more efficient in terms of number of cycles than collector
> > based approaches, you lose out on the obvious ability to collect
> > garbage concurrently, which on todays hardware is a big inefficiency.
>
> Actually, in older environments, there *is* no garbage collection!

I mean you must do it manually :-)

> Garbage collection arises as a necessity only when programmers are
> released from the obligation to return resources they are no longer
> using.
>
> Garbage collection generally means traceable data structures, and
> therefore disciplined data structures. However, don't overlook the
> "catch all" discipline for garbage--simply allocating a chunk of
> resource and then reclaiming everything that wasn't "registered
> as persistent" at end-of-job.
>
> In any case, concurrent garbage collection is possible in any
> disciplined environment, at some level of time/space granularity
> for which the discipline applies.
>
> > There's also the issue around algorithmic complexity going up with
> > inline collection (try implementing a fast B+ tree that doesn't leak)
> > so there's another big payoff to a more abstract approach.
>
> Actually, there are a lot of good reasons not to prefer a
> concurrent garbage collector, if it can be avoided.

Other than not having plenty of free concurrent processing power
available, what are they?

Any non-trivial program creates data then needs to dispose of it. What
purpose does it serve for that program to waste sequential running time
doing so?

> >>>>When I first saw Java, and heard it proposed as a "universal"
> >>>>language, I noticed that it had vectors of vectors, rather than
> >>>>multidimensional arrays. I knew immediately that this would be
> >>>>a huge performance issue for numerical programming, since I had
> >>>>the pleasure of working with Burroughs machines that had exactly
> >>>>the same limitation--and the same fundamental inefficiencies.
> >>>
> >>>
> >>>Java supports multidimensional arrays as well. Always has.
> >>>Interestingly, people tend when comparing Java to C/C++ to pick the
> >>>worst way to implement something in Java and the best way in C++.
> >>
> >>Hmmm. This wasn't the case in 1994--or it wasn't the case in the
> >>extant implementations... And this comparison was done by someone
> >>trying to do the best matrix package possible in Java.
> >
> >
> > Perhaps the implementations at the time treated multidimensional arrays
> > as vectors. Of course had explicit pointer arithmetic been allows we'd
> > have been stuck with that implementation.
>
> No, they treated multidimensional objects as vectors of vectors of...
> The workaround was to declare a one-dimensional vector of numbers, say,
> and take responsibility for the multi-dimensional mapping yourself--
> a much less "automatic" approach. And pointers were not necessary,
> since all the arithmetic could be index arithmetic.

The main point for me is that the language allows a portable definition
of multidimensional structures. Whether or not a particular
implementation compiles it one way or another is not really an issue.

Once we rely on techniques that bind us to architectures we lose. It's
easy to avoid them, and we should do so.

> >>>This is essentially why I bother with these discussions. Obviously the
> >>>lack of primitive array types in Java would be a major impediment to
> >>>performance (At least until the VM's improve to the point of being able
> >>>to optimise Collection classes down to discrete implementations when
> >>>possible), but it simply isn't true.
> >>
> >>A major source of inefficiency without dense, multidimensional matrices,
> >>is the inabliity to rely on simple address arithmetic to navigate a
> >>matrix. Introducing levels of indirection is a performance disaster.
> >
> >
> > Of course, it's simply arithmetic on an index to an ultimately
> > sequential structure, and treating it as multidimensional is just an
> > abstraction that can be implemented just as efficiently without
> > explicit pointers.
>
> Of course, this explicit approach makes compiler optimization of the
> element address calculations much harder to optimize. Better to have
> real multidimensional arrays.

Which implementation model is more efficient for a given calculation is
something that must be determined, possibly even at runtime. This is
part of my argument for using portable language definitions.

> >>We're in substantial agreement about this. I've been enough on both
> >>sides of "advocacy" that I tend to resist much of it as overstated--
> >>both positively and negatively.
> >>
> >>The real problems remain, and there are no silver bullets.
> >
> >
> > Absolutely. It is important in the absence of silver bullets to make
> > incremental improvements instead - which is more or less the approach I
> > advocate. Sort of a Darwinian model for software engineering if you
> > like.
>
> Just be careful that the "incremental improvements" don't take the
> place of the fundamental restructuring of computation that is needed
> to get to the parallel future. That would be a case of "fiddling
> while Rome burns".

Absolutely

> >>>Unfortunately this is the case. I'm trying very hard to retain the
> >>>philosophy that this is a bad thing, rather than be thankful that it
> >>>increases my value as a professional as time goes by ;-)
> >>
> >>I hear that! ;-)
> >>
> >>"Some of us were talking at lunch, and we hear that you've actually
> >>seen what 'bits' are..."
> >
> >
> > LOL! Fortunately for the moment at least, the computer scientist has a
> > substantially smaller hierarchy to navigate in his mind than the
> > philospher or scientist who's trying to fathom the fabric of the
> > universe ;-)
>
> Actually, not so.
>
> I invite you to make some lists. You will find that there are more
> "levels of digital reality" than there are (known) levels of "real"
> reality.
>
> I have often opined that this is one reason that so many physicists
> have found computers attractive--because their "level navigating"
> skills and their understanding of invariants and conservation laws
> that apply across levels is so useful.

Of course :-) My point is that the levels of digital reality are all
knowable, where the levels of real reality are still being mapped, and
it's not even known whether or not the hierachy is finite, or indeed
that the models we have of the known levels are accurate.

You're correct that the digital realm has more known levels, but
whether or not the 'end' result is that the digital realm will have
more levels is a subject of great debate.

> >>Here we disagree. No programmer should ever start a project without a
> >>specific performance and resource target. To do so is like asking an
> >>engineer to build a bridge without specifying the span, the traffic, or
> >>the budget!
> >>
> >>Put another way, a programmer who doesn't understand the performance and
> >>resource utilization consequences of his design decisions is *not* a
> >>programmer, but a programmer's assistant.
> >
> >
> > These days, performance and resource consumption are very rarely issues
> > for developers to face, as the majority of projects utilise a small
> > fraction of the available machines.
>
> For little problems, yes--but operating systems are not such
> problems. And neither are any programs on which people wait.
>
> And any "little class" that someone eventually wants to call
> a billion times before lunch has just become important.
>
> In these cases, too, there is a performance model--and for "little"
> or "infrequently used" code, the model suggests that performance is
> not an issue--but it still needs to be examined.
>
> I remember a case where a user wanted to write a very simple, 3-page
> program to find patterns in data of a certain kind. Since he was only
> going to run it once, he didn't concern himself with resources--and
> proceeded to write a quadruply-nested loop over a three dimensional
> array with dimensions of 1000x1000x1000!
>
> Needless to say, his program wouldn't even load. But if it had
> loaded, it would have run for weeks! (Maybe now it would load if
> the machine had 16GB, and then it would run in only days. ;-)

You'll get no argument from me that a fundamental understanding of
computational complexity is important :-)

> > Of course, in specific cases I agree with your approach.
>
> I didn't think it was controversial. ;-)
>
> >>>>>>There is *no* substitute for a low-level understanding of what is
> >>>>>>actually going on in a system. Unfortunately, higher level tools tend
> >>>>>>to further obscure actual system behavior by making it more "esoteric".
> >>>>>
> >>>>>
> >>>>>I agree. I feel it's a problem we'll never really solve, only mitigate.
> >>>>>higher level tools certainly do eliminate the need for a lot of
> >>>>>low-level understanding, but as your example illustrates, it's when the
> >>>>>problems arise that low-level understanding is necessary.
> >>>>>
> >>>>>One has to ask though what an inexperienced developer was doing writing
> >>>>>process scheduler code!
> >>>>
> >>>>He wasn't inexperienced! He had been doing OS code for several years,
> >>>>after a brilliant career in the field. But he fell under the spell
> >>>>of "abstraction for the sake of abstraction", and began designing as
> >>>>if an OS were algebra instead of a powerful, finely tuned, throbbing
> >>>>engine that could be thrown out of balance by inattention to the
> >>>>realities of its execution.
> >>>
> >>>
> >>>This is certainly an area where better tools to perform analysis with
> >>>are needed.
> >>
> >>For sure. And those tools are worthless without the mindset and
> >>training to use them.
> >>
> >>I used to talk about the "4 T's": targets, tools, training, and
> >>tracking. Leave even one of them out and you can forget about getting
> >>a predictably performing system.
> >
> >
> > Agreed.
> >
> >
> >>Of course premature *anything* is a bad idea!
> >>
> >>But the design phase is *exactly* the right place to be concerned
> >>with system performance. If you design an inefficient system, then
> >>local optimizations aren't going to get it anywhere near where a
> >>performant design would have.
> >
> >
> > There's always the frequent issue that a problem domain is not fully
> > understood until you've implemented at least one solution, and realised
> > how little you actually understood the problem.
>
> How true--but we've discussed that, and we seem to be on the same page.
>
> BTW, many problems that have "formal solutions" will be revealed to
> be *very* difficult when performance modelling is done. This level
> of resource modelling reveals where the formal model relies upon
> "infinite" resources. ;-)
>
> >>Note that I'm talking about *high level*, structural optimization in
> >>the design phase, not niggly bit-fiddley stuff.
> >>
> >>Sometimes, the right approach is to make anything that could be
> >>critical to performance have a very clean interface, so that it can
> >>be replaced with a better one when: 1) we know how to do it, 2) there
> >>is time, or 3) it turns out to be a big problem.
> >
> >
> > I tend to believe it's a good idea to give everything a very clean
> > interface. Dynamic OO languages have a clear advantage here, where
> > every interdependancy in a system becomes a functional one, and if you
> > keep your interfaces coarse, then you can achieve the approach you
> > outline iteratively, which is an efficiency win for implementation.
>
> Yes, I generally agree--but there is always the problem of deciding
> what level of granularity for the "modules" is appropriate. Things
> can go seriously wrong here if they are chosen either too big (structure
> deficient) or too small (unaffordable overhead). (Again, a resource
> tradeoff.)
>
> Engineering is *all about* tradeoffs. While they are sometimes painful,
> the *most* painful ones are the ones you find out later were tradeoffs
> that you didn't recognize!

I look forward to the day when we can easily manipulate structures. One
advantage of language techniques that promote fine granularity of
modules, is that they provide a lot of room for coalescing them out at
runtime once the scope is resolvable.

I find it frustrating that many still advocate toolsets that implicity
bind us tradeoffs we needn't have made.

> And a compiler/code generator can do much of it with minimal support
> from the OS/hardware. Of course, it's much easier and more accurate
> with hardware/OS support.

Yes, I'd appreciate more featureful os's from this perspective too.

> It's bloat. ;-)
>
> More seriously, the bloat is undeniable, and is, in large part, the
> reason that running much code is I/O bound--just loading the code!
>
> In the days before we could have a gigabyte of memory, the bloat caused
> lots of virtual memory I/O, which is the worst kind of I/O bound.
>
> I think that the time required to boot a machine is an interesting
> measure of performance. Somehow, we continue to lose ground here
> (and I do realize that much of that is I/O time). Still, disk I/O
> is many times faster than it was a decade ago...

Yet still the speed deficit between I/O and computation grows...

> > I think a certain degree of bloat is unavoidable, hopefully it evolves
> > out of systems over time.
>
> That would be a new evolutionary trend. ;-)

LOL, true. I mean that much of it can be pruned off when it proves
extraneous, much like callouses on a toe or foot. In fact, that seems a
most appropriate analogy when discussing most of the UI frameworks I've
seen :-)

> Organisms usually get simpler only when some extreme stressor hits their
> environment. Hand-cranked machines? Boot times exceeding attention
> spans? ;-)

Well, such things are on their way :-)

> >>The only schema that I can see approaching this level of structural
> >>optimization would be an aggressive genetic algorithm--and all the
> >>computers on the planet may not yet be quite enough to do the job
> >>in a reasonable amount of time. Later... ;-)
> >
> >
> > It may not ever be possible, given the raising complexity of software.
> > If the software we run can challenge the best machines, we'll never
> > have enough horsepower to evolve solutions...
>
> Here's a heuristic to apply...a solution should get more complex only
> if the problem it is solving gets more complex. By this rule, there
> is some other factor besides problem complexity that is driving up
> solution complexity. I think that it is because we have erroneously
> concluded that complexity has no intrinsic cost, so we can increase
> it with impunity.
>
> Perhaps this error is abetted by the hope that new tools will increase
> our ability to cope with any negative consequences of complexity. ;-)
>
> This can't be good for survival...

Indeed not, but using more abstract languages means that the
human-maintained component of a solution reduces in complexity, owing
to reduction in size. While its runtime complexity may be higher than
necessary, the actual solution is simpler.

The old rule that the number of bugs is proportional to the number of
lines of code comes to mind, and this is certainly an area of
complexity we can do without.

> > Indeed. I believe though that minimising these decisions through
> > keeping interfaces highly abstract is the best way to provide space
> > down the track for optimisation. Generally speaking, it isn't that hard
> > to get it right, but getting everyone to agree is oft problematic :-)
>
> Hence the need for objective models, and the expertise to appreciate
> their results.

If only there was time to analyse them. Can I say more, better tools
one more time ? ;)
> >>Actually, they are long past the point of "improving" anything. All
> >>the people that understood those programs or the environments used to
> >>create them are long gone. These are truly "legacy" apps, in the sense
> >>that they are treated just like gears or bolts--taken completely for
> >>granted until something fails.
> >
> >
> > There are always fringe cases, that's for sure.
> >
> >
> >>I agree with your points, Matt. But consider how much of the resistance
> >>you describe may be generated by *overselling* the "new" thing as the
> >>"solution to all our problems". Successfully marketing new approaches
> >>requires subtlety (which I apparently lack ;-).
> >
> >
> > Indeed the overselling of the new is as much to blame as overselling
> > the old. The massive hype around OO is a testament to this.
>
> Well, "one hand washes the other", they say. So the overselling will
> increase to match the perceived resistance to change. Then the "echoes"
> of the overselling constitute the basis for the next wave of resistance
> to the *next* change. Sounds like a setup for "deja vu all over again".

Heh. There's no doubt this pattern will continue for quite some time,
and with less and less developers out there being able to comprehend
the whole stack/heirarchy, the amount of nonsense that propogates will
like climb, unless we find ways to combat it :-(

Matt

From: mdj on

Michael J. Mahon wrote:

> > Indeed this is the goal. Perhaps you should get involved in one of the
> > open source projects working on such fields. There seems to be a lot
> > more 'hardcore' computing going on in the open source community these
> > days than inside corporate walls.
>
> Corporate R&D budgets have been systematically slashed to make
> the next quarter look good. ;-(

There's an unfortunate reality that as corporations grow, more and more
of the resources get allocated to maintaining their current size.

> >>Actually, most of what is being practiced in the realm of hardware
> >>concurrency is quite well understood, and has been around for decades.
> >>What has changed is that transistors have become cheap enough that the
> >>techniques originally developed in limited form for supercomputers
> >>can now be implemented in more general forms on microprocessor chips.
> >>
> >>Buffering, speculation, scoreboarding, data flow, and associative memory
> >>are not new, but their affordability in desktop processors is.
> >
> >
> > Certainly techniques that are not applied in the software realm, which
> > is what I was pointing out.
>
> Software folks are always pushing their own flavor of complexity, so
> they don't take time to learn from what's been going on in the real
> world very much.

Aint that the truth.

> >
> > Hence my lack of faith in languages that bind us too tightly to
> > (hardware) architectural borders.
>
> But the solution to this kind of varability is run-time adaptation,
> not anything linguistic (= static). Language has pretty much done its
> thing by the time it has allowed local variables to be grouped.

That's right, but it's important that we actually *use* languages that
allow this kind of runtime adaptation.

> > Many of them still on the market, being espoused as equivalent or
> > superior to competing solutions :-(
>
> I know, but Darwin will have his way with them... ;-)
>
> The only way to be successful and persistently stupid is if everyone
> else is also persistently stupid--fortunately, an unstable proposition.

It'd be better if Darwin would prune the tree from both ends though
wouldn't it :-)

> >>>>>>Any practical system will consist of all levels of connectivity, with
> >>>>>>sharing at virtually all the different levels of the memory hierarchy.
> >>>>>>And I would add another set of levels, in which there is no "memory
> >>>>>>consistency" model, but message passing is the sharing mechanism.
> >>>>>>This extends the multiprocessing model across networks.
> >>>>>
> >>>>>
> >>>>>A brand new PC these days will get you two cores. There are different
> >>>>>scales of systems. The big ones, for sure, will exhibit all 4 levels,
> >>>>>but there's plenty of work to be done just to handle the opportunities
> >>>>>that exist in a new laptop.
> >>>>
> >>>>Agreed--and the tools for *optimizing* applications across even two
> >>>>processors are almost nonexistent. A suboptimal data partitioning
> >>>>can cause a large factor of performance degradation, and designers
> >>>>have virtually no tools to understand and adjust this.
> >>>
> >>>
> >>>Indeed. There are plenty of cases where even non-parallel application
> >>>performance drops when moving to multiprocessor systems, simply because
> >>>of operating system design. A classic example, which is largely solved
> >>>now is the naive scheduler which shifts a running process to another
> >>>CPU, throwing out the cache on the one it was on in the process, and
> >>>worse, this happens while the process in question was waiting on I/O
> >>>:-)
> >>
> >>And you correctly identify this as an immature OS scheduler design--yet
> >>another example of how even OS designers are blissfully unaware of the
> >>concurrency trends barreling toward them like trucks! (And they've
> >>been quite visibly coming for a decade, so no one can justly claim that
> >>they were surprised!)
> >>
> >>I've been amazed by how unaware OS designers are about the implications
> >>of memory hierarchy and multiprocessing on their algorithms. And those
> >>who figured out how to "cope" with a 4-way system can't understand
> >>that they will need a completely different approach to deal efficiently
> >>with a 32- or 64-way system. (Think about queueing theory, lock
> >>contention, resource pools, and error recovery.)
> >
> >
> > Absolutely, although there are designs out there the deal well with
> > machines of that size (Solaris is one example). PC based operating
> > systems have conventionally been optimised only for the hardware that
> > appears at the time, which considering the rate of improvement in that
> > field, has left a lot of stones unturned.
>
> Of course. HP-UX was managing large numbers of processors with
> excellent performance early in the game.
>
> I was trying to get some very real stones turned before they became
> prevalent.
>
> >>>>>>This was always an inevitable result of higher levels of integration.
> >>>>>>As soon as a significant amount of cache can be shared on the chip,
> >>>>>>it becomes advantageous to adorn it with multiple processors.
> >>>>>
> >>>>>
> >>>>>Just be thankful you aren't the author of the process scheduler of a
> >>>>>current operating system :-)
> >>>>
> >>>>Actually, that's one of the most important areas to do *much* better
> >>>>and more adaptively than is done in current OSs.
> >>>
> >>>
> >>>Yes. The current approach adds effectively a 'timeout' constant to a
> >>>process. The timeout represents the cost of shifting it to the next
> >>>nearest locality, and processes don't move until this time has expired
> >>>without the opportunity to shift them. Once the timeout has expired,
> >>>you look for a gap in the scheduling table on processors of this
> >>>locality, and if one exists, you slot it in. If not, you increase the
> >>>timeout to the cost of moving one level further out, and so on. Each
> >>>time a process gets scheduled to run, you reset the timeout to nearest
> >>>locality.
> >>>
> >>>This approach works very well for avoiding unnecessary shifting of
> >>>single-thread processes on a multitasking system, and things tend to
> >>>settle into relative harmony as the system runs.
> >>>
> >>>Of course, handling scheduling and moving of massively parallel
> >>>processes is another kettle of fish!
> >>
> >>Not if approached with the right abstractions. ;-)
> >
> >
> > And ditching explicit pointer arithmetic in code is one of the
> > abstractions dammit ;-)
>
> Well, I'll grant that operating as if there is only one contiguous
> space holding all data is a problem. Now, treating data as if it were
> "segmented" (which everyone hates) may prove to be quite useful... ;-)

The solution is to treat data such that you're unconcerned by it's
physical organisation, this making it possible (via instrumentation) to
observe the partitioning scheme that's most appropriate for that
particular application. I'm glad you see this as a problem holding us
back, and note that it affects my concerns (portability, security) as
much as it affects yours.

> >>>>The "distributed process allocation" problem in the OS is as fundamental
> >>>>as the "code selection" problem in a compiler, and will have much larger
> >>>>performance impact as the level of *system* concurrency continues to
> >>>>increase.
> >>>>
> >>>>I would even go so far as to say that the current *lack* of good
> >>>>solutions to this problem is a major reason for the limited utility
> >>>>of multicore systems. In a couple of years, we could all have 8-way
> >>>>systems on our desktops, but if the fundamental enablers for parallel
> >>>>apps don't get done, they won't perform much better than single-core
> >>>>systems (as is true today for dual-core/single-core).
> >>>
> >>>
> >>>There's a real need to crawl before we walk here. One big problem that
> >>>needs be solved even on current systems is how to deal with many
> >>>concurrent independant processes all in kernel space at the same time.
> >>>Unless we move to microkernel systems, which I think eventally we will,
> >>>we have to solve the problem of how to effectively schedule multiple
> >>>concurrent threads of operating system code competing for resources as
> >>>well, effectively meaning the problem has to be solved twice. Most
> >>>current OS's have pretty ordinary granularity when it comes to
> >>>intra-kernel concurrency.
> >>
> >>True--and significant changes in the low levels of OSs is inevitable.
> >>But "crawl before walk" implies that we are facing this as a *new*
> >>problem, when, in fact, it has been coming toward us for a generation,
> >>but we have failed to acknowledge it in our designs.
> >>
> >>I hope using our nose for a wall detector won't cause us too much more
> >>pain in this process (though note that lack of application support for
> >>even low levels of parallelism is really slowing the PC market down).
> >
> >
> > Agreed. The crawling should have been done a decade ago, not starting
> > now while oodles of performance going begging. The process we have to
> > go through remains the same regardless of the timeline unfortunately.
>
> Such is the nature of learning. And such is the inefficacy of prophecy.
> ;-)

Only when the congregation is worshiping at the feet of false prophets
;-)

> >>>>>This is similar to the compiler you mentioned a while back that used
> >>>>>runtime instrumentation of it's p-code engine to determine critical
> >>>>>sections then tune them. Although modern VM's apply this technique
> >>>>>today, that's about as far as it's got.
> >>>>
> >>>>It doesn't need to be a non-existent machine--most machines have timer
> >>>>interrupts that permit background profiling and dynamic code changes.
> >>>>All of these tricks can be (and have been) done on actual machine code.
> >>>
> >>>
> >>>Yes. I've often considered playing with this myself with Apple Pascal,
> >>>and profiling the bytecode interpreter as it runs. Then, use the output
> >>>of it to selectively native compile various procedures or functions and
> >>>relink the application. Should be a hoot :-)
> >>
> >>Sounds like great fun!
> >>
> >>Using a source of timer interrupts makes dynamic profiling a breeze.
> >>(Of course, this would also identify the parts of the *interpreter* that
> >>were hot.)
> >
> >
> > I'm awaiting a TimeMaster HO which has a rather nice programmable
> > interrupt controller. Much more flexible than relying on the 60Hz VBL I
> > use at the moment for experimentation.
>
> Actually, 60Hz is plenty fast enough to find almost anything of
> real interest on a slow machine. And it is infrequent enough that
> you can actually execute some code without it being intrusive.

True. Actually I've found 60Hz to be a good slicing interval on the
Apple II, perhaps moreso when the machine accelerated than not. I'm
looking forward though to having a programmable timer source in the
machine - something I've been missing for a long time.

> > It's good fun exploring these ideas on smaller environments that have
> > nice predictable behaviors, but you already know this :-)
>
> ...and I *love* it! Of course, I love it even more when I fail to
> predict a behavior. ;-)

And it's very enlightening just how frequently this occurs, even on
machines that are very humble. There is so much to be learned.

> Yes, I always provided a way for processes to handle interrupts
> directed to them. (Most language designers hated the idea.)

It's hard enough getting OS designers to acknowledge the issue :-(

Most language designers are fundamentally opposed to concurrency
concerns entering the language. This is a folly, and a real impediment
to having modern system provide realtime scheduling :-(

> That is the very simple approach which I believe the encoders themselves
> should do after taking a "census" of the system they are running on.

Yeah - it's simple enough to get a poor mans version through a simple
wrapper script, but you're right encoders need to do this. Bring on
parallelism support in languages says I !

> Quantum computing evaluates all possible computations simultaneously,
> so the "selection" of the answer tends to be done at the end. ;-)
>
> NP-complete problems become solvable because a huge combinatoric
> space of possible solutions is explored "in superposition" in the
> time required to explore a single solution. Then you have to "read
> out" the final state(s) to discover the solution(s).

Parallelism provides a reduction in time required to compute many
combinatorial problems too.

> I'm reminded of "the answer to the ultimate question". At some point,
> you realize that you really want the ultimate question, too. ;-)

Indeed! The answer is arbitrary without the question :-)

> >>What needs to be done is for trained people with an *engineering*
> >>mindset to sit down and squarely face the problem of multi-level
> >>concurrency.
> >>
> >>I expect the solutions will roll out in layers of successive refinement,
> >>but we haven't yet even chosen to directly address the problem.
> >
> >
> > It won't be long now. We can't wait much longer for companies to
> > engineer processors with massively faster sequential processing speeds
> > before realising that they can't :-)
>
> Yes, I think that has dawned on them as they whip their design teams
> harder while watching their stock stagnate and then fall...

I think it has dawned on them the stock price is falling, but beyond
that, the actual measures of such failure tend to evade being
addressed. :-(

> > You're absolutely right, but there's a degree of dependency between the
> > two that needs to be addressed, and part of that is engineering out old
> > notational forms which inhibit the progress of parellel system design.
>
> Again, I would say that very little of the current linguistic goals have
> more than incidental relevance to parallelism. It's a plain case of
> "looking where there's light instead of where they lost it".

Indeed the things I'd like to change are fairly little things too, yet
the degree of defiance that ones faces when suggesting it is staggering
:-)

> If an alien intelligence is watching, a little box on page 11,325 of
> their weekly report must be devoted to a betting pool on how long it
> will take us to figure out that we were working on the wrong problem.
> (Just in *this* area--there are lots of boxes on other pages! ;-)

And the one who wins the pool probably made his guess based on looking
at their own history ;-)

From: mdj on

Michael J. Mahon wrote:

> > It's probably a combination of obstinacy and caution. On todays
> > hardware, you can do some pretty amazing numerics on GPU's rather than
> > CPU's, which have orders of magnitude more computational power than
> > CPU's with regards to numerical calculations. Getting it *right* for
> > numerics is hard, and given the rate of change in the hardware arena
> > thus far, it may prove to be more sensible to have left it until the
> > dust settles a little.
>
> Whoops--cop out. There is essentially universal agreement on IEEE
> FP, with the only holdouts being...LANGUAGE DESIGNERS! It's at least
> a decade past time to get with the program!

Here's the changes that actually were made to the Language
Specification for version 1.2.

http://java.sun.com/docs/books/jls/strictfp-changes.pdf

In essence, these changes relax the rules on floating point arithmetic
for intermediate calculations, meaning you can take advantage of the
peculiarities of particular fp engine implementations (the Intel 80 bit
implementation being the primary example cited) at the cost of
non-reproducability of results across Java implementations, which
nobody seems to care about.

There's a strictfp keyword that exists to enforce the original
reproducable semantics for cases where it's required, which it would
seem, is almost never, I've not seen any code actually use it.

This edition of the platform was released in 1999, so it seems the
major issue went away with those changes.

What remains to be done is an extension to handle richer numerics, like
complex numbers.

Matt

First  |  Prev  |  Next  |  Last
Pages: 43 44 45 46 47 48 49 50 51 52 53 54 55
Prev: what happened CBM=VGA
Next: 1581 Drive Kits on eBay