From: "Andy "Krazy" Glew" on
On 4/23/2010 3:51 PM, MitchAlsup wrote:
> On Apr 23, 2:06 am, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>
> wrote:
>> While I have worked on, and advocated, handling reg-reg move instructions efficiently, this introduces a whole new level
>> of complexity.
>>
>> Specifically, MOVE elimination, changing
>>
>> lreg2 := MOVE lreg1
>> lreg3 := ADD lreg2 + 1
>
> Ireg2 := MOV Ireg1
> Ireg2 := OP Ireg2,<const or reg or mem>
>
> Was a hardware optimization in K9 easily detecteed during trace
> building.

That is/was a local optimization. K9 R.I.P.

Generic MOVE elimination works when there is an arbitrary separation between the MOVEr and the USEr.

>> lreg2 := MOVE lreg1
....lots of instructions, including branches and calls
>> lreg3 := ADD lreg2 + 1

It also doesn't required a trace cache or a similar structure to hold the optimized code. Although I agree that it
makes sense to hold the optmized code, so that you don;t constantly have to re-optimize.
From: MitchAlsup on
I think there are a number of semi-fundamental issues to be resolved;
architects, microarchitects, and thinkers have already run into many
of the symptoms, but have not recognized the fundamentals. Much like
the period of time after the discovery of the photo-electric effect
and the realization of quantum mechanics. This kind of paradigm shift
will happen. I just don't know whether the shift will happen in the
OS, languages, libraries, communications, or hardware. Probably a
little of each.

The realization that "one can synchronize" a hundred thousand threads
running in a system the size of a basketball court
The realization that "there is exactly one notion of time" in a system
the size of a basketball court operating in the nano-second range
The realization that one cannot* specify "each and every step" in a
K*trillion step process and have the compiler recognize the inherent
parallelism
The realization that one cannot* specify "the large scale data-flow"
and simultaneously have each instruction able to take precise
interrupts

The first two correspond to the Heisenberg uncertanty principle in
physics
The second two correspond to the difference between effects in the
micro-world and effects in the macro-world

Perhaps along with the notion of the "Memory Wall" and the "Power
Wall" we have (or are about to) run into the "Multi-Processing" Wall.
That is, we think we understand the problem of getting applications
and their necessary data and disk structures parallel-enough and
distributed-enough. And we remain are under the impression that we
"espression limited" in applying our techniques to the machines that
have been built; but in reality we are limited by something entirely
more fundamental, and one we do not yet grasp or cannot yet enumerate.

Mitch

{"Other than that Mrs. MultiProcessor, how did you like the Play"?}
From: Brett Davis on
In article
<b24c8bb2-fcc3-4f4a-aa0d-0d18601b02eb(a)11g2000yqr.googlegroups.com>,
MitchAlsup <MitchAlsup(a)aol.com> wrote:

> I think there are a number of semi-fundamental issues to be resolved;
>
> The realization that "one can synchronize" a hundred thousand threads
> running in a system the size of a basketball court
> The realization that "there is exactly one notion of time" in a system
> the size of a basketball court operating in the nano-second range
> The realization that one cannot* specify "each and every step" in a
> K*trillion step process and have the compiler recognize the inherent
> parallelism
> The realization that one cannot* specify "the large scale data-flow"
> and simultaneously have each instruction able to take precise
> interrupts
>
> The first two correspond to the Heisenberg uncertanty principle in
> physics
> The second two correspond to the difference between effects in the
> micro-world and effects in the macro-world
>
> Perhaps along with the notion of the "Memory Wall" and the "Power
> Wall" we have (or are about to) run into the "Multi-Processing" Wall.

ATI chips already have ~2000 processors, simple scaling over the next
decade states that the monitor in your iMac a decade from now will
have 100,000 CPUs. Which means that a desktop server will have a
million CPUs. One for each 10 pixels on your monitor.

A server room with the right software have a higher IQ than
you or I do. ;)

Brett

> That is, we think we understand the problem of getting applications
> and their necessary data and disk structures parallel-enough and
> distributed-enough. And we remain are under the impression that we
> "espression limited" in applying our techniques to the machines that
> have been built; but in reality we are limited by something entirely
> more fundamental, and one we do not yet grasp or cannot yet enumerate.
>
> Mitch
>
> {"Other than that Mrs. MultiProcessor, how did you like the Play"?}
From: nmm1 on
In article <8PydnQJVJtSKXEnWnZ2dnUVZ8r-dnZ2d(a)giganews.com>,
<jgd(a)cix.compulink.co.uk> wrote:
>
>> >Oh, [the value of compatibility] can be challenged, all right.
>> >It's just that the required gains from doing so are steadily
>> >increasing as the sunk costs in the current methods grow.
>>
>> In my experience, that is almost always overstated, and very often
>> used as an excuse to avoid thinking out of the box. In particular,
>> once software runs on two hardware architectures, porting it to a
>> third is usually easy.
>
>Perfectly true, provided that the architectures are as alike as, say,
>x86, MIPS, SPARC and PowerPC are. Which is really quite a lot alike
>
>Porting to something like Cell (using the SPEs), or MPI clustering, or
>something else based on different system-architecture principles is
>another matter.

Oh, yes, indeed. 100% agreement.

However, it is used as an argument to avoid considering (say) the
interrupt-free architecture that I have posted on this newsgroup.
That would be essentially transparent to 99% of applications, and
need mainly reorganisation of the kernel and device drivers (not
even a complete rewrite).

That is a classic example of an idea that was thought of 40 years
ago (probably 50+), but could not have been implemented then,
because the technology was inappropriate. But it will not get
reconsidered because it is heretical to the great god Compatibility.
The fact that it might well deliver a fairly painless factor of
two in performance, RAS and reduction of design costs is irrelevant.

Similarly, reintroducing a capability design (rather than the half
baked hacks that have been put into some Unices) would be far less
painful than is often made out, and could easily deliver a massive
improvement in RAS - if done properly, perhaps a factor of 10 in
the short term, 100 in the medium and thousands in the long.


Regards,
Nick Maclaren.
From: Terje Mathisen "terje.mathisen at on
Quadibloc wrote:
> On Apr 25, 9:08 am, n...(a)cam.ac.uk wrote:
>> though I meet a lot who claim that great god
>> Compatibility rules, and must not be challenged.
>
> Upwards compatibility is my shepherd...
>
> Even though I walk through the valley of upgrades,
> I shall not have to buy all my software over again,
> for You are with me.

Closed-source vendors don't want this, and they have lots of ways to
force you to "upgrade", i.e. keep paying for new licenses.

One way is exemplified by the very small, specialized CAD program for
orienteering and other maps, OCAD:

They used to have a very limited but free version which was sufficient
to do course planning for smaller training events. Even though it was
free, you still had to register it online, and now that they have taken
away the registration robot, I could no longer get it to work after
reinstalling the OS on my main PC. Instead my club had to pay about $700
for the cheapest version of the full drawing program.

I.e. by making an internet connection compulsory, vendors can do
whatever the marketplace will let them get away with.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"