From: Anne & Lynn Wheeler on

this morning there was a presentation about OpenSolaris with top bullet
item that it recently has gone "ticless" ... related to high amount of
overhead when running in virtual machine even when idle (potentially
with large number of concurrent numbers all "tic'ing").

in the mid-80s ... I noticed the code in unix and commented that I had
replaced almost the identical code that was in cp67 in 1968 (some
conjecture that cp67 might possibly traced back to ctss ... and unix
might also traced design back to ctss ... potentially via multics).

i've periodically mentioned that this was significant contribution to
being able to leave the system up 7x24 ... allowing things like offshift
access, access from home, etc.

the issue was that the mainframes "rented" and had useage meters ...
and paid monthly useage based on the number of hours run in the useage
meters. in the early days ... simple sporadic offshift useage wasn't
enuf to justify the additional rental logged by the useage meters.

the useage meters ran when cpu &/or i/o was active and tended to
log/increment a couple hundred milliseconds ... even if only had a few
hundred instructions "tic'ing" a few times per second (effectively
resulting in the meter running all the time). moving to event based
operation and eliminating the "tic'ing", helped enabling the useage
meter actually stopping doing idle periods.

the other factors (helping enable transition to leaving systems up 7x24)
were

1) "prepare" command for terminal i/o ... allowed (terminal) channel i/o
program to go appear idle (otherwise would have also resulted in useage
meter running) but able to immediately do something when there were
incoming characters

and

2) automatic reboot/restart after failure (contributed to lights out
operation, leaving the system up 2nd & 3rd shift w/o human operator
.... eliminating those costs also).

on 370s, the useage meter would take 400 milliseconds of idle before
coasting to stop. we had some snide remarks about the favorite son
operating system that had a "tic" process that was exactly 400
milliseconds (if the system was active at all, even otherwise completely
idle, it was guaranteed that the useage meter would never stop).

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: ChrisQ on
Bernd Paysan wrote:

>
> Sorry, that's not true. Especially in the "small embedded intelligent
> device" area we are talking about. The scale of integration changed: You
> will produce a SoC for these applications. I.e. the parts are build from
> incredible small devices: GDS polygons, to be precise (people use more
> convenient building blocks, though). Most people who make SoCs embed some
> standard core like an ARM (e.g. Cortex M0) or an 8051 (shudder - takes the
> same area as a Cortex M0, but is horrible!), but that's because they chose
> so, not because it's not feasible to develop your own architecture.

I think you must work in a slightly more rarified atmosphere :-). The
only place i've seen custom with embedded core is in mobile telephony,
though it's probably far more prevalent now, as it moves more into the
mainstream and where the high volume can justify it. Most embedded
devices still use off the shelf micros afaics, including a lot of
consumer electronics. I still use later si labs 8051 variants (shock
horror) for simple tasks and logic replacement. The compilers produce
reliable, if tortuous code. Historically, used a lot of 68k, but am
moving over to arm for the more complex stuff because everyone makes it.
It is a bit idiosyncratic and stuff like the interrupt handling takes
some getting used to. Portable libraries are important here, so limiting
architectures saves time and money. Most of the client work is low to
medium volume where there is a bit more scope for creativity and novel
solutions. Current project is based around Renesas 80C87 series, which
is a fairly neat 16 bit machine. I didn't choose it, but it's a very
conventional architecture and easy to integrate and program.

>
> Architectures that usually don't surface to the user - e.g. when I embed a
> b16 in a device of ours, it's not user-programmable. It's not visible what
> kind of microprocessor there is or if there is any at all.
>

Just as it should be, but simple user interfaces often represent a large
proportion of the overall software effort, just to make them that way...

Regards,

Chris
From: Robert Myers on
On Oct 21, 6:08 am, Bill Todd <billt...(a)metrocast.net> wrote:

>
> Once again that's irrelevant to the question under discussion here:
> whether Terje's statement that Merced "_would_ have been, by far, the
> fastest cpu on the planet" (i.e., in some general sense rather than for
> a small cherry-picked volume of manually-optimized code) stands up under
> any real scrutiny.

I think that Intel seriously expected that the entire universe of
software would be rewritten to suit its ISA.

As crazy as that sounds, it's the only way I can make sense of Intel's
idea that Itanium would replace x86 as a desktop chip.

To add spice to the mix of speculation, I suspect that Microsoft would
have been salivating at the prospect, as it would have been a one-time
opportunity for Microsoft, albeit with a huge expenditure of
resources, to seal the doom of open source.

None of it happened, of course, but I think your objection about hand-
picked subsets of software would not have impressed Intel management.

Robert.

From: Mayan Moudgill on
Andy "Krazy" Glew wrote:
> Brett Davis wrote:
>
>> Cool info though, TRIPS is the first modern data flow architecture I
>> have looked at. Probably the last as well. ;(
>
>
> No, no!
>
> All of the modern OOO machines are dynamic dataflow machines in their
> hearts. Albeit micro-dataflow: they take a sequential stream of
> instructions, convert it into dataflow by register renaming and what
> amounts to memory dependency prediction and verification (even if, in
> the oldest machine, the prediction was "always depends on earlier stores
> whose address is unknown"; now, of course, better predictors are
> available).
>
> I look forward to slowly, incrementally, increasing the scope of the
> dataflow in OOO machines.
> * Probably the next step is to make the window bigger, by multilevel
> techniques.
> * After that, get multiple sequencers from the same single threaded
> program feeding in.
> * After that, or at the same time, reduce the stupid recomputation
> of the dataflow graph that we are constantly redoing.
>
> My vision is of static dataflow nodes being instantiated several times
> as dynamic dataflow.
>
> I suppose that you could call trips static dataflow, compiler managed.
> But why?

I am sure you are familiar with Monsoon & Id, and all the work that went
into serializing the dataflow graph :).

As for recomputing the dataflow graph; several papers/theses called for
explitictly annotating instructions with the dependence distance(s). I
always wondered:
- do you annotate for flow (register-write-read-dependences) only?
- or do you annotate for any (write-write and read-write as well)?
- how do you deal with multi-path (particularily dependence joins)?
- how do you deal with memory (must vs. may)?
- and how do you indicate this information so that the extra bits in the
I$ don't overwhelm any savings in the dynamic logic?

The first two points are important, because the information you need
differs between implementations with and without renaming.

About making windows bigger: my last work on this subject is a bit
dated, but, at that time, for most workloads, you pretty soon hit a
point of exponentially smaller returns. Path mispredicts & cache misses
were a couple of the gating factors, but so were niggling little details
such as store-queue sizes, retire resources & rename buffer sizes. There
is also the nasty issue of cache pollution on mispredicted paths.
From: Andrew Reilly on
On Wed, 21 Oct 2009 13:56:13 -0700, Robert Myers wrote:

> As crazy as that sounds, it's the only way I can make sense of Intel's
> idea that Itanium would replace x86 as a desktop chip.

I don't think that it's as crazy as it sounds (today). At the time
Microsoft had Windows NT running on MIPS and Alpha as well as x86: how
much effort would it be to run all of the other stuff through the
compiler too?

We're a little further along, now, and I think that the MS/NT experience
and Apple's processor-shifting [six different instruction sets in recent
history: 68k, PPC, PPC64, ia32, x86_64 and at least one from ARM, perhaps
three] (and a side order of Linux/BSD/Unix cross-platform support) shows
that the bigest headaches for portability (across processors but within a
single OS) are word-size and endianness, rather than specific processor
instruction sets. We're still having issues with pointer size changes as
we move from ia32 to x86_64, but at least the latter has *good* support
for running the former at the same time (as long as the OS designers
provide some support and ship the necessary 32-bit libraries).

Of course, most of the pointer-size issues stem directly from the use of
C et al, where they show up all sorts of bad assumptions about integer
equivalence and structure size and packing. Most other languages don't
even have ways to express those sorts of concerns, and so aren't as
affected.

> To add spice to the mix of speculation, I suspect that Microsoft would
> have been salivating at the prospect, as it would have been a one-time
> opportunity for Microsoft, albeit with a huge expenditure of
> resources, to seal the doom of open source.

How so? Open source runs fine on the Itanium, in general. (I think that
most of the large SGI Itanium boxes only run Linux, right?)

Cheers,

--
Andrew