From: John Mashey on

girish wrote:
> this is a basic doubt.
>
> on a processor that is
> .a superscalar
> .b with o-o-o execution
> .c multiple issue
> mechanism, what happens when the speculated path contains either (sure,
> even combination of) -
> .1 an instruction that can cause a software trap - like syscall in
> MIPS
> .2 a break instruction - again say something like break instruction in
> MIPS
> .3 branch likely instruction - hinting either ways
> .4 pre-fetch instruction stream
> .5 enforce in-order execution - eio(?) in PPC instruction set
> obviously the next in this list of logic challenge & of my particular
> interest
> .6 fork/yield instruction(s)
>
> my target design has one multi-threaded processor core. i am still
> studying the specs though. i understand, the documentation does not
> have to cover such aspects!

Different CPUs may do this in different ways, but:

A common scheme uses:

a) In-order fetch, along speculated predicted branch order
b) Out-of-order execution
c) In-order retirement/graduation, with precise exceptions

items like .3 and .4 are simply part of a), .3 because it's just part
of speculated conditional branch mechanism, and .4 as instruction
prefetches can be done by the CPU whenever it makes sense.

Instructions that cause exceptions typically get handled like any other
instructions, but their result gets marked as "exception", and the
exception doesn't actually happen until the instruction is retired
(in-order), in which case in-flight instructions get cancelled. For
something like a MIPS SYSCALL, I suppose you could treat it as a
special-cased unconditional branch, and speculate into the OS, if
you're willing to add a bunch of complexity [handling of special
registers, i.e., in MIPS COP0] ... but the simplest thing to do is:

a SYSCALL or BREAK doesn't actually do anything until it graduates, at
which point the full exception-handling is performed.

In general, one simply cannot speculate into *anything* whose actual
execution has visible, unrecoverable, side-effects, or might have:
those get sequentialized into being done at graduation.

From: Nick Maclaren on

In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>,
"John Mashey" <old_systems_guy(a)yahoo.com> writes:
|>
|> In general, one simply cannot speculate into *anything* whose actual
|> execution has visible, unrecoverable, side-effects, or might have:
|> those get sequentialized into being done at graduation.

Indeed, and that is just one example of a very general problem that
affects both speculation and aggressive out-of-order scheduling,
especially when software completion of hardware operations is
involved.

A common one occurs in interrupt handling on many (perhaps most)
systems, where it is critical to avoid doing anything in an interrupt
handler that might affect anything that is currently 'in flight' (and
conversely, of course). Many first-level handlers have had a stream
of patches to fix up code that relied too much on the hardware doing
all of the serialisation, and where the hardware did NOT serialise by
default on all interrupts.

Take one common example, TLB miss handling, and consider the nasty
case of interruptible instructions that operate on arbitrary length,
unaligned operands. It is really rather important that such an
instruction either can rely on the TLB not shifting under its feet
or is designed to handle that case. And remember to allow for machine
checks at inconvenient times, including ECCs on the page tables!

As John Mashey knows much better than me, that was one of the best
arguments in the RISC versus CISC debate - if you CAN avoid opening
that can of worms, then why not do so?


Regards,
Nick Maclaren.
From: girish on

Thanks.

On 10/7/06 6:55 PM, in article eg7thp$1o0$1(a)gemini.csx.cam.ac.uk, "Nick
Maclaren" <nmm1(a)cus.cam.ac.uk> wrote:

>
> In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>,
> "John Mashey" <old_systems_guy(a)yahoo.com> writes:
> |>
> |> In general, one simply cannot speculate into *anything* whose actual
> |> execution has visible, unrecoverable, side-effects, or might have:
> |> those get sequentialized into being done at graduation.
>
> Indeed, and that is just one example of a very general problem that
> affects both speculation and aggressive out-of-order scheduling,
> especially when software completion of hardware operations is
> involved.
>
> A common one occurs in interrupt handling on many (perhaps most)
> systems, where it is critical to avoid doing anything in an interrupt
> handler that might affect anything that is currently 'in flight' (and
> conversely, of course). Many first-level handlers have had a stream
> of patches to fix up code that relied too much on the hardware doing
> all of the serialisation, and where the hardware did NOT serialise by
> default on all interrupts.

With 2nd release of architecture specs, MIPS (I only know that), has removed
lot of software overheads by finely vectoring the exceptions. The idea I
studied during college days while writing 8085 programs for 8259 interrupt
controllers. It has very simple but convenient hardware-does-it-all
implementation.

Little bit off the topic question - whether such scenarios could be avoided
by putting a completely out-of-box way implementation - divide them
physically. I mean - different processing elements assigned to different O/S
operational modes. Whether some such approach was being considered while you
guys were actually doing the things? Or is it completely weird and
synchronization problems laden approach?


> Take one common example, TLB miss handling, and consider the nasty
> case of interruptible instructions that operate on arbitrary length,
> unaligned operands. It is really rather important that such an
> instruction either can rely on the TLB not shifting under its feet
> or is designed to handle that case. And remember to allow for machine
> checks at inconvenient times, including ECCs on the page tables!

I really don't understand this. I am a RISC enthusiast.


> As John Mashey knows much better than me, that was one of the best
> arguments in the RISC versus CISC debate - if you CAN avoid opening
> that can of worms, then why not do so?
>
>
> Regards,
> Nick Maclaren.

--
First of all - I am an Engineer. I care less for Copyrights/Patents, at
least I have none of my own! I love software development & it pays me to run
my family. I try to dedicate some time thinking about Open Source movement &
sometime contributing to it actually. I often get paid by claiming knowledge
in software developed by Open Source community. Lots of things I know today
& still learning are due to Open Source community.




From: girish on


Thanks.

On 10/7/06 8:32 AM, in article
1160177551.097350.174430(a)m7g2000cwm.googlegroups.com, "John Mashey"
<old_systems_guy(a)yahoo.com> wrote:

>
> girish wrote:
>> this is a basic doubt.
>>
>> on a processor that is
>> .a superscalar
>> .b with o-o-o execution
>> .c multiple issue
>> mechanism, what happens when the speculated path contains either (sure,
>> even combination of) -
>> .1 an instruction that can cause a software trap - like syscall in
>> MIPS
>> .2 a break instruction - again say something like break instruction in
>> MIPS
>> .3 branch likely instruction - hinting either ways
>> .4 pre-fetch instruction stream
>> .5 enforce in-order execution - eio(?) in PPC instruction set
>> obviously the next in this list of logic challenge & of my particular
>> interest
>> .6 fork/yield instruction(s)
>>
>> my target design has one multi-threaded processor core. i am still
>> studying the specs though. i understand, the documentation does not
>> have to cover such aspects!
>
> Different CPUs may do this in different ways, but:
>
> A common scheme uses:
>
> a) In-order fetch, along speculated predicted branch order
> b) Out-of-order execution
> c) In-order retirement/graduation, with precise exceptions
>
> items like .3 and .4 are simply part of a), .3 because it's just part
> of speculated conditional branch mechanism, and .4 as instruction
> prefetches can be done by the CPU whenever it makes sense.

I take back my argument on .3. I understood the idea behind
pref/pref-indexed instruction, its concerned more with the data and not
instructions stream. I should have in fact started with my beginning of
confusion state - what happens if the line containing cache-refill
instruction gets evicted? It triggered off all the above various scenarios.


> Instructions that cause exceptions typically get handled like any other
> instructions, but their result gets marked as "exception", and the
> exception doesn't actually happen until the instruction is retired
> (in-order), in which case in-flight instructions get cancelled. For
> something like a MIPS SYSCALL, I suppose you could treat it as a
> special-cased unconditional branch, and speculate into the OS, if
> you're willing to add a bunch of complexity [handling of special
> registers, i.e., in MIPS COP0] ... but the simplest thing to do is:
>
> a SYSCALL or BREAK doesn't actually do anything until it graduates, at
> which point the full exception-handling is performed.
>
> In general, one simply cannot speculate into *anything* whose actual
> execution has visible, unrecoverable, side-effects, or might have:
> those get sequentialized into being done at graduation.

From a software perspective, a programmer in some cases actually knows when
a exception might occur. Take for example a typical case of sys-call from
library, the software by design expects system to go through exception
mechanism. In that case, would it be advisable to hint the hardware caching
mechanisms etc. to pre-fetch on certain instructions/data streams, say
exception handlers? Of course at the risk of this thread getting kicked out
of context, nevertheless one might achieve relatively good hit. I am looking
at an instruction that would allow (system-)software to give such a hint. Is
it that the virtually indexed/tagged caches that expects MMU translations
would actually restrict such adventures?

Oh that reminds me, this again perhaps a direct question to Dr.Mashey as I
would like to know it MIPS architecture's perspective - whether it was
thought as cute little idea to provide may be small but, chunk of TLB cache
in the RAM at a fixed location? So that whenever processor wants to take a
refill, software might be of some help by providing next best PTE candidates
in that location? In that case may be just add a hint bit in TLB instruction
to have hardware look at this vectors and perhaps a config bit telling TLB
mechanism whether such a thing exists.

(as somebody might be wondering what's all this guy has with MIPS. No I am
not revitalizing MIPS architecture or something. I just love it that's all
and I want to know every possible thing about it's design.)


--
First of all - I am an Engineer. I care less for Copyrights/Patents, at
least I have none of my own! I love software development & it pays me to run
my family. I try to dedicate some time thinking about Open Source movement &
sometime contributing to it actually. I often get paid by claiming knowledge
in software developed by Open Source community. Lots of things I know today
& still learning are due to Open Source community.

From: girish on

I think my post is not going through comp-arch(a)gelato.unsw.edu.au.
Apologies if this post is repeated.

Just a quick question here: I believe in reality it is not possible to get
multiple exceptions from multiple instructions sources which might be
present in some or other stage of pipeline, is my understanding correct?even
if it is possible to report more than one exceptions, out of two one
exception must be due to an asynchronous event such as interrupt fro
external device.

Now I jump to more of theory oriented question. all following observations
are based on a IEEE Micro 1993 paper regarding exceptions in RISC pipeline
design:

If an in-order or program ordered instruction retirement is considered for
(precise-)exception processing, then it might mean reserving all the
pipeline stages that instruction may demand before dispatching it to the
pipeline. It would also mean this method does not make full use of all
multiple functional units.

I believe after several researches, the exact implementation must be quite
advanced than what is described in this paper. this paper suggested
following method:
.1 in-order completion
.2 re-order buffer
.3 history file
.4 future file
fundamentally revolving around these ideas, the actual implementation must
be a combination of some or all of these methods.

Could you describe what were considerations put while implementing
exceptions model in a typical superscalar pipeline design of MIPS?

Thanks.

On 10/8/06 8:31 AM, in article C14E67F6.AF02%girishvg(a)gmail.com, "girish"
<girishvg(a)gmail.com> wrote:

>
> Thanks.
>
> On 10/7/06 6:55 PM, in article eg7thp$1o0$1(a)gemini.csx.cam.ac.uk, "Nick
> Maclaren" <nmm1(a)cus.cam.ac.uk> wrote:
>
>>
>> In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>,
>> "John Mashey" <old_systems_guy(a)yahoo.com> writes:
>> |>
>> |> In general, one simply cannot speculate into *anything* whose actual
>> |> execution has visible, unrecoverable, side-effects, or might have:
>> |> those get sequentialized into being done at graduation.
>>
>> Indeed, and that is just one example of a very general problem that
>> affects both speculation and aggressive out-of-order scheduling,
>> especially when software completion of hardware operations is
>> involved.
>>
>> A common one occurs in interrupt handling on many (perhaps most)
>> systems, where it is critical to avoid doing anything in an interrupt
>> handler that might affect anything that is currently 'in flight' (and
>> conversely, of course). Many first-level handlers have had a stream
>> of patches to fix up code that relied too much on the hardware doing
>> all of the serialisation, and where the hardware did NOT serialise by
>> default on all interrupts.
>
> With 2nd release of architecture specs, MIPS (I only know that), has removed
> lot of software overheads by finely vectoring the exceptions. The idea I
> studied during college days while writing 8085 programs for 8259 interrupt
> controllers. It has very simple but convenient hardware-does-it-all
> implementation.
>
> Little bit off the topic question - whether such scenarios could be avoided
> by putting a completely out-of-box way implementation - divide them
> physically. I mean - different processing elements assigned to different O/S
> operational modes. Whether some such approach was being considered while you
> guys were actually doing the things? Or is it completely weird and
> synchronization problems laden approach?
>
>
>> Take one common example, TLB miss handling, and consider the nasty
>> case of interruptible instructions that operate on arbitrary length,
>> unaligned operands. It is really rather important that such an
>> instruction either can rely on the TLB not shifting under its feet
>> or is designed to handle that case. And remember to allow for machine
>> checks at inconvenient times, including ECCs on the page tables!
>
> I really don't understand this. I am a RISC enthusiast.
>
>
>> As John Mashey knows much better than me, that was one of the best
>> arguments in the RISC versus CISC debate - if you CAN avoid opening
>> that can of worms, then why not do so?
>>
>>
>> Regards,
>> Nick Maclaren.

 |  Next  |  Last
Pages: 1 2
Prev: Ultra simple computing
Next: New information on POWER6