|
Prev: Ultra simple computing
Next: New information on POWER6
From: John Mashey on 6 Oct 2006 19:32 girish wrote: > this is a basic doubt. > > on a processor that is > .a superscalar > .b with o-o-o execution > .c multiple issue > mechanism, what happens when the speculated path contains either (sure, > even combination of) - > .1 an instruction that can cause a software trap - like syscall in > MIPS > .2 a break instruction - again say something like break instruction in > MIPS > .3 branch likely instruction - hinting either ways > .4 pre-fetch instruction stream > .5 enforce in-order execution - eio(?) in PPC instruction set > obviously the next in this list of logic challenge & of my particular > interest > .6 fork/yield instruction(s) > > my target design has one multi-threaded processor core. i am still > studying the specs though. i understand, the documentation does not > have to cover such aspects! Different CPUs may do this in different ways, but: A common scheme uses: a) In-order fetch, along speculated predicted branch order b) Out-of-order execution c) In-order retirement/graduation, with precise exceptions items like .3 and .4 are simply part of a), .3 because it's just part of speculated conditional branch mechanism, and .4 as instruction prefetches can be done by the CPU whenever it makes sense. Instructions that cause exceptions typically get handled like any other instructions, but their result gets marked as "exception", and the exception doesn't actually happen until the instruction is retired (in-order), in which case in-flight instructions get cancelled. For something like a MIPS SYSCALL, I suppose you could treat it as a special-cased unconditional branch, and speculate into the OS, if you're willing to add a bunch of complexity [handling of special registers, i.e., in MIPS COP0] ... but the simplest thing to do is: a SYSCALL or BREAK doesn't actually do anything until it graduates, at which point the full exception-handling is performed. In general, one simply cannot speculate into *anything* whose actual execution has visible, unrecoverable, side-effects, or might have: those get sequentialized into being done at graduation.
From: Nick Maclaren on 7 Oct 2006 05:55 In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>, "John Mashey" <old_systems_guy(a)yahoo.com> writes: |> |> In general, one simply cannot speculate into *anything* whose actual |> execution has visible, unrecoverable, side-effects, or might have: |> those get sequentialized into being done at graduation. Indeed, and that is just one example of a very general problem that affects both speculation and aggressive out-of-order scheduling, especially when software completion of hardware operations is involved. A common one occurs in interrupt handling on many (perhaps most) systems, where it is critical to avoid doing anything in an interrupt handler that might affect anything that is currently 'in flight' (and conversely, of course). Many first-level handlers have had a stream of patches to fix up code that relied too much on the hardware doing all of the serialisation, and where the hardware did NOT serialise by default on all interrupts. Take one common example, TLB miss handling, and consider the nasty case of interruptible instructions that operate on arbitrary length, unaligned operands. It is really rather important that such an instruction either can rely on the TLB not shifting under its feet or is designed to handle that case. And remember to allow for machine checks at inconvenient times, including ECCs on the page tables! As John Mashey knows much better than me, that was one of the best arguments in the RISC versus CISC debate - if you CAN avoid opening that can of worms, then why not do so? Regards, Nick Maclaren.
From: girish on 7 Oct 2006 19:31 Thanks. On 10/7/06 6:55 PM, in article eg7thp$1o0$1(a)gemini.csx.cam.ac.uk, "Nick Maclaren" <nmm1(a)cus.cam.ac.uk> wrote: > > In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>, > "John Mashey" <old_systems_guy(a)yahoo.com> writes: > |> > |> In general, one simply cannot speculate into *anything* whose actual > |> execution has visible, unrecoverable, side-effects, or might have: > |> those get sequentialized into being done at graduation. > > Indeed, and that is just one example of a very general problem that > affects both speculation and aggressive out-of-order scheduling, > especially when software completion of hardware operations is > involved. > > A common one occurs in interrupt handling on many (perhaps most) > systems, where it is critical to avoid doing anything in an interrupt > handler that might affect anything that is currently 'in flight' (and > conversely, of course). Many first-level handlers have had a stream > of patches to fix up code that relied too much on the hardware doing > all of the serialisation, and where the hardware did NOT serialise by > default on all interrupts. With 2nd release of architecture specs, MIPS (I only know that), has removed lot of software overheads by finely vectoring the exceptions. The idea I studied during college days while writing 8085 programs for 8259 interrupt controllers. It has very simple but convenient hardware-does-it-all implementation. Little bit off the topic question - whether such scenarios could be avoided by putting a completely out-of-box way implementation - divide them physically. I mean - different processing elements assigned to different O/S operational modes. Whether some such approach was being considered while you guys were actually doing the things? Or is it completely weird and synchronization problems laden approach? > Take one common example, TLB miss handling, and consider the nasty > case of interruptible instructions that operate on arbitrary length, > unaligned operands. It is really rather important that such an > instruction either can rely on the TLB not shifting under its feet > or is designed to handle that case. And remember to allow for machine > checks at inconvenient times, including ECCs on the page tables! I really don't understand this. I am a RISC enthusiast. > As John Mashey knows much better than me, that was one of the best > arguments in the RISC versus CISC debate - if you CAN avoid opening > that can of worms, then why not do so? > > > Regards, > Nick Maclaren. -- First of all - I am an Engineer. I care less for Copyrights/Patents, at least I have none of my own! I love software development & it pays me to run my family. I try to dedicate some time thinking about Open Source movement & sometime contributing to it actually. I often get paid by claiming knowledge in software developed by Open Source community. Lots of things I know today & still learning are due to Open Source community.
From: girish on 7 Oct 2006 20:20 Thanks. On 10/7/06 8:32 AM, in article 1160177551.097350.174430(a)m7g2000cwm.googlegroups.com, "John Mashey" <old_systems_guy(a)yahoo.com> wrote: > > girish wrote: >> this is a basic doubt. >> >> on a processor that is >> .a superscalar >> .b with o-o-o execution >> .c multiple issue >> mechanism, what happens when the speculated path contains either (sure, >> even combination of) - >> .1 an instruction that can cause a software trap - like syscall in >> MIPS >> .2 a break instruction - again say something like break instruction in >> MIPS >> .3 branch likely instruction - hinting either ways >> .4 pre-fetch instruction stream >> .5 enforce in-order execution - eio(?) in PPC instruction set >> obviously the next in this list of logic challenge & of my particular >> interest >> .6 fork/yield instruction(s) >> >> my target design has one multi-threaded processor core. i am still >> studying the specs though. i understand, the documentation does not >> have to cover such aspects! > > Different CPUs may do this in different ways, but: > > A common scheme uses: > > a) In-order fetch, along speculated predicted branch order > b) Out-of-order execution > c) In-order retirement/graduation, with precise exceptions > > items like .3 and .4 are simply part of a), .3 because it's just part > of speculated conditional branch mechanism, and .4 as instruction > prefetches can be done by the CPU whenever it makes sense. I take back my argument on .3. I understood the idea behind pref/pref-indexed instruction, its concerned more with the data and not instructions stream. I should have in fact started with my beginning of confusion state - what happens if the line containing cache-refill instruction gets evicted? It triggered off all the above various scenarios. > Instructions that cause exceptions typically get handled like any other > instructions, but their result gets marked as "exception", and the > exception doesn't actually happen until the instruction is retired > (in-order), in which case in-flight instructions get cancelled. For > something like a MIPS SYSCALL, I suppose you could treat it as a > special-cased unconditional branch, and speculate into the OS, if > you're willing to add a bunch of complexity [handling of special > registers, i.e., in MIPS COP0] ... but the simplest thing to do is: > > a SYSCALL or BREAK doesn't actually do anything until it graduates, at > which point the full exception-handling is performed. > > In general, one simply cannot speculate into *anything* whose actual > execution has visible, unrecoverable, side-effects, or might have: > those get sequentialized into being done at graduation. From a software perspective, a programmer in some cases actually knows when a exception might occur. Take for example a typical case of sys-call from library, the software by design expects system to go through exception mechanism. In that case, would it be advisable to hint the hardware caching mechanisms etc. to pre-fetch on certain instructions/data streams, say exception handlers? Of course at the risk of this thread getting kicked out of context, nevertheless one might achieve relatively good hit. I am looking at an instruction that would allow (system-)software to give such a hint. Is it that the virtually indexed/tagged caches that expects MMU translations would actually restrict such adventures? Oh that reminds me, this again perhaps a direct question to Dr.Mashey as I would like to know it MIPS architecture's perspective - whether it was thought as cute little idea to provide may be small but, chunk of TLB cache in the RAM at a fixed location? So that whenever processor wants to take a refill, software might be of some help by providing next best PTE candidates in that location? In that case may be just add a hint bit in TLB instruction to have hardware look at this vectors and perhaps a config bit telling TLB mechanism whether such a thing exists. (as somebody might be wondering what's all this guy has with MIPS. No I am not revitalizing MIPS architecture or something. I just love it that's all and I want to know every possible thing about it's design.) -- First of all - I am an Engineer. I care less for Copyrights/Patents, at least I have none of my own! I love software development & it pays me to run my family. I try to dedicate some time thinking about Open Source movement & sometime contributing to it actually. I often get paid by claiming knowledge in software developed by Open Source community. Lots of things I know today & still learning are due to Open Source community.
From: girish on 23 Oct 2006 20:33
I think my post is not going through comp-arch(a)gelato.unsw.edu.au. Apologies if this post is repeated. Just a quick question here: I believe in reality it is not possible to get multiple exceptions from multiple instructions sources which might be present in some or other stage of pipeline, is my understanding correct?even if it is possible to report more than one exceptions, out of two one exception must be due to an asynchronous event such as interrupt fro external device. Now I jump to more of theory oriented question. all following observations are based on a IEEE Micro 1993 paper regarding exceptions in RISC pipeline design: If an in-order or program ordered instruction retirement is considered for (precise-)exception processing, then it might mean reserving all the pipeline stages that instruction may demand before dispatching it to the pipeline. It would also mean this method does not make full use of all multiple functional units. I believe after several researches, the exact implementation must be quite advanced than what is described in this paper. this paper suggested following method: .1 in-order completion .2 re-order buffer .3 history file .4 future file fundamentally revolving around these ideas, the actual implementation must be a combination of some or all of these methods. Could you describe what were considerations put while implementing exceptions model in a typical superscalar pipeline design of MIPS? Thanks. On 10/8/06 8:31 AM, in article C14E67F6.AF02%girishvg(a)gmail.com, "girish" <girishvg(a)gmail.com> wrote: > > Thanks. > > On 10/7/06 6:55 PM, in article eg7thp$1o0$1(a)gemini.csx.cam.ac.uk, "Nick > Maclaren" <nmm1(a)cus.cam.ac.uk> wrote: > >> >> In article <1160177551.097350.174430(a)m7g2000cwm.googlegroups.com>, >> "John Mashey" <old_systems_guy(a)yahoo.com> writes: >> |> >> |> In general, one simply cannot speculate into *anything* whose actual >> |> execution has visible, unrecoverable, side-effects, or might have: >> |> those get sequentialized into being done at graduation. >> >> Indeed, and that is just one example of a very general problem that >> affects both speculation and aggressive out-of-order scheduling, >> especially when software completion of hardware operations is >> involved. >> >> A common one occurs in interrupt handling on many (perhaps most) >> systems, where it is critical to avoid doing anything in an interrupt >> handler that might affect anything that is currently 'in flight' (and >> conversely, of course). Many first-level handlers have had a stream >> of patches to fix up code that relied too much on the hardware doing >> all of the serialisation, and where the hardware did NOT serialise by >> default on all interrupts. > > With 2nd release of architecture specs, MIPS (I only know that), has removed > lot of software overheads by finely vectoring the exceptions. The idea I > studied during college days while writing 8085 programs for 8259 interrupt > controllers. It has very simple but convenient hardware-does-it-all > implementation. > > Little bit off the topic question - whether such scenarios could be avoided > by putting a completely out-of-box way implementation - divide them > physically. I mean - different processing elements assigned to different O/S > operational modes. Whether some such approach was being considered while you > guys were actually doing the things? Or is it completely weird and > synchronization problems laden approach? > > >> Take one common example, TLB miss handling, and consider the nasty >> case of interruptible instructions that operate on arbitrary length, >> unaligned operands. It is really rather important that such an >> instruction either can rely on the TLB not shifting under its feet >> or is designed to handle that case. And remember to allow for machine >> checks at inconvenient times, including ECCs on the page tables! > > I really don't understand this. I am a RISC enthusiast. > > >> As John Mashey knows much better than me, that was one of the best >> arguments in the RISC versus CISC debate - if you CAN avoid opening >> that can of worms, then why not do so? >> >> >> Regards, >> Nick Maclaren. |