From: Anton Ertl on 21 Jun 2010 08:54 "Ken Hagan" <K.Hagan(a)thermoteknix.com> writes: >On Fri, 18 Jun 2010 20:28:37 +0100, R. Matthew Emerson <rme(a)clozure.com> >wrote: > >> I gave a lightning talk about this at the recent International Lisp >> Conference. The slides and one-pager from the proceedings are at >> http://www.clozure.com/~rme/. > >Presumably the orthodox reply is that the micro-architecture is so >divorced from the ISA that you'd get similar performance from emulating a >RISCy load-store architecture. Use ESI to point to some "lisp registers", >EDI to point to the "lisp stack" and the rest for fairly conventional >purposes. The "registers" will be more or less resident in the L1 cache, >along with recent parts of the "stack", and the L1 latency is so much >lower than the memory wall that no-one notices. That's a theory that is often presented. Unfortunately, it is falsified in practice. Keeping frequently-used data in memory (even in L1 cache) instead of in registers has a big cost on the IA-32 and AMD64 implementations I have measured; In particular, look at Figure 3 of http://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl.pdf The speedup (factor about 1.5) from 0.6.2 to 0.7.0ssc on the IA-32 machines (Xeon 5450, Opteron270, Pentium 4 Northwood, Athlon MP) is due to allocating some of the virtual-machine registers in real-machine registers instead of in memory. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: MitchAlsup on 21 Jun 2010 12:09 On Jun 21, 4:20 am, "Ken Hagan" <K.Ha...(a)thermoteknix.com> wrote: > Mitch/nedbrek: Is this a fair summary of your position? After giving this way to little thought: My general inclination would be to put the LISP registers in static memory (maybe .bss) and address them with abosolute memory reference instructions. Then have the "compiler" figure out which ones were in what registers and avoid reloading those already present. One of those LISP registers would be pointing ot the LISP stack and since it is used basically everywhere, it would end up residing the the machine registers most of the time anyways. The data cache would keep the accesses from becomming expensive, and this way you basically get an infinite amount of useable registers for whatever pupposes the LISP environment needs. In the case this is a multithreaded LISP environment, some of these register will remain in static store (the global ones) and others will be migrated to a heap data structure with the active thread carrying around a pointer to this heap 'register file' in a machine register. But there are many sublties involved, and some history from the people that created the original port that may completely outweigh my preconceived notions. The one thing I did take away from the presentation mentioned above, is that there were way too many constants in registers in the original incarnation. This is one of those things that happens when lots of registers are available (essentially) for free. For machines like IA32, one should essentially migrate the registerized data into the data cache, and then decorate the 6 remaining registers with data actively being manipulated, preallocating very few for LISP environmental data. Mitch
From: Terje Mathisen "terje.mathisen at on 22 Jun 2010 03:53 MitchAlsup wrote: > The one thing I did take away from the presentation mentioned above, > is that there were way too many constants in registers in the original > incarnation. This is one of those things that happens when lots of > registers are available (essentially) for free. For machines like > IA32, one should essentially migrate the registerized data into the > data cache, and then decorate the 6 remaining registers with data > actively being manipulated, preallocating very few for LISP > environmental data. I agree, the key issues seemed to be large number of immediate values (easily loaded from L1 when needed, particularly in the load-op instruction format) and all the various pointers to LISP-specific data areas. In a multi-threaded implementation you need to communicate via memory variables anyway, in which case the most important consideration is to make sure all shared data structures that must be updated by multiple threads are located in separate cache lines. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Andrew Reilly on 22 Jun 2010 04:33 On Sun, 20 Jun 2010 10:39:15 +0100, nmm1 wrote: > So does Fortran and most comparable languages. On > a better implementation, you would get a run-time error, of course. Are x86'en capable of faulting on integer overflow? I suppose that they must be, given that some languages mandate that behaviour. Since there's only one add instruction that handles signed and unsigned, I wonder how that works? > Eh? That code's erroneous - wrong, buggy, defective. Historical artifact. It's been in use for many years. Just broke with the new/changed compiler. > You shouldn't > write code that overflows in C or Fortran. End of story. Sorry, but > that is the situation, both de jure and de facto. Unfortunately, that's not an answer I can make any use of. Signal processing with fixed point arithmetic generally requires maximising SNR (left-aligning values to the greatest extent possible) in algorithms that often have unlikely extreme-range conditions. Generally, it is preferable to clip than wrap-around, which is why all processors designed for the purpose have saturating signed addition modes (or in-register headroom and saturating store modes). While that makes for neat per- processor optimised code, it's nice to have a vanilla-C fallback that does the right thing, as a reference. That is now considerably uglier than it needs to be/used to be. Unfortunately-II: C is essentially the only language that one can count on being available, in some form. I don't fancy joining the increasingly- long line of folk who have created their own "fixed" C, but it seems that that's not entirely outside the realms of necessity, at some stage. Cheers, -- Andrew
From: nmm1 on 22 Jun 2010 04:55
In article <88baqtFcceU1(a)mid.individual.net>, Andrew Reilly <areilly---(a)bigpond.net.au> wrote: >On Sun, 20 Jun 2010 10:39:15 +0100, nmm1 wrote: >> So does Fortran and most comparable languages. On >> a better implementation, you would get a run-time error, of course. > >Are x86'en capable of faulting on integer overflow? I suppose that they >must be, given that some languages mandate that behaviour. Since there's >only one add instruction that handles signed and unsigned, I wonder how >that works? Yes. It used to be easier, but has been made harder. If all else fails, the compiler can generate code that does the check explicitly. Been there - done that. >> Eh? That code's erroneous - wrong, buggy, defective. > >Historical artifact. It's been in use for many years. Just broke with >the new/changed compiler. It's been broken for years, but the bug has only just exposed itself. A very common effect. >> You shouldn't >> write code that overflows in C or Fortran. End of story. Sorry, but >> that is the situation, both de jure and de facto. > >Unfortunately, that's not an answer I can make any use of. Actually, you could, but it's tedious and inefficient. Implementing saturating arithmetic using only correct C or Fortran isn't hard, just painful. And your compiler may not be up to optimising it well enough. >Unfortunately-II: C is essentially the only language that one can count >on being available, in some form. I don't fancy joining the increasingly- >long line of folk who have created their own "fixed" C, but it seems that >that's not entirely outside the realms of necessity, at some stage. Right. That is the excuse for the WG14 wild-eyes who want to perpetrate TR 18037. While there is nothing wrong with adding saturating arithmetic, fixed point, or both - in theory - C99's arithmetic model is already ghastly almost beyond belief, and that complicates it by a MASSIVE factor. And that's ignoring the other strand to that TR. Regards, Nick Maclaren. |