From: Eliot Miranda on 18 Sep 2005 00:38 John Mashey wrote: > Eliot Miranda wrote: > >>John Mashey wrote: >>Code sequences for polymorphic add/sub or compare would then look like >> >> fetch operand one >> fetch operand two >> tagged add/sub >> branch no overflow Ldone >> code for non-tagged & overflow cases (method lookup) >> ... >>Ldone: > > > Just out of curiosity, what's known about the dynamic frequencies of > the tweo cases? I can collect accurate figures in a week (off for a quick vacation :) :) ). But my gut is that in ordinary codes the ratio approaches 100% to 0%. tagged/Small/immediate integers cover the vast array of cases (iteration, most arithmetic). Only special codes (e.g. security) will overflow into large integer arithmetic. This is in part confirmation bias because as Terje points out the relative cost of tagged vs boxed arithmetic is at least an order of magnitude worse and so the system is written to avoid the cost as much as possible. But its mostly the fact that in a symbolic language like Smalltalk the highest dynamic frequency for integer math is likely to be for iteration. -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd
From: Terje Mathisen on 18 Sep 2005 02:15 Eliot Miranda wrote: > > > John Mashey wrote: > >> Just out of curiosity, what's known about the dynamic frequencies of >> the tweo cases? > > I can collect accurate figures in a week (off for a quick vacation :) :) Please do, and good for you! (Collecting figures and vacation). > ). But my gut is that in ordinary codes the ratio approaches 100% to > 0%. tagged/Small/immediate integers cover the vast array of cases > (iteration, most arithmetic). Only special codes (e.g. security) will > overflow into large integer arithmetic. This is in part confirmation > bias because as Terje points out the relative cost of tagged vs boxed > arithmetic is at least an order of magnitude worse and so the system is Actually, it might be even worse: Even if most bigint math happens to stay within (say) four words of storage, and even if we disregard the cost of the trap (because the alternative is to always call outline functions directly), then you still have the call/return overhead, the cost of comparing the lengths of the two operands, possibly the cost of allocating the return value, and the actual cost of doing the multi-word addition. The small/immediate case will take just one or two cycles, three at the very maximum, while it would be very easy to have two or even three branch misses, each taking ~10 cycles, inside the bigint function. > written to avoid the cost as much as possible. But its mostly the fact > that in a symbolic language like Smalltalk the highest dynamic frequency > for integer math is likely to be for iteration. Right, it is hard to imagine how it could be otherwise. Terje -- - <Terje.Mathisen (a)hda.hydro.com>"almost all programming can be viewed as an exercise in caching"
From: glen herrmannsfeldt on 18 Sep 2005 08:57 Terje Mathisen wrote: (snip) > The x86 INTO style single-byte trap would be the best possible way to > handle it as long as you also had the fast user-level trap handler you > wished you had been able to fit into MIPS architecture: How much difference does it make at run time, though. In the case of a conditional branch based on the overflow bit, INTO trap instruction, or overflow trap without any added instructions, nothing after that can be committed until the status bit is available. I would assume that a branch on overflow would be predicted as not taken. Other than the cost of the actual instruction bits, which are pretty variable over different architectures, would you expect any difference at runtime for the three cases? -- glen
From: Terje Mathisen on 18 Sep 2005 09:30 glen herrmannsfeldt wrote: > Terje Mathisen wrote: > > (snip) > >> The x86 INTO style single-byte trap would be the best possible way to >> handle it as long as you also had the fast user-level trap handler you >> wished you had been able to fit into MIPS architecture: > > How much difference does it make at run time, though. > > In the case of a conditional branch based on the overflow bit, > INTO trap instruction, or overflow trap without any added instructions, > nothing after that can be committed until the status bit is available. > > I would assume that a branch on overflow would be predicted as not > taken. > > Other than the cost of the actual instruction bits, which are pretty > variable over different architectures, would you expect any difference > at runtime for the three cases? The only important difference is that INTO is effectively a predicted-not-taken call, making it very easy to return to the correct point after fixup. To replace this with an inline branch instruction would require the capability to save the next (i.e. return) address. If your architecture can do this, then by all means use an inline branch-and-link instead of the INTO style check! The only real difference in this case would be that the branch opcode probably takes 4 bytes, vs the single byte of INTO. However, as long as you have fixed 16 or 32-bit instruction sizes anyway, it really doesn't matter. Terje -- - <Terje.Mathisen (a)hda.hydro.com>"almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on 18 Sep 2005 09:41
John Mashey wrote: > Terje Mathisen wrote: >>It doesn't really matter: > > Well, actually it does (in general, perhaps not in this case), which is Oh, absolutely. > why I asked. > Statistical frequencies *always* matter, at least in the sense that > while it may not matter much if something happens 45 or 50% of the > time, it may matter whether it happens 10% or 1% or .1% or .0000001%. Right. > Anyway, philosophically, I suspect that in this case you are right, > BUT: > a) Human intuition is notoriously bad about such things. Which is why one of the very first pieces of asm code I ever wrote was a routine to interrogate the (keyboard) timer chip on a PC, allowing ~ microsecond resolution timings. After 25+ years of performance-optimized programming, I still get regularly surprised when the actual performance of some pice of code doesn't conform to what I initially guessed it would be. :-) The most serious one was a couple of years ago when I wrote asm code for the inner loop of one of the EAS contenders, and didn't get even close to (i.e within a factor of 1.5) the speedup I expected. OTOH, after implementing all of it (together with a couple of other guys), including the outer loop & other housekeeping stuff, the final speedup was 3X the best C code, which was better than my initial estimate. Terje -- - <Terje.Mathisen (a)hda.hydro.com>"almost all programming can be viewed as an exercise in caching" |