|
Prev: Parallel execution of carry dependent instructions ?
Next: Skybuck presents FastGetBit ( Value, BitPosition ) (latency 3)
From: Terje Mathisen on 6 May 2008 11:19 Skybuck Flying wrote: > Still though, your version Terja is not even partial... since even then it My name is Terje, not Terja > fails horribly hihihihi lol. I guess I could compile & verify the throw-away code I wrote for you, but I really don't care enough to do that. However, you are indeed right that my initial version had some problems, i.e. the mask generation/usage was buggy. void WriteLongwordBits( unsigned long Value, unsigned int BitCount, unsigned long int *DestAddress, unsigned int DestBitIndex ) { /* Normalize bit index */ DestAddress += DestBitIndex >> 5; DestBitIndex &= 31; /* Index of last inserted bit: */ unsigned int lastbit = DestBitIndex + BitCount - 1; /* Generate a mask of 0 bits below the first inserted bit */ unsigned int bottom_mask = ((unsigned int) -1) << DestBitIndex; /* There must be 0 to 31 unmodified bits after the last inserted: */ unsigned int top_mask = ((unsigned int) -1) >> (31-(lastbit&31)); /* One or two affected words? */ if (lastbit >= 32) { DestAddress[0] = (DestAddress[0] & ~bottom_mask) | (Value << DestBitIndex); /* Since we modify two words, DestBitIndex MUST be > 0! */ DestAddress[1] = (DestAddress[1] & ~top_mask) | (Value >> (32-DestBitIndex)); } else { /* It all fits in a single word! */ DestAddress[0] = (DestAddress[0] & ~(bottom_mask & top_mask)) | (Value << DestBitIndex); } } Do you like this version better? If will have a runtime of about 10-12 cycles when correctly predicted to not straddle a word boundary, while a predicted cross-over adds another 2-3 cycles. Mispredicts take a long time of course. :-( It is actually quite hard to generate branchless 32-bit code without using 64-bit temporary values, which then require function calls to handle the shift operations, and the SHRD/SHLD opcode is quite slow as well. Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on 7 May 2008 03:35 Skybuck Flying wrote: > Your new version is better, but still has a serious shortcoming: > > the DestBitIndex should be able to have any range. > > Your method assumes parameter DestBitIndex has range 0 to 31, this is too > restricted. Huh? The very first operation I do is to mask/update the base pointer and the bit index, specifically to handle index values above 31 > Your method writes garbage bits from the value as well... > > Suppose the value is: > > LSB ------------------------------> MSB > | | > 1111 1111 1111 0000 0000 0000 0000 0000 > > Suppose the bit count is: 3 This is simply bogus, you should never try to write a value which would not fit inside the target field, or if you have to do so, then you should have a separate function which masks the input value before calling the base/fast function. BTW, you even translated my normalization code to Pascal, and still didn't read it. You're amazing! > // Normalize bit index > longword(DestAddress) := longword(DestAddress) + (DestBitIndex shr 5); > DestBitIndex := DestBitIndex and 31; Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Skybuck Flying on 7 May 2008 07:31 The specification for parameter DestAddress is an untyped pointer. Your code does not follow this specification. Change your code to: void WriteLongwordBits( unsigned long Value, unsigned int BitCount, void *DestAddress, unsigned int DestBitIndex ) Then try again. Bye, Skybuck.
From: Skybuck Flying on 7 May 2008 08:34
"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:DqKdnRwYdexXxrzVnZ2dnUVZ8qSnnZ2d(a)giganews.com... > Skybuck Flying wrote: >> Your new version is better, but still has a serious shortcoming: >> >> the DestBitIndex should be able to have any range. >> >> Your method assumes parameter DestBitIndex has range 0 to 31, this is too >> restricted. > > Huh? > > The very first operation I do is to mask/update the base pointer and the > bit index, specifically to handle index values above 31 1. Your code is using compiler magic. 2. Your code is dangerous, it is assuming a certain operator precedence. 3. Your code is not following specifications, your code is instead using a typed pointer. If you change your code to follow specifications, the problems and danger will become apperent to you. Why did you change the specification ? Is your weakness showing ? Is the C language weakness showing ? Are you unable to work with untyped pointers ? (void pointers) >> Your method writes garbage bits from the value as well... >> >> Suppose the value is: >> >> LSB ------------------------------> MSB >> | | >> 1111 1111 1111 0000 0000 0000 0000 0000 >> >> Suppose the bit count is: 3 > > > This is simply bogus, you should never try to write a value which would > not fit inside the target field, or if you have to do so, then you should > have a separate function which masks the input value before calling the > base/fast function. Your routine was supposed to do that. Read the specification ! What do you think the BitCount parameter is for ? The whole concept of this specification is to take away difficulties and make it easy to simply write/copy some bits from somewhere to somewhere at some bit destination. So far you have consistently failed to meet expectations of the specification :) I shall cut you some slack, and fix your bugged Delphi version by using Delphi's Compiler Magic as well: However I will not fix your horrible C code, it would be nice to see if you managed to fix it yourself for a change ;) :) Your method falls under classification A2B1. Given your track record I would not be surprised if there are still some bugs in it somewhere :) For now I will assume it is correct and it may enter the contest for now... procedure WriteLongwordBitsTerje2Fixed ( Value : longword; BitCount : longword; DestAddress : pointer; DestBitIndex : longword ); var vLastBit : longword; vBottomMask : longword; vTopMask : longword; begin // Normalize bit index Inc( Plongword(DestAddress), DestBitIndex shr 5 ); // compiler magic at work, I don't like it but ok. DestBitIndex := DestBitIndex and 31; // Index of last inserted bit: vLastBit := DestBitIndex + BitCount - 1; // Generate a mask of 0 bits below the first inserted bit */ vBottomMask := ( longword(-1) ) shl DestBitIndex; // There must be 0 to 31 unmodified bits after the last inserted: */ vTopMask := ( longword(-1) ) shr (31-(vLastBit and 31)); // One or two affected words? */ if (vLastBit >= 32) then begin Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vBottomMask) or (Value shl DestBitIndex); Longword(DestAddress) := Longword(DestAddress) + 4; // Since we modify two words, DestBitIndex MUST be > 0! */ Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vTopMask) or (Value shr (32-DestBitIndex)); end else begin // It all fits in a single word ! Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not (vBottomMask and vTopMask)) or (Value shl DestBitIndex); // topmask incorrect. end; end; Bye, Skybuck. |