|
Prev: Parallel execution of carry dependent instructions ?
Next: Skybuck presents FastGetBit ( Value, BitPosition ) (latency 3)
From: Skybuck Flying on 6 May 2008 08:41 Well, For my video compression purposes... writing 26 to 31 bits might be enough. So writing 32 bits maybe not necessary. So this could circumvent the shr 32 problem all together. Hehe. Nice. Though it's always handy to have a complete/full version available. Still though, your version Terja is not even partial... since even then it fails horribly hihihihi lol. Bye, Skybuck.
From: Terje Mathisen on 6 May 2008 11:19 Skybuck Flying wrote: > Still though, your version Terja is not even partial... since even then it My name is Terje, not Terja > fails horribly hihihihi lol. I guess I could compile & verify the throw-away code I wrote for you, but I really don't care enough to do that. However, you are indeed right that my initial version had some problems, i.e. the mask generation/usage was buggy. void WriteLongwordBits( unsigned long Value, unsigned int BitCount, unsigned long int *DestAddress, unsigned int DestBitIndex ) { /* Normalize bit index */ DestAddress += DestBitIndex >> 5; DestBitIndex &= 31; /* Index of last inserted bit: */ unsigned int lastbit = DestBitIndex + BitCount - 1; /* Generate a mask of 0 bits below the first inserted bit */ unsigned int bottom_mask = ((unsigned int) -1) << DestBitIndex; /* There must be 0 to 31 unmodified bits after the last inserted: */ unsigned int top_mask = ((unsigned int) -1) >> (31-(lastbit&31)); /* One or two affected words? */ if (lastbit >= 32) { DestAddress[0] = (DestAddress[0] & ~bottom_mask) | (Value << DestBitIndex); /* Since we modify two words, DestBitIndex MUST be > 0! */ DestAddress[1] = (DestAddress[1] & ~top_mask) | (Value >> (32-DestBitIndex)); } else { /* It all fits in a single word! */ DestAddress[0] = (DestAddress[0] & ~(bottom_mask & top_mask)) | (Value << DestBitIndex); } } Do you like this version better? If will have a runtime of about 10-12 cycles when correctly predicted to not straddle a word boundary, while a predicted cross-over adds another 2-3 cycles. Mispredicts take a long time of course. :-( It is actually quite hard to generate branchless 32-bit code without using 64-bit temporary values, which then require function calls to handle the shift operations, and the SHRD/SHLD opcode is quite slow as well. Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on 7 May 2008 03:35 Skybuck Flying wrote: > Your new version is better, but still has a serious shortcoming: > > the DestBitIndex should be able to have any range. > > Your method assumes parameter DestBitIndex has range 0 to 31, this is too > restricted. Huh? The very first operation I do is to mask/update the base pointer and the bit index, specifically to handle index values above 31 > Your method writes garbage bits from the value as well... > > Suppose the value is: > > LSB ------------------------------> MSB > | | > 1111 1111 1111 0000 0000 0000 0000 0000 > > Suppose the bit count is: 3 This is simply bogus, you should never try to write a value which would not fit inside the target field, or if you have to do so, then you should have a separate function which masks the input value before calling the base/fast function. BTW, you even translated my normalization code to Pascal, and still didn't read it. You're amazing! > // Normalize bit index > longword(DestAddress) := longword(DestAddress) + (DestBitIndex shr 5); > DestBitIndex := DestBitIndex and 31; Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Skybuck Flying on 7 May 2008 07:31 The specification for parameter DestAddress is an untyped pointer. Your code does not follow this specification. Change your code to: void WriteLongwordBits( unsigned long Value, unsigned int BitCount, void *DestAddress, unsigned int DestBitIndex ) Then try again. Bye, Skybuck.
From: Skybuck Flying on 7 May 2008 08:34
"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:DqKdnRwYdexXxrzVnZ2dnUVZ8qSnnZ2d(a)giganews.com... > Skybuck Flying wrote: >> Your new version is better, but still has a serious shortcoming: >> >> the DestBitIndex should be able to have any range. >> >> Your method assumes parameter DestBitIndex has range 0 to 31, this is too >> restricted. > > Huh? > > The very first operation I do is to mask/update the base pointer and the > bit index, specifically to handle index values above 31 1. Your code is using compiler magic. 2. Your code is dangerous, it is assuming a certain operator precedence. 3. Your code is not following specifications, your code is instead using a typed pointer. If you change your code to follow specifications, the problems and danger will become apperent to you. Why did you change the specification ? Is your weakness showing ? Is the C language weakness showing ? Are you unable to work with untyped pointers ? (void pointers) >> Your method writes garbage bits from the value as well... >> >> Suppose the value is: >> >> LSB ------------------------------> MSB >> | | >> 1111 1111 1111 0000 0000 0000 0000 0000 >> >> Suppose the bit count is: 3 > > > This is simply bogus, you should never try to write a value which would > not fit inside the target field, or if you have to do so, then you should > have a separate function which masks the input value before calling the > base/fast function. Your routine was supposed to do that. Read the specification ! What do you think the BitCount parameter is for ? The whole concept of this specification is to take away difficulties and make it easy to simply write/copy some bits from somewhere to somewhere at some bit destination. So far you have consistently failed to meet expectations of the specification :) I shall cut you some slack, and fix your bugged Delphi version by using Delphi's Compiler Magic as well: However I will not fix your horrible C code, it would be nice to see if you managed to fix it yourself for a change ;) :) Your method falls under classification A2B1. Given your track record I would not be surprised if there are still some bugs in it somewhere :) For now I will assume it is correct and it may enter the contest for now... procedure WriteLongwordBitsTerje2Fixed ( Value : longword; BitCount : longword; DestAddress : pointer; DestBitIndex : longword ); var vLastBit : longword; vBottomMask : longword; vTopMask : longword; begin // Normalize bit index Inc( Plongword(DestAddress), DestBitIndex shr 5 ); // compiler magic at work, I don't like it but ok. DestBitIndex := DestBitIndex and 31; // Index of last inserted bit: vLastBit := DestBitIndex + BitCount - 1; // Generate a mask of 0 bits below the first inserted bit */ vBottomMask := ( longword(-1) ) shl DestBitIndex; // There must be 0 to 31 unmodified bits after the last inserted: */ vTopMask := ( longword(-1) ) shr (31-(vLastBit and 31)); // One or two affected words? */ if (vLastBit >= 32) then begin Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vBottomMask) or (Value shl DestBitIndex); Longword(DestAddress) := Longword(DestAddress) + 4; // Since we modify two words, DestBitIndex MUST be > 0! */ Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vTopMask) or (Value shr (32-DestBitIndex)); end else begin // It all fits in a single word ! Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not (vBottomMask and vTopMask)) or (Value shl DestBitIndex); // topmask incorrect. end; end; Bye, Skybuck. |