Prev: ===Christian Louboutin - www.vipchristianlouboutin.com
From: Andy Glew "newsgroup at on 6 Aug 2010 08:56
On 8/5/2010 7:20 AM, Skybuck Flying wrote:
> Actually you have to process only
> 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s
> THAT's a piece of cake for modern systems.
> I don't agree with that last sentence.
> Suppose RLE compression is used, tried it myself.
> That means a branch for each color.
No it doesn't.
Just think about it. On comp.arch back a few years agao, this would
have been a newbie question.
Think branchless code. Heck, I could code it up branchlessly in C.
Not sure of branchless is a win over a machine with branch prediction,
where correctly predicted branches are free. But if you say you have
branch mispredictions, try the branchless version.
From: Skybuck Flying on 7 Aug 2010 00:17
"Andy Glew" <"newsgroup at comp-arch.net"> wrote in message
> On 8/5/2010 7:20 AM, Skybuck Flying wrote:
>> Actually you have to process only
>> 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s
>> THAT's a piece of cake for modern systems.
>> I don't agree with that last sentence.
>> Suppose RLE compression is used, tried it myself.
>> That means a branch for each color.
> No it doesn't.
Each compressed color has a color value and a count value.
The count value has to be decremented.
When it reaches zero a new compressed color value and a count value has to
I can imagine all colors to be in a color array and all counts in a count
Therefore the count down could be done on a countdown pointer.
Therefore the copieing of the color could be done from a color pointer.
Both pointers would have to be incremented when the count reaches zero.
I can imagine something like:
ColorPointer = ColorPointer + ColorIncrementation;
CountPointer = CountPointer + CountIncrementation;
A slight problem is that color has to be advanced by let's say 1 byte
the rgb's have been split up into red, green, blue arrays, which is an
requirement and count slow things down further but ok... I'll cut you some
And the count incrementation is different which needs to be 4 bytes
However the counts needs to be compressed as well using 4 bytes for each
would be overkill.
For now I am willing to ignore the fact that the counts have to be
compressed as well...
let's focus on RLE for now ;) :)
ColorIncrementation could be calculated as follows:
ColorIncrementation := 1 * ?;
CountIncrementation := 4 * ?;
Now the question is what does the question mark become ?
The question mark has to be 1 if the count is zero.
Therefore a branchless piece of code is necessary to determine if the count
This could indeed be done with shr's and or's.
Lastly... the zero count has to be inverted to a one.
If any of the bits in the count was 1 it would be inverted to a zero.
If none of the bits in the count was 0 it would be inverted to a one.
Which would trigger the multiply.
So indeed a branchless version of RLE is possible, but at what costs the
question is ?! ;)
This is a first version... perhaps it can be further enhanced to use less
> Just think about it.
I just did, see above ;)
> On comp.arch back a few years agao, this would have been a newbie
Newbies ? lol.
> Think branchless code. Heck, I could code it up branchlessly in C.
I bet you can, but the question is: What it would be faster than the branch
version ? ;)
> Not sure of branchless is a win over a machine with branch prediction,
Aha, so you not sure ! ;) :)
> where correctly predicted branches are free. But if you say you have
> branch mispredictions, try the branchless version.
Maybe I will sometime ;)
But I have a better idea... let the gpu do it ! ;) :)
From: Skybuck Flying on 8 Aug 2010 12:09
I think good solution could be to replace VCL's reliance on GDI with OpenGL
or maybe even DirectX.
Except for one little problem: OpenGL has issue's with switching between
These issue's might be overwon with newer extensions like framebuffers...
and maybe even opengl 4.1 multiple viewports and what not...
From: Skybuck Flying on 10 Aug 2010 02:33
That website works really bad, further more I don't believe in the cpu doing
gui's... that's what a graphics card is for ! ;) :)
Offloading work to gpu much better ?! ;) :)