From: Andy Glew "newsgroup at on
On 8/5/2010 7:20 AM, Skybuck Flying wrote:
> "
> Actually you have to process only
>
> 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s
>
> THAT's a piece of cake for modern systems.
> "
>
> I don't agree with that last sentence.
>
> Suppose RLE compression is used, tried it myself.
>
> That means a branch for each color.

No it doesn't.

Just think about it. On comp.arch back a few years agao, this would
have been a newbie question.

Think branchless code. Heck, I could code it up branchlessly in C.

Not sure of branchless is a win over a machine with branch prediction,
where correctly predicted branches are free. But if you say you have
branch mispredictions, try the branchless version.
From: Skybuck Flying on

"Andy Glew" <"newsgroup at comp-arch.net"> wrote in message
news:UpednTydFo8am8HRnZ2dnUVZ_q2dnZ2d(a)giganews.com...
> On 8/5/2010 7:20 AM, Skybuck Flying wrote:
>> "
>> Actually you have to process only
>>
>> 1920*1200 * 3Bytes * 60/s = 0.41472 GByte/s
>>
>> THAT's a piece of cake for modern systems.
>> "
>>
>> I don't agree with that last sentence.
>>
>> Suppose RLE compression is used, tried it myself.
>>
>> That means a branch for each color.
>
> No it doesn't.

Each compressed color has a color value and a count value.

The count value has to be decremented.

When it reaches zero a new compressed color value and a count value has to
be read.

I can imagine all colors to be in a color array and all counts in a count
array.

Therefore the count down could be done on a countdown pointer.

Therefore the copieing of the color could be done from a color pointer.

Both pointers would have to be incremented when the count reaches zero.

I can imagine something like:

ColorPointer = ColorPointer + ColorIncrementation;
CountPointer = CountPointer + CountIncrementation;

A slight problem is that color has to be advanced by let's say 1 byte
assuming
the rgb's have been split up into red, green, blue arrays, which is an
additional
requirement and count slow things down further but ok... I'll cut you some
slack.

And the count incrementation is different which needs to be 4 bytes
maximum...

However the counts needs to be compressed as well using 4 bytes for each
count
would be overkill.

For now I am willing to ignore the fact that the counts have to be
compressed as well...
let's focus on RLE for now ;) :)

ColorIncrementation could be calculated as follows:

ColorIncrementation := 1 * ?;
CountIncrementation := 4 * ?;

Now the question is what does the question mark become ?

The question mark has to be 1 if the count is zero.

Therefore a branchless piece of code is necessary to determine if the count
is zero.

This could indeed be done with shr's and or's.

Lastly... the zero count has to be inverted to a one.

If any of the bits in the count was 1 it would be inverted to a zero.
If none of the bits in the count was 0 it would be inverted to a one.

Which would trigger the multiply.

So indeed a branchless version of RLE is possible, but at what costs the
question is ?! ;)

This is a first version... perhaps it can be further enhanced to use less
instructions.

> Just think about it.

I just did, see above ;)

> On comp.arch back a few years agao, this would have been a newbie
> question.

Newbies ? lol.

> Think branchless code. Heck, I could code it up branchlessly in C.

I bet you can, but the question is: What it would be faster than the branch
version ? ;)

> Not sure of branchless is a win over a machine with branch prediction,

Aha, so you not sure ! ;) :)

> where correctly predicted branches are free. But if you say you have
> branch mispredictions, try the branchless version.

Maybe I will sometime ;)

But I have a better idea... let the gpu do it ! ;) :)

Bye,
Skybuck =D


From: Skybuck Flying on
I think good solution could be to replace VCL's reliance on GDI with OpenGL
or maybe even DirectX.

Except for one little problem: OpenGL has issue's with switching between
rendering contexts...

These issue's might be overwon with newer extensions like framebuffers...
and maybe even opengl 4.1 multiple viewports and what not...

Bye,
Skybuck.


From: Skybuck Flying on
That website works really bad, further more I don't believe in the cpu doing
gui's... that's what a graphics card is for ! ;) :)

Offloading work to gpu much better ?! ;) :)

Bye,
Skybuck =D