From: //o//annabee on
P� Fri, 11 Jan 2008 14:53:02 +0100, skrev Wolfgang Kern <nowhere(a)never.at>:

>
> Wannabee skrev:
> ...
>>> Windoze seem to occupy all the cache anytime, so we are lucky if we
>>> got one free line for our tests.
>
>> No. You get a whole timeslice, which can be very long if you run in
>> realtime mode. The reason why windows seems so slow, is the constant
>> paging of memoryhungry programs, plus (I guess) I/O sheduling.
>> Window is simply slow because hardrives, and I/O are slow.
>
> Haven't KESYS and L'unix to deal with the same hardware as windoze ?

(win2000 )
Yes, but I am thinking about the fact that many apps that run in windows,
including some of the default windows apps, (like copy) abuses the OS. In
windows, a program may become slower, beause it runs faster.... E.g if it
does not allow other processes to run, often enough, it will drain and
make other apps seem slow, plus it will also drain itself, and make itself
slow. (as seem from the userside)

(Thus creating the illusion that the OS itself is slow)

Playing with buffersizes, and allways making sure to yield lots of time to
other processes, and both the running app and other apps will behave as
they should. Many apps fails completly in this regard. running bittourment
with a few torrents runing, and you can virtually say goodbye to that
machine, it is useless while it hogs all the resources, and does not allow
other apps to be responsive. This is totally not needed in windows. The
effective downloading rate, will be the same, even if the app yields 1000s
of times a second. Another example is windows COPY, it hogs most of the
machine resources during the copy, which is absurd when copying lots of
large files. This is wrong, and the copy could be made to run at nearly no
CPU power, (that you need) and allow you to continue to work with any
other task, including many copies, and the copy would not appear much
slower at all, because there will be plenty of times between keypresses
and etc. You will not notice that the copy is taking place, and have no
kind of hangtimes in the other work you do. No large waittimes to start
new applications... Thus, everything will appear to run smoother.

(Thus creating the illusion that the OS itself is fast)

(of course there are limits, but not as much as it appear with many apps
today)

The reason why windows seems slow is applications that does not yield
(play ball) often enough.


>> And because certain windows apps, does not pay any attention
>> to avoid this problem. (Bittourment for instance, windows itself, and
>> also
>> Opera is dog slow in this regard). Windows is often just I/O bound.
>> Given that you only talk to memory, windows is not at all significantly
>> slow.
>> The kernel is _not_ faster then user apps. I think we should confirm it.
>
> The problem with an 'event-driven' OS is that an IRQ may direct trigger
> a huge program part assigned to it and blocking other events and threads
> (well known as the hour-glass and the frozen mice) for some time.

This is the responsibility of the AppCoder. there is not "Big" in windows.
Everything is pages, and the only thing that matters is if the page is in
or out. Plus the above. There are of course many other things to critize
the Os for, but I do not think this is it. Plus, I am not sure I
understand exactly what you mean. A timeslice is plenty of time to make
good use of the cache.. (so I dont understand where you got this notion
from).

>> Write an app, in dos that floats a 256*256 bitmap across a 32 bit
>> formated, vesa canvas. Then I write one using only GDI. Then we can
>> compare the framerate. It would also be interessting to see for other
>> reasons.
>
> Ye olde DOS is awful slow on this, because it needs to detour with
> INT10h for every dot or has to use the little 32 KB frame at A0000
> and switch the pages of this all the time, ...
>
>> I never heard of speed comparisons between dos and windows. (Even
>> I guess they exist).
>
> ..but with any 32-bit DOS-extension like KEMM (still found in DEMOs)
> and flat framed VRAM, the screen memory can be written as fast as
> the Bus allow (beside AGP- and HW-specific accelleration yet).
>
> ...
> >>> nearly all my cycles are taken by drawing.
>>>> even writing a single char to a graphic screen cost more then
>>>> counting the entire string.
>
> A solid graphic character needs ie: 16*8 = 128 dots to draw,
> and foreground background colours are the obstacles here ;)
> So it is faster to 'fill' the background rectangle with lines, if
> required at all, and output the characters in transparent mode
> (at least faster on KESYS).

You get me thinking. I can rearrange my data, to allow for fast burst
writes in
the most cachefriendly direction. Or at least I could give it a try. On
the todo list.


>
>>> I need to compare my code with windoze one more time.
>>> My screen routines write direct unbuffered to the VRAM and the
>>> last upgrade on text display show an average of 33 cycles per dot,
>>> but it still works on single characters and I think to improve
>>> this and work on whole strings, so it may end up below 30.
>
>> Didnt understand anything after "but".
>> btw, did you like the youtube link I posted?
>
>> (yes in 3rd reading. Yes. Good idea. Write it a whole scanline at the
>> time
>> will remove a Bunch of cache misses. I guess you are just toying with me
>> now eh? ).
>
> Not yet :)

| :)

> I like to save on the call/ret pair for every character and loop on
> string size in the core instead, even this slow single characters then.

I will get back to this problem later. Now I am heavy at work rewriting a
larger portion of
my code.

> ...
>>> :) boot an old DOS6.00 and run your code under test there ?
>>> the problem with timing in windoze is just a windoze-problem ...
>>> we measure cache penalties and page faults, and our code could perform
>>> that fast, that we don't even see any difference.
>
>> Well. I did manage to time your 40 cycles code to 48 cycles.
>> (and if we remove the overhead from that)?
>
> which one do you have in mind here, the 32 bit ASCIIh2bin ?
> yes, the BCD(BCH) packing could be improved here.

It was the bintoascii. or IntegerToString as I use to call those, rare
needed functions.
Those names are a bit confusing. What do you call a routine printing
10101010?
bintoascii2 ?

IntegerToString for writing > 232424 / -232333
HextoString > 0_FEED
BinaryToString > 00_1010_0000

so I am often a bit confused by those names.

alternativly :

BinaryToDecimal
BinaryToHex
BinaryToBinary ?

BinaryToDecimalString
BinaryToHexString
BinaryToBinaryString ?

BinToDecAscii
BinToHexAscii
binToBinAscii

BinToDec
BinToHeck
BinToLol...


>
>> If you want lets do the bitmap test I noted above,
>> and see what comes out?
>
> Fine, even I don't have Vendor specific accellerating DirectX-drivers

I will use GDI, not directx.

> for it, I'll time a bitmap of this size using the DEMO-KESYS under DOS,
> but I need to create a 256KB picture in 32-bit format first,
> because KESYS isn't a game console it's standard is 8-bit colour.
> I assume we both use 1024*768,32 (@100Hz, if this is relevant at all).

ok. I have to use what the Os offers. The main thing is the relative
speed, in the 32bit resolution. If wanted we could just generate the
bitmap as a single color. just that it is diffrent from the background,
and just output the framerate in a corner and keep it running. Just copy
it as fast as possible to the screen. No syncing with the screenrate as
that will not reveal the faster one. Just print it in the same spot, so it
will not need to clip. I dont see this as a competition, just would be
interessting to see what is lost from using windows. No need to hurry. I
am anxious to get back to finishing a long process I started a few days
ago with my code as well. I post mine, when I am done with it. Just got
out of bed.


>
> __
> wolfgang
>
>
>

From: //o//annabee on
P� Fri, 11 Jan 2008 16:01:55 +0100, skrev Wolfgang Kern <nowhere(a)never.at>:

> I could access this page,
> but not the item (perhaps filtered by my provider).

You need flashplayer to see it.
Must see. When I found it it had <40 hits,
now its 260000 seen, in like 3 days!
Yesterday was around 120000 or something.

> __
> wolfgang
>
>

From: Wolfgang Kern on

Wannabee skrev:

>>>> Windoze seem to occupy all the cache anytime, so we are lucky if we
>>>> got one free line for our tests.

>>> No. You get a whole timeslice, which can be very long if you run in
>>> realtime mode. The reason why windows seems so slow, is the constant
>>> paging of memoryhungry programs, plus (I guess) I/O sheduling.
>>> Window is simply slow because hardrives, and I/O are slow.

>> Haven't KESYS and L'unix to deal with the same hardware as windoze ?

> (win2000 )
> Yes, but I am thinking about the fact that many apps that run in windows,
> including some of the default windows apps, (like copy) abuses the OS. In
> windows, a program may become slower, beause it runs faster.... E.g if it
> does not allow other processes to run, often enough, it will drain and
> make other apps seem slow, plus it will also drain itself, and make itself
> slow. (as seem from the userside)
>
> (Thus creating the illusion that the OS itself is slow)
>
> Playing with buffersizes, and allways making sure to yield lots of time to
> other processes, and both the running app and other apps will behave as
> they should. Many apps fails completly in this regard. running bittourment
> with a few torrents runing, and you can virtually say goodbye to that
> machine, it is useless while it hogs all the resources, and does not allow
> other apps to be responsive. This is totally not needed in windows. The
> effective downloading rate, will be the same, even if the app yields 1000s
> of times a second. Another example is windows COPY, it hogs most of the
> machine resources during the copy, which is absurd when copying lots of
> large files. This is wrong, and the copy could be made to run at nearly no
> CPU power, (that you need) and allow you to continue to work with any
> other task, including many copies, and the copy would not appear much
> slower at all, because there will be plenty of times between keypresses
> and etc. You will not notice that the copy is taking place, and have no
> kind of hangtimes in the other work you do. No large waittimes to start
> new applications... Thus, everything will appear to run smoother.
>
> (Thus creating the illusion that the OS itself is fast)
>
> (of course there are limits, but not as much as it appear with many apps
> today)
>
> The reason why windows seems slow is applications that does not yield
> (play ball) often enough.

Right, even an HLL-created OS can be used to its best performance
by well coded applications.

>> The problem with an 'event-driven' OS is that an IRQ may direct trigger
>> a huge program part assigned to it and blocking other events and threads
>> (well known as the hour-glass and the frozen mice) for some time.

> This is the responsibility of the AppCoder. there is not "Big" in windows.
> Everything is pages, and the only thing that matters is if the page is in
> or out. Plus the above. There are of course many other things to critize
> the Os for, but I do not think this is it. Plus, I am not sure I
> understand exactly what you mean. A timeslice is plenty of time to make
> good use of the cache.. (so I dont understand where you got this notion
> from).

Perhaps I once try to get deeper into windoze and figure out in detail
how to make things work as fast as possible, meanwhile I can only
guess from bad experience that everything in there is bloated and slow.

....
>> A solid graphic character needs ie: 16*8 = 128 dots to draw,
>> and foreground background colours are the obstacles here ;)
>> So it is faster to 'fill' the background rectangle with lines, if
>> required at all, and output the characters in transparent mode
>> (at least faster on KESYS).

> You get me thinking. I can rearrange my data, to allow for fast burst
> writes in
> the most cachefriendly direction. Or at least I could give it a try. On
> the todo list.


....
>>> Well. I did manage to time your 40 cycles code to 48 cycles.
>>> (and if we remove the overhead from that)?

>> which one do you have in mind here, the 32 bit ASCIIh2bin ?
>> yes, the BCD(BCH) packing could be improved here.

> It was the bintoascii. or IntegerToString as I use to call those, rare
> needed functions.
> Those names are a bit confusing. What do you call a routine printing
> 10101010?
> bintoascii2 ?

bin2ASCIIbin, or following the BASIC-notation "bin2bin$"

> IntegerToString for writing > 232424 / -232333
> HextoString > 0_FEED
> BinaryToString > 00_1010_0000
>
> so I am often a bit confused by those names.
>
> alternativly :
>
> BinaryToDecimal
> BinaryToHex
> BinaryToBinary ?
>
> BinaryToDecimalString
> BinaryToHexString
> BinaryToBinaryString ?
>
> BinToDecAscii
> BinToHexAscii
> binToBinAscii
>
> BinToDec
> BinToHeck
> BinToLol...

:) anyway better than 'ATOI'

>>> If you want lets do the bitmap test I noted above,
>>> and see what comes out?

>> Fine, even I don't have Vendor specific accellerating DirectX-drivers
>
> I will use GDI, not directx.
>
> > for it, I'll time a bitmap of this size using the DEMO-KESYS under DOS,
> > but I need to create a 256KB picture in 32-bit format first,
> > because KESYS isn't a game console it's standard is 8-bit colour.
> > I assume we both use 1024*768,32 (@100Hz, if this is relevant at all).
>
> ok. I have to use what the Os offers. The main thing is the relative
> speed, in the 32bit resolution. If wanted we could just generate the
> bitmap as a single color. just that it is diffrent from the background,
> and just output the framerate in a corner and keep it running. Just copy
> it as fast as possible to the screen. No syncing with the screenrate as
> that will not reveal the faster one. Just print it in the same spot, so it
> will not need to clip. I dont see this as a competition, just would be
> interessting to see what is lost from using windows. No need to hurry. I
> am anxious to get back to finishing a long process I started a few days
> ago with my code as well. I post mine, when I am done with it. Just got
> out of bed.

I just checked it out by using the existing KESYS function

ecx= 0100 Ysize
ebx= 0100 Xsize
edx= 0 X+Yposition (Y in hw)
eax= 0 colour mask
esi= source
jmp draw_bmp

first I tried it with my 8-bit pictures:

||very first run||repeated runs||cycles/dot||mS on 500Hz K7||
(position aligned 0000,0100)
565142 540271 8..9 ~1
(misaligned 0003,0101)
590432 573081 8..9 ~1

for the 32-bit picture (rare used, so not fully optimised)
(I just used my sin/cos table pattern for it):
2225502 2099743 32..34 ~4

I forgot to disable interrupts, so ~400 cycles per mS
may be in there from my timer-IRQ.
And yes, the write was asynchron to the frame rate.
__
wolfgang



From: Wolfgang Kern on
now this 500Hz should read as 500MHz of course.

__
wolfgang



From: //o//annabee on
P� Fri, 11 Jan 2008 19:02:41 +0100, skrev Wolfgang Kern <nowhere(a)never.at>:

>
>
> I just checked it out by using the existing KESYS function
>
> ecx= 0100 Ysize
> ebx= 0100 Xsize
> edx= 0 X+Yposition (Y in hw)
> eax= 0 colour mask
> esi= source
> jmp draw_bmp
>
> first I tried it with my 8-bit pictures:
>
> ||very first run||repeated runs||cycles/dot||mS on 500Hz K7||
> (position aligned 0000,0100)
> 565142 540271 8..9 ~1
> (misaligned 0003,0101)
> 590432 573081 8..9 ~1
>
> for the 32-bit picture (rare used, so not fully optimised)
> (I just used my sin/cos table pattern for it):
> 2225502 2099743 32..34 ~4
>
> I forgot to disable interrupts, so ~400 cycles per mS
> may be in there from my timer-IRQ.
> And yes, the write was asynchron to the frame rate.

This is pretty impressive.
What kind of graphics card is this for?

Anyway, I have coded the win equivalent. So if you want to prove it....

:D (we can exchange real code)

> __
> wolfgang
>
>
>

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: A little ASM 6809 program
Next: what is rsrc.rc?