|
Prev: A little ASM 6809 program
Next: what is rsrc.rc?
From: Wolfgang Kern on 12 Jan 2008 15:33 Wannabee skrev: >> I just checked it out by using the existing KESYS function >> ecx= 0100 Ysize >> ebx= 0100 Xsize >> edx= 0 X+Yposition (Y in hw) >> eax= 0 colour mask >> esi= source >> jmp draw_bmp >> first I tried it with my 8-bit pictures: >> >> ||very first run||repeated runs||cycles/dot||mS on 500Hz K7|| >> (position aligned 0000,0100) >> 565142 540271 8..9 ~1 >> (misaligned 0003,0101) >> 590432 573081 8..9 ~1 >> for the 32-bit picture (rare used, so not fully optimised) >> (I just used my sin/cos table pattern for it): >> 2225502 2099743 32..34 ~4 >> I forgot to disable interrupts, so ~400 cycles per mS >> may be in there from my timer-IRQ. >> And yes, the write was asynchron to the frame rate. > This is pretty impressive. > What kind of graphics card is this for? > Anyway, I have coded the win equivalent. So if you want to prove it.... Let's see it. I have currently a winfast(Nvidea_Geforce4000) under test, but I know from other cards previously inserted that this wont affect the timeng on direct VRAM-access in flat VESA 2.0/3.0 modes. The limit here is the max. BUS-frequency and I'm almost on it. > :D (we can exchange real code) Hope you don't mean we exchange KESYS-core with RosAsm-source :):):) __ wolfgang
From: //o//annabee on 12 Jan 2008 13:56 P� Sat, 12 Jan 2008 21:33:44 +0100, skrev Wolfgang Kern <nowhere(a)never.at>: > > Wannabee skrev: >>> I just checked it out by using the existing KESYS function > >>> ecx= 0100 Ysize >>> ebx= 0100 Xsize >>> edx= 0 X+Yposition (Y in hw) >>> eax= 0 colour mask >>> esi= source >>> jmp draw_bmp > >>> first I tried it with my 8-bit pictures: >>> >>> ||very first run||repeated runs||cycles/dot||mS on 500Hz K7|| >>> (position aligned 0000,0100) >>> 565142 540271 8..9 ~1 >>> (misaligned 0003,0101) >>> 590432 573081 8..9 ~1 > >>> for the 32-bit picture (rare used, so not fully optimised) >>> (I just used my sin/cos table pattern for it): >>> 2225502 2099743 32..34 ~4 > >>> I forgot to disable interrupts, so ~400 cycles per mS >>> may be in there from my timer-IRQ. >>> And yes, the write was asynchron to the frame rate. > >> This is pretty impressive. >> What kind of graphics card is this for? > >> Anyway, I have coded the win equivalent. So if you want to prove it.... > > Let's see it. Its easy to find. :D Where is yours? > I have currently a winfast(Nvidea_Geforce4000) under test, > but I know from other cards previously inserted that this wont affect > the timeng on direct VRAM-access in flat VESA 2.0/3.0 modes. > The limit here is the max. BUS-frequency and I'm almost on it. > >> :D (we can exchange real code) > > Hope you don't mean we exchange KESYS-core with RosAsm-source :):):) Why not? Do you need a core to run a file as simple as this? If you want your OS out of the picture. why dont you just write it as a dos image instead? btw, I still have the copy of you demo. Will it run on that? > __ > wolfgang > > >
From: Wolfgang Kern on 13 Jan 2008 10:09 Wannabee skrev: >>>> I just checked it out by using the existing KESYS function >>>> first I tried it with my 8-bit pictures: >>>> >>>> ||very first run||repeated runs||cycles/dot||mS on 500Hz K7|| >>>> (position aligned 0000,0100) >>>> 565142 540271 8..9 ~1 >>>> (misaligned 0003,0101) >>>> 590432 573081 8..9 ~1 >>>> for the 32-bit picture (rare used, so not fully optimised) >>>> (I just used my sin/cos table pattern for it): >>>> 2225502 2099743 32..34 ~4 >>>> I forgot to disable interrupts, so ~400 cycles per mS >>>> may be in there from my timer-IRQ. >>>> And yes, the write was asynchron to the frame rate. >>> This is pretty impressive. >>> What kind of graphics card is this for? >>> Anyway, I have coded the win equivalent. So if you want to prove it.... >> Let's see it. > Its easy to find. :D can you repost your page please, last crashes wiped my address book. what are the figures you got ? > Where is yours? >> I have currently a winfast(Nvidea_Geforce4000) under test, >> but I know from other cards previously inserted that this wont affect >> the timeng on direct VRAM-access in flat VESA 2.0/3.0 modes. >> The limit here is the max. BUS-frequency and I'm almost on it. >>> :D (we can exchange real code) >> Hope you don't mean we exchange KESYS-core with RosAsm-source :):):) > Why not? Do you need a core to run a file as simple as this? To direct access a flat VideoRAM it needs a 32-bit OS which allow to write to this memory range (best without paging issues). It asks me to type the source code manually, but the trick is easy enough to explain, if the colour mask is disabled then it just uses REP MOVSD with pre and post aligning (for 8-bit modes only) on every hor.line, and a line increase is just done by an ADD reg,imm32 (which is modified by Vmode set and holds 0400 for 1024,,8 and 01000 for 1024,,32). In addition it checks on screen bounds before a line is drawn and either clip or discard the line if OutOfBounds. ok, the 32-bit colour part looks like: usage: MOV eax,00001019h |INT 7F ;set VESAmode to 1024*768,32 ecx= 0100 Ysize ebx= 0100 Xsize edx= 0 X+Yposition (Y in hw) eax= 0 colour mask esi= source ;btw: KESYS.bitmaps aren't stored upside down! AND [vflag],0f0 ;clear all options CALL draw_bmp MOV eax 00001009h |INT 7F ;set VESAmode to 1024*768,8 again _________ draw_bmp: OR edi,ebx OR edi,ecx |JZ ret ;just in case PUSH ebx PUSH edx ;[esp]=Xpos [esp+2]=Ypos ;clip_it: MOVZX eax,w[esp+2] ;eax= Ypos MOV edx,0300 ;max lines (altered by Vmode) ADD eax,ebx CMP eax,edx |Jc L1> SUB eax,ebx |MOV ebx,edx |SUB ebx,eax |JS L9> L1: MOVZX eax,w[esp] ;Xpos MOV edx,01000 ;scan line size (altered by Vmode) ADD eax,ecx CMP eax,edx |Jc L2> SUB eax,ecx |MOV ecx,edx |SUB ecx,eax |JS L9> L2: MOVZX eax,w[esp+2] IMUL eax,edx ;y*line size LEA edi,[eax+screen_start] ;from VESA-info,(altered by Vmode) MOVZX eax w[esp] ;+x for 8-bit, +4*x for 32bit TEST[Vflag]40h ;indicates 8/32 bit colours JZ draw8 ;not shown yet TEST[Vmode]04h indicates colour mask active JNZ draw_32_eax ;not shown yet LEA edi,[edi+eax*4]; ;draw it: PUSH ecx L3:MOV eax,edi ;keep the line start REP MOVSD ADD eax,edx ;add scan line size DEC ebx |MOV ecx[esp] MOV edi,eax |JNZ L3< POP ecx L9: POP edx |POP ebx ret: RET ___________ You see it's not optimised at all, I could try to improve the loop with MOVD/MOVNTQ or SSE 128-bit moves, even then any unaligned parts may destroy the gain. > If you want your OS out of the picture. why dont you just write > it as a dos image instead? It wont work in plain DOS because it must use 32-bit code to access a flat VRAM (usually above 2GB). EMM and XMS wont do well here, because IRQs become disabled for too long and may lock up some hardware then. > btw, I still have the copy of you demo. Will it run on that? I think this DEMO was a version.000 or 001, so it wont contain the bitmap draw nor any 32-bit colour support. __ wolfgang
From: //o//annabee on 13 Jan 2008 11:45 P� Sun, 13 Jan 2008 16:09:28 +0100, skrev Wolfgang Kern <nowhere(a)never.at>: > > Wannabee skrev: > >>>>> I just checked it out by using the existing KESYS function >>>>> first I tried it with my 8-bit pictures: >>>>> >>>>> ||very first run||repeated runs||cycles/dot||mS on 500Hz K7|| >>>>> (position aligned 0000,0100) >>>>> 565142 540271 8..9 ~1 >>>>> (misaligned 0003,0101) >>>>> 590432 573081 8..9 ~1 > >>>>> for the 32-bit picture (rare used, so not fully optimised) >>>>> (I just used my sin/cos table pattern for it): >>>>> 2225502 2099743 32..34 ~4 > >>>>> I forgot to disable interrupts, so ~400 cycles per mS >>>>> may be in there from my timer-IRQ. >>>>> And yes, the write was asynchron to the frame rate. > >>>> This is pretty impressive. >>>> What kind of graphics card is this for? > >>>> Anyway, I have coded the win equivalent. So if you want to prove >>>> it.... > >>> Let's see it. >> Its easy to find. :D > > can you repost your page please, last crashes wiped my address book. > what are the figures you got ? Exactly. all my half finnished stuff, tests apps, insane apps, and etc is available here> Newbies should be warned that I am a newbie too. < http://szmyggenpv.com/downloads/ > > >> Where is yours? > >>> I have currently a winfast(Nvidea_Geforce4000) under test, >>> but I know from other cards previously inserted that this wont affect >>> the timeng on direct VRAM-access in flat VESA 2.0/3.0 modes. >>> The limit here is the max. BUS-frequency and I'm almost on it. > >>>> :D (we can exchange real code) > >>> Hope you don't mean we exchange KESYS-core with RosAsm-source :):):) > >> Why not? Do you need a core to run a file as simple as this? > > To direct access a flat VideoRAM it needs a 32-bit OS which allow > to write to this memory range (best without paging issues). ok. But I still dont understand why you cannot just extract that code insert it in the dos file, go to 32 bit flat mode and just do the blts and write the numbers. After 26 years building an OS I imagine you could do that inside of 10 minutes? I am pretty sure I could do that in a couple of hours, or a day, if I had the info. (even I never did any dos programming) then how can I verify your findings? The diffrence of my app is between a AMD64 and a 1500mhz Athlon XP is 4900 copies per second, to just 460+ per second. using the OS BitBlt which I have considered fast, and which I have few alternatives to unless using hardware acceleration. Yours run on a much slower computer, but achives 1/5 of the AMD64 running at >2 gigahz I can hardly belive it. Your code is 6+ times faster, then the atlon xp > It asks me to type the source code manually, but the trick is > easy enough to explain, if the colour mask is disabled then it just > uses REP MOVSD with pre and post aligning (for 8-bit modes only) > on every hor.line, and a line increase is just done by an ADD reg,imm32 > (which is modified by Vmode set and holds 0400 for 1024,,8 and > 01000 for 1024,,32). In addition it checks on screen bounds before > a line is drawn and either clip or discard the line if OutOfBounds. > > ok, the 32-bit colour part looks like: > > usage: > MOV eax,00001019h |INT 7F ;set VESAmode to 1024*768,32 > ecx= 0100 Ysize > ebx= 0100 Xsize > edx= 0 X+Yposition (Y in hw) > eax= 0 colour mask > esi= source ;btw: KESYS.bitmaps aren't stored upside down! > AND [vflag],0f0 ;clear all options > CALL draw_bmp > MOV eax 00001009h |INT 7F ;set VESAmode to 1024*768,8 again > _________ > draw_bmp: > OR edi,ebx > OR edi,ecx what the heck is this? (above) > |JZ ret ;just in case > PUSH ebx > PUSH edx ;[esp]=Xpos [esp+2]=Ypos > ;clip_it: > MOVZX eax,w[esp+2] ;eax= Ypos Stack abuse? :D > MOV edx,0300 ;max lines (altered by Vmode) > ADD eax,ebx > CMP eax,edx |Jc L1> > SUB eax,ebx |MOV ebx,edx |SUB ebx,eax |JS L9> > L1: > MOVZX eax,w[esp] ;Xpos > MOV edx,01000 ;scan line size (altered by Vmode) the Vmode change recode this one? SMC > ADD eax,ecx > CMP eax,edx |Jc L2> > SUB eax,ecx |MOV ecx,edx |SUB ecx,eax |JS L9> > L2: > MOVZX eax,w[esp+2] > IMUL eax,edx ;y*line size > LEA edi,[eax+screen_start] ;from VESA-info,(altered by Vmode) nice. > MOVZX eax w[esp] ;+x for 8-bit, +4*x for 32bit > TEST[Vflag]40h ;indicates 8/32 bit colours > JZ draw8 ;not shown yet > TEST[Vmode]04h indicates colour mask active > JNZ draw_32_eax ;not shown yet > LEA edi,[edi+eax*4]; > > ;draw it: > PUSH ecx > L3:MOV eax,edi ;keep the line start > REP MOVSD > ADD eax,edx ;add scan line size > DEC ebx |MOV ecx[esp] > MOV edi,eax |JNZ L3< > POP ecx > L9: POP edx |POP ebx > ret: RET > ___________ > You see it's not optimised at all, ? :D Looks very nice to me. short and excellent code I gather. > I could try to improve the loop > with MOVD/MOVNTQ or SSE 128-bit moves, even then any unaligned parts > may destroy the gain. >> If you want your OS out of the picture. why dont you just write >> it as a dos image instead? > > It wont work in plain DOS because it must use 32-bit code to > access a flat VRAM (usually above 2GB). > EMM and XMS wont do well here, because IRQs become disabled for too > long and may lock up some hardware then. But shouldnt be all that hard still? To run a com, break the barried by your own code? Or am I speaking of ignorance here? I cant figure it could be much of a job for you? >> btw, I still have the copy of you demo. Will it run on that? > > I think this DEMO was a version.000 or 001, so it wont contain > the bitmap draw nor any 32-bit colour support. ok. I like your code, but would very much like to see it running with printed numbers (fps). (as fast as it can run) Since we cant do that I would just have to trust you ... (I am not really hardwired for that) :) So you pushing 1/5 of a AMD64 400mhz fsb performance on a 500mhz antique AMD? And 6 times that of a 266mhz fsb athlon? hmmm.....(teeth gnizzeling sounds) ..... Get out of here! Rewrite it to a dos image, that set up the flat mode, and vesa, and runs the app. I know you can do that easily. And I promise you, if you do that i read the code in hex. And I also will then restart the testing of the demo, if you want to. (I now have enough hardware for dedicating a machine to testing). You want to prove a point, you have the means, (easily) so whats stopping you? > __ > wolfgang > > >
From: Dirk Wolfgang Glomp on 13 Jan 2008 14:26
Am Sun, 13 Jan 2008 16:09:28 +0100 schrieb Wolfgang Kern: > It wont work in plain DOS because it must use 32-bit code to > access a flat VRAM (usually above 2GB). 32-bit code? I use the unreal/bigreal-mode with 16Bit-adressmode to access the linear framebuffer. > EMM and XMS wont do well here, because IRQs become disabled for too > long and may lock up some hardware then. When i use EMM-Register the IRQs become disabled? Dirk |