|
Prev: cpu type idea
Next: CPU type idea
From: Wolfgang Kern on 20 Jul 2008 13:11 "cr88192" wrote: .... > sadly, I don't have much on-topic to say, not having done a whole lot > ASM or compiler related recently. > of well, I have recently noted that both AMD and Intel have their own > ideas for how to extend C and C++ to handle large array-like vectors. AMD produces ATI-GPUs as well, so the aspect may be more close to needs. > I am left with another idea: > how about we make a "conservative" set of extensions for this kind of > thing: > _Vector float a[100], b[100], c[100]; I'm for sure the wrong one to discuss C/C+- ... ;) My personal approach (long before GPUs joined in) is an all integer solution for drawing 3D-objects in various view-angles, aspects and perspective points. Every point in the array got four 16 bit entries: x/y/z-coordinantes, direction-flags and neighbour link-flags. Packed SIMD could be of particlar use for this. For all graphic calculations I use a SIN/COS LUT with 2^n shift biased 16-bit values ranged from 0.00...450.00 degrees. Even this ended up in a 90002 bytes table it saves me on many things like determining the sign and it doesn't need any divide. This method were quite fast compared to other stuff around until GPUs and 'accellerator drivers' (for M$ only) became available. I got register information on AMD/ATI GPUs recently, but it will ask me one more time to 're-engineer' a couple of drivers before I know enough details on how to use this information. Looks like this GPUs use SP-float packets for most things and may calculate somehow faster than an FPU ever will. [snipped C...] sorry for I wont be of any help on this. __ wolfgang
From: mcjason on 20 Jul 2008 19:34 On Jul 20, 1:11 pm, "Wolfgang Kern" <nowh...(a)never.at> wrote: > "cr88192" wrote: > > ... > > > sadly, I don't have much on-topic to say, not having done a whole lot > > ASM or compiler related recently. > > of well, I have recently noted that both AMD and Intel have their own > > ideas for how to extend C and C++ to handle large array-like vectors. > > AMD produces ATI-GPUs as well, so the aspect may be more close to needs. > > > I am left with another idea: > > how about we make a "conservative" set of extensions for this kind of > > thing: > > _Vector float a[100], b[100], c[100]; > > I'm for sure the wrong one to discuss C/C+- ... ;) > > My personal approach (long before GPUs joined in) is an all integer > solution for drawing 3D-objects in various view-angles, aspects and > perspective points. > Every point in the array got four 16 bit entries: > x/y/z-coordinantes, direction-flags and neighbour link-flags. > > Packed SIMD could be of particlar use for this. > > For all graphic calculations I use a SIN/COS LUT with 2^n shift > biased 16-bit values ranged from 0.00...450.00 degrees. > Even this ended up in a 90002 bytes table it saves me on many > things like determining the sign and it doesn't need any divide. > > This method were quite fast compared to other stuff around until > GPUs and 'accellerator drivers' (for M$ only) became available. > > I got register information on AMD/ATI GPUs recently, but it will > ask me one more time to 're-engineer' a couple of drivers before > I know enough details on how to use this information. > > Looks like this GPUs use SP-float packets for most things > and may calculate somehow faster than an FPU ever will. > > [snipped C...] sorry for I wont be of any help on this. > > __ > wolfgang I'd like to think again how a fluorescent lightbulb tube works... If I use a few radio signals to make a lightshow in it, doesn't it work that the pattern of light depends on the radio signals? but then maybe a few at a time? so the radio signals together make a pattern like what matters for both of the radio signals, then doesn't it work out that there's a way to forumlate the signals to make a pattern in different ways? then isn't this what can do math computation? If it's figured what signals find what response in the flurescent lightbulb tube, then can't I find signals that say a math operation and a lightshow that says a math answer?
From: cr88192 on 20 Jul 2008 20:03 "Wolfgang Kern" <nowhere(a)never.at> wrote in message news:g5vrj9$ubn$1(a)newsreader2.utanet.at... > > "cr88192" wrote: > ... > >> sadly, I don't have much on-topic to say, not having done a whole lot >> ASM or compiler related recently. >> of well, I have recently noted that both AMD and Intel have their own >> ideas for how to extend C and C++ to handle large array-like vectors. > > AMD produces ATI-GPUs as well, so the aspect may be more close to needs. > yeah, many of these extensions are intended for offloading calculations to the GPU (and are also intended for allowing efficient utilization of SSE operations). lamely, both provide extensions that are, IMO, ugly (and generally go against what is generally allowed within the respective standards). >> I am left with another idea: >> how about we make a "conservative" set of extensions for this kind of >> thing: > >> _Vector float a[100], b[100], c[100]; > > I'm for sure the wrong one to discuss C/C+- ... ;) > > My personal approach (long before GPUs joined in) is an all integer > solution for drawing 3D-objects in various view-angles, aspects and > perspective points. > Every point in the array got four 16 bit entries: > x/y/z-coordinantes, direction-flags and neighbour link-flags. > > Packed SIMD could be of particlar use for this. > > For all graphic calculations I use a SIN/COS LUT with 2^n shift > biased 16-bit values ranged from 0.00...450.00 degrees. > Even this ended up in a 90002 bytes table it saves me on many > things like determining the sign and it doesn't need any divide. > possible, but yes, not as applicable to GPUs... > This method were quite fast compared to other stuff around until > GPUs and 'accellerator drivers' (for M$ only) became available. > > I got register information on AMD/ATI GPUs recently, but it will > ask me one more time to 're-engineer' a couple of drivers before > I know enough details on how to use this information. > > Looks like this GPUs use SP-float packets for most things > and may calculate somehow faster than an FPU ever will. > yeah. the GPUs are capable of large-scale parallel operation, wheras the CPU is capable of high-speed serial operation. the GPUs tend to have more parralel speed than the CPU has serial speed. sadly, I am not that fammiliar with low-level interfacing with the GPU, more with low-level stuff on the CPU, and recently doing a few tasks by making use of the GPU and shaders (via OpenGL), a recent example being me figuring out how to approximately implement visibility calculations/occlusion-culling in real-time via the GPU (a variant of this approach could also be used for doing faster PVS calculations, albeit still likely taking several seconds or more for a large and complex scene). the main limitation being though, that it only currently works for "currently visible" solid geometry (a true PVS could answer more complex queries involving non-visible areas as well), me being uncertain how to use it for efficiently culling things like shadow volumes (it being more expensive right now to accurately cull the shadow geometry than to just render it). at present, I am using the lamely sad approach of culling the shadows along with their respective geometry (the alternative approach of just drawing all the shadows can hurt framerate), so when a shadowing object goes off-screen, the shadow simply disappears... so, effective shadow handling (among other things) is still likely to require a "proper" occlusion-culling scheme (either precomputed, or real-time computed, full PVS). another possible route is keeping track of the bounds for the currently visible area, which, along with the frustum, could allow culling shadows by the currently visible area (such as adding a kind of shadowing far-clip plane, ...). this would be real-time and approximate, but should be good enough (possible issue: this would tend to be a lot less effective, say, if looking down a long hallways with lots of shadow-causing geometry nearby, but not actually visible from within the hallway). > [snipped C...] sorry for I wont be of any help on this. > yeah, mostly just semantics anyways. > __ > wolfgang > > >
From: Wolfgang Kern on 21 Jul 2008 05:29 "cr88192" wrote: .... > the GPUs are capable of large-scale parallel operation, wheras the CPU > is capable of high-speed serial operation. the GPUs tend to have more > parallel speed than the CPU has serial speed. Yes, I checked on a few newer games and rough calculated the required speed for turning an fully animated 3D-view (including shadows, emmissive objects and interactive entities) into any view angle without distortion. Seems CPU/FPU aren't involved in this at all, because it would need a ~300 GHz machine to perform all matrix calculation with SIMD within one 1600*1200,32 60Hz frame. It's astonishing enough how fast the whole screen is redrawn, even when I check it on my old 500/33 MHZ K7 and set the frame rate to non interlaced 100Hz (>200 MHz dot clock). With linear frame VESA, I'm limited to the given 33 MHZ bus here. .... while you mention OpenGL ... I haven't checked on it, an HLL-story anyway ? .... > the main limitation being though, that it only currently works for > "currently visible" solid geometry (a true PVS could answer more complex > queries involving non-visible areas as well), me being uncertain how to use > it for efficiently culling things like shadow volumes (it being more > expensive right now to accurately cull the shadow geometry than to just > render it). > at present, I am using the lamely sad approach of culling the shadows along > with their respective geometry (the alternative approach of just drawing all > the shadows can hurt framerate), so when a shadowing object goes off-screen, > the shadow simply disappears... > so, effective shadow handling (among other things) is still likely to > require a "proper" occlusion-culling scheme (either precomputed, or > real-time computed, full PVS). I think shadows don't need to be very exact and some 'true-light' cards may do it on their own. > another possible route is keeping track of the bounds for the currently > visible area, which, along with the frustum, could allow culling shadows by > the currently visible area (such as adding a kind of shadowing far-clip > plane, ...). > this would be real-time and approximate, but should be good enough >(possible issue: this would tend to be a lot less effective, > say, if looking down a long hallways with lots of shadow-causing > geometry nearby, but not actually visible from within the hallway). You'll find this minor bugs (late shadow update) in almost all games, perhaps just to keep top speed for the front view. __ wolfgang
From: cr88192 on 22 Jul 2008 07:41
"Wolfgang Kern" <nowhere(a)never.at> wrote in message news:g61ku5$pc6$3(a)newsreader2.utanet.at... > > "cr88192" wrote: > ... > >> the GPUs are capable of large-scale parallel operation, wheras the CPU >> is capable of high-speed serial operation. the GPUs tend to have more >> parallel speed than the CPU has serial speed. > > Yes, I checked on a few newer games and rough calculated the required > speed for turning an fully animated 3D-view (including shadows, emmissive > objects and interactive entities) into any view angle without distortion. > > Seems CPU/FPU aren't involved in this at all, because it would need > a ~300 GHz machine to perform all matrix calculation with SIMD within > one 1600*1200,32 60Hz frame. > It's astonishing enough how fast the whole screen is redrawn, > even when I check it on my old 500/33 MHZ K7 and set the frame > rate to non interlaced 100Hz (>200 MHz dot clock). > With linear frame VESA, I'm limited to the given 33 MHZ bus here. > yes, these cards can do some things pretty quickly, it can be surprising. for the limited range of tasks that can be pulled off on it, the GPU can be fairly effective for bulk calculations. for example, it is possible that the GPU could be used for decent-sized neural net simulation or similar (future GPUs allowing for much bigger nets). many things that would be done using much slower but direct algorithms can be done on the GPU using "visual reasoning" based algos (draw a bunch of stuff and process the results). this is actually how I was doing visibility calculations: draw a bunch of stuff and what is visible, is visible... but, there are still limits... > ... > while you mention OpenGL ... > I haven't checked on it, an HLL-story anyway ? > ... > I am not sure what you are asking exactly. if about my compiler, well, it does C, but I got distracted before I got my new compiler core fully written (thus, no x86-64 support yet). adding Java and JavaScript support have similarly been on hold. C still works though... more recently, I have been doing a lot more 3D related stuff, and very recently have started again messing with physics simulation, me considering adding several features to my physics engine: soft-body physics (cloth, globs, ...); particle simulation (for smoke and fire effects, ...); ragdolls; physics shaders; .... >> the main limitation being though, that it only currently works for >> "currently visible" solid geometry (a true PVS could answer more complex >> queries involving non-visible areas as well), me being uncertain how to > use > it for efficiently culling things like shadow volumes (it being more >> expensive right now to accurately cull the shadow geometry than to just >> render it). > >> at present, I am using the lamely sad approach of culling the shadows > along > with their respective geometry (the alternative approach of just > drawing all > the shadows can hurt framerate), so when a shadowing object > goes off-screen, >> the shadow simply disappears... > >> so, effective shadow handling (among other things) is still likely to >> require a "proper" occlusion-culling scheme (either precomputed, or >> real-time computed, full PVS). > > I think shadows don't need to be very exact and some 'true-light' cards > may do it on their own. > yeah. well, in general it is working fairly well, but some effects are lost... actually, there is a certain degree of quality that IMO can currently only really be done with ray-tracing and radiosity methods, however real-time stuff is real time. so, one can have a world where rendering is much faster, lighting is much better, shadows are nice and smooth, ... but where the lights are fixed in position and a long costly process is needed to rebuild the map; or a world where lighting has funky pop-up issues, rendering is much slower, and everything looks like dark grungy plastic (aka: the Doom3/Quake4 look), but with the gain that everything can happen in real-time (add a light, and the scene gets lighter, move something and the shadow follows, open a hole in the wall and have the light flow right through, ...). much better would be combine these approaches, where some lighting is static and other lighting is dynamic, but this does require using a prebuilt BSP... however, the GPU would atleast allow for, among other things, greatly speeding up radiosity calculations (albeit full scene radiosity is by no means likely to be real-time either, nor am I sure how one could effectively do radiosity purely from the POV of the camera). >> another possible route is keeping track of the bounds for the currently >> visible area, which, along with the frustum, could allow culling shadows > by >> the currently visible area (such as adding a kind of shadowing far-clip >> plane, ...). > >> this would be real-time and approximate, but should be good enough >>(possible issue: this would tend to be a lot less effective, >> say, if looking down a long hallways with lots of shadow-causing >> geometry nearby, but not actually visible from within the hallway). > > You'll find this minor bugs (late shadow update) in almost all games, > perhaps just to keep top speed for the front view. > yes, oh course, my current approach does not allow delaying shadows (I am using the depth-fail shadow-volumes approach, which sadly has to be redrawn every frame, and creates a totally huge amount of invisible geometry and overdraw related to drawing the shadow volumes and then drawing the lights onto the surfaces...). the result is that, for the Quake-1 maps, I get performance comprable to what Quake-1 was getting on the original hardware it was targetting (good old pentium 90s and pentium 133s). of course, for a long time, I had been running it primarily on a 486DX-66 and similar style HW (that is what my main computer had been at the time), so I had been fairly used to it lagging (or, for true terror, one could try to use HW accel on the then new S3-Virge and later S3-Trio-64 cards, for questionable graphics at far worse framerates...). long ago, the Voodoo cards were the major point I think when HW-accel actually fully outdid SW rendering, my projects eventually changing focus primarily to HW-accel primarily around the time when the GeForce-2 was still newer (and when I actually had one). I think prior to this, I had tried doing some stuff using Mesa, but it was unacceptably slow at the time... > __ > wolfgang > > > |