From: Wolfgang Kern on

"cr88192" wrote:
....

> sadly, I don't have much on-topic to say, not having done a whole lot
> ASM or compiler related recently.
> of well, I have recently noted that both AMD and Intel have their own
> ideas for how to extend C and C++ to handle large array-like vectors.

AMD produces ATI-GPUs as well, so the aspect may be more close to needs.

> I am left with another idea:
> how about we make a "conservative" set of extensions for this kind of
> thing:

> _Vector float a[100], b[100], c[100];

I'm for sure the wrong one to discuss C/C+- ... ;)

My personal approach (long before GPUs joined in) is an all integer
solution for drawing 3D-objects in various view-angles, aspects and
perspective points.
Every point in the array got four 16 bit entries:
x/y/z-coordinantes, direction-flags and neighbour link-flags.

Packed SIMD could be of particlar use for this.

For all graphic calculations I use a SIN/COS LUT with 2^n shift
biased 16-bit values ranged from 0.00...450.00 degrees.
Even this ended up in a 90002 bytes table it saves me on many
things like determining the sign and it doesn't need any divide.

This method were quite fast compared to other stuff around until
GPUs and 'accellerator drivers' (for M$ only) became available.

I got register information on AMD/ATI GPUs recently, but it will
ask me one more time to 're-engineer' a couple of drivers before
I know enough details on how to use this information.

Looks like this GPUs use SP-float packets for most things
and may calculate somehow faster than an FPU ever will.

[snipped C...] sorry for I wont be of any help on this.

__
wolfgang



From: mcjason on
On Jul 20, 1:11 pm, "Wolfgang Kern" <nowh...(a)never.at> wrote:
> "cr88192" wrote:
>
> ...
>
> > sadly, I don't have much on-topic to say, not having done a whole lot
> > ASM or compiler related recently.
> > of well, I have recently noted that both AMD and Intel have their own
> > ideas for how to extend C and C++ to handle large array-like vectors.
>
> AMD produces ATI-GPUs as well, so the aspect may be more close to needs.
>
> > I am left with another idea:
> > how about we make a "conservative" set of extensions for this kind of
> > thing:
> > _Vector float a[100], b[100], c[100];
>
> I'm for sure the wrong one to discuss C/C+- ... ;)
>
> My personal approach (long before GPUs joined in) is an all integer
> solution for drawing 3D-objects in various view-angles, aspects and
> perspective points.
> Every point in the array got four 16 bit entries:
> x/y/z-coordinantes, direction-flags and neighbour link-flags.
>
> Packed SIMD could be of particlar use for this.
>
> For all graphic calculations I use a SIN/COS LUT with 2^n shift
> biased 16-bit values ranged from 0.00...450.00 degrees.
> Even this ended up in a 90002 bytes table it saves me on many
> things like determining the sign and it doesn't need any divide.
>
> This method were quite fast compared to other stuff around until
> GPUs and 'accellerator drivers' (for M$ only) became available.
>
> I got register information on AMD/ATI GPUs recently, but it will
> ask me one more time to 're-engineer' a couple of drivers before
> I know enough details on how to use this information.
>
> Looks like this GPUs use SP-float packets for most things
> and may calculate somehow faster than an FPU ever will.
>
> [snipped C...] sorry for I wont be of any help on this.
>
> __
> wolfgang

I'd like to think again how a fluorescent lightbulb tube works...

If I use a few radio signals to make a lightshow in it, doesn't it
work that the pattern of light depends on the radio signals? but then
maybe a few at a time?
so the radio signals together make a pattern like what matters for
both of the radio signals, then doesn't it work out that there's a way
to forumlate the signals
to make a pattern in different ways?

then isn't this what can do math computation? If it's figured what
signals find what response in the flurescent lightbulb tube, then
can't I find signals that say a math operation and a lightshow that
says a math answer?

From: cr88192 on

"Wolfgang Kern" <nowhere(a)never.at> wrote in message
news:g5vrj9$ubn$1(a)newsreader2.utanet.at...
>
> "cr88192" wrote:
> ...
>
>> sadly, I don't have much on-topic to say, not having done a whole lot
>> ASM or compiler related recently.
>> of well, I have recently noted that both AMD and Intel have their own
>> ideas for how to extend C and C++ to handle large array-like vectors.
>
> AMD produces ATI-GPUs as well, so the aspect may be more close to needs.
>

yeah, many of these extensions are intended for offloading calculations to
the GPU (and are also intended for allowing efficient utilization of SSE
operations).

lamely, both provide extensions that are, IMO, ugly (and generally go
against what is generally allowed within the respective standards).


>> I am left with another idea:
>> how about we make a "conservative" set of extensions for this kind of
>> thing:
>
>> _Vector float a[100], b[100], c[100];
>
> I'm for sure the wrong one to discuss C/C+- ... ;)
>
> My personal approach (long before GPUs joined in) is an all integer
> solution for drawing 3D-objects in various view-angles, aspects and
> perspective points.
> Every point in the array got four 16 bit entries:
> x/y/z-coordinantes, direction-flags and neighbour link-flags.
>
> Packed SIMD could be of particlar use for this.
>
> For all graphic calculations I use a SIN/COS LUT with 2^n shift
> biased 16-bit values ranged from 0.00...450.00 degrees.
> Even this ended up in a 90002 bytes table it saves me on many
> things like determining the sign and it doesn't need any divide.
>

possible, but yes, not as applicable to GPUs...


> This method were quite fast compared to other stuff around until
> GPUs and 'accellerator drivers' (for M$ only) became available.
>
> I got register information on AMD/ATI GPUs recently, but it will
> ask me one more time to 're-engineer' a couple of drivers before
> I know enough details on how to use this information.
>
> Looks like this GPUs use SP-float packets for most things
> and may calculate somehow faster than an FPU ever will.
>

yeah.

the GPUs are capable of large-scale parallel operation, wheras the CPU is
capable of high-speed serial operation. the GPUs tend to have more parralel
speed than the CPU has serial speed.


sadly, I am not that fammiliar with low-level interfacing with the GPU, more
with low-level stuff on the CPU, and recently doing a few tasks by making
use of the GPU and shaders (via OpenGL), a recent example being me figuring
out how to approximately implement visibility calculations/occlusion-culling
in real-time via the GPU (a variant of this approach could also be used for
doing faster PVS calculations, albeit still likely taking several seconds or
more for a large and complex scene).

the main limitation being though, that it only currently works for
"currently visible" solid geometry (a true PVS could answer more complex
queries involving non-visible areas as well), me being uncertain how to use
it for efficiently culling things like shadow volumes (it being more
expensive right now to accurately cull the shadow geometry than to just
render it).

at present, I am using the lamely sad approach of culling the shadows along
with their respective geometry (the alternative approach of just drawing all
the shadows can hurt framerate), so when a shadowing object goes off-screen,
the shadow simply disappears...

so, effective shadow handling (among other things) is still likely to
require a "proper" occlusion-culling scheme (either precomputed, or
real-time computed, full PVS).


another possible route is keeping track of the bounds for the currently
visible area, which, along with the frustum, could allow culling shadows by
the currently visible area (such as adding a kind of shadowing far-clip
plane, ...).

this would be real-time and approximate, but should be good enough (possible
issue: this would tend to be a lot less effective, say, if looking down a
long hallways with lots of shadow-causing geometry nearby, but not actually
visible from within the hallway).


> [snipped C...] sorry for I wont be of any help on this.
>

yeah, mostly just semantics anyways.


> __
> wolfgang
>
>
>


From: Wolfgang Kern on

"cr88192" wrote:
....

> the GPUs are capable of large-scale parallel operation, wheras the CPU
> is capable of high-speed serial operation. the GPUs tend to have more
> parallel speed than the CPU has serial speed.

Yes, I checked on a few newer games and rough calculated the required
speed for turning an fully animated 3D-view (including shadows, emmissive
objects and interactive entities) into any view angle without distortion.

Seems CPU/FPU aren't involved in this at all, because it would need
a ~300 GHz machine to perform all matrix calculation with SIMD within
one 1600*1200,32 60Hz frame.
It's astonishing enough how fast the whole screen is redrawn,
even when I check it on my old 500/33 MHZ K7 and set the frame
rate to non interlaced 100Hz (>200 MHz dot clock).
With linear frame VESA, I'm limited to the given 33 MHZ bus here.

....
while you mention OpenGL ...
I haven't checked on it, an HLL-story anyway ?
....

> the main limitation being though, that it only currently works for
> "currently visible" solid geometry (a true PVS could answer more complex
> queries involving non-visible areas as well), me being uncertain how to
use > it for efficiently culling things like shadow volumes (it being more
> expensive right now to accurately cull the shadow geometry than to just
> render it).

> at present, I am using the lamely sad approach of culling the shadows
along > with their respective geometry (the alternative approach of just
drawing all > the shadows can hurt framerate), so when a shadowing object
goes off-screen,
> the shadow simply disappears...

> so, effective shadow handling (among other things) is still likely to
> require a "proper" occlusion-culling scheme (either precomputed, or
> real-time computed, full PVS).

I think shadows don't need to be very exact and some 'true-light' cards
may do it on their own.

> another possible route is keeping track of the bounds for the currently
> visible area, which, along with the frustum, could allow culling shadows
by
> the currently visible area (such as adding a kind of shadowing far-clip
> plane, ...).

> this would be real-time and approximate, but should be good enough
>(possible issue: this would tend to be a lot less effective,
> say, if looking down a long hallways with lots of shadow-causing
> geometry nearby, but not actually visible from within the hallway).

You'll find this minor bugs (late shadow update) in almost all games,
perhaps just to keep top speed for the front view.

__
wolfgang



From: cr88192 on

"Wolfgang Kern" <nowhere(a)never.at> wrote in message
news:g61ku5$pc6$3(a)newsreader2.utanet.at...
>
> "cr88192" wrote:
> ...
>
>> the GPUs are capable of large-scale parallel operation, wheras the CPU
>> is capable of high-speed serial operation. the GPUs tend to have more
>> parallel speed than the CPU has serial speed.
>
> Yes, I checked on a few newer games and rough calculated the required
> speed for turning an fully animated 3D-view (including shadows, emmissive
> objects and interactive entities) into any view angle without distortion.
>
> Seems CPU/FPU aren't involved in this at all, because it would need
> a ~300 GHz machine to perform all matrix calculation with SIMD within
> one 1600*1200,32 60Hz frame.
> It's astonishing enough how fast the whole screen is redrawn,
> even when I check it on my old 500/33 MHZ K7 and set the frame
> rate to non interlaced 100Hz (>200 MHz dot clock).
> With linear frame VESA, I'm limited to the given 33 MHZ bus here.
>

yes, these cards can do some things pretty quickly, it can be surprising.
for the limited range of tasks that can be pulled off on it, the GPU can be
fairly effective for bulk calculations.

for example, it is possible that the GPU could be used for decent-sized
neural net simulation or similar (future GPUs allowing for much bigger
nets). many things that would be done using much slower but direct
algorithms can be done on the GPU using "visual reasoning" based algos (draw
a bunch of stuff and process the results).

this is actually how I was doing visibility calculations:
draw a bunch of stuff and what is visible, is visible...


but, there are still limits...


> ...
> while you mention OpenGL ...
> I haven't checked on it, an HLL-story anyway ?
> ...
>

I am not sure what you are asking exactly.

if about my compiler, well, it does C, but I got distracted before I got my
new compiler core fully written (thus, no x86-64 support yet).

adding Java and JavaScript support have similarly been on hold.
C still works though...


more recently, I have been doing a lot more 3D related stuff, and very
recently have started again messing with physics simulation, me considering
adding several features to my physics engine:
soft-body physics (cloth, globs, ...);
particle simulation (for smoke and fire effects, ...);
ragdolls;
physics shaders;
....


>> the main limitation being though, that it only currently works for
>> "currently visible" solid geometry (a true PVS could answer more complex
>> queries involving non-visible areas as well), me being uncertain how to
> use > it for efficiently culling things like shadow volumes (it being more
>> expensive right now to accurately cull the shadow geometry than to just
>> render it).
>
>> at present, I am using the lamely sad approach of culling the shadows
> along > with their respective geometry (the alternative approach of just
> drawing all > the shadows can hurt framerate), so when a shadowing object
> goes off-screen,
>> the shadow simply disappears...
>
>> so, effective shadow handling (among other things) is still likely to
>> require a "proper" occlusion-culling scheme (either precomputed, or
>> real-time computed, full PVS).
>
> I think shadows don't need to be very exact and some 'true-light' cards
> may do it on their own.
>

yeah.
well, in general it is working fairly well, but some effects are lost...


actually, there is a certain degree of quality that IMO can currently only
really be done with ray-tracing and radiosity methods, however real-time
stuff is real time.

so, one can have a world where rendering is much faster, lighting is much
better, shadows are nice and smooth, ... but where the lights are fixed in
position and a long costly process is needed to rebuild the map;
or a world where lighting has funky pop-up issues, rendering is much slower,
and everything looks like dark grungy plastic (aka: the Doom3/Quake4 look),
but with the gain that everything can happen in real-time (add a light, and
the scene gets lighter, move something and the shadow follows, open a hole
in the wall and have the light flow right through, ...).

much better would be combine these approaches, where some lighting is static
and other lighting is dynamic, but this does require using a prebuilt BSP...


however, the GPU would atleast allow for, among other things, greatly
speeding up radiosity calculations (albeit full scene radiosity is by no
means likely to be real-time either, nor am I sure how one could effectively
do radiosity purely from the POV of the camera).


>> another possible route is keeping track of the bounds for the currently
>> visible area, which, along with the frustum, could allow culling shadows
> by
>> the currently visible area (such as adding a kind of shadowing far-clip
>> plane, ...).
>
>> this would be real-time and approximate, but should be good enough
>>(possible issue: this would tend to be a lot less effective,
>> say, if looking down a long hallways with lots of shadow-causing
>> geometry nearby, but not actually visible from within the hallway).
>
> You'll find this minor bugs (late shadow update) in almost all games,
> perhaps just to keep top speed for the front view.
>

yes, oh course, my current approach does not allow delaying shadows (I am
using the depth-fail shadow-volumes approach, which sadly has to be redrawn
every frame, and creates a totally huge amount of invisible geometry and
overdraw related to drawing the shadow volumes and then drawing the lights
onto the surfaces...).


the result is that, for the Quake-1 maps, I get performance comprable to
what Quake-1 was getting on the original hardware it was targetting (good
old pentium 90s and pentium 133s).

of course, for a long time, I had been running it primarily on a 486DX-66
and similar style HW (that is what my main computer had been at the time),
so I had been fairly used to it lagging (or, for true terror, one could try
to use HW accel on the then new S3-Virge and later S3-Trio-64 cards, for
questionable graphics at far worse framerates...).


long ago, the Voodoo cards were the major point I think when HW-accel
actually fully outdid SW rendering, my projects eventually changing focus
primarily to HW-accel primarily around the time when the GeForce-2 was still
newer (and when I actually had one). I think prior to this, I had tried
doing some stuff using Mesa, but it was unacceptably slow at the time...


> __
> wolfgang
>
>
>


 |  Next  |  Last
Pages: 1 2 3 4 5 6 7
Prev: cpu type idea
Next: CPU type idea