From: jacko on
On 14 July, 17:59, MitchAlsup <MitchAl...(a)aol.com> wrote:
> On Jul 14, 5:37 am, "Piotr Wyderski"
>
> <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote:
> > Not only, Sparcs have much worse memory disambiguation mechanisms.
> > Even such a simple loop shows stunning differences:
>
> >     for(volatile unsigned int i = 0; i != 1000000000; ++i);
>
> > Investigation at the assembler level has shown that Sparc is good
> > (i.e. Xeon-like) at loads and stores, but when there is a long
> > store-load dependence, the performance loss is a total disaster, to
> > put it mildly. Xeons do not exhibit this kind of problems.
>
> x86 grew up in an environment where stores were reloaded in a rather
> short amount of time {arguments get pushed, then accessed a couple
> cycles later in the called subroutine.) Thus, there are mechanisms to
> forward store data to loads when the addresses match even when the
> store instruction has not retired. This normally goes under the
> monicer of Store-to-Load forwarding (STLF).
>
> Mitch

Stack cache. http;//nibz.googlecode.com

It's probably some throw back to shared memory models. Cache coherance
is for fools who cant run a miniture network.

It's like a sick joke where the kids with cash pick the stupid ideas
and make fabrications.
From: Morten Reistad on
In article <96ec88ed-b6a4-4e7e-a5d0-43b98ba34ae2(a)w12g2000yqj.googlegroups.com>,
jacko <jackokring(a)gmail.com> wrote:
>On 14 July, 17:59, MitchAlsup <MitchAl...(a)aol.com> wrote:
>> On Jul 14, 5:37�am, "Piotr Wyderski"
>>
>> <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote:
>> > Not only, Sparcs have much worse memory disambiguation mechanisms.
>> > Even such a simple loop shows stunning differences:
>>
>> > � � for(volatile unsigned int i = 0; i != 1000000000; ++i);
>>
>> > Investigation at the assembler level has shown that Sparc is good
>> > (i.e. Xeon-like) at loads and stores, but when there is a long
>> > store-load dependence, the performance loss is a total disaster, to
>> > put it mildly. Xeons do not exhibit this kind of problems.
>>
>> x86 grew up in an environment where stores were reloaded in a rather
>> short amount of time {arguments get pushed, then accessed a couple
>> cycles later in the called subroutine.) Thus, there are mechanisms to
>> forward store data to loads when the addresses match even when the
>> store instruction has not retired. This normally goes under the
>> monicer of Store-to-Load forwarding (STLF).
>>
>> Mitch
>
>Stack cache. http;//nibz.googlecode.com
>
>It's probably some throw back to shared memory models. Cache coherance
>is for fools who cant run a miniture network.
>
>It's like a sick joke where the kids with cash pick the stupid ideas
>and make fabrications.

All of these tests handle billions and trillions of pretty lightweight
IP packets, either as web servers, rtp reflectors, rtp bridges, dns
servers or database servers. There is state kept between packets.
Typical processing is high hundreds to low thousands of instructions
per packet. This does not allow for many memory accesses to take place.

You cannot identify the session before you have done a bit of
identification on the packet, and by then there is a thread on
a processor in the cluster that has read the packet and is
acting on it. When the state needs to be inspected and updated
the handful of bytes are very likely (on a 32 way machine the
odds are close to 31/32) that they reside in the cache of another
processor.

Doing an effective cross-cache snoop of that cache then becomes
essential for performance.

Almost all the UDP stuff on the important internet servers behave
like this. It verifies with the new instrumentation in Linux 2.6.31.
These servers are what is the effective performance limit for a
number of huge web sites.

-- mrr
From: jacko on
Is

http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4b9f67406c6852dd/844cab7cd4b9ab52#844cab7cd4b9ab52

important for solving the GC problem? Ref FIFOO
From: George Neuner on
On Thu, 15 Jul 2010 08:17:44 -0700 (PDT), jacko <jackokring(a)gmail.com>
wrote:

>Is
>
>http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4b9f67406c6852dd/844cab7cd4b9ab52#844cab7cd4b9ab52
>
>important for solving the GC problem? Ref FIFOO

There isn't enough detail in that conversation to figure out what
problem the structure is trying to solve. What exactly are you
asking?

George
From: jacko on
On Jul 14, 5:59 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
> On Jul 14, 5:37 am, "Piotr Wyderski"
>
> <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote:
> > Not only, Sparcs have much worse memory disambiguation mechanisms.
> > Even such a simple loop shows stunning differences:
>
> >     for(volatile unsigned int i = 0; i != 1000000000; ++i);
>
> > Investigation at the assembler level has shown that Sparc is good
> > (i.e. Xeon-like) at loads and stores, but when there is a long
> > store-load dependence, the performance loss is a total disaster, to
> > put it mildly. Xeons do not exhibit this kind of problems.
>
> x86 grew up in an environment where stores were reloaded in a rather
> short amount of time {arguments get pushed, then accessed a couple
> cycles later in the called subroutine.) Thus, there are mechanisms to
> forward store data to loads when the addresses match even when the
> store instruction has not retired. This normally goes under the
> monicer of Store-to-Load forwarding (STLF).
>
> Mitch

Yes, a force write-thru signal? Cache invalidates, well a write must
invalidate any other cores cache line, but if no invalidate happens,
then there's no imediate pressure to writeback. If an invalidate does
happen, then there maybe a pressure to writeback. If an invalidate
occurs when queued writeback is queued, then a write displacement
writethru must immediatly occur. Am I missing something?

Cheers Jacko