Prev: Multi-core lag for Left 4 Dead 1 and 2 and Quake 4 on AMD X23800+ processor... why ?
Next: Which is the most beautiful and memorable hardware structure in a CPU?
From: Tim McCaffrey on 3 Apr 2010 09:46
In article <tpqar5pfc05ajvn3329an6v9tcodh58i65(a)4ax.com>, kal(a)dspia.com says...
>On Thu, 01 Apr 2010 22:48:47 -0500, Del Cecchi` <delcecchi(a)gmail.com>
>>> On Apr 1, 5:40 pm, timcaff...(a)aol.com (Tim McCaffrey) wrote:
>>>>The PCIe 2.0 links on the Clarkdale chips runs at 5G.
>>> Any how many dozen meters can these wires run?
>>Maybe 1 dozen meters, depending on thickness of wire. Wire thickness
>>depends on how many you want to be able to put in a cable, and if you
>>want to be able to bend those cables.
>>10GbaseT or whatever it is called gets 10 gbits/second over 4 twisted
>>pairs for 100 meters by what I classify as unnatural acts torturing the
>You should also add to it that this is full-duplex ie simultaneous
>transmission of 10G in both directions. One needs 4 equalizers, 4 echo
>cancellers, 12 NEXT and 12 FEXT cancellers in addition to a fully
>parallel LDPC decoder (don't even talk about the insane requirement on
>the clock recovery block). Over the last 5 years probably US$ 100M of
>VC money got spent to develop 10GBT PHYs with several startups
>disappearing with not much to show for. Torturing the bits indeed (not
>the mention torture of the engineers trying to make this thing work.)
There are several companies that provide PCIe bus extender technology.
(Some over optic, IIRC)
PLX technology has some large PCIe switches and non-transparent bridges.
The tech is out there.
With the right software and hardware support (which I think the PLX NT bridges
have), PCIe can be much lower latency and overhead than 10G Ethernet.
From: Robert Myers on 3 Apr 2010 11:38
> The thing that pisses me off is having to explain to them that they
> ALSO need to take account of the design deficiencies of the less
> clueful mainframe architectures and operating systems into account,
> because that is the level at which modern systems map to them :-(
> Stephen's points are a prime example of this. We learnt that that
> was NOT how to handle virtual memory back in the 1960s, but the new
> kid on the block (IBM mainframe division) wouldn't be told anything,
> and things have gone downhill from there :-(
I'd be fascinated to know how you teach that.
Software at the programming language level has increasingly aimed at
making the machine more abstract. Even at the Level of C or assembly
language, I don't know how you'd manage things, short of lots of trial
and error, because the salient features of the hardware aren't exposed
to the programmer.
One possible answer is to rely on machine-specific libraries or
implementations of domain-specific languages. but that still leaves
*someone* to understand the arcane details of the hardware and to find a
way to manipulate them--probably, in the process, side-stepping your
advice to avoid doing things at the OS level.
From: nmm1 on 3 Apr 2010 11:56
In article <WhJtn.57101$y13.10500(a)newsfe12.iad>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>> The thing that pisses me off is having to explain to them that they
>> ALSO need to take account of the design deficiencies of the less
>> clueful mainframe architectures and operating systems into account,
>> because that is the level at which modern systems map to them :-(
>> Stephen's points are a prime example of this. We learnt that that
>> was NOT how to handle virtual memory back in the 1960s, but the new
>> kid on the block (IBM mainframe division) wouldn't be told anything,
>> and things have gone downhill from there :-(
>I'd be fascinated to know how you teach that.
I don't even try to teach the details. I teach them that cache lines
are NOT the only thing to worry about, and that they should avoid
all of widely-separated frequent accesses, updating close objects when
writing shared-memory parallel code and objects separated by multiples
of powers of two. And that the distances involved are 32-256 bytes
for cache lines, but powers of two in excess of 4 KB for the other
issues. Also that they don't have much chance of tracking such
problems down themselves, and to call for help :-(
And that even those rules have exceptions, such as the POWER4 ....
That's more of an innoculation against over-simplistic books and
Web pages than anything else. It's not satisfactory, but it's all
I can do.
>One possible answer is to rely on machine-specific libraries or
>implementations of domain-specific languages. but that still leaves
>*someone* to understand the arcane details of the hardware and to find a
>way to manipulate them--probably, in the process, side-stepping your
>advice to avoid doing things at the OS level.
No, there is a better solution. The hardware and operating system
design could avoid such problems arising in the first place. Not all
problems are amenable to elimination, but many are. And it's not
doing that that I think is the defect in modern systems.
From: "Andy "Krazy" Glew" on 3 Apr 2010 13:09
On 4/2/2010 9:45 PM, Morten Reistad wrote:
> If the 6 mb (or 12, or 24) of ram are superfast, have a minimal mmu,
> at least capable of process isolation and address translation; and do
> the "paging" to main memory, then you could run one of these minimal,
> general purpose operating systems inside each gpu/cpu/whatever, and
> live with the page faults. It will be several orders of magnitude
> faster and lower latency than the swapping and paging we normally love
> to hate. We already have that "swapping", except we call it "memory
> The theory is old, stable and well validated. The code is done, and
> still in many operating systems. We "just need drivers".
I know of at least one startup whose founder called me up asking for somebody able to write such a driver and/or tune
the paging subsystem to create a hierarchy of ultra fast and conventional DRAM.
A few months later he was on to hardware state machines to do the paging. Linux paging was too high overhead.
Now, I suspect that a highly tuned paging system might be able to be 10X faster than Linux's. (In the same way that
Linux ages much better than Windows.) (I hearken back to the day when people actually cared about paging, e.g. at Gould.)
But the lesson is that modern software, modern OSes, don't seem to page worth a damn. And if you are thinkng to brush
off, re-tune, and completely reinvent virtual memory algoritms from the god old days (paging andor swapping), you must
remember that the guys doing the hardware (and, more importantly, microcode and firmware, which is just another form of
software) for such multilevel mmory systems have the same ideas and read the same papers.
It's just the usual:
If you do it in software in the OS
- you have to do it in every OS
- you have to ship the new OS, or at least the new drivers
+ you can take advantage of more knowledge of software, processes, tasks
+ you can prototype quicker, since you have Linux sourcecode
If you do it in hardware/firmware/microcode
+ it works for every OS, including legacy OSes without device drivers
- you have less visibility to software constructs
+ if you are the hardware company, you can do it without OS (Microsoft OS) source code
Hmmm... HW/FW/UC can only really do paging. Can't really do swapping. Swapping requires SW knowledge.
My take: if the "slower" memory is within 1,000 cycles, you almost definitely need a hardware state machine Blok the
process (hardware thread), switch to another thread.
If within 10,000 cycles, I think hardware/firmware will still win.
I think software OS level paging only is a clear win at >= 100,000 cycles.
From: nmm1 on 3 Apr 2010 13:29
In article <4BB7763D.9020103(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>Now, I suspect that a highly tuned paging system might be able to be 10X
>faster than Linux's. (In the same way that Linux ages much better than
>Windows.) (I hearken back to the day when people actually cared about
>paging, e.g. at Gould.)
Well, it ages better, too, but I suspect you mean pages :-)
>But the lesson is that modern software, modern OSes, don't seem to page
>worth a damn. And if you are thinkng to brush off, re-tune, and
>completely reinvent virtual memory algoritms from the god old days
>(paging andor swapping), you must remember that the guys doing the
>hardware (and, more importantly, microcode and firmware, which is just
>another form of software) for such multilevel mmory systems have the
>same ideas and read the sam
Very true. The problem isn't the algorithms, it's the use pattern
and design methodology. And hardware will always have a much lower
overhead, given the latter.
What I don't understand is why everybody is so attached to demand
paging - it was near-essential in the 1970s, because memory was
very limited, but this is 35 years later! As far as I know, NONE
of the facilities that demand paging provides can't be done better,
and more simply, in other ways (given current constraints).