Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: nmm1 on 24 Dec 2009 05:48

In article <4B32FE4C.4090308(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>
>"This has been tried (and failed) before; what's different now?" is my
>mantra. But it is not dismissal.
>
>Constraints change.

As do requirements. The correct questions to ask include:

If it always failed before, why did it fail, and what has changed
to mean that can be handled this time?

If it worked before, what did it rely on, and are all of those
preconditions still valid?

Regards,
Nick Maclaren.

From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 08:23

Andy "Krazy" Glew wrote:
> Terje Mathisen wrote:
>> Isn't this _exactly_ the same as the current setup on some chips that
>> use 128-byte cache lines, split into two sectors of 64 bytes each.
>>
>> I.e. an effective cache line size that is smaller than the "real" line
>> size, taken to its logical end point.
>>
>> I would suggest that (as you note) register size words is the smallest
>> item you might need to care about and track, so 8 bits for a 64-bit
>> platform with 64-byte cache lines, but most likely you'll have to
>> support semi-atomic 32-bit operations, so 16 bits which is a 3% overhead.
>
> Well, it's not * exactly * like sectored cache lines. You typically
> need the sector size to be a multiple of the dram burst transfer size,
> what Jim Goodman called the 'transfer block size' in his paper that I
> thought defined the only really good terminology.

OK, that does make sense from a performance viewpoint.

When discussing sub-cache-line granularity for read/write consistency,
it still means that you need to track the M(O)ESI status of each partial
cache line.

> Byte granularity is motivated because it is the smallest granularity
> that you can usually write into some memories without having to do or
> read modify write. Almost nobody allows you to write at bit
> granularity. Sure, some systems do not allow you to write at byte
> granularity and they may even require you to write at word or cache line
> granularity. But byte granularity is very widespread.
>
> If you track this at word granularity but allow the user to write a byte
> granularity because that's what his instruction set has, then you run
> the risk of losing writes.

Yes, but I still think it is the right thing to do!

> Writes can be lost in this way whenever the bitmasks used to merge the
> evicted cache lines are of coarser granularity than the minimum write
> size in the instruction set.

Right.

> Since the whole point of this exercise is to try to reduce the overhead
> of cache coherency, but people have demonstrated they don't like the
> consequences semantically, I am trying a different combination: allow A,
> multiple values; allow B weak ordering; but disallow C losing writes.
>
> I possibly that this may be more acceptable and fewer bugs.
>
> I.e. I am suspecting that full cache coherency is overkill, but that
> completely eliminating cache coherency is underkill.

I agree, and I think most programmers will be happy with word-size
tracking, i.e. we assume all char/byte operations happens on private
memory ranges.
> * This * post, by the way, is composed almost exclusively by speech
> recognition, using the pen for certain trivial edits. It's nice to find
> a way that I can actually compose stuff on a plane again.

Seems to work better than your wan-based tablet posts!

Terje

PS. Merry Christmas everyone, here in Norway the 24th is a much bigger
celebration than the day after. :-)

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 08:33

Robert Myers wrote:
> On Dec 23, 12:21 pm, Terje Mathisen<"terje.mathisen at tmsw.no">
>> Why do a feel that this feels a lot like IBM mainframe channel programs?
>> :-)
>
> Could I persuade you to take time away from your first love
> (programming your own computers, of course) to elaborate/pontificate a
> bit? After forty years, I'm still waiting for someone to tell me
> something interesting about mainframes. Well, other than that IBM bet
> big and won big on them.
>
> And CHANNELS. Well. That's clearly like the number 42.

Del have already answered, but since I know far less than him about IBM
systems, I'll try anyway:

As Del said, an IBM mainframe has lots of dedicated slave processors,
think of them as very generalized DMA engines where you can do stuff like:

seek to and read block # 48, load the word at offset 56 in that block
and compare with NULL: If equal return the block, otherwise use the word
at offset 52 as the new block number and repeat the process.

I.e. you could implement most operations on most forms of diskbased tree
structures inside the channel cpu, with no need to interrupt the host
before everything was done.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 08:36

Bernd Paysan wrote:
> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>> PS. This is my very first post from my personal leafnode installation:
>> I have free news access via my home (fiber) ISP, but not here in
>> Rauland on Christmas/New Year vacation, so today I finally broke down
>> and installed leafnode on my home FreeBSD gps-based ntp server. :-)
>
> I use leafnode locally for a decade or so now; it does a good job on
> message prefetching, and it also can be used to hide details like where
> my actual news feed is coming from.

news.hda.hydro.com which was my local news server since ~1992 started
out as a full server, but ran leafnode (which I installed) during the
last 5-6 years of its life, when the number of local users dropped a lot.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen "terje.mathisen at on 24 Dec 2009 08:40

Robert Myers wrote:
> On Dec 23, 7:57 pm, Anne& Lynn Wheeler<l...(a)garlic.com> wrote:
>> Terje Mathisen<"terje.mathisen at tmsw.no"> writes:
>>
>>> Why do a feel that this feels a lot like IBM mainframe channel programs?
>>> :-)
>>
>> downside was that mainframe channel programs were half-duplex end-to-end
>> serialization. there were all sorts of heat& churn in fiber-channel
>> standardization with the efforts to overlay mainframe channel program
>> (half-duplex, end-to-end serialization) paradigm on underlying
>> full-duplex asynchronous operation.
>>
>> from the days of scarce, very expensive electronic storage
>> ... especially disk channel programs ... used "self-modifying" operation
>> ... i.e. read operation would fetch the argument used by the following
>> channel command (both specifying the same real address). couple round
>> trips of this end-to-end serialization potentially happening over 400'
>> channel cable within small part of disk rotation.
>>
>> trying to get a HYPERChannel "remote device adapter" (simulated
>> mainframe channel) working at extended distances with disk controller&
>> drives ... took a lot of slight of hand. a copy of the
>> completedmainframe channel program was created and downloaded into the
>> memory of the remote device adapter .... to minimize the
>> command-to-command latency. the problem was that some of the disk
>> command arguments had very tight latencies ... and so those arguments
>> had to be recognized and also downloaded into the remote device adapter
>> memory (and the related commands redone to fetch/store to the local
>> adapter memory rather than the remote mainframe memory). this process
>> was never extended to be able to handle the "self-modifying" sequences.
>>
>> on the other hand ... there was a serial-copper disk project that
>> effectively packetized SCSI commands ... sent them down outgoing
>> link ... and allowed asynchronous return on the incoming link
>> ... eliminating loads of the scsi latency. we tried to get this morphed
>> into interoperating with fiber-channel standard ... but it morphed into
>> SSA instead.
>>
>> --
>> 40+yrs virtualization experience (since Jan68), online at home since Mar1970
>
> What bothers me is the "it's already been thought of"
>
> You worked with a different (and harsh) set of constraints.
>
> The contstraints are different now. Lots of resources free that once
> were expensive. Don't want just a walk down memory lane. The world
> is going to change, believe me. Anyone here interested in seeing
> how?
>
> What can we know from the hard lessons your learned. That's a good
> question. What's different now. That's a good question, too.
> Everything is the same except the time scale. That answer requires a
> detailed defense, and I think it's wrong. Sorry, Terje.

Don't feel sorry for me!

I think active code/message passing/dataflow is the obvious direction of
all big systems, including everything that needs to run over the internet.

After all, downloading java applets to the client that knows how to
handle the accompanying server data is one working exam

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

First | Prev | Next | Last
Pages: 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Prev: PEEEEEEP
Next: Texture units as a general function