|
From: David Kanter on 31 Mar 2008 19:39 On Mar 29, 2:42 am, n...(a)cus.cam.ac.uk (Nick Maclaren) wrote: > In article <d3e90bcf-a062-49d0-86b5-1e8445212...(a)s19g2000prg.googlegroups.com>,David Kanter <dkan...(a)gmail.com> writes: > > |> > |> > Don't believe what you read - half of it is propaganda :-) > |> > |> In this case, it would be your post. SMT is a very well established > |> technique, used by most of the high performance CPU vendors. Even the > |> embedded guys use SOEMT quite heavily. > > I suggest that you read a thread before allowing your knees to jerk. > I specifically said that SOEMT was viable. And I don't disagree. > Dammit, it was obviously > a good way to proceed even before the Tera MTA showed that it was! > Blindingly obvious is the expression that springs to mind. As Niels pointed out, Tera isn't SOEMT. > |> > I took the trouble of reading Eggers' main paper (and others), and > |> > analysing her calculations. I started out impressed, and got less > |> > so as I proceeded. There was one very significant omission: the > |> > comparison between the throughput of a SMT system and a multi-core > |> > system with the same number of transistors and same amount of > |> > parallelism. > |> > |> Here's a hint Nick, if you think that ALU real estate was a 1970's > |> problem, then transistor count is a 1960's problem. > |> > |> CPUs today use around 400-600M transistors easily, with high-end ones > |> up to 2B. I don't know about you, but somehow I don't think > |> transistor count is a relevant metric today. > > Would you like to explain what on earth you are wittering on about? > That was precisely the point that I made! And, before you jerk your > knees again, read by third point below. No, I said transistor count is not a big deal, but power efficiency is. If you can increase performance faster than you increase power with SMT, it's a win. In many cases, that is what happens. Especially if you also chuck out the OOO part as well (as IBM has done). > |> SMT is an extremely power efficient technique, and that's a hell of a > |> lot more important than transistors. Every year, we get 2x as many > |> transistors, yet the number of watts remains the same. Which one do > |> you think is the bottleneck? > > Sigh. Firstly, that is wrong. The number of watts fluctuates a bit > but, even over the past decade, the trend is up. That's funny, because I'm pretty sure the TDP for Intel's chips went up to 150W and now is down to 120W and 80W for mainstream. It may rise in the future, as the northbridge is integrated, but it definitely peaked around 2005/6. > Secondly, all of the many chip designers, experts and vendors I have > spoken to have referred to process technologies as the key to keeping > power under control, followed by changing clock rates and/or turning > sections of chip off when unused. SMT was only mentioned before they > had tried it :-) If you are going to call on outside authority, why don't you try to make it clear who you are talking about, not just anonymous people. SMT increases performance/watt, which is all I really care about. There are other techniques that increase performance more (adding cores), but they add much much more power and complexity. For instance, if you have 4 cores and want more performance, you could add SMT, which doesn't mess up the system infrastructure, or add cores, which may require a ring instead of a cross bar interconnect, etc. etc. SMT and CMP have very different trade-offs. > Thirdly, your claim that SMT is an extremely power efficient technique > doesn't make it so. You clearly haven't read Eggers' main paper with > any care, or haven't understood it. I used the transistor count as a > constraint only because that paper did. I have read it several times. > Fourthly, you failed to understand the point I made above - replace the > constraint on number of transistors by the number of watts, and there > was STILL no comparison in those papers with an equivalent multi-core > system. SMT and multicore are not mutually exclusive. > Fifthly, both Sun's and Intel's 'experimental' low-power systems use > larger numbers of simpler, non-SMT cores Wrong. Sun uses a scalar pipeline with SOEMT, which is the same as SMT. See page 2 of this paper: http://www.cse.ucsd.edu/~rakumar/dasCMP05/paper01.pdf Seeing as how Dr. Laudon was one of the architects for Niagara, I think he knows what he's talking about. I don't disagree that SOEMT is a good technique, I just think SMT is better. > - which is PRECISELY the > technique I was saying that should also have been considered. And > THAT is the key to why you should update your beliefs! > > |> > Also, the scalability was dire, and that was for the > |> > SMT-friendly MIPS chip - as history shows, Intel's attempt to use > |> > it on the x86 was not a great success. > |> > |> Here's a hint, implementations vary. Intel's first implementation > |> sucked. Their second one probably won't. > > Here's a technique. Look at what Intel are experimenting with. It's > easy enough to find on the Web - if you can be bothered - I knew about > it earlier, of course, under NDA. Intel has both SMT and SOEMT designs. And I don't think there is enough information in public on intel's plans for me to really be comfortable discussing them. > |> Every processor IBM has designed since the POWER5 has used SMT, and > |> generally, IBM has a tendency to make reasonable choices. > > IBM's objective with those designs is not mainstream computing - more > like mainframe computing! There are four examples, the POWER5, POWER6, CELL and the Xbox360 chip. 2 of them are not even in classic computers, but game consoles. > |> You should leave armchair architecture to those that actually > |> understand it... > > So, are you claiming that you do? :-) I didn't say that, I said you shouldn't. DK
From: Nick Maclaren on 1 Apr 2008 10:02 In article <36549a55-a1cc-4850-97ee-40867cb900b0(a)s8g2000prg.googlegroups.com>, David Kanter <dkanter(a)gmail.com> writes: |> |> > Sigh. Firstly, that is wrong. The number of watts fluctuates a bit |> > but, even over the past decade, the trend is up. |> |> That's funny, because I'm pretty sure the TDP for Intel's chips went |> up to 150W and now is down to 120W and 80W for mainstream. It may |> rise in the future, as the northbridge is integrated, but it |> definitely peaked around 2005/6. The higher-wattage ones were SMT and the lower-power ones weren't, which rather argues against your point! |> If you are going to call on outside authority, why don't you try to |> make it clear who you are talking about, not just anonymous people. I don't expose other people to the mistreatment I get on this and similar newsgroups. I refer to people by name only when they have published. |> SMT increases performance/watt, which is all I really care about. As I said, I have never seen any evidence for that claim, and I have seen evidence that it is the converse of the truth (Intel Netburst versus Core-2 being one example). I don't think that is makes any difference. Rather than just repeating your claim, why don't you provide evidence? |> Wrong. Sun uses a scalar pipeline with SOEMT, which is the same as |> SMT. See page 2 of this paper: http://www.cse.ucsd.edu/~rakumar/dasCMP05/pa= |> per01.pdf I got flamed in 2001 for saying that SMT was an ill-defined term, and asking for a precise definition. The claim was that EVERYBODY knew what it meant except me! Well, as I posted recently, its meaning seems to be expanding, and: I don't know what the CPU design of the year 2010 [ sorry, that was the date I meant ] will look like, but it will be called SMT :-) SOEMT is also what mainframes used to do with paging; when there was a page miss, they switched thread. You MIGHT be able to say that modern systems don't use SOEMT, because they do it in software, but many of the older ones did it in hardware. So, is THAT a subset of SMT? |> Intel has both SMT and SOEMT designs. And I don't think there is |> enough information in public on intel's plans for me to really be |> comfortable discussing them. Actually, there is, but you may not have found it. I haven't kept all of the references. I will attempt to clarify what I mean, without using protean terms like SMT and SOEMT. You can design a CPU so that each thread has a fixed set of resources available when it is running (call that Mode A) or it shares a global pool with others (call that Mode B). Distributed memory clusters use Mode B for everything except the networking and network I/O. On shared-memory systems, at a very coarse grain (e.g. time-slicing, I/O etc.), everyone uses Mode B. Well, almost everyone - some specialist HPC systems don't. Most systems use Mode B for the grain of real memory accesses, but a lot of HPC tuning involves trying to force the system into using Mode A for even that. Most CPUs use Mode A for the actual instruction pipeline, though they often use Mode B for the floating-point units (e.g. Niagara or vector systems). Eggers-style SMT involves using Mode B even for the core pipeline, and that is what I say is a step too far, because the problem ISN'T the shortage of real estate any longer. Whether inactive units can be shut down or not is an ORTHOGONAL question - yes, it is both similar and interacts. I don't like it, because I know some of the software consequences, but it is very popular with designers :-( In a design with a lot of simple cores, you get all of the power gains that Mode B provides, and more, by simply shutting down whole cores - which doesn't bring in the same harmful consequences. What I am saying makes sense is to use Mode A for everything below the main memory access, mode B for the memory access, and to switch threads on a cache miss. After all, as some of us have been saying for decades, the 1960s disk/memory relationship is comparable to the 2000s memory/cache one. Regards, Nick Maclaren.
From: Bill Todd on 1 Apr 2008 21:05 John Dallman wrote: > In article > <d3e90bcf-a062-49d0-86b5-1e8445212dfb(a)s19g2000prg.googlegroups.com>, > dkanter(a)gmail.com (David Kanter) wrote: > >> Every processor IBM has designed since the POWER5 has used SMT, and >> generally, IBM has a tendency to make reasonable choices. > > And they say, up-front, that if you're doing something CPU-limited, you > should turn it off. Well, duh. Of course, that's *not at all* the same as saying that if you're doing something that's *memory-bound* you should turn it off - which is far more likely in commercial environments (the kind that mostly pay IBM's freight). Their pride in SMT's effectiveness in the TPC-C benchmark is one example that I just noted elsewhere (there are several others I've run across). - bill
From: Nick Maclaren on 2 Apr 2008 04:22 This is getting ridiculous, and it will probably be my last attempt to correct misrepresentations. I make no apologies for using capitals. In article <P4mdneDzj49MUG_anZ2dnUVZ_o3inZ2d(a)metrocastcablevision.com>, Bill Todd <billtodd(a)metrocast.net> writes: |> Nick Maclaren wrote: |> > In article <36549a55-a1cc-4850-97ee-40867cb900b0(a)s8g2000prg.googlegroups.com>, |> > David Kanter <dkanter(a)gmail.com> writes: |> > |> |> > |> > Sigh. Firstly, that is wrong. The number of watts fluctuates a bit |> > |> > but, even over the past decade, the trend is up. |> > |> |> > |> That's funny, because I'm pretty sure the TDP for Intel's chips went |> > |> up to 150W and now is down to 120W and 80W for mainstream. It may |> > |> rise in the future, as the northbridge is integrated, but it |> > |> definitely peaked around 2005/6. |> > |> > The higher-wattage ones were SMT and the lower-power ones weren't, |> > which rather argues against your point! |> |> Not at all, Nick: it simply reflects the fact that the newer, |> lower-power processors used significantly more efficient designs (in |> part due to their return to a better balance between clock-rate and ILP |> goals - though as things turned out clock rates didn't suffer too much |> after all). An apples-to-apples comparison of the kind that you're |> suggesting would have to compare SMT with non-SMT products using much |> more similar designs. Oh, for heaven's sake! Yes, of course I know that. That is PRECISELY why I made no claim, in that posting or elsewhere, that that difference showed that SMT is more power-hungry than non-SMT. I don't believe it makes a damn of difference. I was SIMPLY pointing out that David Kanter's OWN example showed a power-hungry SMT being replaced by a less power-hungry non-SMT, which arues against his point. |> What David's comment *does* address is your debatable generalization |> about power trends, despite your attempt to change the subject. In |> particular, Netburst was a power-hungry anomaly reflecting marketing |> considerations: if you leave Netburst out of the picture your |> generalization might have at least some merit in the PC space (Core |> processors dissipating more power then Pentium III IIRC), but when |> Netburst is included there's a definite peak (as David observed) - plus |> some indication that even within just the Core/Core2 lines peak power |> levels may be slowly declining now. Again, for heaven's sake! I was pointing out that it addresses HIS debatable generalisation that SMT is a power-saving technique. Firstly, READ what I said. I was talking about medium-term trends, and specifically mentioned that there were contrary fluctuations. Secondly, don't get confused by the nominal, 'average' power rating of CPUs, because that has very largely been kept under control by 'power saving' techniques (i.e. stopping units and clock twiddling). Look at the power supply and cooling requirements. HOWEVER, to repeat what I have said earlier, I am NOT saying that SMT is a power-hungry technique, and I am NOT saying that it provides no improvements in performance/watt for any measure of performance. What I am doing is to state the following three (true) things: 1) I have never seen any decent comparisons for throughput versus a well-designed, highly multi-core CPU, and believe that it would do a lot less well than its fanatics claim. 2) I have never seen any good evidence that it provides ANY real benefit in performance/watt - what evidence there is, is mixed, and can equally well be used to argue that it is WORSE. 3) What is being called SMT is changing, and its fanatics are including all sorts of traditional technologies under the name to justify their claims. Just as the RISC fanatics did :-( |> Then you haven't been paying attention - as in fact you admit elsewhere |> with regard to recent POWER implementations, since the best evidence |> that immediately springs to mind is that of POWER5, where SMT |> purportedly provides up to 40% increased TPC-C throughput per core. The |> EV8 simulations predicted even higher increases for their |> implementation, but of course we'll never know. And even poor old |> Itanic (Montecito) appears to gain something like a 20% benefit from SMT |> in TPC-C (though if you mean to exclude SoeMT from your definition of |> 'SMT' that may not be relevant to you). On the contrary, you are ignoring what was well known about the POWER series. Yes, OF COURSE, the POWER5 did massively better than the POWER4 - the latter had a seriously cocked-up memory system that made it often slower than the POWER3 on memory-bound HPC workloads, and severely impacted its throughput on many workloads. That was fixed in the POWER5. So why are you saying that the Netburst/NGMA difference shouldn't be used to counter the 'SMT is cool' claim, because we know that the cause wasn't that, but the POWER4/POWER5 one should, despite the fact that we know the same? That is inconsistency, at best. I could carry on, but I give up :-( Regards, Nick Maclaren.
From: Nick Maclaren on 2 Apr 2008 05:46
In article <orqdnQPhUMmHz27anZ2dneKdnZydnZ2d(a)giganews.com>, Terje Mathisen <terje.mathisen(a)hda.hydro.com> writes: |> Jan Vorbr�ggen wrote: |> |> There are two main reasons for Windows to stop responding, cpu |> overcommmit is one of them. |> |> The other is IDE/SATA disk queing: |> |> Even with a lot of memory, i.e. 4GB on my current laptop (of which XP-32 |> can only use 3.4GB), Windows still insists on scheduling large write |> jobs to flush memory to swap disk, and if I do something that requires |> some substantial disk IO, like ISO burning, or starting/stopping a |> VMware image, all other windows can stop responding. |> |> > I do realize, of course, that Microsoft, Mozilla et al. should be fixing |> > their software first - but that is a forlorn hope, IMNSHO. |> |> Indeed. Well, the reliable availability of a large number of cores is a key (not THE key) to fixing this! Replacing the priority and interrupt approach (with the necessity for effectively uninterruptible actions) by critical system activities having a dedicated core can help. But Microsoft aren't going to make that change until 95% (?) of their customers have such systems - and may not even then :-( Regards, Nick Maclaren. |