|
From: Bill Todd on 2 Apr 2008 16:51 Nick Maclaren wrote: > In article <P5SdnX8t4_MmKG7anZ2dnUVZ_rGhnZ2d(a)metrocastcablevision.com>, > Bill Todd <billtodd(a)metrocast.net> writes: > |> Jan Vorbr�ggen wrote: > |> > |> For *any* given power drain, for most tasks in which I'm interested I'd > |> rather have one fast, flexible SMT core than several dumb cores (even > |> including highly parallel server-style processing, unless those dumb > |> cores were SoeMT-enabled) - up to the point of seriously diminishing > |> returns for effectively adding single-thread ILP to that single SMT > |> core, at which point I'd prefer to start adding cores each of which was > |> somewhere near the complexity level of the existing core (again, in > |> preference to more but dumber cores). > |> > |> We have many decades of software experience in effectively multiplexing > |> tasks onto a single core, and if Microsoft has still somehow managed to > |> miss the boat sufficiently seriously in this area to be a real problem > |> there are multiple credible alternatives readily available. > > Doubtless you would want that, doubtless most people do want that, you > are doubtless right that it is the simplest and generally best strategy > and you are doubtless correct that it served the IT community well for > many decades. Now, why are you still completely wrong? I'm not wrong, Nick: you're just an idiot. Please try harder to understand the discussion before responding incompetently yet again. > > You don't seem to have realised that "not_Moore's Law" which said that > serial speeds increase at 50% per annum stopped working some time back. Of course I realize it: what you apparently don't realize is that this is utterly irrelevant to the topic at hand - which is not whether multiple cores may be the easiest (though as you yourself admit above not the preferred) route toward continuing performance progress, but whether (in the face of that single-thread performance wall which I *already* described) it makes any sense to *abandon* the single-thread performance that we already *can* achieve and instead embrace more, slower cores (something which we obviously could have been doing for many years already if there had been any actual demand for it). .... > Let us assume that Silverthorne delivers what it is claimed to do: > 15-25% more throughput for the watt. Fine. It is a 2-way design, > and SMT does not scale. No one has been arguing that SMT is a complete substitute for multiple cores: the point has always been that SMT allows us to retain single-thread performance without seriously compromising throughput or efficiency - and in fact if one does not power down unused execution units (an idea which you claim you don't like anyway) then the fact that SMT can make fuller use of them than a dumb core can may *improve* operating efficiency. Only for further performance (primarily throughput) gains beyond the point that a single SMT core can attain does it make sense to start adding more cores - and even then it makes sense to add more (relatively) fast SMT cores rather than revert to simplistic ones. In other words (though they've already been said before) SMT is an effective way to benefit parallel workloads by using the additional execution resources that are *already required* to attain the good single-thread ILP that you seem to agree above is preferable, thus attaining close to the best of both worlds rather than optimizing one (parallel throughput) at major expense to the other (single-thread performance). If the world were in fact heavily dominated by parallel workloads no one would ever have given SMT a second thought and commodity multi-core processors would have taken over the industry early in this decade (as soon as it became feasible to fit them on a die). The fact that most vendors put off the move to multiple cores as long as possible and instead pursued heroic efforts to improve single-thread performance absolutely as long as they could strongly suggests that the world values single-thread performance in its processors, and that use of SMT to make more flexible use of the complex cores required to attain that single-thread performance when there's less ILP to exploit within that single thread is an eminently sensible optimization. - bill
From: Nick Maclaren on 2 Apr 2008 16:27 In article <iNCdnU7DtO9Ofm7anZ2dnUVZ_hSdnZ2d(a)metrocastcablevision.com>, Bill Todd <billtodd(a)metrocast.net> writes: |> |> In other words (though they've already been said before) SMT is an |> effective way to benefit parallel workloads by using the additional |> execution resources that are *already required* to attain the good |> single-thread ILP that you seem to agree above is preferable, thus |> attaining close to the best of both worlds rather than optimizing one |> (parallel throughput) at major expense to the other (single-thread |> performance). Let us ignore the fact that you have not justified your assertion that the single-thread ILP is actually required - I agree that it is desirable, but that is not the same thing. What you are ignoring is that it comes at a serious cost. The Tech Report article that David Kanter mentioned said that the extra cost was c. 18% - well, that's 18% you can't use for multiple cores. While that article didn't say what alternative it was comparing against, the usual one used for throughput comparisons by the SMT brigade is a single, fast core. No, I am not saying that multiple, simple cores would provide 18% more throughput - it might be less and it might be more. There are also the serious consequences on system reliability (sic) and tuning. You may not be aware that NTP and some other fine-grain communication mechanisms use phase-locked loops. DO you have any idea what SMT does to those? Or are you proposing that the NTP daemon stops all other threads while it is running? Lastly, I suggest that you stop descending to personal abuse as a form of argument. Regards, Nick Maclaren.
From: Paul A. Clayton on 2 Apr 2008 17:03 On Mar 29, 6:24 pm, j...(a)cix.co.uk (John Dallman) wrote: > In article > <52acdbf3-c3ff-4de0-ae56-ed0db2381...(a)59g2000hsb.googlegroups.com>, > paaronclay...(a)earthlink.net (Paul A. Clayton) wrote: [snip] > > (BTW, properly rescheduled FP code could benefit from the effectively > > shorter latencies of operations by reducing the number of active > > registers needed [and avoid register spill/fill].) > > That's for the compilers to handle. This stuff is hard enough to write > and maintain that we avoid trying to recode algorithms in processor- > specific ways. It has to run at good speed on quite a few platforms and > the requirements for consistency between platforms are extremely strict. > Wrong answers are no use at all, no matter how fast you can get them. Yes! Hand scheduling code would be excessively time consuming and prone to (both correctness and performance) errors. However, the compiler needs to be aware of the architecture. (I am not certain if it would only take a redefinition of the operational latencies or if some compiler deep internals would have to be changed. I suspect that no compiler currently optimizes for SMT use; the benefit was supposed to be transparent.) Paul A. Clayton merely a technophile reachable as 'paaronclayton' at "embarqmail.com"
From: Paul A. Clayton on 2 Apr 2008 17:48 On Apr 2, 6:56 am, David Kanter <dkan...(a)gmail.com> wrote: [snip] > I agree with you that SoEMT is a fairly good idea. However, it's > clear to me that SMT is a better one. As Jim Laudon pointed out in > his paper, SMT and SoEMT are the same for single issue processors. I assume you mean FineGrainedMT not SoEMT (a cycle is not really an event). I would disagree however. In my view a scalar pipeline cannot have more than one thread executing Simultaneously, and so cannot be an SMT processor. (Admittedly, pipelining can be viewed as an ILP technique . . .) > SMT is also very cheap if you have an OOO processor that you are > starting with. It's only in the case of in-order superscalar cores > that SoEMT could conceivably make any sense...yet IBM clearly has gone > with SMT for ALL of their in-order superscalar cores. Actually, it would seem that the benefit of SMT would be greater for an in-order core (i.e., an in-order core is more likely to have execution opportunities that cannot be exploited by a single thread; OoO exploits some of those execution opportunities). It is not clear that in-order SMT would be more difficult than OoO SMT.
From: Nick Maclaren on 3 Apr 2008 03:21
In article <91142bbc-6d1d-4b9e-9d0e-e98a06ad7faa(a)i36g2000prf.googlegroups.com>, David Kanter <dkanter(a)gmail.com> writes: |> |> Nick, do you even know what a PLL is? It's not architecturally |> visible... http://www.faqs.org/rfcs/rfc1305.html I don't care whether you think that RFC is abusing the term "phase-locked loop" or not. The fact is that NTP is used for almost all serious time synchronisation on modern computers, it uses that technique, it's done in software, and SMT plays Old Hob with it. Indeed, as a moderate expert on the topic, I doubt that anyone could get it to work properly without temporarily disabling that feature by one means or another. And may I suggest that you take your own advice? In particular, before posting further nonsense on this topic, find out my NTP-related record. Regards, Nick Maclaren. |