Processors stall on OLTP workloads about half the time--almost no matter what you do [Computer Architecture]

Prev: Looking for Sponsorship
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do

From: Robert Myers on 25 Apr 2010 14:51

On Apr 25, 1:55 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:
..
>
> Urgh!!!! I am having a conversation with amateurs who aren't even aware of the basics of power management. You are
> expressing a point of view that was discredited for laptop computers more than 10 years ago.
>
> What you say is true ONLY IF THERE IS NO ENERGY COST TO NOT SPECULATING.
>
> But leakage remains big. So, if by speculating you can get to the point where you can turn the whole processor off more
> quickly, and go into a no-leakage or lower leakage mode, and if the energy spent in the speculative work, both correct
> and incorrectly speculated, is less than the leakage that would have been spent waiting for the thing that would
> eliminate the need for the speculation - then SPECULATING CAN SAVE POWER.
>
> And this has been proven time and again. It is why low power processors often have branch predictors. It's why OOO is
> on ARM's and Samsung's roadmap.
>
> Sometimes we call this "hurry up ad wait" versus "slow and steady"
>
> It's not black and white. You need to know the statistics, the probabilities, and the tradeoff moves back and forth.
>
> If you have a really low leakage mode that you can get into and out of quickly, it is better to stop speculating and
> wait for results.
>
> If you can switch to another thread, it may be better not to speculate and do so. So long as the hardware resources for
> the other thread doesn't cause too much leakage.
>
> If you get an improved predictor, you may want to speculate more.
>
> If you have confidence predictors...

Most of what you say seems to confirm what I claimed: it's hard to
know when speculating is worth the effort.

So far as I know, what was true about leakage ten years ago is no
longer true.

Robert.

From: Robert Myers on 25 Apr 2010 15:05

On Apr 25, 1:55 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:

> Urgh!!!! I am having a conversation with amateurs who aren't even aware of the basics of power management.

It isn't like you to make it personal.

Robert.

From: Robert Myers on 25 Apr 2010 22:39

On Apr 25, 9:59 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:
> On 4/25/2010 12:05 PM, Robert Myers wrote:
>
> > On Apr 25, 1:55 pm, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>
> > wrote:
>
> >> Urgh!!!! I am having a conversation with amateurs who aren't even aware of the basics of power management.
>
> > It isn't like you to make it personal.
>
> My apologies. You can probably tell that I have painful memories of similar conversations from my time at AMD.

Accepted. I can see that you have been frustrated.

Robert.

From: Piotr Wyderski on 26 Apr 2010 02:46

Morten Reistad wrote:

> Yep, Linux is picking up old tricks. But it is getting pretty
> good at it; and has adapted a pretty long list of old tricks by
> now.

It depends: asynchronous IO support on Linux is still
a disaster, compared to Windows (the IOCP infrastructure),
Solaris (EC API) or event BSD (kqueue).

Best regards
Piotr Wyderski

From: George Neuner on 27 Apr 2010 12:14

On Mon, 26 Apr 2010 19:38:01 -0700, "Andy \"Krazy\" Glew"
<ag-news(a)patten-glew.net> wrote:

>On 4/25/2010 11:32 AM, nmm1(a)cam.ac.uk wrote:
>> In article<4BD47F60.9000709(a)patten-glew.net>,
>> Andy \"Krazy\" Glew<ag-news(a)patten-glew.net> wrote:
>>> On 4/24/2010 2:10 AM, nmm1(a)cam.ac.uk wrote:
>>>
>>>> The killer is the amount of code that involves a load or branch
>>>> based on the contents of something that needs loading. You can
>>>> do nothing except either wait for the first load, or follow all
>>>> possible paths. The latter is O(log N) in a good case or
>>>> O(log log N) in a bad one!
>>>
>>> Tjaden and Flynn showed that it is sqrt(N), way back (1968?).
>>>
>>> I.e. that the speedup you could get by following all possible paths N levels deep was sqrt(N).
>>>
>>> This was empirical. For whatever workloads they had awy back tgen.
>>
>> I didn't know that, but that needs A^N logic. I was using the fixed
>> size logic formula.
>>
>> One 'scout' thread is a plausible design, two, perhaps. But it just
>> isn't going to scale.
>
>Actually, I have had good speedups in simulations with 16 threads. And I have limit studies that have many, many, more
>speculative threads. The biggest problem is trying to choose what subset of the possible threads is worth running on
>hardware with limited threads.

Andy,

Are you talking about hardware or software speculation? And do you
have any statistics on what types of codes (OS, science, business,
etc.) benefited most from using speculation?

Are there any papers on your (or similar) work?

I'm particularly interested in any previous work on compiler driven
speculation. AFAICT the research along that line died with Algol.

>I think that the big problem with past work such as Haitham's DMT was that it was often constrained to use too few
>threads or processors. E.g. Haitham's Itanium work with two processors, as compared to his earlier DMT work with 4 threads.
>
>I'll do a separate followup post, with the (vain) attempt to change the topic. I don't think I have ever described my
>SpMT ideas to this newsgroup. Nothing proprietary, just the by now really old stuff that I worked on at Wisconsin.

Looking forward to it.

George

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Looking for Sponsorship
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do