perf_events: improve Intel event scheduling [Kernel]

Prev: BKL: Explicitly add BKL around get_sb/fill_super
Next: [PATCH 2/3] sysctl: Remove CTL_NONE and CTL_UNNUMBERED

From: Stephane Eranian on 7 Jan 2010 05:00

Hi,

Ok, so I made some progress yesterday on all of this.

The key elements are:
- pmu->enable() is always called from generic with PMU disabled
- pmu->disable() is called with PMU possibly enabled
- hw_perf_group_sched_in() is always called with PMU disabled

I got the n_added logic working now on X86.

I noticed the difference in pmu->enabled() between Power and X86.
On PPC, you disable the whole PMU. On X86, that's not the case.

Now, I do the scheduling in hw_perf_enable(). Just like on PPC, I also
move events around if their register assignment has changed. It is not
quite working yet. I must have something wrong with the read and rewrite
code.

I will experiment with pmu->enable(). Given the key elements above, I think
Paul is right, all scheduling can be deferred until hw_perf_enable().

But there is a catch. I noticed that hw_perf_enable() is void. In
other words, it
means that if scheduling fails, you won't notice. This is not a problem on PPC
but will be on AMD64. That's because the scheduling depends on what goes on
on the other cores on the socket. In other words, things can change between
pmu->enable()/hw_perf_group_sched_in() and hw_perf_enable(). Unless we lock
something down in between.

On Thu, Jan 7, 2010 at 10:00 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Thu, 2010-01-07 at 15:13 +1100, Paul Mackerras wrote:
>>
>> > All the enable and disable calls can be called from NMI interrupt context
>> > and thus must be very careful with locks.
>>
>> I didn't think the pmu->enable() and pmu->disable() functions could be
>> called from NMI context.
>
> I don't think they're called from NMI context either, most certainly not
> from the generic code.
>
> The x86 calls the raw disable from nmi to throttle the counter, but all
> that (should) do is disable that counter, which is limited to a single
> msr write. After that it schedules a full disable by sending a self-ipi.
>
>
>
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 7 Jan 2010 05:10

On Thu, 2010-01-07 at 10:54 +0100, Stephane Eranian wrote:
>
> Ok, so I made some progress yesterday on all of this.
>
> The key elements are:
> - pmu->enable() is always called from generic with PMU disabled
> - pmu->disable() is called with PMU possibly enabled
> - hw_perf_group_sched_in() is always called with PMU disabled
>
> I got the n_added logic working now on X86.
>
> I noticed the difference in pmu->enabled() between Power and X86.
> On PPC, you disable the whole PMU. On X86, that's not the case.
>
> Now, I do the scheduling in hw_perf_enable(). Just like on PPC, I also
> move events around if their register assignment has changed. It is not
> quite working yet. I must have something wrong with the read and rewrite
> code.
>
> I will experiment with pmu->enable(). Given the key elements above, I think
> Paul is right, all scheduling can be deferred until hw_perf_enable().
>
> But there is a catch. I noticed that hw_perf_enable() is void. In
> other words, it
> means that if scheduling fails, you won't notice. This is not a problem on PPC
> but will be on AMD64. That's because the scheduling depends on what goes on
> on the other cores on the socket. In other words, things can change between
> pmu->enable()/hw_perf_group_sched_in() and hw_perf_enable(). Unless we lock
> something down in between.

You have to lock stuff, you can't fail hw_perf_enable() because at that
point we've lost all track of what failed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4
Prev: BKL: Explicitly add BKL around get_sb/fill_super
Next: [PATCH 2/3] sysctl: Remove CTL_NONE and CTL_UNNUMBERED