From: Corey Ashford on
Hi Robert,

On 5/19/2010 2:20 PM, Robert Richter wrote:
> This patch introduces a method to specify the type of a raw sample.
> This can be used to setup hardware events other than generic
> performance counters by passing special config data to the pmu. The
> config data can be interpreted different from generic events and thus
> can be used for other purposes.
>
> The raw_type attribute is an extension of the ABI. It reuses the
> unused bp_type space for this. Generic performance counters can be
> setup by setting the raw_type attribute to null. Thus special raw
> events must have a type other than null.
>
> Raw types can be defined as needed for cpu models or architectures.
> To keep backward compatibility all architectures must return an error
> for an event with a raw_type other than null that is not supported.
>
> E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not
> common to pmu features from other vendors or architectures. The pmu
> must be setup with a special config value. Sample data is returned in
> a certain format back to the userland. An IBS event is created by
> setting a raw event and encoding the IBS type in raw_type. The pmu
> handles this raw event then and passes raw sample data back.
>
> Raw type could be architecure specific, e.g. for x86:
>
> enum perf_raw_type {
> PERF_RAW_PERFCTR = 0,
> PERF_RAW_IBS_FETCH = 1,
> PERF_RAW_IBS_OP = 2,
>
> PERF_RAW_MAX,
> };
>
> Null is the architecture's default, meaning for x86 a perfctr.
>
> Maybe the raw type definition could also be part of the ABI with one
> definition for all architectures.
>
> To use raw events with perf, the raw event syntax could be suffixed by
> the type (as for breakpoints):
>
> -e rNNN[:TYPE]
>
> Example:
>
> perf record -e r186A:1 # ... meaning IBS fetch, cycle count 100000
> perf record -e r0:1 -c 100000 # ... the same
>
> Or with named types:
>
> perf record -e r186A:IBS_FETCH ...
> perf record -e r0:IBS_FETCH -c 100000 ...

Should this raw value have been 186A0 instead of 186A?

Where is the named type translation coming from? Is this something that needs to be hard-coded into perf?

Have you looked at Lin Ming's patch series? I think it offers another way to support IBS and other arch-specific and off-chip PMUs in a more general way, though it's not quite fully-baked yet.

- Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Corey Ashford <cjashfor(a)linux.vnet.ibm.com> wrote:

> [...]
>
> Have you looked at Lin Ming's patch series? I think it
> offers another way to support IBS and other
> arch-specific and off-chip PMUs in a more general way,
> though it's not quite fully-baked yet.

Robert, check out this thread on lkml:

[RFC][PATCH v2 06/11] perf: core, export pmus via sysfs

Here is Peter's post that describes the high level design:

http://groups.google.com/group/linux.kernel/msg/ab9aa075016c639e

And please help out Lin Ming with this work - we dont want
to add new, special-purpose ABI extensions for enumeration
purposes, it should be introduced using a new event_source
node via the sysfs enumeration.

Whether IBS fits that category is an open question -
ideally it should have similar high level usage as the
PEBS/LBR bits on Intel CPUs. (at least as far as tooling
goes)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 20.05.10 05:23:33, Peter Zijlstra wrote:
> On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote:
> > I still don't understand why you need all of this to encode IBS.
> > I still believe that with attr.config there is plenty of bits to choose
> > from. I do understand the need for PERF_SAMPLE_RAW. I think
> > there is no other way.
> >
> > You simply need to pick an encoding to mark the config as IBS. You
> > need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
> > Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
> > So you could use bits 62-63 for instance. You don't need to encode
> > the sampling period in attr.config for either IBS. You can use
> > attr.sample_period, so you free up 16 bits.
> >
> > I understand that IBS may evolve and thus may use more bits. But
> > you still have at least 16 bits of margin.

We have some bits in the msrs that are reserved now, for perfctr and
also for ibs. We could start reusing it to mark a special sample type
and so on. So far, ok. Somewhen in the future there is a hw extensions
and these bits are not reserved anymore, what now?

In case the register layout changes, you start workaround this
shifting and masking bits back and forth. This will end in a mess. We
don't know what will change and when, but *reserved* means it could be
subject to change. So, actually you have to assume arbitrary 64 bit
config values for perfctrs and ibs. Thus, you cannot use reserved bits
for encoding, doing this would be a hack.

The approach is also not extendable for new pmu features. The question
will rise again then how to encode it.

So encoding this information in some other attribute like raw_type or
any other field is a much cleaner and a more stable solution, easier
to program and to maintain. And there is enough space in the attribute
structure.

> > Users and tools would rely on an library to provide the event encoding.
> > No need to come up with some raw hex number on the cmdline.

We can pack everything in tools and libraries and hide direct access
to the user. Now, one reads the IBS spec and wants to start using
it. He will have to learn (by reading the source or documentation)
now, how to pass parameters using the tool or library to set certain
bits in the registers. This adds an additional i/f layer what needs to
be defined, documented, implemented and maintained. Why not keep
things simple and let the user pass a register value to the pmu? Only
the config has to be validated in the kernel and that's it. The hw
spec is then the reference for using the interface. We have already
raw config values for counters and there was a reason for it.

> For Instruction-Fetch:
>
> 0:15 sample-period (r/w)
> 57 randomized (r/w)

> For Instruction-Execution:
>
> 0:15 sample-period (r/w)

> 0x87 Instruction Fetch Stall -- Ins-Fetch
> 0xC0 Retired Instructions -- Ins-Exec

I agree, where possible we should reuse attributes or existing events
for configuration. But since the pmu behavior differs, there is the
risk of mixing different things together. "precise" would be different
on Intel and AMD. This will be hidden to the user leading to wrong
assumptions and results. And, what about new options that don't have
equivalent attribute flags?

But it's not the question how to pass the config, the question is how
to mark the event configuration as a certain IBS event.

> Furthermore, these counters will have to deal with sample-period > 2^16
> by 'ignoring' interrupts until we get ->period_left down to 0.

No question here, ibs counters should be extended to 64 bit counters,
but this is not addressed with this patch set, I have kept this for
later work.

> The extra data could possibly be exposed through attaching non-sampling
> group events and using SAMPLE_READ, like L1-misses, although
> reconstructing the count from just one bit seems 'interesting'.
>
> The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
> replacing pt_regs->ip I guess.
>
> IbsDcLinAd goes into PERF_SAMPLE_ADDR

It's the same for sampling data as with config data. We can spread it
all over its data structures, but a user will then recollect all of it
again needing to know where the information can be found. And, it will
never be the same kind of information if the pmus are different.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter(a)amd.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 20.05.10 06:37:39, Peter Zijlstra wrote:

> So yeah, you might as well expose it as a whole separate PMU using Lin's
> stuff.

This seems to be the most promising alternative to raw_type. But it's
also much more bloated requiring sysfs and additional file
descriptors.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter(a)amd.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 20.05.10 08:13:43, Stephane Eranian wrote:
> What's wrong with creating pseudo-events for IBS? We'd have to pick
> two unused event codes. That would have the advantage of making it
> explicit you're using IBS. I think we could still use the precise_ip field
> if people are only interested in the IP. They would use PERF_SAMPLE_RAW
> if they need more.

For some key events this is fine and useful. But, if this has to be
used to programm all IBS features, it will introduce too much
implementation and maintenance effort. What you get then will probably
of less interest since users/tools want to have the raw sample data
for further processing. But maybe we agree already in this point, if
there is a demand we should provide pseudo events for important or
offen used events.

> >> > The Ins-Exec will have to re-construct the actual event->count by adding
> >> > sample-period on each interrupt, as it seems we lack an actual counter
> >> > in hardware.
> >> >
> >> For what? counting mode?
> >
> > Yeah, events are supposed to count.
> >
> IBS is a sampling only feature. I suspect it would be okay to return 0 here
> or do as you said, count the number of IBS interrupts and multiply by the
> sampling period.

True, ibs can be used as an additional counter using the interrupt
only and dropping sampling data. I was already thinking of adding it
somehow to the list of available counters. But this is also later work
as its 64 bit extension.

> > Dunno, reconstruct sensible counters? Surely the software that uses IBS
> > does something useful with that data? What does libpfm do with the IBS
> > data?

Here are some use cases described:

http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf
http://developer.amd.com/cpu/CodeAnalyst/assets/ISPASS2010_IBS_CA_abstract.pdf

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter(a)amd.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/