From: Lin Ming on
On Sat, 2010-07-17 at 08:20 +0800, Corey Ashford wrote:
> On 07/02/2010 01:06 AM, Lin Ming wrote:
> > On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
> >> * Lin Ming<ming.m.lin(a)intel.com> wrote:
> >>
> >>>> Also, we can (optionally) consider 'generic', subsystem level events to
> >>>> also show up under:
> >>>>
> >>>> /sys/bus/pci/drivers/i915/events/
> >>>>
> >>>> This would give a model to non-device-specific events to be listed one
> >>>> level higher in the sysfs hierarchy.
> >>>>
> >>>> This too would be done in the driver, not by generic code. It's generally
> >>>> the driver which knows how the events should be categorized.
> >>>
> >>> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
> >>
> >> We can certainly start with the simpler variant - it's also the more common
> >> case.
> >>
> >>> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
> >>> not touched?
> >>
> >> Well, it's per driver code which creates the 'events' directory anyway, so
> >> that code decides where to link things. It can link it to the per driver kobj
> >> - or to the per subsys kobj.
> >>
> >>>> I'd imagine something similar for wireless drivers as well - most
> >>>> currently defined events would show up on a per device basis there.
> >>>>
> >>>> Can you see practical problems with this scheme?
> >>>
> >>> Not now. I may find some problems when write more detail code.
> >>
> >> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
> >> so that we can see how things are progressing.
> >>
> >> I suspect the best approach would be to try to figure out the right sysfs
> >> placement for one or two existing driver tracepoints, so that we can see it
> >> all in practice. (Obviously any changes to drivers will have to go via the
> >> relevant driver maintainer tree(s).)
> >
> > Well, take i915 tracepoints as an example, the sys structures as below
> >
> > /sys/class/drm/card0/events/
> > |-- i915_gem_object_bind
> > | |-- enable
> > | |-- filter
> > | |-- format
> > | `-- id
> ...
>
> Hi Lin,
>
> Sorry for my late reply on this thread. I had missed these posts
> earlier because I had an email filter that was set to look for messages
> with "perf" in the subject, and so I missed this entire thread.

Sorry for my late reply too.
I have been busy with some other stuff. Hope I can send a more
functional patches this week.

>
> With your example here, let's say I want to open this event with the
> perf_events ABI... how would I go about doing that? Have you figured
> out whether the caller would read the id and pass that into the
> interface, or perhaps pass in the fd of the id file (or perhaps the fd
> of the specific event directory).

Please just ignore my above example. Now I have some uncompleted new
patches to export hardware/software/tracepoint events via sysfs, like
below.

The event path is passed in with perf's "-e" option, for example
perf record -e /sys/kernel/events/page-faults -- <some commands>

The caller reads config and type and pass them into perf_event_attr.

1. Hardware events
/sys/devices/system/cpu/cpu0...cpuN/events
|-- L1-dcache-load-misses ===> event name
| |-- config ===> config value for the event
| `-- type ===> event type
|-- cycles
| |-- config
| `-- type
......

2. Software events
/sys/kernel/events
|-- page-faults
| |-- config
| `-- type
|-- context-switches
| |-- config
| `-- type
.....

3. Tracepoint events
/sys/devices/pci0000:00/0000:00:02.0/events
|-- i915_gem_object_create
| |-- config
| `-- type
|-- i915_gem_object_bind
| |-- config
| `-- type
.....
.....
/sys/devices/system/kvm/kvm0/events
|-- kvm_entry
| |-- config
| `-- type
|-- kvm_hypercall
| |-- config
| `-- type
.....
.....

>
> Also, I see the filter and format fields here. Would the caller write
> to these fields to set them up? What's the format of the data that's
> written to them? Would it be totally device dependent? It seems like
> there should be a way for a user space tool to discover what can be
> programmed into the filter and format fields.

Now only read-only event attributes(config and type) are exported.
I want to first make some minimal functional patches. Then to implement
the complex writable attributes.

Lin Ming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 20.07.10 01:48:28, Lin Ming wrote:
The caller reads config and type and pass them into perf_event_attr.
>
> 1. Hardware events
> /sys/devices/system/cpu/cpu0...cpuN/events
> |-- L1-dcache-load-misses ===> event name
> | |-- config ===> config value for the event
> | `-- type ===> event type

Wouldn't it be much easier to have a unique sysfs id (could be an
u64):

> |-- L1-dcache-load-misses ===> event name
> | `-- id ===> event id

.... and then extend the syscall to enable an event by its sysfs id:

memset(&attr, 0, sizeof(attr));
attr.type = PERF_TYPE_SYSFS;
attr.sysfs_id = sysfs_id;
attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
attr.config = config;
...

The kerrnel then knows which event is meant and the don't have to
provide event specific paramaters such as type/config that requires an
event specific setup. The advantage would be that we can open an event
file descriptor of every kind of event in a standardized way.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Corey Ashford on
On 07/19/2010 10:48 PM, Lin Ming wrote:
> On Sat, 2010-07-17 at 08:20 +0800, Corey Ashford wrote:
>> On 07/02/2010 01:06 AM, Lin Ming wrote:
>>> On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
>>>> * Lin Ming<ming.m.lin(a)intel.com> wrote:
>>>>
>>>>>> Also, we can (optionally) consider 'generic', subsystem level events to
>>>>>> also show up under:
>>>>>>
>>>>>> /sys/bus/pci/drivers/i915/events/
>>>>>>
>>>>>> This would give a model to non-device-specific events to be listed one
>>>>>> level higher in the sysfs hierarchy.
>>>>>>
>>>>>> This too would be done in the driver, not by generic code. It's generally
>>>>>> the driver which knows how the events should be categorized.
>>>>>
>>>>> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
>>>>
>>>> We can certainly start with the simpler variant - it's also the more common
>>>> case.
>>>>
>>>>> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
>>>>> not touched?
>>>>
>>>> Well, it's per driver code which creates the 'events' directory anyway, so
>>>> that code decides where to link things. It can link it to the per driver kobj
>>>> - or to the per subsys kobj.
>>>>
>>>>>> I'd imagine something similar for wireless drivers as well - most
>>>>>> currently defined events would show up on a per device basis there.
>>>>>>
>>>>>> Can you see practical problems with this scheme?
>>>>>
>>>>> Not now. I may find some problems when write more detail code.
>>>>
>>>> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
>>>> so that we can see how things are progressing.
>>>>
>>>> I suspect the best approach would be to try to figure out the right sysfs
>>>> placement for one or two existing driver tracepoints, so that we can see it
>>>> all in practice. (Obviously any changes to drivers will have to go via the
>>>> relevant driver maintainer tree(s).)
>>>
>>> Well, take i915 tracepoints as an example, the sys structures as below
>>>
>>> /sys/class/drm/card0/events/
>>> |-- i915_gem_object_bind
>>> | |-- enable
>>> | |-- filter
>>> | |-- format
>>> | `-- id
>> ...
>>
>> Hi Lin,
>>
>> Sorry for my late reply on this thread. I had missed these posts
>> earlier because I had an email filter that was set to look for messages
>> with "perf" in the subject, and so I missed this entire thread.
>
> Sorry for my late reply too.
> I have been busy with some other stuff. Hope I can send a more
> functional patches this week.
>
>>
>> With your example here, let's say I want to open this event with the
>> perf_events ABI... how would I go about doing that? Have you figured
>> out whether the caller would read the id and pass that into the
>> interface, or perhaps pass in the fd of the id file (or perhaps the fd
>> of the specific event directory).
>
> Please just ignore my above example. Now I have some uncompleted new
> patches to export hardware/software/tracepoint events via sysfs, like
> below.
>
> The event path is passed in with perf's "-e" option, for example
> perf record -e /sys/kernel/events/page-faults --<some commands>
>
> The caller reads config and type and pass them into perf_event_attr.
>
> 1. Hardware events
> /sys/devices/system/cpu/cpu0...cpuN/events
> |-- L1-dcache-load-misses ===> event name
> | |-- config ===> config value for the event
> | `-- type ===> event type
> |-- cycles
> | |-- config
> | `-- type
> .....
>
> 2. Software events
> /sys/kernel/events
> |-- page-faults
> | |-- config
> | `-- type
> |-- context-switches
> | |-- config
> | `-- type
> ....
>
> 3. Tracepoint events
> /sys/devices/pci0000:00/0000:00:02.0/events
> |-- i915_gem_object_create
> | |-- config
> | `-- type
> |-- i915_gem_object_bind
> | |-- config
> | `-- type
> ....
> ....
> /sys/devices/system/kvm/kvm0/events
> |-- kvm_entry
> | |-- config
> | `-- type
> |-- kvm_hypercall
> | |-- config
> | `-- type
> ....
> ....
>
>>
>> Also, I see the filter and format fields here. Would the caller write
>> to these fields to set them up? What's the format of the data that's
>> written to them? Would it be totally device dependent? It seems like
>> there should be a way for a user space tool to discover what can be
>> programmed into the filter and format fields.
>
> Now only read-only event attributes(config and type) are exported.
> I want to first make some minimal functional patches. Then to implement
> the complex writable attributes.

I'm not seeing the value of writable attributes in sysfs at this point.
Wouldn't that disconnect the event opening between the syscall and the
writing of attributes in user space, with no real way to tie them
together? For example, what if two users wrote to the same attribute
with different values... which one would take precedence when you go to
do the open syscall? I think all of the attribute data should be in the
open call, and sysfs should be read-only.

Earlier, I briefly presented an idea that would allow a caller to read
attribute formatting information, such as a shift and mask value, which
would allow the caller to build up a more complex .config value,
possibly extending into a new attr field - .config_extra[n] as dictated
by the shift value; shift values greater than 63 would place the
attribute into .config_extra[shift amount / 64] shifted by shift amount
% 64. It's not the prettiest interface, but I think it could work and
would be extensible.

- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Corey Ashford on
On 07/20/2010 08:19 AM, Robert Richter wrote:
> On 20.07.10 01:48:28, Lin Ming wrote:
> The caller reads config and type and pass them into perf_event_attr.
>>
>> 1. Hardware events
>> /sys/devices/system/cpu/cpu0...cpuN/events
>> |-- L1-dcache-load-misses ===> event name
>> | |-- config ===> config value for the event
>> | `-- type ===> event type
>
> Wouldn't it be much easier to have a unique sysfs id (could be an
> u64):
>
>> |-- L1-dcache-load-misses ===> event name
>> | `-- id ===> event id
>
> ... and then extend the syscall to enable an event by its sysfs id:
>
> memset(&attr, 0, sizeof(attr));
> attr.type = PERF_TYPE_SYSFS;
> attr.sysfs_id = sysfs_id;
> attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
> attr.config = config;
> ...
>
> The kerrnel then knows which event is meant and the don't have to
> provide event specific paramaters such as type/config that requires an
> event specific setup. The advantage would be that we can open an event
> file descriptor of every kind of event in a standardized way.

Your example above still shows the .config member being set. Was that
intentional?

Maybe another way to accomplish this would be to reuse the .config field
for the sysfs_id.

We still need a way to deal with event attributes though, so something
more than a single sysfs_id would be needed to specify the event completely.

- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 20.07.10 13:50:01, Corey Ashford wrote:

> > ... and then extend the syscall to enable an event by its sysfs id:
> >
> > memset(&attr, 0, sizeof(attr));
> > attr.type = PERF_TYPE_SYSFS;
> > attr.sysfs_id = sysfs_id;
> > attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
> > attr.config = config;
> > ...

> Your example above still shows the .config member being set. Was that
> intentional?
>
> Maybe another way to accomplish this would be to reuse the .config field
> for the sysfs_id.

This was intended as this could be used to configure the event,
otherwise there is no way to setup the event with certain
parameters. The config value will be event specific then and we can be
sure the parameter belongs to _this_ kind of event.

> We still need a way to deal with event attributes though, so something
> more than a single sysfs_id would be needed to specify the event completely.

It is true that you still need knowledge of what the event is
measuring and how it is set up or configured. Maybe the configuration
may left blank if the event can be setup without it. But with this
approach you can get file descriptors for every event a user may be
interested in simply by looking into sysfs.

For example, I was thinking of perfctr events vs. ibs events. The cpu
could setup something like:

/sys/devices/system/cpu/cpu0...cpuN/events/perfctr/id
/sys/devices/system/cpu/cpu0...cpuN/events/ibs_op/id

Both events are setup with one 64 bit config value that is basically
the event's configuration msr (x86 perfctr or AMD IBS). These are
definded in the hardware specifications. Its formats differ. You could
then open the event file descriptor using the sysfs id and use the
config value to customize the event. You don't have a complicated
setup or implementation to detect which kind of event you want to use
as the id indicates the type of event.

Actually, we could setup e.g. also trace events with this mechanism.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/