[RFC PATCH 0/4] Finer granularity and task/cgroup irq time accounting [Kernel]

Prev: calgary: Increase the maximum PHB bus number
Next: [RFC PATCH 3/4] sched: Generalize cpuacct usage tracking making it simpler to add new stats

From: Venkatesh Pallipadi on 24 May 2010 20:20

Currently, the softirq and hardirq time reporting is only done at the
CPU level. There are usecases where reporting this time against task
or task groups or cgroups will be useful for user/administrator
in terms of resource planning and utilization charging. Also, as the
accoounting is already done at the CPU level, reporting the same at
the task level does not add any significant computational overhead
other than task level storage (patch 1).

The softirq/hardirq statistics commonly done based on tick based sampling.
Though some archs have CONFIG_VIRT_CPU_ACCOUNTING based fine granularity
accounting. Having similar mechanism to get fine granularity accounting
on x86 will be a major challenge, given the state of TSC reliability
on various platforms and also the overhead it may add in common paths
like syscall entry exit.

An alternative is to have a generic (sched_clock based) and configurable
fine-granularity accounting of si and hi time which can be reported
over the /proc/<pid>/stat API (patch 2).

Patch 3 and 4 are exporting this info at the cgroup level.

Does exposing this additional info to user makes sense? Any feedback on
the way it is done in this patchset?

This precise irq time based on sched_clock() provides some potential
opportunities to handle the softirq time charging in a more fair way.
Specifically cases where an unrelated task is being penalized for
irq load on that CPU.
* With network Receive Flow Steering, for example; We can potentially
do things like not charge receive softirq time to the process that is
currently running and charge it instead to the actual consumer of
the receive (in recvmsg, for example).
* We can reduce the power of the CPU to account for softirq/hardirq
load, in order to increase the scheduler fairness for tasks running on
that CPU.

Comments?

Thanks,
Venki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: calgary: Increase the maximum PHB bus number
Next: [RFC PATCH 3/4] sched: Generalize cpuacct usage tracking making it simpler to add new stats