accelerate grace period if last non-dynticked CPU [Kernel]

Prev: [34/98] [SCSI] iscsi class: modify handling of replacement timeout
Next: [73/98] md: fix small irregularity with start_ro module parameter

From: Andi Kleen on 27 Jan 2010 07:20

> From what I can see, most people would want RCU_FAST_NO_HZ=n. Only

Most people do not recompile their kernel. And even those
that do most likely will not have enough information to make
an informed choice at build time.

> people with extreme power-consumption concerns would likely care enough
> to select this.

What would a distributor shipping binary kernels use?

> > But I think in this case scalability is not the key thing to check
> > for, but expected idle latency. Even on a large system if near all
> > CPUs are idle spending some time to keep them idle even longer is a good
> > thing. But only if the CPUs actually benefit from long idle.
>
> The larger the number of CPUs, the lower the probability of all of them
> going idle, so the less difference this patch makes. Perhaps some

My shiny new 8 CPU threads desktop is not less likely to go idle when I do
nothing on it than an older dual core 2 CPU thread desktop.

Especially not given all the recent optimizations (no idle tick)
in this area etc.

And core/thread counts are growing. In terms of CPU numbers today's
large machine is tomorrow's small machine.

> I do need to query from interrupt context, but could potentially have a
> notifier set up state for me. Still, the real question is "how important
> is a small reduction in power consumption?"

I think any (measurable) power saving is important. Also on modern Intel
CPUs power saving often directly translates into performance:
if more cores are idle the others can clock faster.

> I took a quick look at te pm_qos_latency, and, as you note, it doesn't
> really seem to be designed to handle this situation.

It could be extended for it. It's just software after all,
we can change it.

>
> And we really should not be gold-plating this thing. I have one requester
> (off list) who needs it badly, and who is willing to deal with a kernel
> configuration parameter. I have no other requesters, and therefore
> cannot reasonably anticipate their needs. As a result, we cannot justify
> building any kind of infrastructure beyond what is reasonable for the
> single requester.

If this has a measurable power advantage I think it's better to
do the extra steps to make it usable everywhere, with automatic heuristics
and no Kconfig hacks.

If it's not then it's probably not worth merging.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 27 Jan 2010 08:30

On Wed, Jan 27, 2010 at 01:11:50PM +0100, Andi Kleen wrote:
> > From what I can see, most people would want RCU_FAST_NO_HZ=n. Only
>
> Most people do not recompile their kernel. And even those
> that do most likely will not have enough information to make
> an informed choice at build time.

I believe that only a few embedded people will be using RCU_FAST_NO_HZ=y.

> > people with extreme power-consumption concerns would likely care enough
> > to select this.
>
> What would a distributor shipping binary kernels use?

RCU_FAST_NO_HZ=n.

> > > But I think in this case scalability is not the key thing to check
> > > for, but expected idle latency. Even on a large system if near all
> > > CPUs are idle spending some time to keep them idle even longer is a good
> > > thing. But only if the CPUs actually benefit from long idle.
> >
> > The larger the number of CPUs, the lower the probability of all of them
> > going idle, so the less difference this patch makes. Perhaps some
>
> My shiny new 8 CPU threads desktop is not less likely to go idle when I do
> nothing on it than an older dual core 2 CPU thread desktop.
>
> Especially not given all the recent optimizations (no idle tick)
> in this area etc.
>
> And core/thread counts are growing. In terms of CPU numbers today's
> large machine is tomorrow's small machine.

But your shiny new 8-CPU threads desktop runs off of AC power, right?
If so, I don't think you will care about a 4-5-tick delay for the last
CPU going into dyntick-idle mode.

And I bet you won't be able to measure the difference on your
battery-powered laptop.

> > I do need to query from interrupt context, but could potentially have a
> > notifier set up state for me. Still, the real question is "how important
> > is a small reduction in power consumption?"
>
> I think any (measurable) power saving is important. Also on modern Intel
> CPUs power saving often directly translates into performance:
> if more cores are idle the others can clock faster.

OK, I am testing a corrected patch with the kernel configuration
parameter. If you can show a measureable difference on typical
desktop/server systems, then we can look into doing something more
generally useful.

> > I took a quick look at te pm_qos_latency, and, as you note, it doesn't
> > really seem to be designed to handle this situation.
>
> It could be extended for it. It's just software after all,
> we can change it.

Of course we can change it. But should we?

> > And we really should not be gold-plating this thing. I have one requester
> > (off list) who needs it badly, and who is willing to deal with a kernel
> > configuration parameter. I have no other requesters, and therefore
> > cannot reasonably anticipate their needs. As a result, we cannot justify
> > building any kind of infrastructure beyond what is reasonable for the
> > single requester.
>
> If this has a measurable power advantage I think it's better to
> do the extra steps to make it usable everywhere, with automatic heuristics
> and no Kconfig hacks.

I would agree with the following:

If this has a measurable power advantage -on- -a- -large-
-fraction- -of- -systems-, then it -might- be better to do
extra steps to make it usable everywhere, which -might- involve
heuristics instead of a kernel configuration parameter.

> If it's not then it's probably not worth merging.

This is not necessarily the case. It can make a lot of sense to try
something for a special case, and then use the experience gained in
that special case to produce a good solution. On the other hand, it
does not necessarily make sense to do a lot of possibly useless work
based on vague guesses as to what is needed.

If we merge the special case, then others have the opportunity to try it
out, thus getting us the experience required to see (1) if soemthing
more general-purpose is needed in the first place and (2) if so, what
that more general-purpose thing might look like.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 27 Jan 2010 09:20

On Mon, Jan 25, 2010 at 10:12:03AM -0500, Steven Rostedt wrote:
> On Sun, 2010-01-24 at 19:48 -0800, Paul E. McKenney wrote:
>
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so. This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we are not supporting preemptible RCU, attempt to accelerate
> > + * any current grace periods so that RCU no longer needs this CPU, but
> > + * only if all other CPUs are already in dynticks-idle mode. This will
> > + * allow the CPU cores to be powered down immediately, as opposed to after
> > + * waiting many milliseconds for grace periods to elapse.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > + int c = 1;
> > + int i;
> > + int thatcpu;
> > +
> > + /* Don't bother unless we are the last non-dyntick-idle CPU. */
> > + for_each_cpu(thatcpu, nohz_cpu_mask)
> > + if (thatcpu != cpu)
> > + return rcu_needs_cpu_quick_check(cpu);
> > +
> > + /* Try to push remaining RCU-sched and RCU-bh callbacks through. */
> > + for (i = 0; i < RCU_NEEDS_CPU_FLUSHES && c; i++) {
> > + c = 0;
> > + if (per_cpu(rcu_sched_data, cpu).nxtlist) {
> > + c = 1;
> > + rcu_sched_qs(cpu);
> > + force_quiescent_state(&rcu_sched_state, 0);
> > + __rcu_process_callbacks(&rcu_sched_state,
> > + &per_cpu(rcu_sched_data, cpu));
>
> > + }
> > + if (per_cpu(rcu_bh_data, cpu).nxtlist) {
> > + c = 1;
> > + rcu_bh_qs(cpu);
> > + force_quiescent_state(&rcu_bh_state, 0);
> > + __rcu_process_callbacks(&rcu_bh_state,
> > + &per_cpu(rcu_bh_data, cpu));
> > + }
> > + }
> > +
> > + /* If RCU callbacks are still pending, RCU still needs this CPU. */
> > + return c;
>
> What happens if the last loop pushes out all callbacks? Then we would be
> returning 1 when we could really be returning 0. Wouldn't a better
> answer be:
>
> return per_cpu(rcu_sched_data, cpu).nxtlist ||
> per_cpu(rcu_bh_data, cpu).nxtlist;

Good point!!!

Or I can move the assignment to "c" to the end of each branch of the
"if" statement, and do something like the following:

c = !!per_cpu(rcu_sched_data, cpu).nxtlist;

But either way, you are right, it does not make sense to go to all the
trouble of forcing a grace period and then failing to take advantage
of it.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3
Prev: [34/98] [SCSI] iscsi class: modify handling of replacement timeout
Next: [73/98] md: fix small irregularity with start_ro module parameter