Remove stop_machine from change_clocksource [Kernel]

Prev: generic file status update
Next: [PATCH 2/2] drivers:staging:ti-st: update ABI and TODO

From: Martin Schwidefsky on 28 Jul 2010 03:20

On Tue, 27 Jul 2010 19:06:41 -0700
John Stultz <johnstul(a)us.ibm.com> wrote:

> To me, there isn't a clear reason why we're using stop_machine
> when changing clocksources instead of just taking the xtime_lock.
>
> Additionally, using stop_machine limits us from being able to
> register clocksources from timers (as needed for a following
> patch).
>
> This patch simply removes the stop_machine usage and instead
> directly calls change_clocksource, which now takes the xtime_lock.
>
> I could be totally missing something here that necessitates
> stop_machine, but in my testing it seems to function fine.
>
> Any clarifications or corrections would be appreciated!

Installing a new clocksource updates quite a lot of internal
variables, we need to make sure that no code ever uses these
variables without holding the xtime_lock as writer.

And then there is ktime_get which uses a read_seqbegin/
read_seqretry loop on the xtime_lock to get the time from the
clocksource. Consider the case where a ktime_get call already
did read_seqbegin but did not yet call the read function of
the clocksource. Another cpu registers a better clocksource
which will cause the timekeeper.clock variable to get updated
while the ktime_get call is using it. If I look at
timekeeping_get_ns I don't see anything that prevents the
compiler from generating code that reads timekeeper.clock
multiple times. Which would mix the read function from one
clocksource with the cycle_last / mask values from the new
clock. Now if we add code that prevents the compiler from
reading from timekeeper.clock multiple times we might get
away with it.

The reasoning for stop_machine is that the change of a
clocksource is a major change which has subtle side effects
so we want to make sure that nothing breaks. It is a very rare
event, we can afford to spent a little bit of time there.
Ergo stop_machine.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: john stultz on 28 Jul 2010 12:20

On Wed, 2010-07-28 at 09:17 +0200, Martin Schwidefsky wrote:
> On Tue, 27 Jul 2010 19:06:41 -0700
> John Stultz <johnstul(a)us.ibm.com> wrote:
>
> > To me, there isn't a clear reason why we're using stop_machine
> > when changing clocksources instead of just taking the xtime_lock.
> >
> > Additionally, using stop_machine limits us from being able to
> > register clocksources from timers (as needed for a following
> > patch).
> >
> > This patch simply removes the stop_machine usage and instead
> > directly calls change_clocksource, which now takes the xtime_lock.
> >
> > I could be totally missing something here that necessitates
> > stop_machine, but in my testing it seems to function fine.
> >
> > Any clarifications or corrections would be appreciated!
>
> Installing a new clocksource updates quite a lot of internal
> variables, we need to make sure that no code ever uses these
> variables without holding the xtime_lock as writer.

Agreed.

> And then there is ktime_get which uses a read_seqbegin/
> read_seqretry loop on the xtime_lock to get the time from the
> clocksource. Consider the case where a ktime_get call already
> did read_seqbegin but did not yet call the read function of
> the clocksource. Another cpu registers a better clocksource
> which will cause the timekeeper.clock variable to get updated
> while the ktime_get call is using it.

Although ktime_get will be forced to loop and try again, as any writes
require holding a write on the xtime_lock. While the xtime_lock
writelock is held, the function could possibly mix the
read/cycle_last/mask/cyc2ns values, but the results from those invalid
calculations will not be returned.

> If I look at
> timekeeping_get_ns I don't see anything that prevents the
> compiler from generating code that reads timekeeper.clock
> multiple times. Which would mix the read function from one
> clocksource with the cycle_last / mask values from the new
> clock. Now if we add code that prevents the compiler from
> reading from timekeeper.clock multiple times we might get
> away with it.

Right, but this should be ok. timekeeping_get_ns is a helper that
requires the xtime_lock to be held (such a comment is probably needed,
but there is no usage of it when the xtime_lock isn't held). While the
function may actually mix values from two clocksources in a calculation,
the results of those calculations will be thrown out and re-done via the
xtime_lock seqlock.

> The reasoning for stop_machine is that the change of a
> clocksource is a major change which has subtle side effects
> so we want to make sure that nothing breaks. It is a very rare
> event, we can afford to spent a little bit of time there.
> Ergo stop_machine.

I do agree that there can be subtle side effects when dealing with
clocksources (part of why I'm being so cautious introducing this
change), and when the stop_machine code was added it seemed reasonable.
But given the limitations of stop_machine, the more I look at the
clocksource_change code, the more I suspect stop_machine is overkill and
we can safely just take the write lock on xtime_lock.

If I'm still missing something, do let me know.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Martin Schwidefsky on 29 Jul 2010 03:20

On Wed, 28 Jul 2010 09:12:49 -0700
john stultz <johnstul(a)us.ibm.com> wrote:

> On Wed, 2010-07-28 at 09:17 +0200, Martin Schwidefsky wrote:
> > On Tue, 27 Jul 2010 19:06:41 -0700
> > John Stultz <johnstul(a)us.ibm.com> wrote:
> >
> > If I look at
> > timekeeping_get_ns I don't see anything that prevents the
> > compiler from generating code that reads timekeeper.clock
> > multiple times. Which would mix the read function from one
> > clocksource with the cycle_last / mask values from the new
> > clock. Now if we add code that prevents the compiler from
> > reading from timekeeper.clock multiple times we might get
> > away with it.
>
> Right, but this should be ok. timekeeping_get_ns is a helper that
> requires the xtime_lock to be held (such a comment is probably needed,
> but there is no usage of it when the xtime_lock isn't held). While the
> function may actually mix values from two clocksources in a calculation,
> the results of those calculations will be thrown out and re-done via the
> xtime_lock seqlock.

Ok, all callers to timekeeping_get_ns use an xtime_lock loop to
make sure that no inconsistent result gets returned.

> > The reasoning for stop_machine is that the change of a
> > clocksource is a major change which has subtle side effects
> > so we want to make sure that nothing breaks. It is a very rare
> > event, we can afford to spent a little bit of time there.
> > Ergo stop_machine.
>
> I do agree that there can be subtle side effects when dealing with
> clocksources (part of why I'm being so cautious introducing this
> change), and when the stop_machine code was added it seemed reasonable.
> But given the limitations of stop_machine, the more I look at the
> clocksource_change code, the more I suspect stop_machine is overkill and
> we can safely just take the write lock on xtime_lock.
>
> If I'm still missing something, do let me know.

What about a clocksource_unregister while a cpu is in the middle of a
read_seqbegin/timekeeping_get_ns/read_seqretry? The clocksource structure
is "free" after the successful call to the unregister. At least in theory
this could be a use after free. The race window is tiny but on virtual
systems there can be an arbitrary delay in the ktime_get sequence.

I agree that stop_machine is the big gun and restricts the code in the way
how the clocksource functions may be call. But it is safe, no?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: john stultz on 29 Jul 2010 16:50

On Thu, 2010-07-29 at 09:11 +0200, Martin Schwidefsky wrote:
> On Wed, 28 Jul 2010 09:12:49 -0700
> john stultz <johnstul(a)us.ibm.com> wrote:
> >
> > I do agree that there can be subtle side effects when dealing with
> > clocksources (part of why I'm being so cautious introducing this
> > change), and when the stop_machine code was added it seemed reasonable.
> > But given the limitations of stop_machine, the more I look at the
> > clocksource_change code, the more I suspect stop_machine is overkill and
> > we can safely just take the write lock on xtime_lock.
> >
> > If I'm still missing something, do let me know.
>
> What about a clocksource_unregister while a cpu is in the middle of a
> read_seqbegin/timekeeping_get_ns/read_seqretry? The clocksource structure
> is "free" after the successful call to the unregister. At least in theory
> this could be a use after free. The race window is tiny but on virtual
> systems there can be an arbitrary delay in the ktime_get sequence.

Huh. At first I thought "but we don't yet implement
clocksource_unregister!" but of course do now (I guess that TODO in the
clocksource.c header can be yanked :).

So yes, unregister has been contentious in the past for this very
reason. Once registered, its really hard to find a safe point when it
can be un-registered. Stop machine mostly solves this (although one
should note: vsyscall enabled clocksources really can't be freed, as
their vread() page needs to be statically mapped into userspace).

So while stop_machine is a solution here, it would make more sense to me
to use stop_machine (or maybe even a different method, as it sort of
screams RCU to me) to make sure all the cpus are out of the xtime_lock
critical section prior to returning from unregister_clocksource, rather
then stopping everything for the clocksource change.

> I agree that stop_machine is the big gun and restricts the code in the way
> how the clocksource functions may be call. But it is safe, no?

Actually, my reading of stop_machine makes me hesitate a bit, as I'm not
sure if with kernel preemption, we're sure to avoid stopping a thread
mid-syscall to gettimeofday. Anyone have a clue if that's avoided?

Regardless, we need some other method then stop_machine to register
clocksources, as stop_machine is just too limiting.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: john stultz on 29 Jul 2010 19:30

On Thu, 2010-07-29 at 16:08 -0700, john stultz wrote:
> On Thu, 2010-07-29 at 13:49 -0700, john stultz wrote:
> > On Thu, 2010-07-29 at 09:11 +0200, Martin Schwidefsky wrote:
> > > What about a clocksource_unregister while a cpu is in the middle of a
> > > read_seqbegin/timekeeping_get_ns/read_seqretry? The clocksource structure
> > > is "free" after the successful call to the unregister. At least in theory
> > > this could be a use after free. The race window is tiny but on virtual
> > > systems there can be an arbitrary delay in the ktime_get sequence.
> >
> > So yes, unregister has been contentious in the past for this very
> > reason. Once registered, its really hard to find a safe point when it
> > can be un-registered. Stop machine mostly solves this (although one
> > should note: vsyscall enabled clocksources really can't be freed, as
> > their vread() page needs to be statically mapped into userspace).
> >
> > So while stop_machine is a solution here, it would make more sense to me
> > to use stop_machine (or maybe even a different method, as it sort of
> > screams RCU to me) to make sure all the cpus are out of the xtime_lock
> > critical section prior to returning from unregister_clocksource, rather
> > then stopping everything for the clocksource change.
>
>
> Below is a rough patch to use stop_machine to get the same level of race
> protection for clocksource_unregister as we have currently in Linus's
> tree (which may possibly have holes in it?).
>
> Comments or suggestions for other ideas would be appreciated.
>
> I'm thinking RCU might be really close to what we actually want here,
> but I'd like to be able to avoid any extra work on the read-side (ie:
> even the preempt_disable()), and would even be more prone to disallowing
> clocksource unregistration then impacting the xtime_lock read side.
>
>
> Any other thoughts?

Actually, the more I think about it.. The more I really just think we
should kill clocksource_unregister and simply not allow it.

Part of the reason is that we have other issues lurking under here, such
as: "what do we do if someone unregisters the only HRT capable
clocksource? As there's currently no way to fall back from HRT mode to
non HRT mode."

It just adds a ton of complexity and issues for really zero gain. The
only reasonable use-case I can come up with is having a clocksource
loaded via a module, and then wanting to unload it.

So while loading clocksources as a module is a nice feature that could
save folks in a pinch (think old distro kernels needing a clock fix on
new hardware), unregister and removal really doesn't have much
functional use. Its just only nice an symmetrical.

So unless anyone else objects, I'm prone to kill off unregister (and
change the single user's error-handling path to delay registration until
the hardware is known to be good).

Any counter points?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: generic file status update
Next: [PATCH 2/2] drivers:staging:ti-st: update ABI and TODO