x86: Export tsc related information in sysfs [Kernel]

Prev: [PATCH 3/4] drivers/hid: Eliminate use after free
Next: Loan Application

From: Arjan van de Ven on 16 May 2010 16:30

On Sun, 16 May 2010 09:42:40 -0700 (PDT)
Dan Magenheimer <dan.magenheimer(a)oracle.com> wrote:

> > From: Thomas Gleixner [mailto:tglx(a)linutronix.de]
> > Nah, there are systems which will have it set to 1:
> > Dig out your good old Pentium-I box and enjoy.
>
> Hot stove syndrome again? Are you truly saying that there
> are NO single-socket multi-core systems that don't have
> stupid firmware (SMI and/or BIOS)?

there are no systems *where we can know* this.
Some of the stupid SMI only triggers on higher temperature situations
etc. Impossible to know upfront.

> If things are this bad, why on earth would the kernel itself
> EVER use TSC even as its own internal clocksource?

Why do you think we do extensive and continuous validation of the tsc
(and soon, continuous recalibration)

> But that doesn't mean the vast majority of latest generation
> single-socket systems can't set "tsc_reliable" to 1. Or that
> the kernel is responsible for detecting and/or correcting
> every system with buggy firmware.

sadly this also shows up on single socket systems... much more than we
like.

This is why I really really hate having apps run tsc directly.
A VDSO call at least gives the kernel the option to ensure
correctness... even if it starts out fast and goes slow suddenly after
3 weeks when the AC in the datacenter got maintenance for an hour.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 16 May 2010 21:40

> > > From: Thomas Gleixner [mailto:tglx(a)linutronix.de]
> > > What we can talk about is a vget_tsc_raw() interface along with a
>
> What I have in mind and what I'm working on for quite a while is going
> to work on both bare metal and VMs w/o hidden slowness.

Well, if this can be done today/soon and is fast enough
(say <2x the cycles of a rdtsc), I am very interested and
"won't let my friends use rdtsc" :-) Anything I can do
to help?

> From: Thomas Gleixner [mailto:tglx(a)linutronix.de]
> We try to use it for performance sake, but the kernel does at least
> it's very best to find out when it goes bad. We then switch back to a
> hpet or pm-timer which is horrible performance wise but does not screw
> up timekeeping and everything which relies on it completely.
> :
> As I said, we try our very best to determine when things go awry, but
> there are small errors which occur either sporadic or after longer
> uptime which we cannot yet detect reliably. Multi-socket falls into
> that category, but we are working on that.

> From: Arjan van de Ven [mailto:arjan(a)infradead.org]
> Why do you think we do extensive and continuous validation of the tsc
> (and soon, continuous recalibration)

So the kernel has the ability to detect that the TSC
is "OK for now", but must use some kind of polling
(periodic warp test) to recognize that TSC has
gone "bad". As long as TSC is good AND a sophisticated
enterprise app understands that TSC might go bad at
some point in the future AND if the kernel exposes
"goodness" information AND the app (like the kernel) is
resilient** to the possibility that there might be some
period of time that obtained timestamps might be
"bad" before the app polls the kernel to find out that
the kernel says they are indeed "bad"... why should it
be forbidden for an app to use TSC?

(** e.g. increments its own tsc_last to ensure time never goes
backwards)

It seems like the only advantages the kernel has here over
a reasonably intelligent app is that: 1) the kernel can run
a warp test and the app can't, and 2) the kernel can
estimate the frequency of the TSC and the app can't.
AND, in the case of a virtual machine, the kernel has
neither of these advantages anyway.

So though I now understand and agree that neither the kernel
nor an app can guarantee that TSC won't unexpectedly go from
"good" to "bad", I still don't understand why "TSC goodness"
information shouldn't be exposed to userland, where an
intelligent enterprise app can choose to use TSC when it is good
(for the same reason the kernel does: "for performance sake")
and choose to stop using it when it goes bad (for the same
reason the kernel does: to "not screw up timekeeping").

It sounds as if you are saying that "the kernel is allowed
to use a rope because if it accidentally gets the rope
around its neck, it has a knife to ensure it doesn't hang
itself" BUT "the app isn't allowed to use a rope because
it might hang itself and we'll be damned if we loan
our knife to an app because, well... because it doesn't
need a knife because we said it shouldn't use the rope".

I think you can understand why this isn't a very satisfying
explanation.

P.S. Thanks for taking the time to discuss this!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arjan van de Ven on 17 May 2010 01:10

On Sun, 16 May 2010 18:31:30 -0700 (PDT)
Dan Magenheimer <dan.magenheimer(a)oracle.com> wrote:

>
> It seems like the only advantages the kernel has here over
> a reasonably intelligent app is that: 1) the kernel can run
> a warp test and the app can't, and 2) the kernel can
> estimate the frequency of the TSC and the app can't.

and 3) the kernel gets thermal interrupts and the app does not
and 4) the kernel decides which power management to use when
and 5) the kernel can find out if SMI's happened, and the app cannot.
and 6) the kernel can access tsc and a per cpu offset/frequency
data atomically, without being scheduled to another CPU. The app cannot
[well it can ask the kernel to be pinned, and that's a 99.99% thing,
but still]

[snipped a bunch of twists of my argument that are not correct]

look we're not disabling ring 3 tsc. We could, but we don't.
we're just telling you that WE as kernel cannot tell you, in
an architectural and long term (multiple kernel versions and
hardware generations) stable way, when the tsc is "usable".
Because WE know it is barely if any so. We continuously add
workarounds, calibrations and tweaks for this, and stop using it
at runtime when something smells funny and defeats our logic.

If you want to find out yourself if the tsc is good enough for you
that is one thing.... but if you want the kernel to have an official
interface for it.... the kernel has to live by that commitment.
We cannot put in that interface "oh and you need to implement the same
workarounds, scaling and offsets as the kernel does", because that's
in a huge flux, and will change from kernel version to kernel version.
The only shot you could get is some vsyscall/vdso function that gives
you a unit (but that is not easy given per cpu offset/frequency/etc..
but at least the kernel can try)

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 17 May 2010 06:30

> Yes, understood. But the kernel doesn't expose a "gettimeofday
> performance sucks" flag either. If it did (or in the case of
> the patch, if tsc_reliable is zero) the application could at least
> choose to turn off the 10000-100000 timestamps/second and log
> a message saying "you are running on old hardware so you get
> fewer features".

I don't think anyone would object to exporting such a flag if
it's cleanly designed.

Getting the semantics right for that might be somewhat tricky
though. How is "slow" defined?

> A CPU-hotplugable system is a good example of a case where
> the kernel should expose that tsc_reliable is 0. (I've heard

That would mean that a large class of systems which
are always hotplug capable (even if it's not used)
would never get fast TSC time.

Wasn't the goal here to be faster?

> anecdotally that CPU hotplug into a QPI or Hypertransport system
> will have some other interesting challenges, so may require some
> special kernel parameters anyway.) Even if tsc_reliable were
> only enabled if a "no-cpu_hotplug" kernel parameter is set,
> that is still useful. And with cores-per-socket (and even
> nodes-per-socket) going up seemingly every day, multi-socket
> systems will likely be an ever smaller percentage of new
> systems.

Still the people running them will expect as good performance
as possible.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Thomas Gleixner on 17 May 2010 06:30

On Sun, 16 May 2010, Dan Magenheimer wrote:

> > > > From: Thomas Gleixner [mailto:tglx(a)linutronix.de]
> > > > What we can talk about is a vget_tsc_raw() interface along with a
> >
> > What I have in mind and what I'm working on for quite a while is going
> > to work on both bare metal and VMs w/o hidden slowness.
>
> Well, if this can be done today/soon and is fast enough
> (say <2x the cycles of a rdtsc), I am very interested and

Are you going to measure that with rdtsc() ? :)

> "won't let my friends use rdtsc" :-) Anything I can do
> to help?

Yes, stop trying to convince me that rdtsc in apps is a good idea. :)

> It sounds as if you are saying that "the kernel is allowed
> to use a rope because if it accidentally gets the rope
> around its neck, it has a knife to ensure it doesn't hang
> itself" BUT "the app isn't allowed to use a rope because
> it might hang itself and we'll be damned if we loan
> our knife to an app because, well... because it doesn't
> need a knife because we said it shouldn't use the rope".
>
> I think you can understand why this isn't a very satisfying
> explanation.

What I understand is that you want us to give out the rope only and
when things go wrong let us kernel developers deal with the bugreports
about the missing knife.

Please understand that once we expose that tsc_reliable information we
are responsible for its correctness. People will use it whether the
enterprise entity who wants this feature has qualified that particular
piece of hardware or not. And while the support of that enity refuses
to help on non qualified hardware (your own words), we'll end up with
the mess which was created to help that very entity.

I think you understand that I have no intention to put a ticking time
bomb into the code I'm responsible for. I really have better things to
do than shooting myself in the foot.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: [PATCH 3/4] drivers/hid: Eliminate use after free
Next: Loan Application