From: KOSAKI Motohiro on
Hi

> Due to vtime calling vgettimeofday(), its possible that an application
> could call time();create("stuff",O_RDRW); only to see the file's
> creation timestamp to be before the value returned by time.

Just dumb question.

Almost application are using gettimeofday() instead time(). It mean
your fix don't solve almost application.

So, Why can't we fix vgettimeofday() vs create() inconsistency?
This is just question, I don't intend to disagree you.


>
> A similar way to reproduce the issue is to compare the vsyscall time()
> with the syscall time(), and observe ordering issues.
>
> The modified test case from Oleg Nesterov below can illustrate this:
>
> int main(void)
> {
> time_t sec1,sec2;
> do {
> sec1 = time(&sec2);
> sec2 = syscall(__NR_time, NULL);
> } while (sec1 <= sec2);
>
> printf("vtime: %d.000000\n", sec1);
> printf("time: %d.000000\n", sec2);
> return 0;
> }
>
> The proper fix is to make vtime use the same time value as
> current_kernel_time() (which is exported via update_vsyscall) instead of
> vgettime().
>
> Thanks to Jiri Olsa for bringing up the issue and catching bugs in
> earlier verisons of this fix.
>
> Signed-off-by: John Stultz <johnstul(a)us.ibm.com>
>
> CC: Jiri Olsa <jolsa(a)redhat.com>
> CC: Thomas Gleixner <tglx(a)linutronix.de>
> CC: Oleg Nesterov <oleg(a)redhat.com>
> ---
> arch/x86/kernel/vsyscall_64.c | 11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
> index 1c0c6ab..dce0c3c 100644
> --- a/arch/x86/kernel/vsyscall_64.c
> +++ b/arch/x86/kernel/vsyscall_64.c
> @@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
> * unlikely */
> time_t __vsyscall(1) vtime(time_t *t)
> {
> - struct timeval tv;
> + unsigned seq;
> time_t result;
> if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
> return time_syscall(t);
>
> - vgettimeofday(&tv, NULL);
> - result = tv.tv_sec;
> + do {
> + seq = read_seqbegin(&__vsyscall_gtod_data.lock);
> +
> + result = __vsyscall_gtod_data.wall_time_sec;
> +
> + } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
> +
> if (t)
> *t = result;
> return result;
> --
> 1.6.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on
On Wed, 2010-07-14 at 11:40 +0900, KOSAKI Motohiro wrote:
> Hi
>
> > Due to vtime calling vgettimeofday(), its possible that an application
> > could call time();create("stuff",O_RDRW); only to see the file's
> > creation timestamp to be before the value returned by time.
>
> Just dumb question.
>
> Almost application are using gettimeofday() instead time(). It mean
> your fix don't solve almost application.

Correct, filesystem timestamps and gettimeofday can still seem
inconsistently ordered. But that is expected.

Because of granularity differences (one interface is only tick
resolution, the other is clocksource resolution), we can't interleave
the two interfaces (time and gettimeofday, respectively) and expect to
get ordered results.

This is why the fix I'm proposing is important: Filesystem timestamps
have always been tick granular, so when vtime() was made clocksource
granular (by using vgettime internally) we broke the historic
expectation that the time() interface could be interleaved with
filesystem operations.

Side note: For full nanosecond resolution of the tick-granular
timestamps, check out the clock_gettime(CLOCK_REALTIME_COARSE, ...)
interface.


> So, Why can't we fix vgettimeofday() vs create() inconsistency?
> This is just question, I don't intend to disagree you.

The only way to make gettimeofday and create consistent is to use
gettimeofday clocksource resolution timestamps for files. This however
would potentially cause a large performance hit, since each every file
timestamp would require a possibly expensive read of the clocksource.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Wed, 2010-07-14 at 11:40 +0900, KOSAKI Motohiro wrote:
> > Hi
> >
> > > Due to vtime calling vgettimeofday(), its possible that an application
> > > could call time();create("stuff",O_RDRW); only to see the file's
> > > creation timestamp to be before the value returned by time.
> >
> > Just dumb question.
> >
> > Almost application are using gettimeofday() instead time(). It mean
> > your fix don't solve almost application.
>
> Correct, filesystem timestamps and gettimeofday can still seem
> inconsistently ordered. But that is expected.
>
> Because of granularity differences (one interface is only tick
> resolution, the other is clocksource resolution), we can't interleave
> the two interfaces (time and gettimeofday, respectively) and expect to
> get ordered results.

hmmm...
Yes, times() vs gettimeofday() mekes no sense. nobody want this. but
I don't understand why we can ignore gettimeofday() vs file-tiemstamp.


> This is why the fix I'm proposing is important: Filesystem timestamps
> have always been tick granular, so when vtime() was made clocksource
> granular (by using vgettime internally) we broke the historic
> expectation that the time() interface could be interleaved with
> filesystem operations.
>
> Side note: For full nanosecond resolution of the tick-granular
> timestamps, check out the clock_gettime(CLOCK_REALTIME_COARSE, ...)
> interface.
>
>
> > So, Why can't we fix vgettimeofday() vs create() inconsistency?
> > This is just question, I don't intend to disagree you.
>
> The only way to make gettimeofday and create consistent is to use
> gettimeofday clocksource resolution timestamps for files. This however
> would potentially cause a large performance hit, since each every file
> timestamp would require a possibly expensive read of the clocksource.

Why clocksource() reading is so slow? the implementation of current
tsc clocksource ->read method is here.


static cycle_t read_tsc(struct clocksource *cs)
{
cycle_t ret = (cycle_t)get_cycles();

return ret >= clocksource_tsc.cycle_last ?
ret : clocksource_tsc.cycle_last;
}

It mean, the difference is almost only one rdtsc.
And, now we have RELATIME. then crazy atime frequently updating issue
has been solved.

Can you please elaborate your worry? I think I haven't get which case
you worry.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on
On Thu, 2010-07-15 at 10:51 +0900, KOSAKI Motohiro wrote:
> > On Wed, 2010-07-14 at 11:40 +0900, KOSAKI Motohiro wrote:
> > > Hi
> > >
> > > > Due to vtime calling vgettimeofday(), its possible that an application
> > > > could call time();create("stuff",O_RDRW); only to see the file's
> > > > creation timestamp to be before the value returned by time.
> > >
> > > Just dumb question.
> > >
> > > Almost application are using gettimeofday() instead time(). It mean
> > > your fix don't solve almost application.
> >
> > Correct, filesystem timestamps and gettimeofday can still seem
> > inconsistently ordered. But that is expected.
> >
> > Because of granularity differences (one interface is only tick
> > resolution, the other is clocksource resolution), we can't interleave
> > the two interfaces (time and gettimeofday, respectively) and expect to
> > get ordered results.
>
> hmmm...
> Yes, times() vs gettimeofday() mekes no sense. nobody want this. but
> I don't understand why we can ignore gettimeofday() vs file-tiemstamp.


So, just to be clear, this discussion is really around the question of
"Why don't filesystems use a clocksource-granular (ie: getnstimeofday())
timestamps instead of tick-granular (ie current_kernel_time())
timestamps."

However, this is *not* what the patch that started this thread was
about. In the patch I'm simply fixing an inconsistency in the vtime
interface, where it does not align with what the syscall-time interface
provides.

The issue was noticed via inconsistencies with filesystem timestamps,
but the patch does not change anything to do with filesystem timestamp
behavior.


> > This is why the fix I'm proposing is important: Filesystem timestamps
> > have always been tick granular, so when vtime() was made clocksource
> > granular (by using vgettime internally) we broke the historic
> > expectation that the time() interface could be interleaved with
> > filesystem operations.
> >
> > Side note: For full nanosecond resolution of the tick-granular
> > timestamps, check out the clock_gettime(CLOCK_REALTIME_COARSE, ...)
> > interface.
> >
> >
> > > So, Why can't we fix vgettimeofday() vs create() inconsistency?
> > > This is just question, I don't intend to disagree you.
> >
> > The only way to make gettimeofday and create consistent is to use
> > gettimeofday clocksource resolution timestamps for files. This however
> > would potentially cause a large performance hit, since each every file
> > timestamp would require a possibly expensive read of the clocksource.
>
> Why clocksource() reading is so slow? the implementation of current
> tsc clocksource ->read method is here.
>
>
> static cycle_t read_tsc(struct clocksource *cs)
> {
> cycle_t ret = (cycle_t)get_cycles();
>
> return ret >= clocksource_tsc.cycle_last ?
> ret : clocksource_tsc.cycle_last;
> }
>
> It mean, the difference is almost only one rdtsc.

Sure, for hardware that can use the TSC clocksource, it is fairly cheap,
however there are numerous systems that cannot use the TSC (or
architectures that don't have a fast TSC like counter) and in those
cases a read can take more then a microsecond.

Even with the TSC, the multiplication required to convert to nanoseconds
adds extra overhead that isn't seen when using the pre-calculated
tick-granular current_kernel_time() value.

It may not seem like much, but with filesystems each small delay adds
up.

I'm not a filesystems guy, and maybe there are some filesystems that
really want very fine-grained timestamps. If so they can consider
switching from using current_kernel_time() to getnstimeofday(). But due
to the likely performance impact, its not something I'd suggest doing.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on
On Thu, 2010-07-15 at 10:51 +0900, KOSAKI Motohiro wrote:
> > On Wed, 2010-07-14 at 11:40 +0900, KOSAKI Motohiro wrote:
> > > Hi
> > >
> > > > Due to vtime calling vgettimeofday(), its possible that an application
> > > > could call time();create("stuff",O_RDRW); only to see the file's
> > > > creation timestamp to be before the value returned by time.
> > >
> > > Just dumb question.
> > >
> > > Almost application are using gettimeofday() instead time(). It mean
> > > your fix don't solve almost application.
> >
> > Correct, filesystem timestamps and gettimeofday can still seem
> > inconsistently ordered. But that is expected.
> >
> > Because of granularity differences (one interface is only tick
> > resolution, the other is clocksource resolution), we can't interleave
> > the two interfaces (time and gettimeofday, respectively) and expect to
> > get ordered results.
>
> hmmm...
> Yes, times() vs gettimeofday() mekes no sense. nobody want this. but
> I don't understand why we can ignore gettimeofday() vs file-tiemstamp.

Oh.. and another bit worth mentioning again:
clock_gettime(CLOCK_REALTIME_COARSE, ...) provides tick-granular output
that should be able to be correctly interleaved with filesystem
timestmaps.

So if there's an application that is using gettimeofday() for logging
and having problems trying to map the log timestmaps with filesystem
timestamps, they can use clock_gettime(CLOCK_REALTIME_COARSE,...) to do
so correctly.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/