From: Thomas Gleixner on
On Fri, 25 Jun 2010, Ulrich Drepper wrote:

Can you please use a mail client with proper line breaks ?

> ----- "Darren Hart" <dvhltc(a)us.ibm.com> wrote:
> > Unless there is some good reason to object to breaking the API that I
> > am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> > seems more intuitive to me).
>
> It's only not intuitive because Oleg misrepresented or at least
> didn't describe the issue.

> The kernel already catches invalid timespec values. Unfortunately
> the code used comes from the time when all timeouts where specified
> with relative values. In such situations negative tv_sec values
> were in fact invalid and rejected with EINVAL.

> But for absolute timeouts tv_sec = -1 means a time before Epoch.
> This is not an invalid value, it just is one of many points in time
> which have passed and therefore the kernel has to respond with
> ETIMEDOUT.

That's simply wrong.

... or the TIMER_ABSTIME flag was specified in flags and the rqtp
argument is outside the range for the clock specified by clock_id;

And I consider anything before the EPOCH or before the computer booted
outside of the range. Simply because that's outside the range which we
can read back from the clock, out of the range to which we can set the
clock.

And it's completely illogical to treat relative and abolute timeouts
different. If we'd accept that before the EPOCH or before the computer
started is valid for ABSTIME, then there is no freaking reason to
treat relative timeouts any different.

> This is no semantic change or anything like that. It pure and
> simply a bug fix. When Thomas worked on that come we simply missed
> updating the test for invalid timespec values.

No, that's how we treat every damned timespec in the
syscalls. clock_nanosleep(ABSTIME) has this behaviour forever and we
have this behaviour in sys_futex since we merged PI futex support way
before we added the BITSET stuff.

So just because you messed up your glibc implementation you want us to
fix glibc in the kernel based on some backwards arguments ?

> The kernel code should be fixed to always check tv_nsec for < 0 and
> > 1000000000. But the tv_sec test for < 0 should be skipped if the
> timeout value is interpreted as an absolute time value.

Definitely NOT!

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Fri, 25 Jun 2010, Ulrich Drepper wrote:

> ----- "Thomas Gleixner" <tglx(a)linutronix.de> wrote:
> > tv->sec < 0 is definitely an invalid value for both CLOCK_REALTIME
> > and CLOCK_MONOTONIC.

> CLOCK_MONOTONIC is different but it's wrong for CLOCK_REALTIME.

Why is CLOCK_MONOTONIC any different ? According to you argumentation
it's just _BEFORE_ we started the system.

> Why would it be invalid? Because times before Epoch will not be
> used? By that logic you would have to declare all values before
> Linus' first running kernel as invalid. None of this makes sense.

That's utter bullshit and you know that. Every system which does not
have an RTC starts with the EPOCH and the range check is correct in
the terms of the specification:

Outside the valid range of the clock_id.

The kernel treats the range valid which it can theoretically hand back
to user space:

EPOCH <= CLOCK_REALTIME <= Y2038_or_far_away

0 <= CLOCK_MONOTONIC <= far_away

> The tv_sec in timespec is of type time_t and for absolute time
> values the same semantics as for naked time_t values applies. The
> absolute time is
> epoch + tv_sec + tv_nsec / 1000000000
>
> If tv_sec is negative these are values before epoch.
>
> If there are other interfaces with absolute timeouts they certainly
> should be changed as well.

If we'd change that, then we'd need to allow negative relative
timeouts as well and not restrict it to CLOCK_REALTIME.

If we'd change the notion of "valid" then we do it in a consistent
way for everything not just for a corner case of some glibc wreckage.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 06/25, Ulrich Drepper wrote:
>
> ----- "Darren Hart" <dvhltc(a)us.ibm.com> wrote:
> > Unless there is some good reason to object to breaking the API that I
> > am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> > seems more intuitive to me).
>
> It's only not intuitive because Oleg misrepresented or at least didn't
> describe the issue.
> ...
> It pure and simply
> a bug fix.

Because personally I disagree that sys_futex()->timespec_valid() is buggy.

I repeated this many times during the previous discussion. I didn't even
try to judge if it is really right or not, because my opinion doesn't matter
at all here.

But it is unfair (imho) to state this code is buggy. The code is correct
even if it does not match your expectations, it works as expected/designed.
And, sys_futex() does this since 2006 iirc.


Honestly, it looks a bit strange to me that you blame the correct code,
and at the same time you ignore the test-case which hangs because the
kernel returns -EFAULT saying that this is the caller's problem.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jakub Jelinek on
On Mon, Jun 28, 2010 at 03:58:25PM +0200, Oleg Nesterov wrote:
> Honestly, it looks a bit strange to me that you blame the correct code,

Whether that is correct or not is what is being disputed.

> and at the same time you ignore the test-case which hangs because the
> kernel returns -EFAULT saying that this is the caller's problem.

The userspace code reads the abstime->tv_nsec value, so if it wouldn't
be valid address, the code would already segfault. And that's fine, POSIX
certainly allows that, reporting EFAULT isn't required. Well, it doesn't
read abstime->tv_sec in the assembly version, so if you try hard, you can
avoid the segfault, yet get EFAULT from futex syscall by putting abstime
8 bytes before start of some page with previous page not mmapped.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 06/28, Jakub Jelinek wrote:
>
> On Mon, Jun 28, 2010 at 03:58:25PM +0200, Oleg Nesterov wrote:
> > Honestly, it looks a bit strange to me that you blame the correct code,
>
> Whether that is correct or not is what is being disputed.

OK. I only argued with the "buggy" term. Once again, the code works
as expected.

> > and at the same time you ignore the test-case which hangs because the
> > kernel returns -EFAULT saying that this is the caller's problem.
>
> The userspace code reads the abstime->tv_nsec value, so if it wouldn't
> be valid address, the code would already segfault. And that's fine, POSIX
> certainly allows that, reporting EFAULT isn't required. Well, it doesn't
> read abstime->tv_sec in the assembly version, so if you try hard, you can
> avoid the segfault, yet get EFAULT from futex syscall by putting abstime
> 8 bytes before start of some page with previous page not mmapped.

And this is exactly what I did to prove that (in my opinion) libc needs
fixes anyway, even if we change the kernel to treat tv_sec < 0 specially.

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/mman.h>

pthread_rwlock_t rwlock;

static struct timespec *make_efault_ts(void)
{
int page_size = sysconf(_SC_PAGESIZE);
void *ptr = mmap(0, 2 * page_size, PROT_READ|PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
munmap(ptr, page_size);
return ptr + page_size - sizeof(long);
}

static void *thread_func(void *arg)
{
int ret = pthread_rwlock_timedwrlock(&rwlock, make_efault_ts());
printf("lock: ret=%d %m\n", ret);
return NULL;
}

int main(int argv, char *argc[])
{
pthread_t tid;

pthread_rwlock_init(&rwlock, NULL);
pthread_rwlock_wrlock(&rwlock);

pthread_create(&tid, NULL, thread_func, NULL);
pthread_join(tid, NULL);

return 0;
}

It may hang or segfault on your machine, this depends on libc version.
It hangs on the testing machine which also suffers from the reported
timespec_valid() issue. I did this test-case looking at
"objdump -d /lib64/libpthread.so".

To me, this looks like a bug in libc, but I won't insist.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/