From: Oleg Nesterov on
Hello.

Another stupid question about the trivial problem I am going to ask,
just to report the authoritative answer back to bugzilla. The problem
is, personally I am not sure we should/can add the user-visible change
required by glibc maintainers, and I am in no position to suggest them
to fix the user-space code instead.


In short, glibc developers believe that sys_futex(ts) is buggy and
needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
ts->tv_sec < 0 and the timeout is absolute.

Ignoring the possible cleanups/microoptimizations, something like this:

--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
cmd == FUTEX_WAIT_REQUEUE_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
+
+ // absolute timeout
+ if (cmd != FUTEX_WAIT) {
+ if (ts->tv_nsec >= NSEC_PER_SEC)
+ return -EINVAL;
+ if (ts->tv_sec < 0)
+ return -ETIMEDOUT;
+ }
+
+
if (!timespec_valid(&ts))
return -EINVAL;

------------------------------------------------------------------------

Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
forever if ts->tv_sec < 0.

To clarify: this depends on libc version and arch.

This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
roughly does:

for (;;) {
if (fast_path_succeeds(rwlock))
return 0;

if (ts->tv_nsec >= NSEC_PER_SEC)
return EINVAL;

errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
if (errcode == ETIMEDOUT)
return ETIMEDOUT;
}

and since the kernel return EINVAL due to !timespec_valid(ts), the
code above loops forever.

(btw, we have same problem with EFAULT, and this is considered as
a caller's problem).

IOW, pthread_rwlock_timedwrlock() assumes that in this case
sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.



So, the question: do you think we can change sys_futex() to make
glibc happy?

Or, do you think it is user-space who should check tv_sec < 0 if
it wants ETIMEDOUT with the negative timeout ?

Thanks,

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
----- "Darren Hart" <dvhltc(a)us.ibm.com> wrote:
> Unless there is some good reason to object to breaking the API that I
> am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> seems more intuitive to me).

It's only not intuitive because Oleg misrepresented or at least didn't describe the issue.

The kernel already catches invalid timespec values. Unfortunately the code used comes from the time when all timeouts where specified with relative values. In such situations negative tv_sec values were in fact invalid and rejected with EINVAL.

But for absolute timeouts tv_sec = -1 means a time before Epoch. This is not an invalid value, it just is one of many points in time which have passed and therefore the kernel has to respond with ETIMEDOUT.

This is no semantic change or anything like that. It pure and simply a bug fix. When Thomas worked on that come we simply missed updating the test for invalid timespec values.

The kernel code should be fixed to always check tv_nsec for < 0 and > 1000000000. But the tv_sec test for < 0 should be skipped if the timeout value is interpreted as an absolute time value.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on
On 06/25/2010 12:20 PM, Oleg Nesterov wrote:
> Hello.
>

Hi Oleg,

> Another stupid question about the trivial problem I am going to ask,
> just to report the authoritative answer back to bugzilla. The problem
> is, personally I am not sure we should/can add the user-visible change
> required by glibc maintainers, and I am in no position to suggest them
> to fix the user-space code instead.
>
> In short, glibc developers believe that sys_futex(ts) is buggy and
> needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
> ts->tv_sec< 0 and the timeout is absolute.
>

Just a question of semantics I guess. Seems reasonable to me to call a
negative timeout invalid. However, I certainly don't feel strongly
enough about it to fight for it. Glibc is the principle user of
sys_futex(). While there are certainly other users out there (Mathieu
Desnoyers' Userspace RCU comes to mind), I doubt any of them depend on
-EINVAL for negative timeouts to function properly.

Unless there is some good reason to object to breaking the API that I am
missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL seems
more intuitive to me).

--
Darren "Little Fish" Hart

> Ignoring the possible cleanups/microoptimizations, something like this:
>
> --- x/kernel/futex.c
> +++ x/kernel/futex.c
> @@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
> cmd == FUTEX_WAIT_REQUEUE_PI)) {
> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
> return -EFAULT;
> +
> + // absolute timeout
> + if (cmd != FUTEX_WAIT) {
> + if (ts->tv_nsec>= NSEC_PER_SEC)
> + return -EINVAL;
> + if (ts->tv_sec< 0)
> + return -ETIMEDOUT;
> + }
> +
> +
> if (!timespec_valid(&ts))
> return -EINVAL;
>
> ------------------------------------------------------------------------
>
> Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
> forever if ts->tv_sec< 0.
>
> To clarify: this depends on libc version and arch.
>
> This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
> roughly does:
>
> for (;;) {
> if (fast_path_succeeds(rwlock))
> return 0;
>
> if (ts->tv_nsec>= NSEC_PER_SEC)
> return EINVAL;
>
> errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
> if (errcode == ETIMEDOUT)
> return ETIMEDOUT;
> }
>
> and since the kernel return EINVAL due to !timespec_valid(ts), the
> code above loops forever.
>
> (btw, we have same problem with EFAULT, and this is considered as
> a caller's problem).
>
> IOW, pthread_rwlock_timedwrlock() assumes that in this case
> sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
> I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.
>
>
>
> So, the question: do you think we can change sys_futex() to make
> glibc happy?
>
> Or, do you think it is user-space who should check tv_sec< 0 if
> it wants ETIMEDOUT with the negative timeout ?
>
> Thanks,
>
> Oleg.
>


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Darren Hart (dvhltc(a)us.ibm.com) wrote:
> On 06/25/2010 12:20 PM, Oleg Nesterov wrote:
>> Hello.
>>
>
> Hi Oleg,
>
>> Another stupid question about the trivial problem I am going to ask,
>> just to report the authoritative answer back to bugzilla. The problem
>> is, personally I am not sure we should/can add the user-visible change
>> required by glibc maintainers, and I am in no position to suggest them
>> to fix the user-space code instead.
>>
>> In short, glibc developers believe that sys_futex(ts) is buggy and
>> needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
>> ts->tv_sec< 0 and the timeout is absolute.
>>
>
> Just a question of semantics I guess. Seems reasonable to me to call a
> negative timeout invalid. However, I certainly don't feel strongly
> enough about it to fight for it. Glibc is the principle user of
> sys_futex(). While there are certainly other users out there (Mathieu
> Desnoyers' Userspace RCU comes to mind), I doubt any of them depend on
> -EINVAL for negative timeouts to function properly.

Userspace RCU does not use futex timeouts (the parameter is always NULL). So
this change/fix won't have any effect as far as urcu is concerned.

Thanks,

Mathieu

>
> Unless there is some good reason to object to breaking the API that I am
> missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL seems
> more intuitive to me).
>
> --
> Darren "Little Fish" Hart
>
>> Ignoring the possible cleanups/microoptimizations, something like this:
>>
>> --- x/kernel/futex.c
>> +++ x/kernel/futex.c
>> @@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
>> cmd == FUTEX_WAIT_REQUEUE_PI)) {
>> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
>> return -EFAULT;
>> +
>> + // absolute timeout
>> + if (cmd != FUTEX_WAIT) {
>> + if (ts->tv_nsec>= NSEC_PER_SEC)
>> + return -EINVAL;
>> + if (ts->tv_sec< 0)
>> + return -ETIMEDOUT;
>> + }
>> +
>> +
>> if (!timespec_valid(&ts))
>> return -EINVAL;
>>
>> ------------------------------------------------------------------------
>>
>> Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
>> forever if ts->tv_sec< 0.
>>
>> To clarify: this depends on libc version and arch.
>>
>> This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
>> roughly does:
>>
>> for (;;) {
>> if (fast_path_succeeds(rwlock))
>> return 0;
>>
>> if (ts->tv_nsec>= NSEC_PER_SEC)
>> return EINVAL;
>>
>> errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
>> if (errcode == ETIMEDOUT)
>> return ETIMEDOUT;
>> }
>>
>> and since the kernel return EINVAL due to !timespec_valid(ts), the
>> code above loops forever.
>>
>> (btw, we have same problem with EFAULT, and this is considered as
>> a caller's problem).
>>
>> IOW, pthread_rwlock_timedwrlock() assumes that in this case
>> sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
>> I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.
>>
>>
>>
>> So, the question: do you think we can change sys_futex() to make
>> glibc happy?
>>
>> Or, do you think it is user-space who should check tv_sec< 0 if
>> it wants ETIMEDOUT with the negative timeout ?
>>
>> Thanks,
>>
>> Oleg.
>>
>
>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
----- "Thomas Gleixner" <tglx(a)linutronix.de> wrote:
> tv->sec < 0 is definitely an invalid value for both CLOCK_REALTIME
> and CLOCK_MONOTONIC.

CLOCK_MONOTONIC is different but it's wrong for CLOCK_REALTIME. Why would it be invalid? Because times before Epoch will not be used? By that logic you would have to declare all values before Linus' first running kernel as invalid. None of this makes sense.

The tv_sec in timespec is of type time_t and for absolute time values the same semantics as for naked time_t values applies. The absolute time is

epoch + tv_sec + tv_nsec / 1000000000

If tv_sec is negative these are values before epoch.

If there are other interfaces with absolute timeouts they certainly should be changed as well.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/