From: Wenji Wu on
> That yield() will need to be removed - yield()'s behaviour is truly
> awfulif the system is otherwise busy. What is it there for?

Please read the uploaded paper, which has detailed description.

thanks,

wenji

----- Original Message -----
From: Andrew Morton <akpm(a)osdl.org>
Date: Wednesday, November 29, 2006 7:08 pm
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

> On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> David Miller <davem(a)davemloft.net> wrote:
>
> >
> > Please, it is very difficult to review your work the way you have
> > submitted this patch as a set of 4 patches. These patches have not
> > been split up "logically", but rather they have been split up "per
> > file" with the same exact changelog message in each patch posting.
> > This is very clumsy, and impossible to review, and wastes a lot of
> > mailing list bandwith.
> >
> > We have an excellent file, called
> Documentation/SubmittingPatches, in
> > the kernel source tree, which explains exactly how to do this
> > correctly.
> >
> > By splitting your patch into 4 patches, one for each file touched,
> > it is impossible to review your patch as a logical whole.
> >
> > Please also provide your patch inline so people can just hit reply
> > in their mail reader client to quote your patch and comment on it.
> > This is impossible with the attachments you've used.
> >
>
> Here you go - joined up, cleaned up, ported to mainline and test-
> compiled.
> That yield() will need to be removed - yield()'s behaviour is truly
> awfulif the system is otherwise busy. What is it there for?
>
>
>
> From: Wenji Wu <wenji(a)fnal.gov>
>
> For Linux TCP, when the network applcaiton make system call to move
> data from
> socket's receive buffer to user space by calling tcp_recvmsg().
> The socket
> will be locked. During this period, all the incoming packet for
> the TCP
> socket will go to the backlog queue without being TCP processed
>
> Since Linux 2.6 can be inerrupted mid-task, if the network application
> expires, and moved to the expired array with the socket locked, all
> thepackets within the backlog queue will not be TCP processed till
> the network
> applicaton resume its execution. If the system is heavily loaded,
> TCP can
> easily RTO in the Sender Side.
>
>
>
> include/linux/sched.h | 2 ++
> kernel/fork.c | 3 +++
> kernel/sched.c | 24 ++++++++++++++++++------
> net/ipv4/tcp.c | 9 +++++++++
> 4 files changed, 32 insertions(+), 6 deletions(-)
>
> diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c~tcp-speedup
> +++ a/net/ipv4/tcp.c
> @@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru
> struct task_struct *user_recv = NULL;
> int copied_early = 0;
>
> + current->backlog_flag = 1;
> +
> lock_sock(sk);
>
> TCP_CHECK_TIMER(sk);
> @@ -1468,6 +1470,13 @@ skip_copy:
>
> TCP_CHECK_TIMER(sk);
> release_sock(sk);
> +
> + current->backlog_flag = 0;
> + if (current->extrarun_flag == 1){
> + current->extrarun_flag = 0;
> + yield();
> + }
> +
> return copied;
>
> out:
> diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h
> --- a/include/linux/sched.h~tcp-speedup
> +++ a/include/linux/sched.h
> @@ -1023,6 +1023,8 @@ struct task_struct {
> #ifdef CONFIG_TASK_DELAY_ACCT
> struct task_delay_info *delays;
> #endif
> + int backlog_flag; /* packets wait in tcp backlog queue flag */
> + int extrarun_flag; /* extra run flag for TCP performance */
> };
>
> static inline pid_t process_group(struct task_struct *tsk)
> diff -puN kernel/sched.c~tcp-speedup kernel/sched.c
> --- a/kernel/sched.c~tcp-speedup
> +++ a/kernel/sched.c
> @@ -3099,12 +3099,24 @@ void scheduler_tick(void)
>
> if (!rq->expired_timestamp)
> rq->expired_timestamp = jiffies;
> - if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> - enqueue_task(p, rq->expired);
> - if (p->static_prio < rq->best_expired_prio)
> - rq->best_expired_prio = p->static_prio;
> - } else
> - enqueue_task(p, rq->active);
> + if (p->backlog_flag == 0) {
> + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> + enqueue_task(p, rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p-
> >static_prio;+ } else
> + enqueue_task(p, rq->active);
> + } else {
> + if (expired_starving(rq)) {
> + enqueue_task(p,rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p-
> >static_prio;+ } else {
> + if (!TASK_INTERACTIVE(p))
> + p->extrarun_flag = 1;
> + enqueue_task(p,rq->active);
> + }
> + }
> } else {
> /*
> * Prevent a too long timeslice allowing a task to
> monopolizediff -puN kernel/fork.c~tcp-speedup kernel/fork.c
> --- a/kernel/fork.c~tcp-speedup
> +++ a/kernel/fork.c
> @@ -1032,6 +1032,9 @@ static struct task_struct *copy_process(
> clear_tsk_thread_flag(p, TIF_SIGPENDING);
> init_sigpending(&p->pending);
>
> + p->backlog_flag = 0;
> + p->extrarun_flag = 0;
> +
> p->utime = cputime_zero;
> p->stime = cputime_zero;
> p->sched_time = 0;
> _
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Wenji Wu <wenji(a)fnal.gov>
Date: Wed, 29 Nov 2006 19:56:58 -0600

> >We could also pepper tcp_recvmsg() with some very carefully placed
> >preemption disable/enable calls to deal with this even with
> >CONFIG_PREEMPT enabled.
>
> I also think about this approach. But since the "problem" happens in
> the 2.6 Desktop and Low-latency Desktop (not server), system
> responsiveness is a key feature, simply placing preemption
> disabled/enable call might not work. If you want to place
> preemption disable/enable calls within tcp_recvmsg, you have to put
> them in the very beginning and end of the call. Disabling preemption
> would degrade system responsiveness.

We can make explicitl preemption checks in the main loop of
tcp_recvmsg(), and release the socket and run the backlog if
need_resched() is TRUE.

This is the simplest and most elegant solution to this problem.

The one suggested in your patch and paper are way overkill, there is
no reason to solve a TCP specific problem inside of the generic
scheduler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Wed, 2006-11-29 at 17:08 -0800, Andrew Morton wrote:
> + if (p->backlog_flag == 0) {
> + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> + enqueue_task(p, rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p->static_prio;
> + } else
> + enqueue_task(p, rq->active);
> + } else {
> + if (expired_starving(rq)) {
> + enqueue_task(p,rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p->static_prio;
> + } else {
> + if (!TASK_INTERACTIVE(p))
> + p->extrarun_flag = 1;
> + enqueue_task(p,rq->active);
> + }
> + }

(oh my, doing that to the scheduler upsets my tummy, but that aside...)

I don't see how that can really solve anything. "Interactive" tasks
starting to use cpu heftily can still preempt and keep the special cased
cpu hog off the cpu for ages. It also only takes one task in the
expired array to trigger the forced array switch with a fully loaded
cpu, and once any task hits the expired array, a stream of wakeups can
prevent the switch from completing for as long as you can keep wakeups
happening.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* David Miller <davem(a)davemloft.net> wrote:

> We can make explicitl preemption checks in the main loop of
> tcp_recvmsg(), and release the socket and run the backlog if
> need_resched() is TRUE.
>
> This is the simplest and most elegant solution to this problem.

yeah, i like this one. If the problem is "too long locked section", then
the most natural solution is to "break up the lock", not to "boost the
priority of the lock-holding task" (which is what the proposed patch
does).

[ Also note that "sprinkle the code with preempt_disable()" kind of
solutions, besides hurting interactivity, are also a pain to resolve
in something like PREEMPT_RT. (unlike say a spinlock,
preempt_disable() is quite opaque in what data structure it protects,
etc., making it hard to convert it to a preemptible primitive) ]

> The one suggested in your patch and paper are way overkill, there is
> no reason to solve a TCP specific problem inside of the generic
> scheduler.

agreed.

What we could also add is a /reverse/ mechanism to the scheduler: a task
could query whether it has just a small amount of time left in its
timeslice, and could in that case voluntarily drop its current lock and
yield, and thus give up its current timeslice and wait for a new, full
timeslice, instead of being forcibly preempted due to lack of timeslices
with a possibly critical lock still held.

But the suggested solution here, to "prolong the running of this task
just a little bit longer" only starts a perpetual arms race between
users of such a facility and other kernel subsystems. (besides not being
adequate anyway, there can always be /so/ long lock-hold times that the
scheduler would have no other option but to preempt the task)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Wenji Wu <wenji(a)fnal.gov> wrote:

> > That yield() will need to be removed - yield()'s behaviour is truly
> > awfulif the system is otherwise busy. What is it there for?
>
> Please read the uploaded paper, which has detailed description.

do you have any URL for that?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/