- Potential performance bottleneck for Linxu TCP [Kernel]

Prev: Add IDE mode support for SB600 SATA
Next: 2.6 driver for Silan SC92031 (second try)

From: Martin Bligh on 29 Nov 2006 18:40

Wenji Wu wrote:
> From: Wenji Wu <wenji(a)fnal.gov>
>
> Greetings,
>
> For Linux TCP, when the network applcaiton make system call to move data
> from
> socket's receive buffer to user space by calling tcp_recvmsg(). The socket
> will
> be locked. During the period, all the incoming packet for the TCP socket
> will go
> to the backlog queue without being TCP processed. Since Linux 2.6 can be
> inerrupted mid-task, if the network application expires, and moved to the
> expired array with the socket locked, all the packets within the backlog
> queue
> will not be TCP processed till the network applicaton resume its execution.
> If
> the system is heavily loaded, TCP can easily RTO in the Sender Side.

So how much difference did this patch actually make, and to what
benchmark?

> The patch is for Linux kernel 2.6.14 Deskop and Low-latency Desktop

The patch oesn't seem to be attached? Also, would be better to make
it for the latest kernel version (2.6.19) ... 2.6.14 is rather old ;-)

M
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Miller on 29 Nov 2006 20:00

Please, it is very difficult to review your work the way you have
submitted this patch as a set of 4 patches. These patches have not
been split up "logically", but rather they have been split up "per
file" with the same exact changelog message in each patch posting.
This is very clumsy, and impossible to review, and wastes a lot of
mailing list bandwith.

We have an excellent file, called Documentation/SubmittingPatches, in
the kernel source tree, which explains exactly how to do this
correctly.

By splitting your patch into 4 patches, one for each file touched,
it is impossible to review your patch as a logical whole.

Please also provide your patch inline so people can just hit reply
in their mail reader client to quote your patch and comment on it.
This is impossible with the attachments you've used.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 29 Nov 2006 20:10

On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
David Miller <davem(a)davemloft.net> wrote:

>
> Please, it is very difficult to review your work the way you have
> submitted this patch as a set of 4 patches. These patches have not
> been split up "logically", but rather they have been split up "per
> file" with the same exact changelog message in each patch posting.
> This is very clumsy, and impossible to review, and wastes a lot of
> mailing list bandwith.
>
> We have an excellent file, called Documentation/SubmittingPatches, in
> the kernel source tree, which explains exactly how to do this
> correctly.
>
> By splitting your patch into 4 patches, one for each file touched,
> it is impossible to review your patch as a logical whole.
>
> Please also provide your patch inline so people can just hit reply
> in their mail reader client to quote your patch and comment on it.
> This is impossible with the attachments you've used.
>

Here you go - joined up, cleaned up, ported to mainline and test-compiled.

That yield() will need to be removed - yield()'s behaviour is truly awful
if the system is otherwise busy. What is it there for?

From: Wenji Wu <wenji(a)fnal.gov>

For Linux TCP, when the network applcaiton make system call to move data from
socket's receive buffer to user space by calling tcp_recvmsg(). The socket
will be locked. During this period, all the incoming packet for the TCP
socket will go to the backlog queue without being TCP processed

Since Linux 2.6 can be inerrupted mid-task, if the network application
expires, and moved to the expired array with the socket locked, all the
packets within the backlog queue will not be TCP processed till the network
applicaton resume its execution. If the system is heavily loaded, TCP can
easily RTO in the Sender Side.

include/linux/sched.h | 2 ++
kernel/fork.c | 3 +++
kernel/sched.c | 24 ++++++++++++++++++------
net/ipv4/tcp.c | 9 +++++++++
4 files changed, 32 insertions(+), 6 deletions(-)

diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c
--- a/net/ipv4/tcp.c~tcp-speedup
+++ a/net/ipv4/tcp.c
@@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru
struct task_struct *user_recv = NULL;
int copied_early = 0;

+ current->backlog_flag = 1;
+
lock_sock(sk);

TCP_CHECK_TIMER(sk);
@@ -1468,6 +1470,13 @@ skip_copy:

TCP_CHECK_TIMER(sk);
release_sock(sk);
+
+ current->backlog_flag = 0;
+ if (current->extrarun_flag == 1){
+ current->extrarun_flag = 0;
+ yield();
+ }
+
return copied;

out:
diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h
--- a/include/linux/sched.h~tcp-speedup
+++ a/include/linux/sched.h
@@ -1023,6 +1023,8 @@ struct task_struct {
#ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info *delays;
#endif
+ int backlog_flag; /* packets wait in tcp backlog queue flag */
+ int extrarun_flag; /* extra run flag for TCP performance */
};

static inline pid_t process_group(struct task_struct *tsk)
diff -puN kernel/sched.c~tcp-speedup kernel/sched.c
--- a/kernel/sched.c~tcp-speedup
+++ a/kernel/sched.c
@@ -3099,12 +3099,24 @@ void scheduler_tick(void)

if (!rq->expired_timestamp)
rq->expired_timestamp = jiffies;
- if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
- enqueue_task(p, rq->expired);
- if (p->static_prio < rq->best_expired_prio)
- rq->best_expired_prio = p->static_prio;
- } else
- enqueue_task(p, rq->active);
+ if (p->backlog_flag == 0) {
+ if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
+ enqueue_task(p, rq->expired);
+ if (p->static_prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->static_prio;
+ } else
+ enqueue_task(p, rq->active);
+ } else {
+ if (expired_starving(rq)) {
+ enqueue_task(p,rq->expired);
+ if (p->static_prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->static_prio;
+ } else {
+ if (!TASK_INTERACTIVE(p))
+ p->extrarun_flag = 1;
+ enqueue_task(p,rq->active);
+ }
+ }
} else {
/*
* Prevent a too long timeslice allowing a task to monopolize
diff -puN kernel/fork.c~tcp-speedup kernel/fork.c
--- a/kernel/fork.c~tcp-speedup
+++ a/kernel/fork.c
@@ -1032,6 +1032,9 @@ static struct task_struct *copy_process(
clear_tsk_thread_flag(p, TIF_SIGPENDING);
init_sigpending(&p->pending);

+ p->backlog_flag = 0;
+ p->extrarun_flag = 0;
+
p->utime = cputime_zero;
p->stime = cputime_zero;
p->sched_time = 0;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Miller on 29 Nov 2006 20:20

From: Andrew Morton <akpm(a)osdl.org>
Date: Wed, 29 Nov 2006 17:08:35 -0800

> On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> David Miller <davem(a)davemloft.net> wrote:
>
> >
> > Please, it is very difficult to review your work the way you have
> > submitted this patch as a set of 4 patches. These patches have not
> > been split up "logically", but rather they have been split up "per
> > file" with the same exact changelog message in each patch posting.
> > This is very clumsy, and impossible to review, and wastes a lot of
> > mailing list bandwith.
> >
> > We have an excellent file, called Documentation/SubmittingPatches, in
> > the kernel source tree, which explains exactly how to do this
> > correctly.
> >
> > By splitting your patch into 4 patches, one for each file touched,
> > it is impossible to review your patch as a logical whole.
> >
> > Please also provide your patch inline so people can just hit reply
> > in their mail reader client to quote your patch and comment on it.
> > This is impossible with the attachments you've used.
> >
>
> Here you go - joined up, cleaned up, ported to mainline and test-compiled.
>
> That yield() will need to be removed - yield()'s behaviour is truly awful
> if the system is otherwise busy. What is it there for?

What about simply turning off CONFIG_PREEMPT to fix this "problem"?

We always properly run the backlog (by doing a release_sock()) before
going to sleep otherwise except for the specific case of taking a page
fault during the copy to userspace. It is only CONFIG_PREEMPT that
can cause this situation to occur in other circumstances as far as I
can see.

We could also pepper tcp_recvmsg() with some very carefully placed
preemption disable/enable calls to deal with this even with
CONFIG_PREEMPT enabled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wenji Wu on 29 Nov 2006 21:00

Yes, when CONFIG_PREEMPT is disabled, the "problem" won't happen. That is why I put "for 2.6 desktop, low-latency desktop" in the uploaded paper. This "problem" happens in the 2.6 Desktop and Low-latency Desktop.

>We could also pepper tcp_recvmsg() with some very carefully placed preemption disable/enable calls to deal with this even with CONFIG_PREEMPT enabled.

I also think about this approach. But since the "problem" happens in the 2.6 Desktop and Low-latency Desktop (not server), system responsiveness is a key feature, simply placing preemption disabled/enable call might not work. If you want to place preemption disable/enable calls within tcp_recvmsg, you have to put them in the very beginning and end of the call. Disabling preemption would degrade system responsiveness.

wenji

----- Original Message -----
From: David Miller <davem(a)davemloft.net>
Date: Wednesday, November 29, 2006 7:13 pm
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

> From: Andrew Morton <akpm(a)osdl.org>
> Date: Wed, 29 Nov 2006 17:08:35 -0800
>
> > On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> > David Miller <davem(a)davemloft.net> wrote:
> >
> > >
> > > Please, it is very difficult to review your work the way you have
> > > submitted this patch as a set of 4 patches. These patches have
> not> > been split up "logically", but rather they have been split
> up "per
> > > file" with the same exact changelog message in each patch posting.
> > > This is very clumsy, and impossible to review, and wastes a lot of
> > > mailing list bandwith.
> > >
> > > We have an excellent file, called
> Documentation/SubmittingPatches, in
> > > the kernel source tree, which explains exactly how to do this
> > > correctly.
> > >
> > > By splitting your patch into 4 patches, one for each file touched,
> > > it is impossible to review your patch as a logical whole.
> > >
> > > Please also provide your patch inline so people can just hit reply
> > > in their mail reader client to quote your patch and comment on it.
> > > This is impossible with the attachments you've used.
> > >
> >
> > Here you go - joined up, cleaned up, ported to mainline and test-
> compiled.>
> > That yield() will need to be removed - yield()'s behaviour is
> truly awful
> > if the system is otherwise busy. What is it there for?
>
> What about simply turning off CONFIG_PREEMPT to fix this "problem"?
>
> We always properly run the backlog (by doing a release_sock()) before
> going to sleep otherwise except for the specific case of taking a page
> fault during the copy to userspace. It is only CONFIG_PREEMPT that
> can cause this situation to occur in other circumstances as far as I
> can see.
>
> We could also pepper tcp_recvmsg() with some very carefully placed
> preemption disable/enable calls to deal with this even with
> CONFIG_PREEMPT enabled.
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Add IDE mode support for SB600 SATA
Next: 2.6 driver for Silan SC92031 (second try)