Raise initial congestion window size / speedup slow start? [Kernel]

Prev: [PATCH] wm8727: add a missing return in wm8727_platform_probe
Next: [PATCH] block: Add secure discard

From: Ed W on 14 Jul 2010 19:00

>> Although section 3 of RFC 5681 is a great text, it does not say at all
>> that increasing the initial CWND would lead to fairness issues.
>>
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>

So lets define the problem more succinctly:
- New TCP connections are assumed to have no knowledge of current
network conditions (bah)
- We desire the connection to consume the maximum amount of bandwidth
possible, but staying ever so fractionally under the maximum link bandwidth

> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.
>

Sounds like smarter people than I have played this game, but just to
chuck out one idea: How about attacking the idea that we have no
knowledge of network conditions? After all we have a bunch of
information about:

1) very good information about the size of the link to the first hop (eg
the modem/network card reported rate)
2) often a reasonably good idea about the bandwidth to the first
"restrictive" router along our default path (ie usually the situation is
there is a pool of high speed network locally, then a more limited
connectivity between our network and other networks. We can look at the
maximum flows through our network device to outside our subnet and infer
an approximate link speed from that)
3) often moderate quality information about the size of the link between
us and a specific destination IP

So here goes: the heuristic could be to examine current flows through
our interface, use this to offer hints to the remote end during SYN
handshake as to a recommended starting size, and additionally the client
side can examine the implied RTT of the SYN/ACK to further fine tune the
initial cwnd?

In practice this could be implemented in other ways such as examining
recent TCP congestion windows and using some heuristic to start "near"
those. Or remembering congestion windows recently used for popular
destinations? Also we can benefit the receiver of our data - if we see
some app open up 16 http connections to some poor server then some of
those connections will NOT be given large initial cwnd.

Essentially perhaps we can refine our initial cwnd heuristic somewhat if
we assume better than zero knowledge about the network link?

Out of curiousity, why has it taken so long for active feedback to
appear? If every router simply added a hint to the packet as to the max
bandwidth it can offer then we would appear to be able to make massively
better decisions on window sizes. Furthermore routers have the ability
to put backpressure on classes of traffic as appropriate. I guess the
speed at which ECN has been adopted answers the question of why nothing
more exotic has appeared?

>> But for all we know this side discussion about initial CWND settings
>> could have nothing to do with the issue being reported at the start of
>> this thread. :-)
>>

Actually the original question was mine and it was literally - can I
adjust the initial cwnd for users of my very specific satellite network
which has a high RTT. I believe Stephen Hemminger has been kind enough
to recently add the facility to experiment with this to the ip utility
and so I am now in a position to go do some testing - thanks Stephen

Cheers

Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ed W on 14 Jul 2010 19:10

> Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
> People at google stated that a CWND of 10 seems to be fair in their
> measurements. 10 because the test setup was equipped with a reasonable large
> link capacity? Do they analyse their modification in environments with a small
> BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
> happens if TCPM adopts this.
>

Well, I personally would shoot for starting from the position of
assuming better than zero knowledge about our link and incorporating
that into the initial cwnd estimate...

We know something about the RTT from the syn/ack times, speed of the
local link and quickly we will learn about median window sizes to other
destinations, plus additionally the kernel has some knowledge of other
connections currently in progress. With all that information perhaps we
can make a more informed option than just a hard coded magic number? (Oh
and lets make the option pluggable so that we can soon have 10 different
kernel options...)

Seems like there is evidence that networks are starting to cluster into groups that would benefit from a range of cwnd options (higher/lower) - perhaps there is some way to choose a reasonable heuristic to cluster these and choose a better starting option?

Cheers

Ed W

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ed W on 14 Jul 2010 19:10

On 15/07/2010 00:01, Hagen Paul Pfeifer wrote:
> It is quite late here so I will quickly write two sentence about ECN: one
> month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
> really sure if it was google) analysed the employment of ECN - the usage was
> really low. Search the PDF, it is quite interesting one.
>

I would speculate that this is because there is a big warning on ECN
saying that it may cause you to loose customers who can't connect to
you... Businesses are driven by needing to support the most common case,
not the most optimal (witness the pain of html development and needing
to consider IE6...)

What would be more useful is for google to survey how many devices are
unable to interoperate with ECN and if that number turned out to be
extremely low, and this fact were advertised, then I suspect we might
see a mass increase in it's deployment? I know I have it turned off on
all my servers because I worry more about loosing one customer than
improving the experience for all customers...

Cheers

Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Hagen Paul Pfeifer on 14 Jul 2010 19:10

* Ed W | 2010-07-14 23:52:02 [+0100]:

>Out of curiousity, why has it taken so long for active feedback to
>appear? If every router simply added a hint to the packet as to the
>max bandwidth it can offer then we would appear to be able to make
>massively better decisions on window sizes. Furthermore routers have
>the ability to put backpressure on classes of traffic as appropriate.
>I guess the speed at which ECN has been adopted answers the question
>of why nothing more exotic has appeared?

It is quite late here so I will quickly write two sentence about ECN: one
month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
really sure if it was google) analysed the employment of ECN - the usage was
really low. Search the PDF, it is quite interesting one.

Hagen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Bill Fink on 14 Jul 2010 23:00

On Wed, 14 Jul 2010, David Miller wrote:

> From: Bill Davidsen <davidsen(a)tmr.com>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
> > You may have to go into /proc/sys/net/core and crank up the
> > rmem_* settings, depending on your distribution.
>
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup. If you do, it's a
> bug, report it so we can fix it.
>
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
>
> For one thing, TCP dynamically adjusts the socket buffer sizes based
> upon the behavior of traffic on the connection.
>
> And the TCP memory limit sysctls (not the core socket ones) are sized
> based upon available memory. They are there to protect you from
> situations such as having so much memory dedicated to socket buffers
> that there is none left to do other things effectively. It's a
> protective limit, rather than a setting meant to increase or improve
> performance. So like the others, leave these alone too.

What's normal? :-)

netem1% cat /proc/version
Linux version 2.6.30.10-105.2.23.fc11.x86_64 (mockbuild(a)x86-01.phx2.fedoraproject.org) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP Thu Feb 11 07:06:34 UTC 2010

Linux TCP autotuning across an 80 ms RTT cross country network path:

netem1% nuttcp -T10 -i1 192.168.1.18
14.1875 MB / 1.00 sec = 119.0115 Mbps 0 retrans
558.0000 MB / 1.00 sec = 4680.7169 Mbps 0 retrans
872.8750 MB / 1.00 sec = 7322.3527 Mbps 0 retrans
869.6875 MB / 1.00 sec = 7295.5478 Mbps 0 retrans
858.4375 MB / 1.00 sec = 7201.0165 Mbps 0 retrans
857.3750 MB / 1.00 sec = 7192.2116 Mbps 0 retrans
865.5625 MB / 1.00 sec = 7260.7193 Mbps 0 retrans
872.3750 MB / 1.00 sec = 7318.2095 Mbps 0 retrans
862.7500 MB / 1.00 sec = 7237.2571 Mbps 0 retrans
857.6250 MB / 1.00 sec = 7194.1864 Mbps 0 retrans

7504.2771 MB / 10.09 sec = 6236.5068 Mbps 11 %TX 25 %RX 0 retrans 80.59 msRTT

Manually specified 100 MB TCP socket buffer on the same path:

netem1% nuttcp -T10 -i1 -w100m 192.168.1.18
106.8125 MB / 1.00 sec = 895.9598 Mbps 0 retrans
1092.0625 MB / 1.00 sec = 9160.3254 Mbps 0 retrans
1111.2500 MB / 1.00 sec = 9322.6424 Mbps 0 retrans
1115.4375 MB / 1.00 sec = 9356.2569 Mbps 0 retrans
1116.4375 MB / 1.00 sec = 9365.6937 Mbps 0 retrans
1115.3125 MB / 1.00 sec = 9356.2749 Mbps 0 retrans
1121.2500 MB / 1.00 sec = 9405.6233 Mbps 0 retrans
1125.5625 MB / 1.00 sec = 9441.6949 Mbps 0 retrans
1130.0000 MB / 1.00 sec = 9478.7479 Mbps 0 retrans
1139.0625 MB / 1.00 sec = 9555.8559 Mbps 0 retrans

10258.5120 MB / 10.20 sec = 8440.3558 Mbps 15 %TX 40 %RX 0 retrans 80.59 msRTT

The manually selected TCP socket buffer size both ramps up
quicker and achieves a much higher steady state rate.

-Bill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: [PATCH] wm8727: add a missing return in wm8727_platform_probe
Next: [PATCH] block: Add secure discard