From: Ed W on

> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)
>

My guess is that this result is specific to google and their servers?

I guess we can probably stereotype the world into two pools of devices:

1) Devices in a pool of fast networking, but connected to the rest of
the world through a relatively slow router
2) Devices connected via a high speed network and largely the bottleneck
device is many hops down the line and well away from us

I'm thinking here 1) client users behind broadband routers, wireless,
3G, dialup, etc and 2) public servers that have obviously been
deliberately placed in locations with high levels of interconnectivity.

I think history information could be more useful for clients in category
1) because there is a much higher probability that their most
restrictive device is one hop away and hence affects all connections and
relatively occasionally the bottleneck is multiple hops away. For
devices in category 2) it's much harder because the restriction will
usually be lots of hops away and effectively you are trying to figure
out and cache the speed of every ADSL router out there... For sure you
can probably figure out how to cluster this stuff and say that pool
there is 56K dialup, that pool there is "broadband", that pool is cell
phone, etc, but probably it's hard to do better than that?

So my guess is this is why google have had poor results investigating
cwnd caching?

However, I would suggest that whilst it's of little value for the server
side, it still remains a very interesting idea for the client side and
the cache hit ratio would seem to be dramatically higher here?


I haven't studied the code, but given there is a userspace ability to
change init cwnd through the IP utility, it would seem likely that
relatively little coding would now be required to implement some kind of
limited cwnd caching and experiment with whether this is a valuable
addition? I would have thought if you are only fiddling with devices
behind a broadband router then there is little chance of you "crashing
the internet" with these kind of experiments?

Good luck

Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H.K. Jerry Chu on
On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus(a)ducksong.com> wrote:
> On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>> �except there are indeed bugs in the code today in that the
>> code in various places assumes initcwnd as per RFC3390. So when
>> initcwnd is raised, that actual value may be limited unnecessarily by
>> the initial wmem/sk_sndbuf.
>
> Thanks for the discussion!
>
> can you tell us more about the impl concerns of initcwnd stored on the
> route?

We have found two issues when altering initcwnd through the ip route cmd:
1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
defaulted to a small value of 16KB). This problem has been made obscured
by the TSO code, which fudges the flow control limit (and could be a bug by
itself).

2. the congestion backoff code is supposed to take inflight, rather than cwnd,
but initcwnd presents a special case. I don't fully understand the code yet to
propose a fix.

>
> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)

This is partly due to our load balancer policy resulting in poor cache hit,
partly due to the sheer volumes of remote clients. Some of colleagues
tried to change the host cache to a /24 subnet cache but the result wasn't
that good either (sorry I don't remember all the details.)

>
> article and slides much appreciated and very interetsing. I've long been
> of the opinion that the downsides of being too aggressive once in a
> while aren't all that serious anymore.. as someone else said in a
> non-reservation world you are always trying to predict the future anyhow
> and therefore overflowing a queue is always possible no matter how
> conservative.

Please voice your support to TCPM then :)

Jerry

>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H.K. Jerry Chu on
On Fri, Jul 16, 2010 at 10:41 AM, Ed W <lists(a)wildgooses.com> wrote:
>
>> and while I'm asking for info, can you expand on the conclusion
>> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
>> only read the slides.. maybe the paper has more info?)
>>
>
> My guess is that this result is specific to google and their servers?
>
> I guess we can probably stereotype the world into two pools of devices:
>
> 1) Devices in a pool of fast networking, but connected to the rest of the
> world through a relatively slow router
> 2) Devices connected via a high speed network and largely the bottleneck
> device is many hops down the line and well away from us
>
> I'm thinking here 1) client users behind broadband routers, wireless, 3G,
> dialup, etc and 2) public servers that have obviously been deliberately
> placed in locations with high levels of interconnectivity.
>
> I think history information could be more useful for clients in category 1)
> because there is a much higher probability that their most restrictive
> device is one hop away and hence affects all connections and relatively
> occasionally the bottleneck is multiple hops away. �For devices in category
> 2) it's much harder because the restriction will usually be lots of hops
> away and effectively you are trying to figure out and cache the speed of
> every ADSL router out there... �For sure you can probably figure out how to
> cluster this stuff and say that pool there is 56K dialup, that pool there is
> "broadband", that pool is cell phone, etc, but probably it's hard to do
> better than that?
>
> So my guess is this is why google have had poor results investigating cwnd
> caching?

Actually we have investigated two type of caches, a short-history limited size
internal cache that is subject to some LRU replacement policy hence
much limiting
the cache hit rate, and a long-history external cache, which provides much more
accurate cwnd history per subnet but with high complexity and
deployment headache.

Also we have set out for a much more ambitious goal, to not just speed
up our own
services, but also provide a solution that could benefit the whole web
(see http://code.google.com/speed/index.html). The latter pretty much
precludes a complex
external cache scheme mentioned above.

Jerry

>
> However, I would suggest that whilst it's of little value for the server
> side, it still remains a very interesting idea for the client side and the
> cache hit ratio would seem to be dramatically higher here?
>
>
> I haven't studied the code, but given there is a userspace ability to change
> init cwnd through the IP utility, it would seem likely that relatively
> little coding would now be required to implement some kind of limited cwnd
> caching and experiment with whether this is a valuable addition? �I would
> have thought if you are only fiddling with devices behind a broadband router
> then there is little chance of you "crashing the internet" with these kind
> of experiments?
>
> Good luck
>
> Ed W
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rick Jones on
H.K. Jerry Chu wrote:
> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus(a)ducksong.com> wrote:
>>can you tell us more about the impl concerns of initcwnd stored on the
>>route?
>
>
> We have found two issues when altering initcwnd through the ip route cmd:
> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
> defaulted to a small value of 16KB). This problem has been made obscured
> by the TSO code, which fudges the flow control limit (and could be a bug by
> itself).

I'll ask my Emily Litella question of the day and inquire as to why that would
be unique to altering initcwnd via the route?

The slightly less Emily Litella-esque question is why an appliction with a
desire to know it could send more than 16K at one time wouldn't have either
asked via its install docs to have the minimum tweaked (certainly if one is
already tweaking routes...), or "gone all the way" and made an explicit
setsockopt(SO_SNDBUF) call? We are in a realm of applications for which there
was a proposal to allow them to pick their own initcwnd right? Having them pick
an SO_SNDBUF size would seem to be no more to ask.

rick jones

sendbuf_init = max(tcp_mem,initcwnd)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H.K. Jerry Chu on
Mon, Jul 19, 2010 at 10:08 AM, Rick Jones <rick.jones2(a)hp.com> wrote:
> H.K. Jerry Chu wrote:
>>
>> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus(a)ducksong.com>
>> wrote:
>>>
>>> can you tell us more about the impl concerns of initcwnd stored on the
>>> route?
>>
>>
>> We have found two issues when altering initcwnd through the ip route cmd:
>> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
>> defaulted to a small value of 16KB). This problem has been made obscured
>> by the TSO code, which fudges the flow control limit (and could be a bug
>> by
>> itself).
>
> I'll ask my Emily Litella question of the day and inquire as to why that
> would be unique to altering initcwnd via the route?
>
> The slightly less Emily Litella-esque question is why an appliction with a
> desire to know it could send more than 16K at one time wouldn't have either
> asked via its install docs to have the minimum tweaked (certainly if one is
> already tweaking routes...), or "gone all the way" and made an explicit
> setsockopt(SO_SNDBUF) call? �We are in a realm of applications for which
> there was a proposal to allow them to pick their own initcwnd right? �Having

Per app setting of initcwnd is just one case. Another is per route setting of
initcwnd basis through the ip route cmd. For the latter the initcwnd change is
more or less supposed to be transparent to apps.

This wasn't a big issue and can probably be easily fixed by
initializing sk_sndbuf
to max(tcp_wmem[1], initcwnd) as you alluded to below. It is just our
experiements got hindered by this little bug but we weren't aware of it sooner
due to TSO fudging sndbuf.

Jerry

> them pick an SO_SNDBUF size would seem to be no more to ask.
>
> rick jones
>
> sendbuf_init = max(tcp_mem,initcwnd)?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/