From: Peer-Joachim Koch on
Hi,

I'm currently trying to reduce our backup time. We have to transfer
~250GB/day (50GB-2000GB).
We tried to connect one of the file server using bonding and 2 unused
network interfaces to improve the performance, but we are only seeing
a small improvement (5-10%; averg. is currently 60MB/s).

So I started to play a little bit on our cluster, where we have a couple
of nodes with bonding interfaces. Using netcat I can transfer a file
of 670MB within 5.9 sek, but this is only the theor. value for
1 GBit.

Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast
enough ...).

Is there anything to improve this values substantial ?
Is it principal not possible to jumb over the single link speed ?

Any hints is welcome !!


Bye, Peer
-------
OS: Novell SLES9 SP3, AMD64
Network E1000 (from Sles)
Bounding (RR) <-> Alcatel OS6800 using static linkagg

cluster nodes have 16GB RAM / 8 Cores
basic tuning using sysctl has been done:
net.ipv4.tcp_westwood = 0
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_frto = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_adv_win_scale = 2
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_rmem = 10485760 10485760 10485760
net.ipv4.tcp_wmem = 10485760 10485760 10485760
net.ipv4.tcp_mem = 10485760 10485760 10485760
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_fack = 1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_max_tw_buckets = 180000
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syn_retries = 5
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_sack = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
From: David Schwartz on
On Apr 15, 11:25 pm, Peer-Joachim Koch <pk...(a)bgc-jena.mpg.de> wrote:

> I'm currently trying to reduce our backup time. We have to transfer
> ~250GB/day (50GB-2000GB).
> We tried to connect one of the file server using bonding and 2 unused
> network interfaces to improve the performance, but we are only seeing
> a small improvement (5-10%; averg. is currently 60MB/s).

Most bonding implementations assign a connection to a link, so a
single connection will not run any faster.

> So I started to play a little bit on our cluster, where we have a couple
> of nodes with bonding interfaces. Using netcat I can transfer a file
> of 670MB within 5.9 sek, but this is only the theor. value for
> 1 GBit.

Yep, that's what I would expect.

> Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast
> enough ...).

Since SCP is not multi-threaded, the transfer speed is limited by how
fast one core can do encryption. Last I checked, there was a huge
speed difference between various ciphers, so you might try a few to
see which is fastest.

DS
From: Peer-Joachim Koch on
David Schwartz schrieb:
> On Apr 15, 11:25 pm, Peer-Joachim Koch <pk...(a)bgc-jena.mpg.de> wrote:
>
>> I'm currently trying to reduce our backup time. We have to transfer
>> ~250GB/day (50GB-2000GB).
>> We tried to connect one of the file server using bonding and 2 unused
>> network interfaces to improve the performance, but we are only seeing
>> a small improvement (5-10%; averg. is currently 60MB/s).
>
> Most bonding implementations assign a connection to a link, so a
> single connection will not run any faster.
So there is no way to benefit from a trunk for a single application ?
Only multi threaded or many individual app. can use the higher bandwith.

The main problem is our TSM backup. Therefore I'll have see how I can
set it up to use more individual threads or use lanfree ...

>
>> So I started to play a little bit on our cluster, where we have a couple
>> of nodes with bonding interfaces. Using netcat I can transfer a file
>> of 670MB within 5.9 sek, but this is only the theor. value for
>> 1 GBit.
>
> Yep, that's what I would expect.
>
>> Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast
>> enough ...).
>
> Since SCP is not multi-threaded, the transfer speed is limited by how
> fast one core can do encryption. Last I checked, there was a huge
> speed difference between various ciphers, so you might try a few to
> see which is fastest.
>
> DS

Thanks, Peer
From: David Schwartz on
On Apr 16, 1:07 am, Peer-Joachim Koch <pk...(a)bgc-jena.mpg.de> wrote:

> > Most bonding implementations assign a connection to a link, so a
> > single connection will not run any faster.

> So there is no way to benefit from a trunk for a single application ?

No, just no way for a single TCP connection to benefit. The exact
details, and possible workarounds, depend on the exact trunking
implementation.

> Only multi threaded or many individual app. can use the higher bandwith.

In some implementations, it must be many different destination MAC
addresses. In some it must be multiple destination IP addresses. In
some implementations, it depends whether the interface is bridging or
routing.

There may be a way to configure it to alternate packets out the two
links. This may help with your particular problem, but it will cause
tremendously reduced performance in some other cases (due to large
numbers of out-of-order packets being received).

> The main problem is our TSM backup. Therefore I'll have see how I can
> set it up to use more individual threads or use lanfree ...

See if you can set it up to use more than one TCP connection. If
possible, have it use two different destination addresses (both
assigned to the same machine). I'm not sure what bonding
implementation you are using, but you should definitely check its
documentation.

DS
From: Jurgen Haan on
Peer-Joachim Koch wrote:
> Hi,
>
> I'm currently trying to reduce our backup time. We have to transfer
> ~250GB/day (50GB-2000GB).
> We tried to connect one of the file server using bonding and 2 unused
> network interfaces to improve the performance, but we are only seeing
> a small improvement (5-10%; averg. is currently 60MB/s).
>
> So I started to play a little bit on our cluster, where we have a couple
> of nodes with bonding interfaces. Using netcat I can transfer a file
> of 670MB within 5.9 sek, but this is only the theor. value for
> 1 GBit.
>
> Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast
> enough ...).
>
> Is there anything to improve this values substantial ?
> Is it principal not possible to jumb over the single link speed ?
>
> Any hints is welcome !!
>

Maybe not the answer you're looking for, but we had a similar problem
(apart from the fact that we could not increase the linespeed between
the two locations) and solved it by using two netapp filers. The netapp
filers have the ability to sync volumes (snapmirror) on a very regular
basis on block-level changes. This way, we can keep two storage arrays
(one containing our production database and one backup in a different
location) synced without having to burst every day.