From: T on
Greetings:

We have two Redhat linux boxes, one is a 5.2 system with GFS1 for a
file system. The other is a 5.4 with EXT3 for a file system. I've
setup the 5.2 system as a rsync server ( that's where the
source data is) and it has between 700 to 800 Gbytes of data, and
hundreds of thousands of files. I'm using the following command line
from the client:

time rsync --inplace --progress --stats -aPh rsync://root(a)src_server/accurev_d1/mnt/accurev_d1

I should also note I'm using rsync 3.0.7:

# rsync --version
rsync version 3.0.7 protocol version 30

It takes more than 8 hours to sync this file system, but only 7G of
data is transferred. Here's the stats:

Number of files: 3349997
Number of files transferred: 32815
Total file size: 760.00G bytes
Total transferred file size: 61.58G bytes
Literal data: 6.96G bytes
Matched data: 54.74G bytes
File list size: 56.84M
File list generation time: 0.233 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 7.43M
Total bytes received: 7.02G

sent 7.43M bytes received 7.02G bytes 222.83K bytes/sec
total size is 760.00G speedup is 108.16
rsync warning: some files vanished before they could be transferred
(code 24) at main.c(1508) [generator=3.0.7]

real 525m33.328s
user 8m25.662s
sys 1m38.190s

The question: Is there any way to speed this up? Am I doing something
wrong? Have I some how mis-configured the server? Is this what should
be expected?

Thanks for any help in advanced.

Tom
From: Kevin Collins on
On 2010-02-12, T <g4173c(a)motorola.com> wrote:
> Greetings:
>
> We have two Redhat linux boxes, one is a 5.2 system with GFS1 for a
> file system. The other is a 5.4 with EXT3 for a file system. I've
> setup the 5.2 system as a rsync server ( that's where the
> source data is) and it has between 700 to 800 Gbytes of data, and
> hundreds of thousands of files. I'm using the following command line
> from the client:
>
> time rsync --inplace --progress --stats -aPh rsync://root(a)src_server/accurev_d1/mnt/accurev_d1
>
> I should also note I'm using rsync 3.0.7:
>
> # rsync --version
> rsync version 3.0.7 protocol version 30
>
> It takes more than 8 hours to sync this file system, but only 7G of
> data is transferred. Here's the stats:
>
> Number of files: 3349997
> Number of files transferred: 32815
> Total file size: 760.00G bytes
> Total transferred file size: 61.58G bytes
> Literal data: 6.96G bytes
> Matched data: 54.74G bytes
> File list size: 56.84M
> File list generation time: 0.233 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 7.43M
> Total bytes received: 7.02G
> sent 7.43M bytes received 7.02G bytes 222.83K bytes/sec
> total size is 760.00G speedup is 108.16
> rsync warning: some files vanished before they could be transferred
> (code 24) at main.c(1508) [generator=3.0.7]
>
> real 525m33.328s
> user 8m25.662s
> sys 1m38.190s
>
> The question: Is there any way to speed this up? Am I doing something
> wrong? Have I some how mis-configured the server? Is this what should
> be expected?

The rsync will need to (at minimum) do a stat() of each file - this will
require GFS locking to happen for each one, and you have 3.3 million!

My (limited) experience plus research I have done shows that can be really
slow. GFS takes out a lock on each file (even for stat). How many nodes in the
GFS cluster? There is communication and lock checking done for EACH node before
the lock is granted.

Additionally, the way the rsync protocol works (a kind of block-based
check-summing), I would think it could be quite slow to rsync large files from
GFS...

Hope this helps.

Kevin