From: Bryan on
Maaartin wrote:
> I re-ran my simple benchmark concentrating on the most-promising
> sizes:
>
> blocksize    seconds
>      4096        284
>      8192        186
>     16384        140
>     32768        135
>     65536        135
>    131072        135
>    262144        135
>    524288        135
>   1048576        243
>
> This clearly shows that any buffer size between 16 and 512 kB is fine,
> and that using larger buffers is a clear non-sense.

In light of your data, I'll amend my previous statement and say the
optimal I/O size is *usually* not worth figuring. I think my range of
4KB to 1MB came out reasonably well, even if your system running Java
was happiest at a factor four from the low end up to a factor of two
from the high end.

> I was wrong, it runs through File.read(byte[] buffer, int off, int
> length) giving length bytes at once. I think we can ignore the
> overhead of Java and state, that the using too small buffer sizes is
> bad because of the OS(*) not being able to handle it as efficiently as
> it should.
>
> The performance for large buffer sizes diminishes because of the L2
> cache misses (it's size is 1MB on my computer, which corresponds with
> the huge speed loss when switching from 0.5MB to 1MB buffers). There
> may be other reasons, too.

I'm not convinced that we can ignore the Java overhead, nor that L2
cache is the issue. On most systems, the initial transfer from disk to
O.S. buffers shouldn't go through the processor's cache at all.
Copying to user-space buffers does, and the Java library may use a
cache of its own.

> It would be nice if somebody could try it using another language/
> computer/OS.

It's kind of tricky test to do right, because the O.S. will try to
optimize it for you by caching buffers. If we read the same file,
previous runs can effect the current run.


--
--Bryan
From: Maaartin on
On Jun 14, 8:52 pm, Bryan <bryanjugglercryptograp...(a)yahoo.com> wrote:
> Maaartin wrote:
> > I re-ran my simple benchmark concentrating on the most-promising
> > sizes:
>
> > blocksize seconds
> > 4096 284
> > 8192 186
> > 16384 140
> > 32768 135
> > 65536 135
> > 131072 135
> > 262144 135
> > 524288 135
> > 1048576 243
>
> > This clearly shows that any buffer size between 16 and 512 kB is fine,
> > and that using larger buffers is a clear non-sense.
>
> In light of your data, I'll amend my previous statement and say the
> optimal I/O size is *usually* not worth figuring. I think my range of
> 4KB to 1MB came out reasonably well, even if your system running Java
> was happiest at a factor four from the low end up to a factor of two
> from the high end.

Right, your range is quite fine. My previous values were fairly
consistent, in the meantime I re-run the test six times with my fake
raid 10 (instead of a single disk) and came to some strange results:

blocksize seconds
4096 299 75 77 77 256 297
8192 189 190 77 187 187 185
16384 141 74 77 76 120 75
32768 106 102 76 115 77 106
65536 83 85 82 92 85 80
131072 85 69 75 69 83 71
262144 72 70 69 70 71 71
524288 68 69 69 70 70 71
1048576 79 69 70 70 72 71
2097152 156 159 159 153 160 152

I see no reason for the great speed differencies, it's my system disk,
but the computer did hardly anything during the tests.

> > The performance for large buffer sizes diminishes because of the L2
> > cache misses (it's size is 1MB on my computer, which corresponds with
> > the huge speed loss when switching from 0.5MB to 1MB buffers). There
> > may be other reasons, too.
>
> I'm not convinced that we can ignore the Java overhead, nor that L2
> cache is the issue.

Currently neither I am. The overhead due to cache misses should be
much lower than one minute for 16 GB. Those 135 seconds for 16 GB
corresponds with 121 MB/s which is actually slightly more then my disk
speed. I'm confused, it seems like some data got cached, although all
useful cache entries should got evicted. I think a main memory eater
between the runs is needed.

> On most systems, the initial transfer from disk to
> O.S. buffers shouldn't go through the processor's cache at all.

Right, I'm using SATA, so it surely makes DMA.

> Copying to user-space buffers does, and the Java library may use a
> cache of its own.
>
> > It would be nice if somebody could try it using another language/
> > computer/OS.
>
> It's kind of tricky test to do right, because the O.S. will try to
> optimize it for you by caching buffers. If we read the same file,
> previous runs can effect the current run.

That's why I used such a large file, my memory is only 8GB with half
of it occupied. Because of Windoze doing everything wrong, I was sure,
all request had to go to the disk, but I'm not sure anymore. Writing a
program eating all main memory is easy, but Windoze chooses to swap
out everything else *forever*. All programs hit by the swap-out will
be cursed until I restart them, it's unbelievable but it's true - just
say WOW (or was it LOL?).