From: Rod Speed on
Cronos wrote
> Rod Speed wrote
>> Cronos wrote:
>>> Ed Light wrote:

>>>> But a very fragmented large file is like a bunch of little files spread all over the place, isn't it?

>>> Yes.

>> Nope, its completely different. A lot of small files needs access to the directory information with ever file, a
>> large fragmented file does not.

> Yea, so?

So there is a lot more to do than when moving between fragments of a single large file.

> The read head of the HDD still has to move across the platter(s) to read one file if it is fragmented is what he
> meant.

But the point is that there are EXTRA head movements with
a lot of small files to get the directory information, cretin.

> God, you fucks are anal.

You never ever could bullshit and lie your way out of a wet paper bag.


From: David Brown on
Arno wrote:
> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>> Cronos wrote:
>>> David Brown wrote:
>>>> (Note that I am discrediting your arguments
>>>> here, not you personally.)
>>> You are not even doing that because you are now arguing with industry
>>> experts and not me.
>
>> I am not arguing with any "industry experts". I am merely pointing out
>> errors in quotations you have copied from some unknown source.
>
>> Even if the source of these comments is in fact someone with experience
>> and a job setting up or maintaining large databases, this does not
>> qualify him as an "industry expert".
>
> It does, but database file accesses are different. They try to

Having a job - even if it is in a mission-critical position of a large
company - does not make you an expert.

And even if it turns out that the guy quoted is actually right in that
he has a database server that fails because of file fragmentation, it is
of no significance whatsoever in the context of "ordinary" computing.

> keep everything in RAM and actually get down to a very small
> number of non-data accesses. Filesystem accesses are more
> like 1...2 extra seeks per file open in the best case.
>

Correct - you aim to keep anything important in ram (either in the
database server's application memory, or in an OS cache). For serious
databases (or any other serious server application), disk is for the
bulk data that can be accessed slowly.

>> And even if he /is/ an industry
>> expert, that does not mean he is right in every point connected to his
>> job. Even experts get things wrong. And for any given opinion, you can
>> easily find a dozen "experts" with different views.
>
> Indeed. See above.
>
>> In summary, to make a database server fail due to fragmentation you
>> would have to try exceedingly hard to overload and misconfigure the
>> system, even if you use a badly designed database server on a badly
>> designed operating system, and badly fragment the disk. Maybe it takes
>> an "industry expert" to be able to achieve this.
>
> Actually, if you want high performance databases, you take out the
> filesystem and let the database system handle the storage directly.
> That way you do not get any filesystem fragmentation, because there
> actually is no filesystem anymore in the strict sense.
>

Yes, I've said as much in a few other posts.

> Anyways, examples from database systems are unsuitable to discuss
> filesystem fragmentation.
>

I agree entirely - it's a very different sort of usage than typical
desktop usage.

> And, yes, putting a large database on an ordinary filesystem
> can slow it down to a crawl.
>

It will certainly make a big difference unless you know what you are
doing, and have an appropriate database server, OS, file system, file
system options, and disk setup. Systems like Oracle prefer direct
access to dedicated partitions (or disks, or disk arrays) with no
filesystem. But you can eliminate most of the file system overhead by
using a dedicated partition with careful setup (such as noatime).

> Arno
>
From: David Brown on
Arno wrote:
> Cronos <cronos(a)sphere.invalid> wrote:
>> David Brown wrote:
>
>>> A very fragmented large file is like a single large file, it's just that
>>> its contents are on different parts of the disk.
>
>> Well, duh, that is what he meant and what I meant too. Of course it is
>> not a file split into many smaller files but it may as well be because
>> the result is the same. Now you are arguing *semantics* in a poor
>> attempt to discredit me so take a hike.
>
> Not quite: With many smaller files there is at likely one additional
> disk access for the metadata when a new one is opened, while with
> one large fragmented file, there should be less than one access per
> pragment for metadata, at least with a sane OS.
>

There will typically be /many/ additional disk accesses, not just one.
You may have to look through each part of a file's directory path to
find the file - and each read of a directory is effectively a read of
the directory "file". The directory entry for the file will contain the
inode number (or equivalent, depending on the file system). The inode
then needs to be read from the inode table on the disk. Depending on
the size of the file, you may need to read other indirect nodes and
lists of blocks allocated to the file (modern file systems like xfs and
ext4 use "extents" giving ranges of blocks rather than lists, which
helps here). You may also have to read security information or access
control information from other parts of the disk as well. While much of
this information may already be in file system caches, there is a huge
amount more work involved in opening a file as compared to simply moving
to the next fragment of an already-open file.

> The observable difference in access times may depend on the
> filesystem in question.
>

True. In my very brief testing on ntfs, copying a thousand 10KB files
was about 20 times slower than copying a single 10MB file. On Linux
with an ext4 file system on a software raid1 setup within a virtual
machine on the same windows host, the difference was a factor of about
8. Obviously my sample size is too small to draw any conclusions other
than to agree that the observable difference will vary, but that
accessing many small files is slower than accessing a big file (that is
probably fragmented).


> Arno
From: Arno on
David Brown <david.brown(a)hesbynett.removethisbit.no> wrote:
> Arno wrote:
>> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>>> Cronos wrote:
>>>> David Brown wrote:
>>>>> (Note that I am discrediting your arguments
>>>>> here, not you personally.)
>>>> You are not even doing that because you are now arguing with industry
>>>> experts and not me.
>>
>>> I am not arguing with any "industry experts". I am merely pointing out
>>> errors in quotations you have copied from some unknown source.
>>
>>> Even if the source of these comments is in fact someone with experience
>>> and a job setting up or maintaining large databases, this does not
>>> qualify him as an "industry expert".
>>
>> It does, but database file accesses are different. They try to

> Having a job - even if it is in a mission-critical position of a large
> company - does not make you an expert.

Indeed.

> And even if it turns out that the guy quoted is actually right in that
> he has a database server that fails because of file fragmentation, it is
> of no significance whatsoever in the context of "ordinary" computing.

In fact I would say anybody getting database performance
problems because of file fragmentation does not know too
much about databases.

>> keep everything in RAM and actually get down to a very small
>> number of non-data accesses. Filesystem accesses are more
>> like 1...2 extra seeks per file open in the best case.
>>

> Correct - you aim to keep anything important in ram (either in the
> database server's application memory, or in an OS cache). For serious
> databases (or any other serious server application), disk is for the
> bulk data that can be accessed slowly.

>>> And even if he /is/ an industry
>>> expert, that does not mean he is right in every point connected to his
>>> job. Even experts get things wrong. And for any given opinion, you can
>>> easily find a dozen "experts" with different views.
>>
>> Indeed. See above.
>>
>>> In summary, to make a database server fail due to fragmentation you
>>> would have to try exceedingly hard to overload and misconfigure the
>>> system, even if you use a badly designed database server on a badly
>>> designed operating system, and badly fragment the disk. Maybe it takes
>>> an "industry expert" to be able to achieve this.
>>
>> Actually, if you want high performance databases, you take out the
>> filesystem and let the database system handle the storage directly.
>> That way you do not get any filesystem fragmentation, because there
>> actually is no filesystem anymore in the strict sense.
>>

> Yes, I've said as much in a few other posts.

>> Anyways, examples from database systems are unsuitable to discuss
>> filesystem fragmentation.
>>

> I agree entirely - it's a very different sort of usage than typical
> desktop usage.

>> And, yes, putting a large database on an ordinary filesystem
>> can slow it down to a crawl.
>>

> It will certainly make a big difference unless you know what you are
> doing, and have an appropriate database server, OS, file system, file
> system options, and disk setup. Systems like Oracle prefer direct
> access to dedicated partitions (or disks, or disk arrays) with no
> filesystem. But you can eliminate most of the file system overhead by
> using a dedicated partition with careful setup (such as noatime).

And other things. Agreed.

Arno

--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans
From: Cronos on
David Brown wrote:

> I understand what you are trying to do here, but I am not sure it is the
> best way to do it - and it will leave you stuck in the middle feeling
> confused by conflicting information, and flamed in the cross-fire.

Yes, and I am going to have to test it out myself.