From: Bill Todd on
Terje Mathisen wrote:

> So what are the features that good file system should have, besides
> never silently dropping updates,

Unless, of course, the user chooses this possibility for performance
reasons (e.g., by allowing it to employ write-back caching).

> and never allowing an inconsistent state?

To be precise, never allowing a *visibly* inconsistent state:
journal-protected file systems enter inconsistent states all the time,
and may even be caught in one by a crash - they just repair them before
anyone can notice, as fsck & friends could do if they could run fast enough.

>
> a) Extent-based block addressing, ideally with a background task which
> cleans up fragmented files.

Anton is right: that's an implementation detail, not a feature per se.

>
> b) inode-like separation between filenames and storage allocation.

That is also an implementation detail, even if hard links must be
supported. If the underlying storage is sufficiently robust its impact
on corruption-survivability lessens a lot, and (as NTFS found out)
keeping at least *some* per-file metadata directory-resident can be a
performance win.

>
> c) Room in the directory structure to skip the inode completely for
> single-linked files with a small (1-3?) number of extents.

Another implementation detail which can become dangerous if the
underlying storage is *not* sufficiently robust to protect directory
access paths.

>
> Any others?

A good file system should be reliable (uncompromising in its data
integrity: what comes out should always be precisely what went in),
available (robust in the face of hardware failure, even in single-disk
environments when possible), securely sharable (across both processes
and network nodes), fast, efficient in its use of resources (across the
full range of file sizes from zero bytes on up), incrementally scalable
(or shrinkable) in size (and performance) from MB to EB, inexpensive to
purchase and use, common (and interoperable) across all operating
systems of interest, and simple to use and manage (this last including
management of whatever trade-offs may be necessary among these features).

Some might suggest including additional features that may be difficult
to incorporate at higher (application) levels with comparable efficiency
and/or standardization, such as audit trails (from snapshots to
'continuous data protection'), transactional semantics across multiple
application operations, and record-oriented extensions - though not to
the point where the file system starts looking like a full-fledged
database (since that tends to compromise speed, efficiency, and
simplicity of use).

That's a start, anyway: I'd be interested to hear what others think
I've missed.

- bill
From: Anton Ertl on
Terje Mathisen <terje.mathisen(a)hda.hydro.com> writes:
>So what are the features that good file system should have

You might be interested in the 2006 Linux File Systems Workshop.
There's a summary of the workshop at
<http://lwn.net/Articles/190222/>.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
From: Tarjei T. Jensen on

Terje Mathisen wrote:
> So what are the features that good file system should have, besides never
> silently dropping updates, and never allowing an inconsistent state?
>
> a) Extent-based block addressing, ideally with a background task which
> cleans up fragmented files.
>
> b) inode-like separation between filenames and storage allocation.
>
> c) Room in the directory structure to skip the inode completely for
> single-linked files with a small (1-3?) number of extents.
>
> Any others?

I have some wishes:

Arbitrary length file names and files. File names should not be case
sensitive.

Support for access control lists, etc. Space for metadata.

An ability to tell the underlying hardware the name of the file system if
such exists. That eases administration of the system.

It should have a management API which allows software to monitor it.

Ideally the file system should be able to defragment itself.

The file system should be able to migrate from one device to another. Or
that may be the job of the volume manger. I don't care; I want the feature.

The file system should allow for transparent compression of individual
files. There should be an API for reading compressed files without
uncompressing. E.g. for backup. The file system should support copying of
files and compressing the content when in transit. This is particularly
useful for copying files over a network. Personally I don't care about
encrypted files.

The file system or volumen manager should support mirroring. This should be
based on blocks. Technology should be able to work across an IP network.

greetings,



From: Bill Todd on
Tarjei T. Jensen wrote:

....

> Arbitrary length file names

A length of much more than, say, 64 KB could start to become an
implementation challenge in any reasonable directory approach that I can
think of: would such a limit satisfy you, and if not, why not?

and files. File names should not be case
> sensitive.

Out of curiosity, does anyone know of a good reason why file names
should *ever* be case-sensitive (aside from the fact that Unix users and
applications have become used to this)?

>
> Support for access control lists, etc. Space for metadata.

Perhaps I should have included the latter feature in my list: flexible
standardized and/or ad hoc annotation capability is something that I
consider important, but I was trying to keep the list items at a fairly
high level.

....

> The file system should be able to migrate from one device to another. Or
> that may be the job of the volume manger. I don't care; I want the feature.

What user need are you attempting to satisfy with that feature? It
sounds like a work-around for some assumed underlying deficiency in the
storage system.

>
> The file system should allow for transparent compression of individual
> files. There should be an API for reading compressed files without
> uncompressing. E.g. for backup.

And without decrypting as well: good point.

The file system should support copying of
> files

I suspect you mean support an explicit 'copy file' operation (along the
lines of NT's) which will handle any ancillary information that may not
be present in the single main data stream: this is desirable as a user
aid and performance enhancement even for simple files, and especially
for 'decorated' files (whether such decoration is ad hoc or
standardized) and for files with an internal organization that does not
allow efficient application-level copying at all (e.g., B+ trees where
the interior nodes are managed by the system rather than allocated from
the virtual space of a simple byte-stream file) - plus facilitates
copy-on-write sharing by multiple file 'copies' of a single instance if
the system supports that (which I likely should have included in my list
of possible extensions).

and compressing the content when in transit. This is particularly
> useful for copying files over a network.

If you are copying within the same file system, any presence of the
network should be transparent. If you are copying the files somewhere
else, you could use the 'read compressed' API that you described above
if the file was already compressed and the remote end understood that
form of compression (e.g., was another instance of the same kind of file
system); otherwise, I'd suggest that compressing for network
transmission is not the job of the local file system but rather should
be a feature of the underlying network mechanisms that the CopyFile()
file system operation uses.

But it is indeed a gray area as soon as one introduces the idea of a
CopyFile() operation (that clearly needs to include network copying to
be of general use). The recent introduction of 'bundles' ('files' that
are actually more like directories in terms of containing a hierarchical
multitude of parts - considerably richer IIRC than IBM's old
'partitioned data sets') as a means of handling multi-'fork' and/or
attribute-enriched files in a manner that simple file systems can at
least store (though applications then need to understand that form of
storage to handle it effectively) may be applicable here.

Personally I don't care about
> encrypted files.
>
> The file system or volumen manager should support mirroring. This should be
> based on blocks.

Again, that last sounds like an implementation detail aimed at
satisfying some perceived user-level need: what is that need?

> Technology should be able to work across an IP network.

What technology? Mirroring? The latter is internal to the file system,
so perhaps you are stating that distributed aspects of the file system
in general should function over an IP network - possibly implying
something about the degree of such distribution (e.g., WAN vs. LAN)?

Conventional mirroring is inherently synchronous, which causes
significant update-performance impacts as line latencies increase with
distance. Asynchronous replication (e.g., at disaster-tolerant
separations sufficient to satisfy even the truly paranoid - I probably
should have included at least limited disaster-tolerance in my list of
extensions) requires ordering guarantees for the material applied to the
remote site that can become complex when the main site comprises
multiple file system nodes executing concurrently.

- bill
From: Terje Mathisen on
Anton Ertl wrote:
> Terje Mathisen <terje.mathisen(a)hda.hydro.com> writes:
>> So what are the features that good file system should have, besides
>> never silently dropping updates, and never allowing an inconsistent state?
>
> Well, there are different kinds of consistency.
>
> Many file systems people only care for meta-data consistency; as long
> as the fsck passes, everything is fine. Who needs data, anyway?

Ouch!
>
> On the other extreme there is fully synchronous operation of the file
> system (so you don't even lose a second of work in case of a crash),
> but this usually results in too-slow implementations.
>
> I like the one that I call in-order semantics
> <http://www.complang.tuwien.ac.at/papers/czezatke&ertl00/#sect-in-order>:
>
> |The state of the file system after recovery represents all write()s
> |(or other changes) that occurred before a specific point in time, and
> |no write() (or other change) that occurred afterwards. I.e., at most
> |you lose a minute or so of work.
>
> Unfortunately, AFAIK all widely-used file systems provide this
> guarantee only in fully-synchronous mode, if at all.


Isn't this why you use a log? A sequential log file can be updated
quickly, storing enough info to follow your guidelines above?

>> a) Extent-based block addressing, ideally with a background task which
>> cleans up fragmented files.
>>
>> b) inode-like separation between filenames and storage allocation.
>>
>> c) Room in the directory structure to skip the inode completely for
>> single-linked files with a small (1-3?) number of extents.
>
> Features a and c seem to be low-level implementation details to me,
> rather than what I would call features. Feature b is an architectural

(a) is based on the presumption that you want the filesystem to be fast,
i.e. any mechanism which allows equally fast access to any given part of
the file is OK with me. :-)

> implementation choice, but apart from the availability of hard links
> (which is not su useful in my experience) it is not a user-visible
> feature, either.

OK, soft links are OK for me, you just need to make sure that they can
be totally transparent to OS users.

I.e. Win* 'link' files really don't count. :-(

> Fup-To: comp.arch

Oops, I forgot to check this myself. Sorry!

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"