From: Bill Todd on
Eric P. wrote:

....

> I have never built a file system, but it seems to me that the problem
> with file compression is that a write in the middle of the file
> will be recompressed and can cause changes to the files' physical
> block mappings and meta data structures. This in turn updates file
> system block allocation tables and meta transaction logs.

But since most large files aren't updated in the middle (they're
typically only appended to if modified at all after creation), that's
not necessarily a significant problem regardless of the implementation
(in fact, one could simply incrementally uncompress any such awkward
files as they were updated).

>
> With normal non-compressed files this only happens when the file is
> extended. With compressed files every write operation can do this
> and could bog down the whole system by hammering these critical
> common data structures.

That's part of what caching and journaling are for: if any such
structures are truly being hammered they'll just remain in cache while
multiple compact (logical) updates accumulate in the log which will
eventually be written back to disk in a single bulk structure update.
True, that still requires a synchronous log write if the update itself
is synchronous (though if the relevant data had to be written to the log
anyway, any metadata changes just piggyback on the same log write), but
if the system even starts to get bogged down by this then additional
operations accumulate while waiting for the previous log write to
complete and get batched together in the next log write. I.e., the
eventual limit is the sequential bandwidth of the log disks, which at,
say, 40 MB/sec works out to tens of thousands of updates per second
before any serious 'bogging down' occurs (and if that's not enough, you
can stripe the log across multiple disks to increase the bandwidth even
more).

>
> It also serializes all file operations while the meta data is being
> diddled.

Not at all.

However until you read the current data, decompress, update
> new data and recompress, you cannot tell whether the compressed
> buffer will expand or contract, and what mapping changes are needed.
> If the file meta structure is affected it forces all operations
> to serialize unless you want to go for a concurrent b+tree update
> mechanism which is probably an order of magnitude more complicated.

Hey, if you're going to support really large files you'll almost
*always* need *something* like a b+ tree to map them, and you'll need to
allow concurrent operations on it. So there's nothing to be saved here:
just bite the bullet and go for it.

- bill
From: Andrew Reilly on
On Mon, 25 Sep 2006 18:14:39 +0200, Terje Mathisen wrote:

> Andrew Reilly wrote:
>> On Mon, 25 Sep 2006 12:52:48 +0200, Terje Mathisen wrote:
>>
>>> For every N MB of contiguous disk space, use an extra MB to store ECC
>>> info for the current block. The block size needs to be large enough that
>>> a local soft spot which straddles two sectors cannot overwhelm the ECC
>>> coding.
>>
>> Isn't that just the same as having the drive manufacturer use longer
>> reed-solomon (forward error correcting) codes? Errors at that level are
>
> No, because you need _huge_ lengths to avoid the problem where an areas
> of the disk is going bad. This really interferes with random access
> benchmark numbers. :-(

Are you sure? Seems to work OK for CDs. Of course the sizes are vastly
different, and I admit to not having done the analysis to say how well it
scales.

ECC seems to be in the same redundancy-space as RS codes to me.

>> something that can be dialed-in or out by the manufacturer. If it's
>> too high for comfort, they'll start to lose sales, won't they?
>>
>> Alternative approach to ECC sectors: store files in a fountain code
>> pattern?
>
> Pointers?
>
> OK, I found some papers, but they really didn't tell my how/why they
> would be suited to disk sector recovery. :-(

Sorry, I don't know any papers, it's just a concept that I heard about
around the water cooler. It's used (or at least been suggested for use) in
communication systems to solve the same sort of forward error correction
problem that read-solomon codes address, but at lower space cost, and
specifically for the situation of packetized signals. Given the duality
between comms systems and storage systems, it ought to help with the
latter, too, but it certainly would get in the way of random access, which
I'd forgotten about. Big sequential-only files (or at least ones where
you would reasonably expect to want to read the whole thing) might be able
to benefit, though.

--
Andrew

From: prep on
Bill Todd <billtodd(a)metrocast.net> writes:

> Out of curiosity, does anyone know of a good reason why file names
> should *ever* be case-sensitive (aside from the fact that Unix users
> and applications have become used to this)?

None of any worth IMO. But case smashing to provide a case blind name
space takes code, and would not fit into a PDP7/11 address space.

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.
From: Dennis Ritchie on

<prep(a)prep.synonet.com> wrote in message news:87mz8ncwlj.fsf(a)k9.prep.synonet.com...


> None of any worth IMO. But case smashing to provide a case blind name
> space takes code, and would not fit into a PDP7/11 address space.

Nonsense. Keeping the case the user specified was a choice.
Case-squashing would be a very few instructions.

Dennis


From: Stephen Fuld on

"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message
news:sngiu3-iee.ln1(a)osl016lin.hda.hydro.com...
> Bill Todd wrote:
>> Something else that you can't do with a linked-list allocation map like
>> FAT's (unless you build another virtual address map on top of it).
>> Compression (which you mentioned later) is similarly difficult with a
>> file of any real size.
>
> So what are the features that good file system should have, besides never
> silently dropping updates, and never allowing an inconsistent state?

I'll chime in here. Several people have taken a passing shot at metadata,
but I would like to discuss this further. I think a file system needs a
consistant, easily accessable, extensible mechanism for setting, retrieving
and modifying metadata/attributes.

Currently file systems use at least four different methods, frequently
within the same file system!

1. Overloading part of the file name (the extension) to indicate what
program is the default to process this file and perhaps implicitly something
about the file format.

2. Various bits of the directory entry (loosly defined) for such things
as read only status, ownership, time of last update, etc.

3. Extra streams.

4. An entry in another file altogether in who knows what format. This is
used by, for example some backup systems. etc. for telling where the backup
copy is, etc.

These are each accessed by a program using a different mechanism for each
other and have different characteristics in terms of ease of
getting/changing the data, etc.

There should be a single mechanism for creating and reading all such data.
There must be a way for users to be able to define their own new attributes
that are accessed in the same way as the other ones. The metadata should be
backed up with the data so it can be restored in the event of an error.
The mechanisms should be easy enough to use that no one will want to use any
other one. There should be utilities for listing the attributes/metadata
for a file as well as changing it (with appropriate permission).

Once you have the mechanism, we can have a profitable discussion of what
those attributes should be. Note that many of the "wish list" items
mentioned already are perfect things to be stored in this manner.

--
- Stephen Fuld
e-mail address disguised to prevent spam