Trying to design low level hard disk manipulation program [Computer Architecture]

Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?

From: Eric P. on 25 Sep 2006 13:05

Bill Todd wrote:
>
> That's a start, anyway: I'd be interested to hear what others think

- DeleteOnClose file attribute. Also gets deleted during crash recovery.

- Anonymous temporary files: Temporary files do not require a
file name. Anonymous files are automatically marked DeleteOnClose.

- EraseOnDelete to overwrite file contents. This is best done in the
file system, particularly wrt compressed files whose actual physical
footprint is data dependent and therefore hard to overwrite [*].
A simple overwrite won't keep out a national security service,
but could help keep average prying eyes out of financial,
corporate and personal data.

- [*] For disk file system at least, I think the time for compressed
files has passed and are a waste of development time these days.

- Multi-disk volumes are also a feature I think whose time has passed
and would waste development time, but mention them anyway.

- I haven't thought of any real use for sparse files yet, since
databases do their own internal management anyway, so I might
consider also classifying this as a waste development time.

- Automatic extend/contract of MFT/inode space, without limits and
without need to pre allocate space or otherwise manually manage it.

- Built in safe defrag ability, including for directory and mft files.

- SoftLinks (vms logical names) and HardLinks.

- Async operations for Open, Close and directory lookups as well as
Read and Write. Open and Close can require many internal IOs to
accomplish and be very lengthily operations, especially over
networks, and that stalls servers and GUI's.

- Separate the FileSize from EndOfFile attribute.
I always liked VMS's InitialSize, ExtendSize, MaximumSize
file attributes for cutting down fragmentation.

- File placement attributes (outer edge, inner edge, center, etc)

- I have been pondering the idea of FIFO files that have a
FrontOfFile marker as well as an EndOfFile marker.
Efficient for store and forward inter process messaging but
I'm not sure if it would be useful enough to warrant support.

- To copy files between two remote systems, send a message from
A to B telling it to send to C, or to C telling it to pull from B.
Don't read the blocks from B to A and write them out to C.

- KISS: it would be nice to use the same file system for lots of
devices, from hard drives to DVD to memory sticks to holocrystals.
So please don't put a full blown friggen RDB inside the file system
because I'll buy the one I want anyway so this is just a waste.

Thats all for now.
Eric

From: Jonathan Thornburg -- remove -animal to reply on 25 Sep 2006 13:25

Bill Todd wrote
BT> Out of curiosity, does anyone know of a good reason why file names
BT> should *ever* be case-sensitive (aside from the fact that Unix
BT> users and applications have become used to this)?

Someone (who I couldn't identify in the nested quoting) replied:
> Which language do you want to be case-insensitive in? What if two
> users of the same file system disagree on the choice?

Someone else (who I also couldn't identify in the nested quoting) replied:
> That is not a matter of language. Or is there a character encoding that
> says for language A, "X" and "x" are a pair while for language B, "X" and
> "y" are a pair?

In article <pdblu3-9hh.ln1(a)osl016lin.hda.hydro.com> Terje Mathisen then
commented:
> The German 'double-s' is two letters in uppercase and a single letter in
> lowercase.
>
>> Case-blind case-preserving is the only variant which is acceptable from the
>> point of view of ergonomics, IMNSHO.
>
> There I agree. This obeys the principle of least surprise, but as noted
> above, it does still have drawbacks.

In mathematics and physics quantities are *always* case-sensitive.
That is, 'g' and 'G' are *always* distinct. So... suppose my black hole
simulation writes a file /some/where/g.h5 containing HDf5 data for some
quantity described by 'g' in our equations, and then it writes a file
/some/where/G.h5 containing HDF5 data for the completely different
quantity described by 'G' in our equations.

*My* idea of the principle of least surprise (POLS) is that because
those two pathnames are in the same directory, and have filenames which
strcmp(3) deems to be distinct, then the result should be two distinct
files on disk.

Are you saying that your idea of the POLS is that because the two
pathnames are in the same directory, and have filenames which (let us
say, in the current locale) compare as equal according to strcoll(3),
then the 2nd file should overwrite the 1st? Ick.

Things are going to get even ickier if different users (having different
locales in effect) find different sets of files in the directory. Eg
what happens if a backup from a system which allows creation of the
distinct files /some/where/g.h5 and /some/where/G.h5 gets restored on
a system which thinks those are two distinct names for the same file?

The fundamental problem is that different {users,applications} may
have different ideas of how case should be handled... yet need to
use the same {OS, file system code, mounted file system}.

IMHO the only sane solution is for the OS to provide the basic mechanism,
and not try to impose a one-size-fits-all policy on applications. That
is (precisely the current Unix semantics):
* filenames are uninterpreted byte strings apart from '/' and '\0',
so 'g' and 'G' are (of course) distinct
* applications are free to provide case-insensitive semantics if
they deem it suitable

Note also that given a case-sensitive filesystem of this type, it's
easy to provide (any of the different flavors of) case-insensitive
semantics on top of this. In contrast, given any of the usual flavors
of case-insensitive filesystems it's somewhere between "tricky" and
"impossible" to provide case-sensitive semantics on top (making
/some/where/g.h5 and /some/where/G.h5 be distinct files, and stay
distinct across a network filesystem, backup/restore, etc etc).

ciao,

--
-- "Jonathan Thornburg -- remove -animal to reply" <jthorn(a)aei.mpg-zebra.de>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany, "Old Europe" http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam

From: Tarjei T. Jensen on 25 Sep 2006 13:30

Bill Todd wrote:
> Tarjei T. Jensen wrote:
>> Arbitrary length file names
>
> A length of much more than, say, 64 KB could start to become an
> implementation challenge in any reasonable directory approach that I can
> think of: would such a limit satisfy you, and if not, why not?

As far as I'm concerned 255 bytes for a file name is arbitrary enough
length. A path length limit of 64KB sounds reasonable.

>> The file system should be able to migrate from one device to another. Or
>> that may be the job of the volume manger. I don't care; I want the
>> feature.
>
> What user need are you attempting to satisfy with that feature? It sounds
> like a work-around for some assumed underlying deficiency in the storage
> system.

I have used this on Digital Unix. Very cool when you have a failing disk
drive and you can migrate data to another drive without any problems. Love
the feature.

> I suspect you mean support an explicit 'copy file' operation (along the
> lines of NT's) which will handle any ancillary information that may not be
> present in the single main data stream: this is desirable as a user aid
> and performance enhancement even for simple files, and especially for
> 'decorated' files (whether such decoration is ad hoc or standardized) and
> for files with an internal organization that does not allow efficient
> application-level copying at all (e.g., B+ trees where the interior nodes
> are managed by the system rather than allocated from the virtual space of
> a simple byte-stream file) - plus facilitates copy-on-write sharing by
> multiple file 'copies' of a single instance if the system supports that
> (which I likely should have included in my list of possible extensions).

Sounds even better than what I wanted.

> If you are copying within the same file system, any presence of the
> network should be transparent. If you are copying the files somewhere
> else, you could use the 'read compressed' API that you described above if
> the file was already compressed and the remote end understood that form of
> compression (e.g., was another instance of the same kind of file system);
> otherwise, I'd suggest that compressing for network transmission is not
> the job of the local file system but rather should be a feature of the
> underlying network mechanisms that the CopyFile() file system operation
> uses.

There are several ways of looking at this. One way is to view the Network as
just transport and let the file systems communicate with each other. This
means that the copy command sets up communication and give commands to the
file system about what to do. Then the file systems will sort out the rest.
They know how to talk to each other.

> But it is indeed a gray area as soon as one introduces the idea of a
> CopyFile() operation (that clearly needs to include network copying to be
> of general use). The recent introduction of 'bundles' ('files' that are
> actually more like directories in terms of containing a hierarchical
> multitude of parts - considerably richer IIRC than IBM's old 'partitioned
> data sets') as a means of handling multi-'fork' and/or attribute-enriched
> files in a manner that simple file systems can at least store (though
> applications then need to understand that form of storage to handle it
> effectively) may be applicable here.

Sounds great.

>> The file system or volumen manager should support mirroring. This should
>> be
>> based on blocks.
>
> Again, that last sounds like an implementation detail aimed at satisfying
> some perceived user-level need: what is that need?

The need is safekeeping of data AND backups. It is wonderful if you have a
catastrophic failure and you can continue after a short pause at another
site.

>> Technology should be able to work across an IP network.
>
> What technology? Mirroring? The latter is internal to the file system,
> so perhaps you are stating that distributed aspects of the file system in
> general should function over an IP network - possibly implying something
> about the degree of such distribution (e.g., WAN vs. LAN)?

If somebody provides transport for the file system, it should be able to use
that transport to communicate with another file system of the same type and
replicate changes.

> Conventional mirroring is inherently synchronous, which causes significant
> update-performance impacts as line latencies increase with distance.
> Asynchronous replication (e.g., at disaster-tolerant separations
> sufficient to satisfy even the truly paranoid - I probably should have
> included at least limited disaster-tolerance in my list of extensions)
> requires ordering guarantees for the material applied to the remote site
> that can become complex when the main site comprises multiple file system
> nodes executing concurrently.

I am a BIG fan of asking "How should the product be perceived work" before
worrying about implementation. I suppose that is pretty close to the
doctrine from "The Inmates Are Running The Asylum".

greetings,

From: Tarjei T. Jensen on 25 Sep 2006 13:37

Bill Todd wrote:
> Conventional mirroring is inherently synchronous, which causes significant
> update-performance impacts as line latencies increase with distance.
> Asynchronous replication (e.g., at disaster-tolerant separations
> sufficient to satisfy even the truly paranoid - I probably should have
> included at least limited disaster-tolerance in my list of extensions)
> requires ordering guarantees for the material applied to the remote site
> that can become complex when the main site comprises multiple file system
> nodes executing concurrently.

BTW A file system should be able to give an early warning of impending doom
resulting from failing media.

greetings,

From: Tarjei T. Jensen on 25 Sep 2006 13:45

"Eric P." wrote:
> - [*] For disk file system at least, I think the time for compressed
> files has passed and are a waste of development time these days.

Sorry content is balooning. We need everything in order to keep cost down.

> - Multi-disk volumes are also a feature I think whose time has passed
> and would waste development time, but mention them anyway.

Good idea combined with the ability to migrate to new volumes. Some times
you need multiple volumes in order to spread I/O load.

Ideally the file system should spread the load automagically if more than
one channel is available.

> - I haven't thought of any real use for sparse files yet, since
> databases do their own internal management anyway, so I might
> consider also classifying this as a waste development time.

Agreed.

> - SoftLinks (vms logical names) and HardLinks.

Soft links are more important than hard links for me.

> - I have been pondering the idea of FIFO files that have a
> FrontOfFile marker as well as an EndOfFile marker.
> Efficient for store and forward inter process messaging but
> I'm not sure if it would be useful enough to warrant support.

Being able to use FIFOs across systems would be great.

VMS style logical names would also be a great. They are truly useful.

Now we will have to invent a distributed name manager to go with the file
system :-)

greetings,

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?