Trying to design low level hard disk manipulation program [Computer Architecture]

Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?

From: Terje Mathisen on 26 Sep 2006 03:45

Eric P. wrote:
> Terje Mathisen wrote:
>> Tarjei T. Jensen wrote:
>>> Niels J?rgen Kruse wrote:
>>>> File formats are usually compressed already, and you need to know the
>>>> kind of content to get the best compression.
>>> Sorry, they are not yet compressed. It does not mean that we should not
>>> prepare the file system to handle that.
>>>
>>> We'll have to see whether future word processing and spreadsheet formats
>>> are compressed enough natively.
>> I'd be quite happy if just one single app would stop storing all it's
>> data pessimally:
>>
>> Microsoft Powerpoint.
>>
>> Any jpeg images you include in a PPT presentation will be decompressed
>> into a 32-bit RGB bitmap, and stored that way in the file.
>>
>> This holds even if you resize the source image to a small thumbnail in
>> your presentation.
>>
>> This one app is responsible for 10-20% of _all_ file space on most of
>> our file server volumes. :-(
>
> Ok, well maybe compression still has a place. Of course you realize
> that some peoples ability to do dumbass things far exceeds others
> ability to compensate by adding compression. Would the compression
> algorithm the file system uses work well on 32 bit RGB bitmaps?

No, not at all:

You get at best something like 2:1 compression using zlib/zip or a
similar lossless approach, while the jpeg->BMP decompression gave a 10:1
expansion.

I.e. this particular problem can _only_ be fixed inside the application.

I'm guessing that at one point in time, lost in obscurity by now, the
PowerPoint team decided that "let's add the capability to import BMP
images!", than a little bit later someone else said: "Now that we have
BMP import, why don't we write/borrow/steal a set of file format
conversion routines, so that we can also import other non-vector image
formats?"

By doing all conversions at the import stage, it didn't matter if a
specific image format required a relatively costly decompression stage,
it would still be displayed just as quickly as a regular BMP file!

A few years later we got to the stage where a JPEG actually loads a
_lot_ faster from disk than a BMP, simply because it is much faster to
decompress the jpeg than to read a 10X larger file. :-(

Terje
--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen on 26 Sep 2006 03:55

Andrew Reilly wrote:
> Quite a lot of meta-data is stored within files, in application-specific
> formats, now. ID3 title/artist tags or sample rates in MP3 files, "meta"
> attributes in HTML files, author information in office documents.
> Alternate language soundtracks in DVD movies, perhaps (not meta-data, but
> "extra stream" information).
>
> How could this reasonably be subsumed by a file system, when the
> information must travel with the file, by the definition of the file
> format? Perhaps it is reasonable for a "file system" to expose abstract
> meta-data methods that operate on different file types through
> type-specific plug-ins that access (and modify?) the information in
> format-specific ways. Is that really a win? Is it what you are thinking
> about, or would such meta-information be duplicated from the file into
> file-system meta-data forks? How much effort would you go to to ensure
> consistency in that case?

The examples you're using here are all more or less of the 'file system
within a single file' order.

Until the least common denominator of file systems include all this
stuff, we'll still see the need for file formats that effectively works
as a limited/application specific file system:

tar, zip, doc and probably a bunch of others.

Java jar files are afaik just zip files with a modified extension and
one or two added conventions for naming/content.

Terje
--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

From: Jean-Marc Bourguet on 26 Sep 2006 04:02

Jan Vorbr?ggen <jvorbrueggen(a)not-mediasec.de> writes:

>>>> Which language do you want to be case-insensitive in? What if two
>>>> users of the same file system disagree on the choice?
>>> That is not a matter of language. Or is there a character encoding that
>>> says for language A, "X" and "x" are a pair while for language B, "X" and
>>> "y" are a pair?
>> Yes, afaik:
>> The German 'double-s' is two letters in uppercase and a single letter in
>> lowercase.
>
> No, that's not what I meant. I asked whether there are languages that use the
> same letters, but for which the mapping between upper- and lower-case is in-
> compatible.

Turkish has two I, one with a dot and one without. If in a Turkish locale
you ask for the lowercase of I, you get the dotless i.

Yours,

--
Jean-Marc

From: Benny Amorsen on 26 Sep 2006 04:49

>>>>> "JV" == Jan Vorbrüggen <jvorbrueggen(a)not-mediasec.de> writes:

BT> Out of curiosity, does anyone know of a good reason why file names
BT> should *ever* be case-sensitive (aside from the fact that Unix
BT> users and applications have become used to this)?
>> Which language do you want to be case-insensitive in? What if two
>> users of the same file system disagree on the choice?

JV> That is not a matter of language. Or is there a character encoding
JV> that says for language A, "X" and "x" are a pair while for
JV> language B, "X" and "y" are a pair?

There are certainly languages which say that two letters are
considered the same apart from case, where another language considers
them different letters. So you risk having names which conflict in one
language, but do not conflict in another.

One special case is Å which can be alternatively spelled Aa in Danish.
A case-insensitive file system really ought to forbid having both
Aalborg and Ålborg as file names in the same directory. As far as I
know, no system has gone that far. (I wonder if any system even gets
the sorting right: a is before b is before aa is the same as å).

I have the collation problem with http://generals.dk, a site I run for
a friend of mine. There is no good universal collation, so collations
in several languages are wrong on that site. I suppose I could fix it
so that the list for each country is correct at least, but there is no
good way to sort a list of names from different countries.

JV> Case-blind case-preserving is the only variant which is acceptable
JV> from the point of view of ergonomics, IMNSHO.

Put it in user space, not the file system.

/Benny

From: Nick Maclaren on 26 Sep 2006 05:07

In article <4ns2u4FbqceeU2(a)individual.net>,
?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen(a)not-mediasec.de> writes:
|>
|> >>None of any worth IMO. But case smashing to provide a case blind name
|> >>space takes code, and would not fit into a PDP7/11 address space.
|>
|> > Nonsense. Keeping the case the user specified was a choice.
|> > Case-squashing would be a very few instructions.
|>
|> I'm all for keeping the user's choice of case, but making it irrelevant
|> on compare. Would that still be "a very few instructions", in your opinion?

I have used systems that did just that. It is a negligible number of
instructions, but is rather confusing - consider putting a list of
names into sort or uniq - should the default be case sensitive or
insensitive?

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?