S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Dmitry A. Kazakov on 2 Aug 2010 03:58

On Mon, 02 Aug 2010 09:17:15 +0200, Georg Bauhaus wrote:

> On 8/1/10 11:13 PM, Dmitry A. Kazakov wrote:
>
>>> I use S-expression as a sort of universal text-based container, just
>>> like most people use XML these days,
>>
>> These wonder me too. I see no need in text-based containers for binary
>> data. Binary data aren't supposed to be read by people, they read texts.
>> And conversely the machine shall not read texts, it has no idea of good
>> literature...
>
> The whole idea of machine readable Ada source text is silly, right?

Right.

1. Ada sources are read by the compiler.

2. Sources need not to be text file based. Compare it with the word
processor, an application the whole purpose of which is to produce texts,
that aren't text-based... (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Pascal Obry on 2 Aug 2010 13:08

Simon,

> [Team, I seem to remember an '05 feature that would support this? or was
> that my imagination?]

Yes, I think you are referring to Ada.Tags.Generic_Dispatching_Constructor

For an usage example see:

http://www.adacore.com/2007/11/26/ada-gem-19/

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net - http://v2p.fr.eu.org
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver keys.gnupg.net --recv-key F949BD3B

From: Simon Wright on 2 Aug 2010 15:08

Pascal Obry <pascal(a)obry.net> writes:

> Simon,
>
>> [Team, I seem to remember an '05 feature that would support this? or was
>> that my imagination?]
>
> Yes, I think you are referring to Ada.Tags.Generic_Dispatching_Constructor
>
> For an usage example see:
>
> http://www.adacore.com/2007/11/26/ada-gem-19/
>
> Pascal.

That was it, thanks.

Am I right that this really comes into its own with classwide types?

From: Natacha Kerensikova on 7 Aug 2010 03:23

On Aug 1, 11:13 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> On Sun, 1 Aug 2010 13:06:10 -0700 (PDT), Natacha Kerensikova wrote:
> > On Aug 1, 8:49 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> > wrote:
> >> Hmm, why do you consider brackets as separate elements?
> > Because that's the definition of S-expressions :-)
> OK, why S-expressions, then? (:-))

First, because this is the existing format in most of my programs, so
for interoperability I can choose only between using this format or
converting from and to it (or rewrite everything). In both cases there
is a strong code-reuse argument in favor of writing a S-expression
library instead of writing a bunch of ad-hoc similar code chunks.

Second, because I like this format and I find it good (see below).

> > I'm referring to this almost-RFC:http://people.csail.mit.edu/rivest/Sexp.txt
> I see, yet another poor data format.

Could you please explain why it is so poor?

I consider it good because it is flexible, expressive and simple. I
already mentioned quite a few times why it looks simple: my parser
written from scratch in ~1000 lines. I looks expressive, because most
data structures I can think of, and all data structures I have
actually used, can be easily represented: an array can be written down
as a list, a hash table as a list of two-element lists (key then
value), and so on. And I see flexibility coming from the fact that any
sequence of bytes can be encoded in an atom.

What am I missing?

> > I use S-expression as a sort of universal text-based container, just
> > like most people use XML these days,
>
> These wonder me too. I see no need in text-based containers for binary
> data. Binary data aren't supposed to be read by people, they read texts.
> And conversely the machine shall not read texts, it has no idea of good
> literature...

Actually this an interesting remark, it made me realize I'm mixing two
very different use of a data format (though I still think S-
expressions are adequate for both of them):

I think text-based format is very useful when the file has to be dealt
with by both humans and programs. The typical example would be
configuration files: read and written by humans, and used by the
program. And that's where I believe XML is really poor, because it's
too heavy for human use. I occasionally feel the need of embedding
binary data in some configuration files, e.g. cryptographic keys or
binary initialization sequences to send as-is over whatever
communication channel. In these occasion I do use the text-based
binary encoding allowed by S-expressions, base-64 or hexadecimal, so
that the configuration file is still a text file. The huge advantage
of text files here is that there is already a lot of tools to deal
with it, while using a binary format would require writing specific
tools for humans to deal with it, with is IMO a waste of time compared
to the text-based approach.

The other application is actual serialization, i.e. converting
internal types into a byte sequence in order to be stored on disk or
transmitted over a network or whatever. In this situation, humans
don't need to interact with the data (except for debugging purposes,
but it's an argument so light it's only justified when everything else
is otherwise equal).

In my previous posts I have talked a lot about serialization, while my
actual use of S-expression is more often the first one. And
historically I first used S-expressions for configuration files,
because of their expressiveness over all other text format I know,
while still being extremely simple. I then used this format for
serialization mostly for code reuse sake: I did have a code for S-
expressions, so it was very cheap to use it for serialization
purposes. Compared to using another serialization format, it leads to
less code being more used, hence less opportunities to write bugs and
more opportunities to find and fix bugs. So it seems like a very
rational choice.

> > The library is not supposed to care about what those some_stuff_ are.
> > Actually, the library is suppose to get binary data from the
> > application along with the tree structure described above, store it
> > into a sequence of bytes (on a disk or over a network or whatever),
> > and to retrieve from the byte sequence the original tree structure
> > along with the binary data provided.
>
> Serialize can yield a binary chunk, but if you have to write it anyway, why
> not to write text? You insisted on having text, why do mess with binary
> stuff?

I hope the above already answered this: I mostly use S-expressions for
text purposes, yet occasionally I feel the need of embedding binary
data. Of course there are a lot of way to testify binary data, like
hexadecimal or base-64, but considering S-expressions already handle
textification, I don't see the point of having the application deal
with it too.

> > But now that I think about it, I'm wondering whether I'm stuck in my C
> > way of thinking and trying to apply it to Ada. Am I missing an Ada way
> > of storing structured data in a text-based way?
>
> I think yes. Though it is not Ada-specific, rather commonly used OOP design
> patterns.

I heard people claiming that the first language shapes the mind of
coders (and they continue saying a whole generation of programmers has
been mind-crippled by BASIC). My first language happened to be 386
assembly, that might explain things. Anyway, I genuinely tried OOP
with C++ (which I dropped because it's way too complex for me (and I'm
tempted to say way too complex for the average coder, it should be
reserved to the few geniuses actually able to fully master it)), but I
never felt the need of anything beyond what can be done with a C
struct containing function pointers.

Now back to the topic, thanks to your post and some others in this
thread (for which I'm also thankful), I came to realize my mistake is
maybe wanting to parse S-expressions and atom contents separately. The
problem is, I just can't manage to imagine how to go in a single step
from the byte sequence containing a S-expression describing multiple
objects to the internal memory representation and vice-versa.

Thanks for your help and your patience,
Natacha

From: Dmitry A. Kazakov on 7 Aug 2010 04:39

On Sat, 7 Aug 2010 00:23:01 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 1, 11:13�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> On Sun, 1 Aug 2010 13:06:10 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 1, 8:49�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
>>> wrote:
>>>> Hmm, why do you consider brackets as separate elements?
>>> Because that's the definition of S-expressions :-)
>> OK, why S-expressions, then? (:-))
>
> First, because this is the existing format in most of my programs, so
> for interoperability I can choose only between using this format or
> converting from and to it (or rewrite everything). In both cases there
> is a strong code-reuse argument in favor of writing a S-expression
> library instead of writing a bunch of ad-hoc similar code chunks.

Legacy stuff also. That is a valid argument.

>>> I'm referring to this almost-RFC:http://people.csail.mit.edu/rivest/Sexp.txt
>> I see, yet another poor data format.
>
> Could you please explain why it is so poor?
>
> I consider it good because it is flexible, expressive and simple. I
> already mentioned quite a few times why it looks simple: my parser
> written from scratch in ~1000 lines. I looks expressive, because most
> data structures I can think of, and all data structures I have
> actually used, can be easily represented: an array can be written down
> as a list, a hash table as a list of two-element lists (key then
> value), and so on. And I see flexibility coming from the fact that any
> sequence of bytes can be encoded in an atom.
>
> What am I missing?

The requirements.

One cannot judge a format without knowing what is the purpose of. Most of
the formats like S-expressions are purposeless, in the sense that there is
no *rational* purpose behind them. As you wrote above, it is either legacy
(we have to overcome some limitations of some other poorly designed
components of the system) or personal preferences (some people like angle
brackets others do curly ones).

>>> I use S-expression as a sort of universal text-based container, just
>>> like most people use XML these days,
>>
>> These wonder me too. I see no need in text-based containers for binary
>> data. Binary data aren't supposed to be read by people, they read texts.
>> And conversely the machine shall not read texts, it has no idea of good
>> literature...
>
> Actually this an interesting remark, it made me realize I'm mixing two
> very different use of a data format (though I still think S-
> expressions are adequate for both of them):
>
> I think text-based format is very useful when the file has to be dealt
> with by both humans and programs. The typical example would be
> configuration files: read and written by humans, and used by the
> program.

There should be no configuration files at all. The idea that a
configuration can be edited using a text editor is corrupt.

> And that's where I believe XML is really poor, because it's
> too heavy for human use. I occasionally feel the need of embedding
> binary data in some configuration files, e.g. cryptographic keys or
> binary initialization sequences to send as-is over whatever
> communication channel. In these occasion I do use the text-based
> binary encoding allowed by S-expressions, base-64 or hexadecimal, so
> that the configuration file is still a text file. The huge advantage
> of text files here is that there is already a lot of tools to deal
> with it, while using a binary format would require writing specific
> tools for humans to deal with it, with is IMO a waste of time compared
> to the text-based approach.

All these tools are here exclusively to handle poor formats of these files.
They add absolutely nothing to the actual purpose of configuration, namely
to handle the *semantics* of the given configuration parameter. None
answers simple questions like: How do I make the 3-rd button on the left
4cm large? Less than none verify the parameter values.

The king is naked.

> The other application is actual serialization,

That should not be a text.

>>> But now that I think about it, I'm wondering whether I'm stuck in my C
>>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
>>> of storing structured data in a text-based way?
>>
>> I think yes. Though it is not Ada-specific, rather commonly used OOP design
>> patterns.
>
> I heard people claiming that the first language shapes the mind of
> coders (and they continue saying a whole generation of programmers has
> been mind-crippled by BASIC). My first language happened to be 386
> assembly, that might explain things.

I see where mixing abstraction layers comes from...

> Anyway, I genuinely tried OOP
> with C++ (which I dropped because it's way too complex for me (and I'm
> tempted to say way too complex for the average coder, it should be
> reserved to the few geniuses actually able to fully master it)), but I
> never felt the need of anything beyond what can be done with a C
> struct containing function pointers.

Everything is Turing-complete you know... (:-))

> The
> problem is, I just can't manage to imagine how to go in a single step
> from the byte sequence containing a S-expression describing multiple
> objects to the internal memory representation and vice-versa.

You need not, that is the power of OOP you dislike so much. Consider each
object knows how to construct itself from a stream of octets. It is trivial
to simple objects like number. E.g. you read until the octets are '0'..'9'
and generate the result interpreting it as a decimal representation. Or you
take four octets and treat them as big-endian binary representation etc.
For a container type, you call the constructors for each container member
in order. If the container is unbounded, e.g. has variable length, you read
its bounds first or you use some terminator in the stream to mark the
container end. For containers of dynamically typed elements you must learn
the component type before you construct it.

In the theory this is called the recursive descent parser, the simplest
thing ever.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: GPRbuild compatibility
Next: Irony?