S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Dmitry A. Kazakov on 10 Aug 2010 06:36

On Tue, 10 Aug 2010 01:56:22 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 9, 12:56�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> On Mon, 9 Aug 2010 02:55:03 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 8, 5:15�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
>>> wrote:
>>> S-expressions are not a format on top or below that, it's a format
>>> *besides* that, at the same level. Objects are serialized into byte
>>> sequences forming S-expression atoms, and relations between objects/
>>> atoms are serialized by the S-expression format. This is how one get
>>> the canonical representation of a S-expression.
>>
>> I thought you wanted to represent *objects* ... as S-sequences?
>
> It depends what you call object. Here again, my vocabulary might has
> been tainted by C Standard. Take for example a record, I would call
> each component an object, as well as the whole record itself.

That's OK. Using this definition S-sequence in the memory is an object.
Which was the question: what was wrong with the first object so that you
wanted another instead?

My first take was S-sequence used as an object presentation outside the
memory, because you used the word "format". Now it looks as a solution
before the problem. You seem going to convert objects to S-sequences *in*
the memory and then dump the result them into files. Is it so? What was the
problem then? Because it cannot work without a conversion between
S-sequence in the memory (object) and S-sequence in the file
(representation). Why do you need S-sequence in the memory, while dumping
objects directly into files as S-sequences (if you insist on having them)
is simpler, cleaner, thinner, faster.

>>>> The difference is that Character represents code points and octet does
>>>> atomic arrays of 8 bits.
>>
>>> Considering Ada's Character also spans over 8 bits (256-element
>>> enumeration), both are equivalent, right?
>>
>> Equivalent defiled as? In terms of types they are not, because the types
>> are different. In terms of the relation "=" they are not either, because
>> "=" is not defined on the tuple Character x Unsigned_8 (or whatever).
>
> Sorry, "equivalent" in the mathematical that there is a bijection
> between the set of Characters and the set of Octets, which allows to
> use any of them to represent the other. Agreed, this a very week
> equivalence, it just means there are exactly as many octet values as
> Character values.

No it is OK. Character can be used to represent octets. Ada provides means
for that, e.g.:

type Octet is private; -- Interface
private
type Octet is new Character; -- Implementation

> On the other hand, Storage_Element and Character are not bijection-
> equivalent because there is no guarantee they will always have the
> same number of values, even though they often do.

Yes.

>>> Actually I've started to wonder whether Stream_Element might even more
>>> appropriated: considering a S-expression atom is the serialization of
>>> an object, and I guess objects which know how to serialize themselves
>>> do so using the Stream subsystem, so maybe I could more easily
>>> leverage existing serialization codes if I use Stream_Element_Array
>>> for atoms.
>>
>> Note that Stream_Element is machine-depended as well.
>
> I'm sadly aware of that. I need an octet-sequence to follow the S-
> expression standard, and there is here an implementation trade-off:
> assuming objects already know how to serialize themselves into a
> Stream_Element_Array, I can either code a converter from
> Stream_Element_Array to octet-sequence, or reinvent the wheel and code
> a converter for each type directly into an octet-sequence. For some
> strange reason I prefer by far the first possibility.

That depends on your goal. Streams are machine-dependent. Streams of octets
are not. If you want to exchange objects in the form of S-sequences across
the network you have to drop standard stream implementations of the objects
and replace them with your own, based on the stream octets. In this case
you will not use Stream_Element_Array directly. You will read and write
octets, by Octet'Read and Octet'Write. Provided that octet streams work,
which is about 99.9%, I guess. When they are not capable to handle octets
properly, you will have to implement I/O manually. If you wrap Octet'Read
into a function, you will be able to exchange the I/O layer without
affecting the upper ones. If we look at all this mechanics we will see the
old good OSI model.

> If it helps, you can think of S-expressions as a standardized way of
> serializing some relations between objects. However the objects still
> have to be serialized, and that's outside of the scope of S-
> expressions. From what I understood, the existing serializations of
> objects use Stream_Element_Array as a low-level type. So the natural
> choice for serializing the relations seems to be taking the
> Stream_Element_Array from each object, and hand over to the lower-
> level I/O a unified Stream_Element_Array.
>
> Does it make sense or am I completely missing something?

In other post Jeffrey Carter described this as low-level. Why not to tell
the object: store yourself and all relations you need, I just don't care
which and how?

In fact I did something like that, persistent objects with dependencies
between them and collection of unreferenced objects. But I did it as
Jeffrey suggested. There is Put (Store, Object) the rest is hidden.

BUT, S-expression do not support references, they are strictly by-value, so
you don't need that stuff anyway.

>> The point is that you never meet 80 before knowing that this is a "port",
>> you never meet "port" before knowing it is of "tcp-connect". You always
>> know all types in advance. It is strictly top-down.
>
> Right, in that simple example it the case. It is even quite often the
> case, hence my thinking about a Sexp_Stream in another post, which
> would allow S-expression I/O without having more than a single node at
> the same time in memory.
>
> But there are still situations where S-expression have to be stored in
> memory.

There is no such cases!

> For examples the templates, where S-expressions represent a
> kind of limited programming language that is re-interpreted for each
> template extension.

I am not sure what do you mean here, but in general template is not the
object, its instance is. You do not need S-expressions here either. You can
store/restore templates as S-sequences. A template in the memory would be
an object with some operations like Instantiate_With_Parameters etc. The
result of instantiation will be again an object and no S-sequence.

BTW, for an interpreter I would certainly prefer the Reverse Polish
Notation to a tree. (I think this too boils down to a solution before the
problem.)

> the latter being
> roughly the low-level (as in "close to the hardware", at least close
> enough not to rule out programming for embedded platforms and system
> programming) and the performance.

(BTW, Ada is closer to the hardware than C is. You can even describe
interrupt handlers in Ada. Try it in C.)

> Now I can't explain why your posts often make me feel Ada is
> completely out of my tastes in programming languages,

In the way of programming you mean? I wanted to convey that a
"C-programmer" will have to change some habits when switching to Ada. Ada
enforces certain attitude to programming. Some people would say to software
engineering. It is not obvious because you can do in Ada anything you can
in C. But a stubborn hardcore "C-programmer" might become very frustrated
very soon. A competent C developer will only enjoy Ada.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: _FrnchFrgg_ on 10 Aug 2010 07:06

Le 10/08/2010 09:16, Dmitry A. Kazakov a �crit :
> On Mon, 09 Aug 2010 23:54:00 +0200, _FrnchFrgg_ wrote:
>> I think that you want pattern matching
>> (http://en.wikipedia.org/wiki/Standard_ML#Algebraic_datatypes_and_pattern_matching)
>
> No, I don't believe in type inference, in fact I strongly disbelieve in it.

Unification and pattern matching is independent of type inference. Sure,
most of the time you find both in the same languages, but IIRC in the
course of my master, I have encountered languages with one but not the
other.

From: Dmitry A. Kazakov on 10 Aug 2010 07:19

On Tue, 10 Aug 2010 13:06:58 +0200, _FrnchFrgg_ wrote:

> Le 10/08/2010 09:16, Dmitry A. Kazakov a �crit :
>> On Mon, 09 Aug 2010 23:54:00 +0200, _FrnchFrgg_ wrote:
>>> I think that you want pattern matching
>>> (http://en.wikipedia.org/wiki/Standard_ML#Algebraic_datatypes_and_pattern_matching)
>>
>> No, I don't believe in type inference, in fact I strongly disbelieve in it.
>
> Unification and pattern matching is independent of type inference.

Did you mean the standard meaning of pattern matching instead of Standard
ML's Volap�k?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Natacha Kerensikova on 10 Aug 2010 08:06

On Aug 10, 12:36 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> On Tue, 10 Aug 2010 01:56:22 -0700 (PDT), Natacha Kerensikova wrote:
> > On Aug 9, 12:56 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> >> I thought you wanted to represent *objects* ... as S-sequences?
>
> > It depends what you call object. Here again, my vocabulary might has
> > been tainted by C Standard. Take for example a record, I would call
> > each component an object, as well as the whole record itself.
>
> That's OK. Using this definition S-sequence in the memory is an object.

Just to be sure, what is it exactly you call S-sequence? For the rest
of this post I will assume it synonym of S-expression atom, I hope my
answers won't be too misguided.

> Which was the question: what was wrong with the first object so that you
> wanted another instead?

The first object is the internal memory representation designed for
actual efficient use. For example, an integer will probably be
represented by its binary value with machine-defined endianness and
machine-defined size.

The other object is a "serialized" representation, in the sense that
it's designed for communication and storage, for example the same
integer, in a context where it will be sent over a network, can be for
example represented as an ASCII-encoded decimal number, or in binary
but with a predefined size and endianness. This is really the same
considerations as when storing or sending an object directly, except
that is has to reside in memory for a short time. There is no more
conversions or representations than when S-expression-lessly storing
or sending objects; the only difference is the memory buffering to
allow S-expression-specific information to be inserted around the
stream.

> My first take was S-sequence used as an object presentation outside the
> memory, because you used the word "format". Now it looks as a solution
> before the problem. You seem going to convert objects to S-sequences *in*
> the memory and then dump the result them into files. Is it so?

Yes, it is.

> What was the problem then?

The problem is to organize different objects inside a single file. S-
expression standardize the organization and relations between objects,
while something else has to be done beyond S-expression to turn
objects into representations suitable to live in a file.

Or the same thing with, instead of a file, an IPC socket or a network
socket or whatever, I just don't know how to call it generically
without resorting to a word derived from "serialize", but I hope you
get the point anyway.

> Because it cannot work without a conversion between
> S-sequence in the memory (object) and S-sequence in the file
> (representation).

The S-expression standard describe what conversions are allowed and in
what they consist. I cannot follow the standard without such a
conversion anyway, so either I do it or I drop the idea of S-
expressions, but then I don't have anything to portably store or
transmit objects, so back to square one.

> Why do you need S-sequence in the memory, while dumping
> objects directly into files as S-sequences (if you insist on having them)
> is simpler, cleaner, thinner, faster.

Because I need to examine the S-sequence before writing it to disk, in
order to have enough information to write S-expression metadata. At
the very lest, I need to know the total size of the atom before
allowing its first byte to be send into the file.

> >> Note that Stream_Element is machine-depended as well.
>
> > I'm sadly aware of that. I need an octet-sequence to follow the S-
> > expression standard, and there is here an implementation trade-off:
> > assuming objects already know how to serialize themselves into a
> > Stream_Element_Array, I can either code a converter from
> > Stream_Element_Array to octet-sequence, or reinvent the wheel and code
> > a converter for each type directly into an octet-sequence. For some
> > strange reason I prefer by far the first possibility.
>
> That depends on your goal. Streams are machine-dependent. Streams of octets
> are not. If you want to exchange objects in the form of S-sequences across
> the network you have to drop standard stream implementations of the objects
> and replace them with your own, based on the stream octets.

This looks like a very strong point in favor of using an array-of-
octets to represent S-expression atoms.

> In this case
> you will not use Stream_Element_Array directly. You will read and write
> octets, by Octet'Read and Octet'Write. Provided that octet streams work,
> which is about 99.9%, I guess. When they are not capable to handle octets
> properly, you will have to implement I/O manually. If you wrap Octet'Read
> into a function, you will be able to exchange the I/O layer without
> affecting the upper ones. If we look at all this mechanics we will see the
> old good OSI model.

That sounds like a very nice way of doing it. So in the most common
case, there will still be a stream, provided by the platform-specific
socket facilities, which will accept an array-of-octets, and said
array would have to be created from objects by custom code, right?
(just want to be sure I understood correctly).

> In other post Jeffrey Carter described this as low-level. Why not to tell
> the object: store yourself and all relations you need, I just don't care
> which and how?

That's indeed a higher-level question. That's how it will happen at
some point in my code; however at some other point I will still have
to actually implement said object storage, and that's when I will
really care about which and how. I'm aware from the very beginning
that a S-expression library is low-level and is only used by mid-level
objects before reaching the application.

I've discussed with a friend of mine who has a better knowledge than
me about what actually is a parser and a lexer and things like that.
It happens that what my S-expression code is intended to be is
actually a partial parser, requiring some more specific stuff on top
on it to actually be called a parser. Just like S-expression is a
partial format in that it describes how to serialize relations between
atoms without describing how objects are serialized into atoms.

At some point in my projects I will have to write various
configuration file parsers, template parsers, and maybe a lot of other
parsers. They can all be treated as independent parsers, and
implemented with independent code, maybe derived from YACC or
something. I didn't choose that path, I chose rather to use what I
called a S-expression library, which is sort of a common core to all
these parsers, so I only have left to write the specific part of each
situations, which is matching the keywords and typing/deserializing
the objects.

> > But there are still situations where S-expression have to be stored in
> > memory.
>
> There is no such cases!

Right, I guess just like goto-use, they can always be avoided, but I'm
still not convinced it's always the best.

> > For examples the templates, where S-expressions represent a
> > kind of limited programming language that is re-interpreted for each
> > template extension.
>
> I am not sure what do you mean here, but in general template is not the
> object, its instance is.

Just to be sure we use the same words, I'm talking here about HTML
templates. The basic definition of a S-expression is a list of nodes,
each of which can be a list or an atom; for a template I copy directly
atom contents into the output, and lists are interpreted as functions,
with an "instance" (or whatever is called the thing containing data to
populate the template) handling the actual "execution" of the
function. Now the interest I find in the recursive definition of S-
expression is that it's easy to pass template-fragments as arguments
of these functions. For this it's much easier to let the S-expression
describing the template reside into memory. S-expressions are of
course not required here, but provide a nice and simple unified
format.

> You do not need S-expressions here either. You can
> store/restore templates as S-sequences. A template in the memory would be
> an object with some operations like Instantiate_With_Parameters etc. The
> result of instantiation will be again an object and no S-sequence.

Well how would solve the problem described above without S-
expressions? (That's a real question, if something simpler and/or more
efficient than my way of doing it exists, I'm genuinely interested.)

> > the latter being
> > roughly the low-level (as in "close to the hardware", at least close
> > enough not to rule out programming for embedded platforms and system
> > programming) and the performance.
>
> (BTW, Ada is closer to the hardware than C is. You can even describe
> interrupt handlers in Ada. Try it in C.)

Fantastic \o/

> > Now I can't explain why your posts often make me feel Ada is
> > completely out of my tastes in programming languages,
>
> In the way of programming you mean? I wanted to convey that a
> "C-programmer" will have to change some habits when switching to Ada.

I unsure about you mean here by "C-programmer". As I said in other
posts, I don't really code the same way as other C coders I've met. I
still don't know whether it's a good thing or not. I'm willing to
change some of my habits to switch to Ada, while I won't give up some
other even if mean giving up Ada. The main criterion being that coding
must remain fun, because it makes no sense to continue doing a leisure
activity that isn't fun anymore.

> But a stubborn hardcore "C-programmer" might become very frustrated
> very soon. A competent C developer will only enjoy Ada.

I somehow hope I'm more of the latter than the former.

BTW, that post of yours is one that encourages me rather than restrain
me.

Thanks for your interesting replies,
Natacha

From: Robert A Duff on 10 Aug 2010 08:50

"Randy Brukardt" <randy(a)rrsoftware.com> writes:

> Not sure if it still exists in the real world, but the compiler we did for
> the Unisys mainframes used Stream_Element'Size = 9.

Interesting. Can these machines communicate with non-Unisys machines
over a regular TCP/IP network? E.g. send an e-mail using standard
protocols, that can be read on a x86?

I assume Storage_Element'Size = 9, too. Correct?

Next question: Is (was) there any Ada implementation where
Stream_Element'Size /= Storage_Element'Size?

>...(The Unisys mainframes
> are 36-bit machines.) Stream_Element'Size = 8 would have made the code for
> handling arrays awful.
>
> Similarly, Character'Size = 9 on that machine.

That sounds like a non-conformance, at least if the SP Annex
is supported. Maybe you mean X'Size = 9, where X is of
type Character? You'd certainly want 'Component_Size = 9
for array-of-Character.

> That would have made a compiler for the Unisys machines impossible; it would
> have made streaming impossible. There is no sane way to put 36-bit values
> into Octets - the only way that would have worked would have been to use
> 16-bits for every 9-bit byte.
>
> Whether this is a significant consideration today (in 2010) is debatable,
> but it surely was a real consideration back in 1993-4. So Ada 95 could not
> have made this choice.

I think it would not be a good idea to make Ada unimplementable
on "odd-ball" machines.

- Bob

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: GPRbuild compatibility
Next: Irony?