S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Simon Wright on 8 Aug 2010 06:26

I'd disagree with Jeffrey here.

Nothing wrong with stating at the bottom! especially when you already
know that the component you're looking at is likely to be useful and to
fit into *your* way of thinking about things. Your plan already has
higher-level abstractions, so that if you get to the next layer up and
want to change your lowest layer (if only for experimentation's sake)
you will be able to do so.

Lots of people round here are responsible for component libraries at
various levels of abstraction which they've developed for their own
purposes and then pushed out to the community in the hope they'll help
someone else.

The only caution I'd add is that, at some point, when you're reading an
S-expression from an external file and you expect the next 4 bytes to
contain an integer in binary little-endian format, you're going to have
to trust _something_ to have got it right; if you wrote "*(struct foo
**)((char *)whatever + bar.offset)" in C you will have to write the
equivalent in Ada. Unless you were going to invent a sort of checked
S-expression? (I don't get that impression!)

--S

From: Dmitry A. Kazakov on 8 Aug 2010 07:44

On Sun, 08 Aug 2010 11:26:11 +0100, Simon Wright wrote:

> The only caution I'd add is that, at some point, when you're reading an
> S-expression from an external file and you expect the next 4 bytes to
> contain an integer in binary little-endian format,

S-expressions are bad, but not that bad. They have binary data encoded as
hexadecimal strings.

> you're going to have
> to trust _something_ to have got it right; if you wrote "*(struct foo
> **)((char *)whatever + bar.offset)" in C you will have to write the
> equivalent in Ada.

Luckily, Ada makes it difficult to write C equivalents. In Ada a
"non-equivalent" could be:

with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces;
with Ada.Exceptions; use Ada.Exceptions;

function Get (Stream : access Root_Stream_Type'Class) return Unsigned_16 is
Result : Unsigned_16 := 0;
begin
if '#' /= Character'Input (Stream) then
Raise_Exception (Syntax_Error'Identity, "Opening '#' is expected");
end if;
for Octet in 0..3 loop
Result :=
Result + Character'Pos (Character'Input (Stream)) * 2**Octet;
end loop;
if '#' /= Character'Input (Stream) then
Raise_Exception (Syntax_Error'Identity, "Closing '#' is expected");
end if;
return Result;
end Get;

> Unless you were going to invent a sort of checked
> S-expression? (I don't get that impression!)

I don't think that were an invention. The program in C or Ada that does not
check the format of the input is broken to me.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Dmitry A. Kazakov on 8 Aug 2010 07:48

On Sun, 8 Aug 2010 13:44:48 +0200, Dmitry A. Kazakov wrote:

> On Sun, 08 Aug 2010 11:26:11 +0100, Simon Wright wrote:
>
>> The only caution I'd add is that, at some point, when you're reading an
>> S-expression from an external file and you expect the next 4 bytes to
>> contain an integer in binary little-endian format,
>
> S-expressions are bad, but not that bad. They have binary data encoded as
> hexadecimal strings.
>
>> you're going to have
>> to trust _something_ to have got it right; if you wrote "*(struct foo
>> **)((char *)whatever + bar.offset)" in C you will have to write the
>> equivalent in Ada.
>
> Luckily, Ada makes it difficult to write C equivalents. In Ada a
> "non-equivalent" could be:
>
> with Ada.Streams; use Ada.Streams;
> with Interfaces; use Interfaces;
> with Ada.Exceptions; use Ada.Exceptions;
>
> function Get (Stream : access Root_Stream_Type'Class) return Unsigned_16 is

Unsigned_32

> Result : Unsigned_16 := 0;

Unsigned_32

> begin
> if '#' /= Character'Input (Stream) then
> Raise_Exception (Syntax_Error'Identity, "Opening '#' is expected");
> end if;
> for Octet in 0..3 loop
> Result :=
> Result + Character'Pos (Character'Input (Stream)) * 2**Octet;

2**(Octet*8) of course

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Natacha Kerensikova on 8 Aug 2010 08:23

On Aug 7, 4:23 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> On Sat, 7 Aug 2010 05:56:50 -0700 (PDT), Natacha Kerensikova wrote:
> > Can we at least agree on the fact that a sequence of bytes is a
> > general-purpose format, widely used for storing and transmitting data?
> > (this question is just a matter of vocabulary)
>
> I don't think so. Namely it don't think that "general" is a synonym to
> "completeness." It is rather about the abstraction level under the
> condition of completeness.

Well, then I'm afraid I can discuss anymore, because I fail to
understand your definition here.

I was using "general-purpose" as the opposite of "specific-purpose".
If we make the parallel with compression schemes, FLAC sure is as
complete as bzip2, yet the first one has a specific purpose
(compressing sounds) while the other is general-purpose. So back to
data format, I made the distinction in the amount preliminary
assumptions about data to be contained. In that sense the raw byte-
sequence is the most general format in that there is no assumption
about the contained data (except that its number of bits is a multiple
of the number of bits per byte).

> > So let's add as few semantics as possible, to keep as much generality
> > as possible. We end up with a bunch of byte sequences, whose semantics
> > are still left to the application, linked together by some kind of
> > semantic link. When the chosen links are "brother of" and "sublist of"
> > you get exactly S-expressions.
>
> Yes, the language of S-expressions is about hierarchical structures of
> elements lacking any semantics.
>
> I see no purpose such descriptions.

Indeed, I don't see any either, and that's the point: there is room to
add your application-specific purpose on top of this format.

> > However from a purely practical point of view, and using the fact that
> > in my background languages (C and 386 asm) bytes sequences and strings
> > are so similar, these crude semantics are all I need (or at least, all
> > I've ever needed so far).
>
> Lower you descend down the abstraction levels less differences you see.
> Everything is a bunch of transistors...

In the Get procedure from your last post, you don't seem to make that
much difference between a binary byte and a Character. I would seem
Ada Strings are also very similar to byte sequences/arrays.

> > Now if we agree that simplicity is a
> > desirable quality (because it leads to less bugs, more productivity,
> > etc), I still fail to see the issues of such a format.
>
> Programs in 386 Assembler are sufficiently more complex than programs in
> Ada. Simplicity of nature by no means implies simplicity of use.

Guess why I haven't written a single line in assembly during the last
8 years ;-)

> > Now regarding personal preferences about braces, I have to admit I'm
> > always shocked at the way so many people dismiss S-expressions on
> > first sight because of the LISP-looking parentheses.
>
> Do you mean LISP does not deserve its fame? (:-))

I honestly don't know enough about both LISP and its fame to mean
anything like that. I just meant that judging format only from its
relatively heavy use of parenthesis is about as silly as judging
skills of a person only from the amount of melanin in their skin.

> > My point is, most of my (currently non-OOP) code can be expressed as
> > well in an OOP style. When I defined a C structure along with a bunch
> > of functions that perform operations on it, I'm conceptually defining
> > a class and its methods, only expressed in a non-OOP language. I
> > sometimes put function pointers in the structure, to have a form of
> > dynamic dispatch or virtual methods. I occasionally even leave the
> > structure declared but undefined publicly, to hide internals (do call
> > that encapsulation?), along with functions that could well be called
> > accessors and mutators. In my opinion that doesn't count as OOP
> > because it doesn't use OOP-specific features like inheritance.
>
> I disagree because in my view this is all what OO is about. OO is not about
> the tools (OOPL), it is about the way of programming.

Then I guess you could say I'm twisting C into OO programming, though
I readily violate OOP principles when it significantly improves code
readability or simplicity (which I guess happens much more often in C
than in Ada).

> > And the reason why I started this thread is only to
> > know how to buffer into memory the arrays of octets, because I need
> > (in my usual use of S-expressions) to resolve the links between atoms
> > before I can know the type of atoms. So I need a way to delay the
> > typing, and in the meantime handle data as a generic byte sequence
> > whose only known information is its size and its place in the S-
> > expression tree. What exactly is so bad with that approach?
>
> Nothing wrong when at the implementation level. However I don't see why
> links need to be resolved first. In comparable cases - I do much messy
> protocol/communication stuff - I usually first restore objects and then
> resolve links.

That's because some atom types are only known after having examined
other atoms. I you remember my example (tcp-connect (host foo.example)
(port 80)), here is how would it be interpreted: from the context or
initial state, we expect a list beginning with a atom which is a
string describing what to with whatever is after. "tcp-connect" is
therefore interpreted as a string, from the string value we know the
following is a list of settings, each of them being a list whose first
element is a atom which is a string describing the particular setting.
"host" is therefore a string, as its value tells us the following
atoms are also strings, describing host names to connect to, in
decreasing priority order. There "foo.example" is a string to be
resolve into a network address. "port" is also a string, and from its
value we know it's followed by atom being the decimal representation
of a port number, which in Ada would probably be a type on its own
(probably Integer mod 2**16 or something like that).

Of course, all those "we know" is actually knowledge derived from the
configuration file specification.

In this particular example, atoms are treated in the order in which
they appear in the byte stream, so there is already enough context to
know the type of an atom before reading it. This is not always the
case, for example it might be necessary to build an associative array
from a list of list before being able to know the type of non-head
atoms, or the S-expression might have to be kept uninterpreted (and
thus untyped) before some other run-time actions are performed (this
is quite common in the template system, where the template and the
data can change independently, and both changes induce a S-expression
re-interpretation).

Is it clearer now?

Natacha

From: Dmitry A. Kazakov on 8 Aug 2010 09:01

On Sun, 8 Aug 2010 05:23:37 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 7, 4:23�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> On Sat, 7 Aug 2010 05:56:50 -0700 (PDT), Natacha Kerensikova wrote:
>>> Can we at least agree on the fact that a sequence of bytes is a
>>> general-purpose format, widely used for storing and transmitting data?
>>> (this question is just a matter of vocabulary)
>>
>> I don't think so. Namely it don't think that "general" is a synonym to
>> "completeness." It is rather about the abstraction level under the
>> condition of completeness.
>
> Well, then I'm afraid I can discuss anymore, because I fail to
> understand your definition here.
>
> I was using "general-purpose" as the opposite of "specific-purpose".
> If we make the parallel with compression schemes, FLAC sure is as
> complete as bzip2, yet the first one has a specific purpose
> (compressing sounds) while the other is general-purpose. So back to
> data format, I made the distinction in the amount preliminary
> assumptions about data to be contained. In that sense the raw byte-
> sequence is the most general format in that there is no assumption
> about the contained data (except that its number of bits is a multiple
> of the number of bits per byte).

And how are you going to make any assumptions at the level of raw bytes?
For a sequence of bytes to become sound you need to move many abstraction
layers - and OSI layers - up.

>>> However from a purely practical point of view, and using the fact that
>>> in my background languages (C and 386 asm) bytes sequences and strings
>>> are so similar, these crude semantics are all I need (or at least, all
>>> I've ever needed so far).
>>
>> Lower you descend down the abstraction levels less differences you see.
>> Everything is a bunch of transistors...
>
> In the Get procedure from your last post, you don't seem to make that
> much difference between a binary byte and a Character.

No I do. But you have defined it as a text file. A streamed text file is a
sequence of Character items.

> I would seem
> Ada Strings are also very similar to byte sequences/arrays.

I remember a machine where char was 32-bit long.

Byte, octet, character are three different things (and code point is a
fourth).

> I just meant that judging format only from its
> relatively heavy use of parenthesis is about as silly as judging
> skills of a person only from the amount of melanin in their skin.

The amount of melanin is unrelated to the virtues we count in human beings.
An excessive need in indistinguishable brackets would definitely reduce
readability.

>>> And the reason why I started this thread is only to
>>> know how to buffer into memory the arrays of octets, because I need
>>> (in my usual use of S-expressions) to resolve the links between atoms
>>> before I can know the type of atoms. So I need a way to delay the
>>> typing, and in the meantime handle data as a generic byte sequence
>>> whose only known information is its size and its place in the S-
>>> expression tree. What exactly is so bad with that approach?
>>
>> Nothing wrong when at the implementation level. However I don't see why
>> links need to be resolved first. In comparable cases - I do much messy
>> protocol/communication stuff - I usually first restore objects and then
>> resolve links.
>
> That's because some atom types are only known after having examined
> other atoms. I you remember my example (tcp-connect (host foo.example)
> (port 80)), here is how would it be interpreted: from the context or
> initial state, we expect a list beginning with a atom which is a
> string describing what to with whatever is after. "tcp-connect" is
> therefore interpreted as a string, from the string value we know the
> following is a list of settings,

Once you matched "tcp-connect", you know all the types of the following
components.

> This is not always the
> case, for example it might be necessary to build an associative array
> from a list of list before being able to know the type of non-head
> atoms,

What for? Even if such cases might be invented, I see no reason to do that.
It is difficult to parse, it is difficult to read. So why to mess with?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: GPRbuild compatibility
Next: Irony?