S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Natacha Kerensikova on 1 Aug 2010 08:17

Hi,

I'm trying to learn Ada, coming from a C background. The first thing I
planned to code is a S-expression parser, because it's quite easy (at
least in C, less than 1000 lines including comments and memory
handling (dynamic arrays reinvented)) and very useful considering
almost all my existing programs use S-expressions as a serialization
format.

To describe briefly S-expressions, I consider them to be the simplest
existing data organization beyond raw sequences of bits. They are
basically lists of elements, each element being either a list or an
atom, and atoms being raw sequences of bits.

While I'm still not deep enough into Ada to know how to represent the
lists, I guess there won't be major issues, I think I can handle it
myself (though pointers and hints would still be welcome).

My question here is about how to represent the atoms in Ada. In C it
was merely a void pointer and a size, but it seems more difficult in
Ada because of the strong typing. Because it's up to the application
to make sense (i.e. type) out of the raw sequences of bits in atoms,
the S-expression library has to handle them as a sort of untyped
memory chunk. Do you know of a way to handle that?

Please correct me if I'm wrong, but my guess would be that the S-
expression library would provide a Sexp_Atom type, which refers to the
untyped memory chunks, and the application would have procedures to
convert back and forth between Sexp_Atom and whatever types it
internally uses (i.e. serialization and deserialization procedures).
However the library would provide these procedures for the most common
types (e.g. strings and numeric types).

Though it looks like a fine Ada API (at least to my eyes), I have
absolutely no idea about how to implement the library. How to define
the application-opaque Sexp_Atom type? How to read Sexp_Atom objects
from a file? How to build them from Ada types? How to write them back
to disk?

Thanks for your help,
Natacha

From: Dmitry A. Kazakov on 1 Aug 2010 08:53

On Sun, 1 Aug 2010 05:17:45 -0700 (PDT), Natacha Kerensikova wrote:

> To describe briefly S-expressions, I consider them to be the simplest
> existing data organization beyond raw sequences of bits. They are
> basically lists of elements, each element being either a list or an
> atom, and atoms being raw sequences of bits.

So, it is a tree?

> While I'm still not deep enough into Ada to know how to represent the
> lists, I guess there won't be major issues, I think I can handle it
> myself (though pointers and hints would still be welcome).
>
> My question here is about how to represent the atoms in Ada. In C it
> was merely a void pointer and a size, but it seems more difficult in
> Ada because of the strong typing. Because it's up to the application
> to make sense (i.e. type) out of the raw sequences of bits in atoms,

How can it make sense if the type is unknown? If the type is known, why not
to state it?

> the S-expression library has to handle them as a sort of untyped
> memory chunk. Do you know of a way to handle that?

There are many way to do it.

1. Static polymorphism, generics in Ada. The type of the leaves is the
formal parameter of the package.

2. Dynamic polymorphism.

2.a. The type of a leaf is class wide, each leaf is derived from some
abstract base type. This requires referential approach, i.e. pointers.

2.b. The type of a leaf is a variant type. This is more limiting, but can
be by-value.

> Please correct me if I'm wrong, but my guess would be that the S-
> expression library would provide a Sexp_Atom type, which refers to the
> untyped memory chunks, and the application would have procedures to
> convert back and forth between Sexp_Atom and whatever types it
> internally uses (i.e. serialization and deserialization procedures).

I would not do that.

> However the library would provide these procedures for the most common
> types (e.g. strings and numeric types).
>
> Though it looks like a fine Ada API (at least to my eyes), I have
> absolutely no idea about how to implement the library. How to define
> the application-opaque Sexp_Atom type?

It is not clear why do you need such a type. What are the properties of,
and what is it for, given the application need to convert it anyway?

As for raw memory addresses, it is possible to reinterpret them to any
desired type in Ada, as you would do it in C, provided you know what are
you doing. For this you can use so-called address-to-access conversion (see
RM 13.7.2) or placement attribute 'Address (see 13.3(11)).

> How to read Sexp_Atom objects from a file?

See stream I/O attributes of types (RM 13.13.2). Otherwise, standard design
patterns apply too. In any case you have to know the object's type in order
to write/read it.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Ludovic Brenta on 1 Aug 2010 12:01

Natacha Kerensikova <lithiumcat(a)gmail.com> writes:
> Hi,
>
> I'm trying to learn Ada, coming from a C background. The first thing I
> planned to code is a S-expression parser, because it's quite easy (at
> least in C, less than 1000 lines including comments and memory
> handling (dynamic arrays reinvented)) and very useful considering
> almost all my existing programs use S-expressions as a serialization
> format.
>
> To describe briefly S-expressions, I consider them to be the simplest
> existing data organization beyond raw sequences of bits. They are
> basically lists of elements, each element being either a list or an
> atom, and atoms being raw sequences of bits.
>
> While I'm still not deep enough into Ada to know how to represent the
> lists, I guess there won't be major issues, I think I can handle it
> myself (though pointers and hints would still be welcome).
>
> My question here is about how to represent the atoms in Ada. In C it
> was merely a void pointer and a size, but it seems more difficult in
> Ada because of the strong typing. Because it's up to the application
> to make sense (i.e. type) out of the raw sequences of bits in atoms,
> the S-expression library has to handle them as a sort of untyped
> memory chunk. Do you know of a way to handle that?
>
> Please correct me if I'm wrong, but my guess would be that the S-
> expression library would provide a Sexp_Atom type, which refers to the
> untyped memory chunks, and the application would have procedures to
> convert back and forth between Sexp_Atom and whatever types it
> internally uses (i.e. serialization and deserialization procedures).
> However the library would provide these procedures for the most common
> types (e.g. strings and numeric types).
>
> Though it looks like a fine Ada API (at least to my eyes), I have
> absolutely no idea about how to implement the library. How to define
> the application-opaque Sexp_Atom type? How to read Sexp_Atom objects
> from a file? How to build them from Ada types? How to write them back
> to disk?

In Ada, you normally model blobs with
System.Storage_Elements.Storage_Array; since arrays are first-class
citizens (as opposed to C's void pointers), you do not need to carry the
length of such an array separately. Thus, a naive approach might be:

type Sexp_Atom is access System.Storage_Elements.Storage_Array;
type Sexp;
type Sexp_Access is access Sexp;
type Sexp is record
Car : Sexp_Atom;
Cdr : Sexp_Access;
end record;

However, the purpose of S-Expressions being to be read and written as
text, a blob may not be the most appropriate; you might be better off
with simply:

type Sexp;
type Sexp_Access is access Sexp;
type Sexp is
Car : Ada.Strings.Unbounded.Unbounded_String;
Cdr : Sexp_Access;
Is_List : Boolean;
end record;

To write a sexp to disk and read back, you would leverage the Ada
streams as Dmitry pointed out.

You could then provide a generic package that serializes an arbitrary
type T back and forth to the unbounded_string.

--
Ludovic Brenta.

From: Natacha Kerensikova on 1 Aug 2010 13:35

On Aug 1, 2:53 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> On Sun, 1 Aug 2010 05:17:45 -0700 (PDT), Natacha Kerensikova wrote:
> > To describe briefly S-expressions, I consider them to be the simplest
> > existing data organization beyond raw sequences of bits. They are
> > basically lists of elements, each element being either a list or an
> > atom, and atoms being raw sequences of bits.
>
> So, it is a tree?

Yes, it can be thought of as a binary tree whose leaves are labeled
(the
atom being the label) and with different semantics for the left and
right
children. I'm unsure this represent correctly empty lists, but
otherwise
that's it.

> > My question here is about how to represent the atoms in Ada. In C it
> > was merely a void pointer and a size, but it seems more difficult in
> > Ada because of the strong typing. Because it's up to the application
> > to make sense (i.e. type) out of the raw sequences of bits in atoms,
>
> How can it make sense if the type is unknown? If the type is known, why not
> to state it?

Actually the type is deduced from the context, e.g.
(tcp-connect (host foo.example) (port 80))
We've got here five atoms and three lists, and the application reading
as a
configuration file would know the atom after "host" is a string to be
resolved and the atom after port is a number serialized in a decimal
representation. However this serialization of a number is an
application
choice (which I usually choose to make S-expressions text files) but
it could
have been serialized as a network-ordered 2-byte integer.

This means that as far as the S-expression library is concerned, the
byte
sequence read from the file is not typed. Its typing actually has to
be
delayed until the application has enough context to interpret it.
Hence
the need of a typeless chunk of data.

> > Though it looks like a fine Ada API (at least to my eyes), I have
> > absolutely no idea about how to implement the library. How to define
> > the application-opaque Sexp_Atom type?
>
> It is not clear why do you need such a type. What are the properties of,
> and what is it for, given the application need to convert it anyway?

Well, I imagined making such a new type to distinguish 'raw data
coming
from a S-expression to be interpreted' from other raw data types.
I thought this was the strong type safety to prevent (de)serialization
procedure from trying to interpret just any chunk of memory.

> As for raw memory addresses, it is possible to reinterpret them to any
> desired type in Ada, as you would do it in C, provided you know what are
> you doing. For this you can use so-called address-to-access conversion (see
> RM 13.7.2) or placement attribute 'Address (see 13.3(11)).

Ok, thanks for this references, I guess I'll find there how many
guarantees
there are on such raw memory inspection/interpretation. I've already
encountered
a few situations in C where the code looks very dangerous but where I
know
perfectly well what I'm doing (I also sort-of expected it would be
easier to
convince other people I really know what's going on and what I'm doing
in such
cases when the code is in Ada rather than in C).

From: Jeffrey Carter on 1 Aug 2010 14:25

On 08/01/2010 05:17 AM, Natacha Kerensikova wrote:
>
> To describe briefly S-expressions, I consider them to be the simplest
> existing data organization beyond raw sequences of bits. They are
> basically lists of elements, each element being either a list or an
> atom, and atoms being raw sequences of bits.

You might very well be able to use something like:

package Byte_Lists is new Ada.Containers.Vectors (Index_Type => Positive,
Element_Type => System.Storage_Elements.Storage_Element);

type Content_ID is (Atom, List);

type S_Expression;
type S_Expression_Ptr is access all S_Expression;

type S_Expression_Element (Content : Content_ID := Atom) is record
case Content is
when Atom =>
Byte : Byte_Lists.Vector;
when List =>
Ptr : S_Expression_Ptr;
end case;
end record;

package S_Expression_Lists is new Ada.Containers.Doubly_Linked_Lists
(Element_Type => S_Expression_Element);

type S_Expression is new S_Expression_Lists.List;

If you can use unbounded strings as Brenta suggested, instead of an unbounded
array of bytes (Storage_Element), then this would be even simpler.

--
Jeff Carter
"People called Romanes, they go the house?"
Monty Python's Life of Brian
79

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: GPRbuild compatibility
Next: Irony?