S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Shark8 on 13 Aug 2010 17:48

Sorry about the empty-reply earlier... I think I double-tapped the
reply/send button

Earlier you wrote this:
Consider these two objects:

L--A--tcp-connect
|
L--A--host
| |
| A--foo.example
|
L--A--port
|
A--80

and:

L--A--tcp-connect
|
L--L--A--host
| |
| A--foo.example
|
L--A--port
|
A--80

"I'm sure we can agree on the fact that these are two different and
non-
equivalent objects: not the same topology and not even the same number
of nodes. So I'd say a proper S-expression library has to be able to
deal with both of them without mixing them up."

These aren't valid representations of the LISP-like structure --
Remember that it [a list] is defined as a single item followed my a
list OR a single item alone. Therefore, *all* internal nodes in the
previous drawings should be A-L type and the leaf-nodes should be the
[terminal] A-type node.

> What I find surprising is that you seem to include type information in
> the Node type (which actually represent a S-expression atom, while S-
> expression nodes usually include lists too). Is it a typical Ada way
> of doing it?

I'm tempted to say 'yes' but I'm a newcomer to the language (from a
Pascal/Delphi background); so take that with a grain of salt. In my
opinion types are your friends {they give you information about what
your datum/variable *is*} and you shouldn't strip type-information
without good reason. That was my main reasoning for making the node-
type carry information about its contents; you *could* use arrays-of-
bytes/bits but, unless you need to, why?

> I find it surprising because the underlying S-expression format (as
> standardized by Rivest, I know almost nothing about cons pairs or
> lisp) cannot encoding that information. I would expect a S-expression
> object in memory to reflect only what can be encoded to and decoded
> from S-expression files.

My data-type can exactly represent any encoding that is in an S-
expression text-file [excepting an null-list], which by the LISP-
definition doesn't exist. An empty-list could be simulated with a
single-node containing the null-string "array (1..0) of Character." As
it stands you can think of my SExpression_type, when the List_Size
discriminant is /= 0, as being "everything within that [balanced]
parentheses-pair."

>
> In fact, one can consider S-expressions as a heterogeneous container,
> in that each atom can represent an object of any type (while losing
> the type information, which has to be retrieved from somewhere else),
> in contrast to thing like vectors, whose items are all of the same
> type. Does anybody know an example of heterogeneous container in Ada?
> I'm not sure how Ada generics can be leveraged in such a case.

You can simulate them, as I have done, by isolating "all the types it
could be" and making a variant-record type that ANY Ada container
could then hold. The "big disadvantage" is that the elements cannot
have the same name; that is to say that you cannot have a record with
an element named 'Data' which changes it's type on what discriminant
was passed.

> Another interesting point is that you chose arrays to represent S-
> expression lists. Your code seems currently unable to represent empty
> lists (which are legal at least in Rivest's S-expressions) but I guess
> I can't be too difficult to correct. But it raises the question of
> array vs linked list for a S-expression object implementation.

Well, I showed you how it could be simulated.

> I'm not sure S-expressions would ever be used in a time-critical
> application, so the usual argument of cache-friendliness of arrays is
> pretty weak here. S-expressions are meant to be used for I/O where the
> bottleneck is most likely to be.

*nod* - I didn't choose arrays for I/O cache at all; I chose them
because the number of elements is known [we've either parsed it into
memory or are building it in-place], and that all elements of that
list would have the same type -- basically a non-null pointer to some
other SExpression_type --and that SExpression could be either a list
itself or a terminal node. Add to that that Ada has nice array
manipulation facilities and I think that's enough justification to use
them.

> Ada arrays are statically sized, which can be an issue for parsing,
> because S-expression format doesn't encode list length,

That can be handled by the parser; after all it has to figure out list
lengths in order to produce the list-object, right?
Using the streams provided is a way to cut out the parser altogether,
if that's an issue, and you can just load the SExpressions directly...
think of it as being able to save/load the parse-structure that the
GCC back-end takes, you would then have a way to "compile" a program
without having to parse its source code again. {Not a very useful
feature when the program text is oft changed and would have to be re-
parsed, but for a configuration file which might remain unchanged over
the life of the application it's attendant to the story's a bit
different.}

> so a list has
> to be built by pushing nodes one after the other until encountering
> the end-of-list marker.

You're confusing parsing the list with the finished/internal list-
structure itself, I think.
Parsing is just a way of saying "I'm reading this [in order to produce
this structure]."

> But I think that should be easily worked
> around by using vectors for unfinished S-expression lists, and once
> the list is completely parsed build the array from the vector, and
> reset the vector for further list parsing.

Sure you could do that too. The Append & Prepend procedures do just
that -- without involving a vector as an intermediate.

> However for S-expression objects dynamically created into memory, the
> static size of arrays might make some operations much less efficient
> than with linked lists.

I doubt that; like you just mentioned you could wait until the entire
list has been read into-memory before converting it to a static array.

> Any idea I'm missing about this implementation choice? Would it be
> worth to try and make such a package generic over the container type
> (array vs linked list vs something else?)?

I don't know; like I said, I'm a newcomer to Ada and while I really
like it and the underlying ideology it would be foolish of me to claim
that I have the deep-understanding of a lot of the nuances.

> Thanks for your code example,
> Natacha

You're welcome.

From: Jeffrey R. Carter on 13 Aug 2010 18:53

On 08/13/2010 02:32 AM, Natacha Kerensikova wrote:
>
> When I wrote this I couldn't think of any way to write clearer or
> simpler code with any of the two proposed packages (stream or memory
> based), because of the basic argument that 8 nodes have to be somehow
> created anyway and couldn't think of any way of doing that except by
> creating them one by one (be it with a sequence of procedure calls or
> a bunch of nested function calls). So when complexity and readability
> are equal, I go for the least amount of dependencies.
>
> Of course finding a way to make S-expression building much clearer
> with a given interface would be a huge argument in favor of the
> interface, no matter its level.
>
> Unless I missed something important, it looks like it only moves the
> problem around. While a To_S_Expression function does make a
> TCP_Info'Write simple and one-operation and all, complexity is only
> transferred to To_S_Expression which will still have to do the dirty
> job of creating 8 nodes.

Sure, but it will be hidden and reused, rather than appearing frequently.

Let's see if I can make what I'm talking about clear. We want to build an
application that does something. It will need to manipulate widgets to do so, so
we package the widget manipulation:

with Ada.Text_IO;

package Widgets is
type Widget is ...;

procedure Put (File : in Ada.Text_IO.File_Type; Item : in Widget);
function Get (File : in Ada.Text_IO.File_Type) return Widget;
end Widgets;

Now we can go off and work on the rest of the application. Eventually we'll need
to implement the body of Widgets. We decide that we want use S-expressions for
the external storage syntax, so we'll need (or already have) a low-level
S-expression pkg:

with Ada.Text_IO;

package Sexps is
type Atom is private;

-- Operations to convert things to/from Atoms.

type Sexp is private; -- An Atom or a list of Sexp.

function To_Sexp (From : in Atom);

function Empty_List return Sexp;

Not_A_List : exception;

function Append (To : in Sexp; Item : in Sexp) return Sexp;
-- Appends Item to the list To. Raises Not_A_List is To is not a list.

procedure Put (File : in Ada.Text_IO.File_Type; Item : in Sexp);
function Get (File : in Ada.Text_IO.File_Type) return Sexp;

-- Operations to extract information from a Sexp.
-- Other declarations as needed.
end Sexps;

I've only listed the operations to construct a Sexp and for I/O. This is
low-level and somewhat messy to use, but would never be accessed directly from
the code that implements the functionality of the application.

Now we can implement the body of Widgets:

with Sexps;

package body Widgets is
function To_Sexp (From : in Widget) return Sexps.Sexp is separate;
function To_Widget (From : in Sexps.Sexp) return Widget is separate;
-- These are left as an exercise for the reader :)

procedure Put (File : in Ada.Text_IO.File_Type; Item : in Widget) is
-- null;
begin -- Put
Sexps.Put (File => File, Item => To_Sexp (Item) );
end Put;

function Get (File : in Ada.Text_IO.File_Type) return Widget is
-- null;
begin -- Get
return To_Widget (Sexps.Get (File) );
end Get;
end Widgets;

The body of Sexps will be messy, but it will be written once and mostly ignored
from then on. The bodies of To_Sexp and To_Widget will be sort of messy, but
will be written once (for each type of interest) and mostly ignored from then
on. The code that implements the functionality of the application will be clear.

> Yes, I have not managed yet to get that habit. I try to use a
> Underscore_And_Capitalization style (not sure whether it's the usual
> Ada idiom or not), but sometimes in the heat of action I forgot to do
> it (funny thing that identifier that slip end up to be camel cased,
> while my C habit is underscore and lower case).

Initial_Caps is the Ada convention. Your C convention would be better than
CamelCase.

>
>>>> Your TCP_Info-handling pkg would convert the record into an S-expression, and
>>>> call a single operation from your S-expression pkg to output the S-expression.
>
> What I found tricky is the "single operation" part. Building 8 nodes
> in a single operation does look very difficult, and while Ludovic's
> trick of building them from an encoded string is nice, it makes we
> wonder (again) about the point of building a S-expression object
> before writing it while it's simpler and clearer to write strings
> containing hand-encoded S-expressions.

"Single operation" referred to output of an S-expression. If you can write this
a sequence of low-level steps, then you can put those steps in a higher-level
procedure which becomes a single operation to output an S-expression.

Converting something into an S-expression is similar. Leaving aside
considerations of limitedness, anything you can build up in a series of steps in
Ada can be wrapped in a function that returns the thing after it has been built;
this function is then a single operation.

Using the Sexps pkg I sketched earlier, one can build an arbitrary S-expression
in a single statement with lots of calls to Empty_List, To_Atom, and Append. But
one could also build it up in steps using intermediaries. Either way would serve
as an implementation of Widgets.To_Sexp; To_Sexp is a single operation to
convert a Widget to an S-expression.

HTH.

--
Jeff Carter
"Have you gone berserk? Can't you see that that man is a ni?"
Blazing Saddles
38

From: Randy Brukardt on 13 Aug 2010 20:52

"Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> wrote in message
news:ayxycml2mo60$.1g3uh1fg6m3r0.dlg(a)40tude.net...
> On Thu, 12 Aug 2010 15:56:41 -0500, Randy Brukardt wrote:
....
>> Octets don't have a sum, "sum of octets" is meaningless. It's like
>> talking
>> about the "sum of sand". They're just buckets.
>
> The content of the bucket does have sum. Compare it with the elements an
> integer array. According to the logic since array elements are buckets you
> cannot sum integer elements.

By "bucket" here I mean a semantic-less collection of bits. Integer elements
can be summed because they are integers. Buckets of bits don't make sense to
be summed.

>> (By "octet" here I mean the
>> same concept that Ada calls Stream_Elements, other than with a fixed
>> size.)
>
> 1. This is not the way octets are used in communication applications.

No idea. I use Stream_Elements the way I describe here in *my* applications,
but YMMV.

> 2. If the idea of opaque container is pushed to its end, octet must have
> no
> "and", "or", bit extraction operations either. They also presume some way
> octets can be manipulated and interact with other types (like Boolean).

Correct, an octet should have no operations whatsoever. It's just a bucket
of bits for transport; it needs to be converted to something real to use it,
explicitly so it is obvious what is going on.

Besides, I don't believe in bit operations groups of bits in the first
place; they should be restricted to Boolean logic. (Ada 83 got this right,
IMHO.)

There is of course a certain expediency in copying bad ideas from other
languages like C; that's whats going on with Ada modular types. But the
whole thing is a bad idea from the start to the end.

>> And if you do that, you now
>> have a lot more chances for error (such as adding octets that are used to
>> hold character values and not integers).
>
> Yes, when implementing a communication protocol, there is no such things
> as
> characters or integers, only octets. This is independent on the octet
> operations. Compare it with address arithmetic. You can sum address of a
> task with the address of an employee record. Does it mean that there has
> to
> be no address arithmetic?

I'm a radical on this, too. There should be no address arithmetic in
programming languages; leave that to us compiler-writers, we can do a better
(and much safer) job. Exposing that low-level stuff is a recipe for
disaster.

(Again, I realize that there are times, mostly in interfaces to low-level
languages and hardware, where you need such things. But this is totally
outside of the realm of anything that can be described with sane typing; it
makes no sense to even try.)

>> Octets are by their nature an
>> untyped bucket with no semantics; there has to be a conversion operation
>> (like "#" above or S'Read in Ada) to some type with semantics for it to
>> be
>> meaningful.
>
> No, that is impossible, because 1) there is no such conversion in almost
> any case. Other objects are represented by collections of octets
> interlinked in a very complex way. A simple conversion operation is
> absolutely unsuitable abstraction for this. 2) You are talking about an
> interface, (where however there should be no octets visible at all, but
> streams, files etc), I am talking about implementation of such an
> interface. There should be no conversions at any abstraction level.

I don't have any idea of what you are talking about; it makes no sense to me
at all. As you say, you should never have octets visible in the first place
above the transport layer. And in that layer, the only thing that makes
sense is a conversion to a meaningful type (S'Read is just a fancy name for
Unchecked_Conversion, after all).

>>>> That's the point: in Ada, type conversions are *not* functions,
they're a
>>>> built-in gizmo [including some attributed]; by naming them "#" we would
>>>> allow unifying them with functions.
>>>
>>> I would prefer to eliminate them altogether. Conversions are always bad.
>>
>> I strongly disagree. Some things are best modeled with little or no
>> explicit
>> semantics (such as a raw stream), and you must have conversions to get to
>> real semantics. Indeed, *not* having using conversions in that case is
>> misleading; you're applying improper semantics to the operation.
>
> type Text_Stream is new Raw_Stream with private;
>
> What is wrong with that?

It's applying misleading semantics to an entity; there can be no stream
other than a raw stream; even if the stream carries typing information, that
information has to be converted and interpreted before it can be used.

I don't think this discussion is going anywhere anyway.

Randy.

From: Randy Brukardt on 13 Aug 2010 20:57

"Jeffrey Carter" <spam.jrcarter.not(a)spam.not.acm.org> wrote in message
news:i4463s$hus$1(a)adenine.netfront.net...
....
> Both seem to be standards. {} is used in the ARM, Annex P, with its
> standard meaning of "zero or more", which is why I used it, and am
> surprised Brenta didn't understand it.

The grammar notation of Ada is described in section 1.1.4
(http://www.adaic.com/standards/05rm/html/RM-1-1-4.html). One rather
presumes that Ada programmers are at least somewhat familar with it, so it
makes a good choice to use for grammars on this newgroup.

Randy.

From: Yannick Duchêne (Hibou57) on 13 Aug 2010 21:02

Le Sat, 07 Aug 2010 09:23:01 +0200, Natacha Kerensikova
<lithiumcat(a)gmail.com> a Ã©crit:
> I think text-based format is very useful when the file has to be dealt
> with by both humans and programs. The typical example would be
> configuration files: read and written by humans, and used by the
> program. And that's where I believe XML is really poor, because it's
> too heavy for human use.
I use XML near to every day, I use to write in the raw text editor named
PSPad, without any troubles. A lot of people write HTML (which like XML is
a kind of SGML), for years some people wrote in Docbook document modal,
which again come from SGML. XML can be as simple as S-Expressions are,
providing you avoid using attributes (if you really feel you do not need
such a thing). If you feel XML is too much heavy because it requires close
tag, then just think about it as the Ada's âend ifâ, âend caseâ, âend
loopâ and so on. These closes tags enforce human readability and help
reliability in the sence that one always know what is starting and ending,
and a close-tag missmatch help to detect an error was done. These all miss
from S-Expression (do you know what LISP was named after by the way ? ;) )

I you still feel XML is too much heavy, then just read this :
http://quoderat.megginson.com/2007/01/03/all-markup-ends-up-looking-like-xml/

This demonstrate that as things goes, none of XML, LISP expressions or
JSon, none is more simple than the other, and LISP expressions or JSon
serializations just looks simple on very simple cases. When thing goes,
this is another story.

Did not wanted to make you change your mind (this idea is far from my
though), rather wanted to make you see there is no way to assert XML is
too much heavy of LISP expressions are more human readable (I have never
seen a readable LISP program by the way... I feel this is maid to be read
by machines, not human).

--
There is even better than a pragma Assert: a SPARK --# check.
--# check C and WhoKnowWhat and YouKnowWho;
--# assert Ada;
-- i.e. forget about previous premises which leads to conclusion
-- and start with new conclusion as premise.

First | Prev |
Pages: 23 24 25 26 27 28 29 30 31 32 33
Prev: GPRbuild compatibility
Next: Irony?