S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Natacha Kerensikova on 8 Aug 2010 09:11

On Aug 8, 8:52 am, Jeffrey Carter <spam.jrcarter....(a)spam.not.acm.org>
wrote:
> On 08/07/2010 10:01 AM, Natacha Kerensikova wrote:
> > Here is how I intended to do it, admittedly exactly like I would do it
> > in C, could you please tell me how far I am from the Ada approach?
>
> It's hard to comment meaningfully, since you mostly describe your intended
> implementation, not your requirements.

Well actually the requirements were presented in the beginning: I've
got a S-expression configuration file, a directory of static files, a
directory of S-expression templates and a directory of S-expression
data to populate the templates. And I want to respond to HTTP request
with either a static file or an expanded template. Now I might
misunderstanding the word "requirements", but what I actually consider
as requirements is the above without any "S-expression" occurrence,
the rest being implementation choices. "S-expression" might be part of
the requirements to use existing data, but then again another format
can be chosen provided a converted from S-expression is also coded.

I then proceeded to propose an implementation fitting these
requirements, which how I would do stuff, and asking how far it is
from a typical Ada-style implementation.

> I'd use Ada Web Server (AWS), and perhaps you should try that, too. Using a
> significant, existing, high-level Ada application framework like that might help
> introduce you to how some experienced Ada people thought this kind of thing
> should be approached.

Thanks for the pointer, however I'm already afraid learning only Ada
is too much for me (I've tried with C++ and it's a language much too
complex to fit in my head (especially the STL), and I hope Ada is
simple enough), learning both Ada and AWS at the same time will
probably be way too much. However I haven't decided yet whether I will
actually begin with this project, I might try to code something
simpler before that (which will also need S-expressions, hence my
beginning with that library). In that case, I might be confident
enough in my Ada before starting the web project, and then go for AWS.

> A "web server" can be a variety of things, from a simple page server that serves
> static files to a highly-dynamic system generating everything on the fly. It
> appears that you intend something that serves static files and expanded page
> templates.
>
> Initially, I'd observe that the system talks to the network and to the permanent
> storage that stores the configuration information, static pages, and so on. So
> my initial decomposition would identify interface modules for communicating with
> these. (This is an "edges-in" approach.)
>
> At a higher level, there is something that responds to incoming requests to
> serve the appropriate responses. There's something this uses to obtain the
> configuration from the permanent storage. This could make use of something that
> can serve a static page and something that can serve an expanded page template.
> There's also clearly a place for something that expands a page template.
>
> I'm doing this off the top of my head, so I won't be surprised if I've missed
> something or otherwise screwed up.

Actually, that's very similar to what I thought too. It's just that
I've already thought so much about this project that I'm further down
the road, and I tried to fit everything in the one-dimension of a
text.

That's actually a pretty strange thing I've never encountered yet in
other coders I know, I first spend a lot of time thinking before
writing the first line of code (or that particular case, before even
managing to install the compiler). I also spend much less time
debugging, though I can't tell whether they are related or whether one
of them is more efficient than the other.

> This identifies the major high-level modules in the system. I could now define
> the package specifications for them and have the compiler check that they are
> syntactically correct and semantically consistent. Then I could pick one and
> design its implementation.

I have done this a few times, however C compilation being weaker than
Ada's (from what I understood), knowing that something in C compiles
isn't that useful. Hence my stress on tests, which require an
implementation of every dependency. Hence my tendency to start with
lower-level code.

Moreover, I have quite a few times (though less often recently, it
might a beginner thing) realized during implementation that the
specification is poorly chosen. The less higher-level components are
already written, the cheaper interface redesign is.

> It's likely at some point tasking would be involved,
> allowing the processing of multiple requests at once, so this would all have to
> be done keeping concurrency in mind.

I'm unsure about this. In C I usually use a socket multiplexer call
(poll() or select()) along with memory buffers, which allows to serve
simultaneously multiple requests in a single thread. It scales
differently than a thread-based approach, but I'm nowhere near the
amount of traffic where it matters, so going for the simplest might be
the best.

Moreover, as the multiplexing might end up being one package (or being
integrated to the networking package, I don't know enough yet), there
is only one place to change should I want to switch between pure-
threaded (like apache), pure-multiplexed (like lighttpd), or a mixed
implementation (like nginx). Resources are largely independent and
read-only, so making everything thread-safe shouldn't be that
difficult anyway.

> At some point I'd get to a low enough level to start thinking about
> representations, which seems to be where you begin your thinking about the problem.

Actually I have already thought a lot before, I just didn't feel the
need of Ada-specific advice before thinking about the actual low-level
implementation.

> > As I'm more comfortable using components already coded and tested, I
> > would code them from the lowest to the highest level:
>
> In Ada, one can create package specifications, then create other units that make
> use of those specifications before they are implemented. This is an important
> concept in Ada called the separation of specification and body. Sometimes it is
> useful to create stub bodies for such packages, which can then be used to test
> the units that make use of these packages. Thus it is often possible to
> implement and test higher-level modules before lower-level modules that they use
> have been implemented. This may not be especially useful on a single-person
> project, but can be quite valuable in projects with more than one developer.
> This often seems to be a foreign concept to those used to C.

As I said, I have the feeling this is very close to what I'm already
doing in C, except that you don't get very far with stub in C, because
the C compiler doesn't prevent many errors beyond typos.

> While your approach seems quite different to mine, many aspects of the final
> result seem to be similar. This probably bodes well for you being able to use
> Ada effectively.

Unless I'm very misunderstanding and/or vary naive, I've got the
feeling our approaches are not that different, and differ mostly in
the order of implementation, which indeed doesn't change that much in
the final result.

I'm glad my future in Ada looks well. I'm still afraid of its
complexity, and of how intrusive the standard library is (e.g. DS is
very limited in memory, as much useless (and maybe not-so-unless)
stuff as possible should be trimmed away).

Thanks for your insights,
Natacha

From: Natacha Kerensikova on 8 Aug 2010 09:49

On Aug 8, 3:01 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> And how are you going to make any assumptions at the level of raw bytes?

I'm not, hence the byte-sequence being general-purpose using my
definition. (However one could split hair by saying that at the level
of raw bytes you're making assumptions about the number and endianness
of bits in each byte.)

S-expressions on the other hand are slightly less general-purpose in
that they contain the assumption that data is organized on the leaves
of a binary tree.

The more specialized the format, the more assumptions on the contained
data. Right?

> > In the Get procedure from your last post, you don't seem to make that
> > much difference between a binary byte and a Character.
>
> No I do. But you have defined it as a text file. A streamed text file is a
> sequence of Character items.

Actually, I didn't. I only defined it as a bunch of byte sequences
organized in a certain way.

The fact I usually choose a text representation of S-expressions is
purely a personal choice (motivated by the power of existing text
utilities like grep and sed), but I've never written a S-expression
library assuming it will deal with texts. The canonical form of a S-
expression, where atoms are embedded verbatim, is as binary as one can
get.

> > I would seem
> > Ada Strings are also very similar to byte sequences/arrays.
>
> I remember a machine where char was 32-bit long.

I've often wanted to get access to one of those PDP with 9-bit bytes,
just to further check my C programs.

> Byte, octet, character are three different things (and code point is a
> fourth).

I know very well these differences, except octet vs character,
especially considering Ada's definition of a Character. Or is it only
that Character is an enumeration while octet is a modular integer?

This leads to a question I had in mind since quite early in the
thread, should I really use an array of Storage_Element, while S-
expression standard considers only sequences of octets?

> > I just meant that judging format only from its
> > relatively heavy use of parenthesis is about as silly as judging
> > skills of a person only from the amount of melanin in their skin.
>
> The amount of melanin is unrelated to the virtues we count in human beings.
> An excessive need in indistinguishable brackets would definitely reduce
> readability.

Of course, this the same issue as curly brackets in C. My opinion
being that those brackets are not meant to be read by humans, only by
the compiler. Indentation is supposed to provide the same information
to humans while being ignored by the compiler. I apply the same rule
to S-expressions. Don't you think one should at least have a serious
look at a file before freaking out and calling it unreadable?

> > That's because some atom types are only known after having examined
> > other atoms. I you remember my example (tcp-connect (host foo.example)
> > (port 80)), here is how would it be interpreted: from the context or
> > initial state, we expect a list beginning with a atom which is a
> > string describing what to with whatever is after. "tcp-connect" is
> > therefore interpreted as a string, from the string value we know the
> > following is a list of settings,
>
> Once you matched "tcp-connect", you know all the types of the following
> components.

Unfortunately, you know "80" is a 16-bit integer only after having
matched "port".

> > This is not always the
> > case, for example it might be necessary to build an associative array
> > from a list of list before being able to know the type of non-head
> > atoms,
>
> What for? Even if such cases might be invented, I see no reason to do that.
> It is difficult to parse, it is difficult to read. So why to mess with?

For example, you might have a sub-S-expression describing a seldom
used object that is expensive to build, wouldn't you want to be sure
you actually need it before building it?

Thanks for the discussion,
Natacha

From: Natacha Kerensikova on 8 Aug 2010 10:05

On Aug 8, 12:26 pm, Simon Wright <si...(a)pushface.org> wrote:
> I'd disagree with Jeffrey here.
>
> Nothing wrong with stating at the bottom! especially when you already
> know that the component you're looking at is likely to be useful and to
> fit into *your* way of thinking about things. Your plan already has
> higher-level abstractions, so that if you get to the next layer up and
> want to change your lowest layer (if only for experimentation's sake)
> you will be able to do so.

Thanks a lot for the support \o/

> Lots of people round here are responsible for component libraries at
> various levels of abstraction which they've developed for their own
> purposes and then pushed out to the community in the hope they'll help
> someone else.

I indeed planned to share such a library (assuming I actually write
and finish it), should a generous soul accept to review it. However I
have long lost the hope of seeing my S-expression stuff used, I guess
I can't win against lisp-trauma.

> The only caution I'd add is that, at some point, when you're reading an
> S-expression from an external file and you expect the next 4 bytes to
> contain an integer in binary little-endian format, you're going to have
> to trust _something_ to have got it right; if you wrote "*(struct foo
> **)((char *)whatever + bar.offset)" in C you will have to write the
> equivalent in Ada. Unless you were going to invent a sort of checked
> S-expression? (I don't get that impression!)

Actually that ugly C expression is not a part of my S-expression code.
It's part of a generic self-balancing binary tree interface, supposed
to allow any algorithm as a back end. Because algorithms store
different data, I can't make assumption about the position of children
or balancing data inside the node structure. Therefore I allow the
back-end to provide the offset from the node structure where the
generic tree code can find stuff it needs. Here "whatever" is a void
pointer, pointing to the beginning of the node structure; "bar" is a
structure provided by the back-end; char* cast is needed to perform
byte-based pointer arithmetic, and then the struct foo** cast back to
the real type of the required element.

While I know perfectly what I'm doing with this, I guess it's not
obvious for the majority of C coders. My hope with Ada is that I
wouldn't ever need to write such dubious expressions. In that
particular case, I'm working around C's lack of generics, so it
shouldn't be a problem to express this in Ada. And should I ever need
to write dubious expressions, hope the Ada context would give me the
benefit of doubt long enough to have people read the documents or the
comments and understand there was no other choice; while in C people
wouldn't go further and just label my code and me as ugly and
dangerous.

Regarding the integer encoded as 4 little-endian bytes, I believe it's
pretty safe because S-expression atom are of known length, so if the
length is different than 4 bytes I know there is a problem, and
otherwise I need other constrains on the integer to know whether it's
valid or not. In any case, it doesn't disrupt the reading or
interpretation of the S-expression beyond that particular integer
value.

Thanks again for your support,
Natacha

From: Duke Normandin on 8 Aug 2010 10:08

On 2010-08-08, Natacha Kerensikova <lithiumcat(a)gmail.com> wrote:

>
> Is it clearer now?

GO! Natacha! GO! -- Natacha ROCKS! -- GO! Natacha! GO!

;D

I've _now_ decided to go out and learn _all_ about S-Expressions. ;)
--
Duke

From: Dmitry A. Kazakov on 8 Aug 2010 11:15

On Sun, 8 Aug 2010 06:49:09 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 8, 3:01�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:

> The more specialized the format, the more assumptions on the contained
> data. Right?

Yes if the specialization addresses the encoded entities. No if it does the
medium.

>>> In the Get procedure from your last post, you don't seem to make that
>>> much difference between a binary byte and a Character.
>>
>> No I do. But you have defined it as a text file. A streamed text file is a
>> sequence of Character items.
>
> Actually, I didn't. I only defined it as a bunch of byte sequences
> organized in a certain way.

I see. This confuses things even more. Why should I represent anything as a
byte sequence? It already is, and in 90% cases I just don't care how the
compiler does that. Why to convert byte sequences into other sequences and
then into a text file. It just does not make sense to me. Any conversion
must be accompanied by moving the thing from one medium to another.
Otherwise it is wasting.

>> Byte, octet, character are three different things (and code point is a
>> fourth).
>
> I know very well these differences, except octet vs character,
> especially considering Ada's definition of a Character. Or is it only
> that Character is an enumeration while octet is a modular integer?

The difference is that Character represents code points and octet does
atomic arrays of 8 bits.

> This leads to a question I had in mind since quite early in the
> thread, should I really use an array of Storage_Element, while S-
> expression standard considers only sequences of octets?

That depends on what are you going to do. Storage_Element is a
machine-dependent addressable memory unit. Octet is a machine independent
presentation layer unit, a thing of 256 independent states. Yes
incidentally Character has 256 code points.

> Don't you think one should at least have a serious
> look at a file before freaking out and calling it unreadable?

There are well-known things which do not require reconsidering. Curly or
round brackets aren't bad because of they form. They are because of
excessive overloading: the closing brackets of a loop, aggregate, block etc
are indistinguishable in C. Further in C you have brackets where there none
needed and don't have them where they should be. This do apply to LISP and
S-expressions.

>>> That's because some atom types are only known after having examined
>>> other atoms. I you remember my example (tcp-connect (host foo.example)
>>> (port 80)), here is how would it be interpreted: from the context or
>>> initial state, we expect a list beginning with a atom which is a
>>> string describing what to with whatever is after. "tcp-connect" is
>>> therefore interpreted as a string, from the string value we know the
>>> following is a list of settings,
>>
>> Once you matched "tcp-connect", you know all the types of the following
>> components.
>
> Unfortunately, you know "80" is a 16-bit integer only after having
> matched "port".

Nope, we certainly know that each TCP connection needs a port. There is
nothing to resolve since the notation is not reverse. Parse it top down, it
is simple, it is safe, it allows excellent diagnostics, it works.

>>> This is not always the
>>> case, for example it might be necessary to build an associative array
>>> from a list of list before being able to know the type of non-head
>>> atoms,
>>
>> What for? Even if such cases might be invented, I see no reason to do that.
>> It is difficult to parse, it is difficult to read. So why to mess with?
>
> For example, you might have a sub-S-expression describing a seldom
> used object that is expensive to build, wouldn't you want to be sure
> you actually need it before building it?

See above, if you parse top down, you know if you need that object before
begin. Then having a bracketed structure, it is trivial to skip the
object's description without construction. Just count brackets.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: GPRbuild compatibility
Next: Irony?