S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Natacha Kerensikova on 7 Aug 2010 08:56

On Aug 7, 10:39 am, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> One cannot judge a format without knowing what is the purpose of. Most of
> the formats like S-expressions are purposeless, in the sense that there is
> no *rational* purpose behind them. As you wrote above, it is either legacy
> (we have to overcome some limitations of some other poorly designed
> components of the system) or personal preferences (some people like angle
> brackets others do curly ones).

Why can't there be general-purpose format, just like there are general-
purpose programming languages?

Can we at least agree on the fact that a sequence of bytes is a
general-purpose format, widely used for storing and transmitting data?
(this question is just a matter of vocabulary)

Now byte sequences are a very crude format, because it doesn't have
any semantics besides what the application specifically put into it.

So let's add as few semantics as possible, to keep as much generality
as possible. We end up with a bunch of byte sequences, whose semantics
are still left to the application, linked together by some kind of
semantic link. When the chosen links are "brother of" and "sublist of"
you get exactly S-expressions. The almost-RFC I linked is only this
definition along with a standardization of how to serialize the links
and the bytes sequences.

This is undoubtedly still a crude format. You might argue it's useless
to add so little semantics on top of byte sequences, and that
serialization should be engineered only when you what you are about to
serialize, i.e. make a much larger leap between byte sequences and
meaningful objects. I might even agree on a philosophical point of
view.

However from a purely practical point of view, and using the fact that
in my background languages (C and 386 asm) bytes sequences and strings
are so similar, these crude semantics are all I need (or at least, all
I've ever needed so far). Now if we agree that simplicity is a
desirable quality (because it leads to less bugs, more productivity,
etc), I still fail to see the issues of such a format.

When I mentioned earlier flexibility as a strong point for S-
expressions, I meant that just like byte sequences, they can
accommodate whatever purpose you might want to put on top of them.

Now regarding personal preferences about braces, I have to admit I'm
always shocked at the way so many people dismiss S-expressions on
first sight because of the LISP-looking parentheses. I'm very glad I
can have a higher-level conversation about them.

> > The other application is actual serialization,
>
> That should not be a text.

I tend to agree with that as a generality, however I believe some
particular cases might benefit from a text-based serializations, in
order to harness the power of existing text-based tools.

> >>> But now that I think about it, I'm wondering whether I'm stuck in my C
> >>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
> >>> of storing structured data in a text-based way?
>
> >> I think yes. Though it is not Ada-specific, rather commonly used OOP design
> >> patterns.
>
> > I heard people claiming that the first language shapes the mind of
> > coders (and they continue saying a whole generation of programmers has
> > been mind-crippled by BASIC). My first language happened to be 386
> > assembly, that might explain things.
>
> I see where mixing abstraction layers comes from...

Could you please point me where I am mixing what? I genuinely want to
learn, but I just don't understand what you're referring to.

> > Anyway, I genuinely tried OOP
> > with C++ (which I dropped because it's way too complex for me (and I'm
> > tempted to say way too complex for the average coder, it should be
> > reserved to the few geniuses actually able to fully master it)), but I
> > never felt the need of anything beyond what can be done with a C
> > struct containing function pointers.
>
> Everything is Turing-complete you know... (:-))

I should know, I was accessing from assembly some (DirectX) C++
objects' vtable array before I knew anything about OOP.

My point is, most of my (currently non-OOP) code can be expressed as
well in an OOP style. When I defined a C structure along with a bunch
of functions that perform operations on it, I'm conceptually defining
a class and its methods, only expressed in a non-OOP language. I
sometimes put function pointers in the structure, to have a form of
dynamic dispatch or virtual methods. I occasionally even leave the
structure declared but undefined publicly, to hide internals (do call
that encapsulation?), along with functions that could well be called
accessors and mutators. In my opinion that doesn't count as OOP
because it doesn't use OOP-specific features like inheritance.

> > The
> > problem is, I just can't manage to imagine how to go in a single step
> > from the byte sequence containing a S-expression describing multiple
> > objects to the internal memory representation and vice-versa.
>
> You need not, that is the power of OOP you dislike so much.

I don't dislike at all. I just seldom think that way. I've met people
who saw objects everywhere, while I tend to see bunch of bits
everywhere. As a part of demoscene said when I learned programming,
"100% asm, a way of life".

> Consider each
> object knows how to construct itself from a stream of octets. It is trivial
> to simple objects like number. E.g. you read until the octets are '0'..'9'
> and generate the result interpreting it as a decimal representation. Or you
> take four octets and treat them as big-endian binary representation etc.
> For a container type, you call the constructors for each container member
> in order. If the container is unbounded, e.g. has variable length, you read
> its bounds first or you use some terminator in the stream to mark the
> container end. For containers of dynamically typed elements you must learn
> the component type before you construct it.

That's exactly how I use S-expressions, except that instead of
starting from a stream of octets, I start from an array of octets
(whose length is known), but if I understood correctly that doesn't
change your point. And the reason why I started this thread is only to
know how to buffer into memory the arrays of octets, because I need
(in my usual use of S-expressions) to resolve the links between atoms
before I can know the type of atoms. So I need a way to delay the
typing, and in the meantime handle data as a generic byte sequence
whose only known information is its size and its place in the S-
expression tree. What exactly is so bad with that approach?

I hope I don't bother you too much with my noobity and my will to
understand,
Natacha

From: Dmitry A. Kazakov on 7 Aug 2010 10:23

On Sat, 7 Aug 2010 05:56:50 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 7, 10:39�am, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> One cannot judge a format without knowing what is the purpose of. Most of
>> the formats like S-expressions are purposeless, in the sense that there is
>> no *rational* purpose behind them. As you wrote above, it is either legacy
>> (we have to overcome some limitations of some other poorly designed
>> components of the system) or personal preferences (some people like angle
>> brackets others do curly ones).
>
> Why can't there be general-purpose format, just like there are general-
> purpose programming languages?

An interesting question. I would say no, there cannot be such formats. Any
presentation format is of course a language. The only difference to the
true programming languages is in complexity, any maybe in a tendency to
being declarative rather than imperative. There are border cases like
Postscript, which IMO illustrate the point, more general purpose it has to
be, less "format" it would become.

> Can we at least agree on the fact that a sequence of bytes is a
> general-purpose format, widely used for storing and transmitting data?
> (this question is just a matter of vocabulary)

I don't think so. Namely it don't think that "general" is a synonym to
"completeness." It is rather about the abstraction level under the
condition of completeness.

> So let's add as few semantics as possible, to keep as much generality
> as possible. We end up with a bunch of byte sequences, whose semantics
> are still left to the application, linked together by some kind of
> semantic link. When the chosen links are "brother of" and "sublist of"
> you get exactly S-expressions.

Yes, the language of S-expressions is about hierarchical structures of
elements lacking any semantics.

I see no purpose such descriptions. But this is a very old and bearded
issue. The same question arise when it is discussed why RDBMS are so
boring. For the same reason: a naked structure, be it relational,
hierarchical, whichever, is useless without the semantics. The semantics
when dealt with, is capable to catch such simple relationships as "sibling"
with no efforts. DB people believe that one could bridge the gap and
somehow come to the semantics from the structure's side. Translated into
your S-expressions, it is by putting a proper pattern of opening and
closing brackets one could describe everything...

> However from a purely practical point of view, and using the fact that
> in my background languages (C and 386 asm) bytes sequences and strings
> are so similar, these crude semantics are all I need (or at least, all
> I've ever needed so far).

Lower you descend down the abstraction levels less differences you see.
Everything is a bunch of transistors...

> Now if we agree that simplicity is a
> desirable quality (because it leads to less bugs, more productivity,
> etc), I still fail to see the issues of such a format.

Programs in 386 Assembler are sufficiently more complex than programs in
Ada. Simplicity of nature by no means implies simplicity of use.

> Now regarding personal preferences about braces, I have to admit I'm
> always shocked at the way so many people dismiss S-expressions on
> first sight because of the LISP-looking parentheses.

Do you mean LISP does not deserve its fame? (:-))

>>>>> But now that I think about it, I'm wondering whether I'm stuck in my C
>>>>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
>>>>> of storing structured data in a text-based way?
>>
>>>> I think yes. Though it is not Ada-specific, rather commonly used OOP design
>>>> patterns.
>>
>>> I heard people claiming that the first language shapes the mind of
>>> coders (and they continue saying a whole generation of programmers has
>>> been mind-crippled by BASIC). My first language happened to be 386
>>> assembly, that might explain things.
>>
>> I see where mixing abstraction layers comes from...
>
> Could you please point me where I am mixing what?

Encoding, representation, states, behavior, values, objects, everything is
a sequence of bytes, so?

> My point is, most of my (currently non-OOP) code can be expressed as
> well in an OOP style. When I defined a C structure along with a bunch
> of functions that perform operations on it, I'm conceptually defining
> a class and its methods, only expressed in a non-OOP language. I
> sometimes put function pointers in the structure, to have a form of
> dynamic dispatch or virtual methods. I occasionally even leave the
> structure declared but undefined publicly, to hide internals (do call
> that encapsulation?), along with functions that could well be called
> accessors and mutators. In my opinion that doesn't count as OOP
> because it doesn't use OOP-specific features like inheritance.

I disagree because in my view this is all what OO is about. OO is not about
the tools (OOPL), it is about the way of programming.

>>> The
>>> problem is, I just can't manage to imagine how to go in a single step
>>> from the byte sequence containing a S-expression describing multiple
>>> objects to the internal memory representation and vice-versa.
>>
>> You need not, that is the power of OOP you dislike so much.
>
> I don't dislike at all. I just seldom think that way. I've met people
> who saw objects everywhere, while I tend to see bunch of bits
> everywhere.

Yes, I know them too. I don't believe that everything is object. But I do
believe in abstract type systems, that every object in a well-designed
program must have a type and that type shall describe the behavior as
precise as possible.

> And the reason why I started this thread is only to
> know how to buffer into memory the arrays of octets, because I need
> (in my usual use of S-expressions) to resolve the links between atoms
> before I can know the type of atoms. So I need a way to delay the
> typing, and in the meantime handle data as a generic byte sequence
> whose only known information is its size and its place in the S-
> expression tree. What exactly is so bad with that approach?

Nothing wrong when at the implementation level. However I don't see why
links need to be resolved first. In comparable cases - I do much messy
protocol/communication stuff - I usually first restore objects and then
resolve links.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Jeffrey Carter on 7 Aug 2010 11:38

On 08/07/2010 12:23 AM, Natacha Kerensikova wrote:
>
> I heard people claiming that the first language shapes the mind of
> coders (and they continue saying a whole generation of programmers has
> been mind-crippled by BASIC). My first language happened to be 386
> assembly, that might explain things. Anyway, I genuinely tried OOP
> with C++ (which I dropped because it's way too complex for me (and I'm
> tempted to say way too complex for the average coder, it should be
> reserved to the few geniuses actually able to fully master it)), but I
> never felt the need of anything beyond what can be done with a C
> struct containing function pointers.

Capers Jones, the function-point person, collected statistics on function points
in various languages. One statistic was the average number of LOC to implement a
function point. Function points may or may not be a good metric, but the values
fell into 3 groups, which he labeled low-level languages (assembler and C),
mid-level (FORTRAN, Pascal), and high-level (Ada).

Thus, there is a big difference in abstraction between C and Ada. C is about
translating the problem into the capabilities of the solution language; Ada is
(or should be) about modeling the problem in the software. C is about coding:
mapping everything onto a small set of predefined representations. Ada is about
SW engineering: creating useful abstractions that represent important aspects of
the problem.

Effectively using Ada requires a different way of thinking from using C or
assembler. You'll more quickly gain that mindset by asking how to approach
specific problems in Ada, rather than how to do something you did in C.

I think there may be an analogy between languages and data representation
formats: everything can be implemented in machine code, but that doesn't mean
it's a good language for development. Similarly, everything may be represented
as a sequence of bytes, but that doesn't mean it's an appropriate representation
for an application.

An S-expression library might be a useful thing at a low-level, but few
applications should call it directly. Instead, there should probably be at least
one layer of abstraction on top of the low-level storage representation, so that
the application only deals with appropriate application-level representations of
the data.

Here, your problem seems to be how to have a human-readable storage format for
your applications. While there is merit to discussing the pros and cons of the
many existing formats to use, the Ada approach is to use an application-specific
abstraction, hiding the implementation detail of which such format is eventually
chosen.

--
Jeff Carter
"I blow my nose on you."
Monty Python & the Holy Grail
03

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

From: Natacha Kerensikova on 7 Aug 2010 13:01

On Aug 7, 5:38 pm, Jeffrey Carter <spam.jrcarter....(a)spam.not.acm.org>
wrote:
> On 08/07/2010 12:23 AM, Natacha Kerensikova wrote:
> Thus, there is a big difference in abstraction between C and Ada. C is about
> translating the problem into the capabilities of the solution language; Ada is
> (or should be) about modeling the problem in the software. C is about coding:
> mapping everything onto a small set of predefined representations. Ada is about
> SW engineering: creating useful abstractions that represent important aspects of
> the problem.

Funnily, you're the first one shaking my resolve to learn Ada.

Let's get everything straight: I'm amateur. I'm coding for fun. Well,
I'll so be coding for a living too, but then I won't have a say in the
language chosen. I like coding in C, and I don't care how efficient it
is. There is not waste of time in a leisure activity. I'd rather have
fun with C rather than doing the same thing 10x faster without fun.
The only thing bothering me in C is that I often end up using
dangerous construct. For example, *(struct foo **)((char *)whatever +
bar.offset). While I'm perfectly fine with that, because I'm confident
in what I'm doing, but I can understand it looks sloppy from the
outside. My main motivation to learn Ada is publicize the concerns for
robustness and correctness that might not be obvious from my C code. I
was hoping to do Ada whatever I used to be doing in C: network
programming, DS homebrew, etc.

Am I misguided? Should I stop now?

> Effectively using Ada requires a different way of thinking from using C or
> assembler. You'll more quickly gain that mindset by asking how to approach
> specific problems in Ada, rather than how to do something you did in C.

Ok, so let's have a look at the grand picture. My main objective right
now is to code a webserver in Ada. Yes, that's reinventing the wheel,
but it does wonders for learning.

Here is how I intended to do it, admittedly exactly like I would do it
in C, could you please tell me how far I am from the Ada approach?

I start by dividing the work into "modules" or "components", each
containing a structure or a few related structures, along with
whatever functions or procedures to deal with them. I thought this
would map perfectly into Ada packages.

Configuration files are S-expressions, in the form of (key value)
pairs, gathered in sections like (section-name (key value) (key value))
Webpage templates are also S-expressions, in the form "raw-
html" (function arg1 arg2 ) "raw-html" (function) "raw-html". The
interesting thing being that arg1 arg2 etc are S-expressions and thus
can be subtemplates too.

As I'm more comfortable using components already coded and tested, I
would code them from the lowest to the highest level:
- first a component dealing with S-expression I/O, hence this topic.
- then a component for configuration, which use the S-expression
library and is used by other components either for program-wide
configuration variables or for instance specific configuration
- then a network component, gluing the rest of my program with
whatever socket library I will use (AdaSockets or GNAT.stuff or C
interfacing or whatever, don't know yet)
- then a HTTP parsing component, taking data from the network
component and configuration
- then a general page component, dispatching requests to the relevant
page objects
- then a raw file component, a specific page responding to HTTP
request with data taken directly from a file
- then a template component, interpreting the function calls from S-
expression templates
- then a templated page component, another specialization of a page
object, dealing with HTTP response and containing instance-specific
data used by the template component.

And that should be about it, I might encounter the need for other
components, maybe for network I/O multiplexing or for logging or for
caching templates etc.

So, how bad is it?

> An S-expression library might be a useful thing at a low-level, but few
> applications should call it directly. Instead, there should probably be at least
> one layer of abstraction on top of the low-level storage representation, so that
> the application only deals with appropriate application-level representations of
> the data.

I have to admit in the above I don't really know what belongs to a
library and what belongs to the application. But indeed, a S-
expression package is a low-level thing, I'm well aware of that, I
just can begin with high-level stuff if I don't have strong and tested
low-level stuff to build upon.

However the point of coding so many separate components is to be able
to change the internals of one without having to touch everything
else. Should I find someday a format so much better than S-
expressions, I would only have one component to change. Should I want
different formats for configuration and templates, that's a component
to add, and maybe little modifications to configuration and/or
template modules. And so on

> Here, your problem seems to be how to have a human-readable storage format for
> your applications. While there is merit to discussing the pros and cons of the
> many existing formats to use, the Ada approach is to use an application-specific
> abstraction, hiding the implementation detail of which such format is eventually
> chosen.

Is that so different than what I explained above?

Thanks for your help,
Natacha

From: Jeffrey Carter on 8 Aug 2010 02:52

On 08/07/2010 10:01 AM, Natacha Kerensikova wrote:
>
> Let's get everything straight: I'm amateur. I'm coding for fun. Well,
> I'll so be coding for a living too, but then I won't have a say in the
> language chosen. I like coding in C, and I don't care how efficient it
> is. There is not waste of time in a leisure activity. I'd rather have
> fun with C rather than doing the same thing 10x faster without fun.
> The only thing bothering me in C is that I often end up using
> dangerous construct. For example, *(struct foo **)((char *)whatever +
> bar.offset). While I'm perfectly fine with that, because I'm confident
> in what I'm doing, but I can understand it looks sloppy from the
> outside. My main motivation to learn Ada is publicize the concerns for
> robustness and correctness that might not be obvious from my C code. I
> was hoping to do Ada whatever I used to be doing in C: network
> programming, DS homebrew, etc.
>
> Am I misguided? Should I stop now?

One can do anything you can do in C in Ada. Better, since creating buffer
overflow and signed-integer overflow vulnerabilities takes effort in Ada, while
they're the default in C. (Virtually every "important security update" I see for
Linux is a buffer overflow or signed-integer overflow vulnerability. I doubt if
people are creating these on purpose. My conclusion is that it is impossible in
practice to use C without creating these errors.)

> Ok, so let's have a look at the grand picture. My main objective right
> now is to code a webserver in Ada. Yes, that's reinventing the wheel,
> but it does wonders for learning.
>
> Here is how I intended to do it, admittedly exactly like I would do it
> in C, could you please tell me how far I am from the Ada approach?

It's hard to comment meaningfully, since you mostly describe your intended
implementation, not your requirements.

I'd use Ada Web Server (AWS), and perhaps you should try that, too. Using a
significant, existing, high-level Ada application framework like that might help
introduce you to how some experienced Ada people thought this kind of thing
should be approached.

A "web server" can be a variety of things, from a simple page server that serves
static files to a highly-dynamic system generating everything on the fly. It
appears that you intend something that serves static files and expanded page
templates.

Initially, I'd observe that the system talks to the network and to the permanent
storage that stores the configuration information, static pages, and so on. So
my initial decomposition would identify interface modules for communicating with
these. (This is an "edges-in" approach.)

At a higher level, there is something that responds to incoming requests to
serve the appropriate responses. There's something this uses to obtain the
configuration from the permanent storage. This could make use of something that
can serve a static page and something that can serve an expanded page template.
There's also clearly a place for something that expands a page template.

I'm doing this off the top of my head, so I won't be surprised if I've missed
something or otherwise screwed up.

This identifies the major high-level modules in the system. I could now define
the package specifications for them and have the compiler check that they are
syntactically correct and semantically consistent. Then I could pick one and
design its implementation. It's likely at some point tasking would be involved,
allowing the processing of multiple requests at once, so this would all have to
be done keeping concurrency in mind.

At some point I'd get to a low enough level to start thinking about
representations, which seems to be where you begin your thinking about the problem.

> As I'm more comfortable using components already coded and tested, I
> would code them from the lowest to the highest level:

In Ada, one can create package specifications, then create other units that make
use of those specifications before they are implemented. This is an important
concept in Ada called the separation of specification and body. Sometimes it is
useful to create stub bodies for such packages, which can then be used to test
the units that make use of these packages. Thus it is often possible to
implement and test higher-level modules before lower-level modules that they use
have been implemented. This may not be especially useful on a single-person
project, but can be quite valuable in projects with more than one developer.
This often seems to be a foreign concept to those used to C.

While your approach seems quite different to mine, many aspects of the final
result seem to be similar. This probably bodes well for you being able to use
Ada effectively.

--
Jeff Carter
"I blow my nose on you."
Monty Python & the Holy Grail
03

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: GPRbuild compatibility
Next: Irony?