From: Natacha Kerensikova on
On Aug 1, 8:49 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
wrote:
> On Sun, 1 Aug 2010 10:35:17 -0700 (PDT), Natacha Kerensikova wrote:
> > On Aug 1, 2:53 pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> > wrote:
> >> How can it make sense if the type is unknown? If the type is known, why not
> >> to state it?
>
> > Actually the type is deduced from the context, e.g.
> > (tcp-connect (host foo.example) (port 80))
>
> Hmm, why do you consider brackets as separate elements?

Because that's the definition of S-expressions :-)
I'm referring to this almost-RFC: http://people.csail.mit.edu/rivest/Sexp.txt

> I mean, more natural would be "procedural":
>
>    tcp-connect (host (foo.example), port (80))
>
> or
>
>    tcp-connect (foo.example, 80)

Those are also perfectly valid ways of serializing the same
information, but they are simply not S-expressions.

I use S-expression as a sort of universal text-based container, just
like most people use XML these days, except S-expressions can easily
embed binary data and they are greatly simpler (as I said, ~1000 lines
of commented 80-column C code with a few reinvented wheels embedded).

> > This means that as far as the S-expression library is concerned, the byte
> > sequence read from the file is not typed. Its typing actually has to be
> > delayed until the application has enough context to interpret it. Hence
> > the need of a typeless chunk of data.
>
> The above is mere syntax, it is not clear why internal/external
> representation must be as you described. (Actually, the structures as above
> are widely used in compiler construction e.g. syntax tree, Reverse Polish
> notation etc.)
>
> There is no such thing as untyped data. The information about the type must
> be somewhere. In your example it could be:
>
>    type Connect is new Abstract_Node with record
>        Host : IP_Address;
>        Port : Port_Type;
>    end record;

It is indeed somewhere, but beyond the reach of a S-expression
library: its belongs to the application using the library. The record
you show here can't be part of a generic S-expression handling
library.

In my example, as far as the S-expression library is concerned, it
looks like:
"(<some_stuff_1> (<some_stuff_2> <some_stuff_3>) (<some_stuff_4>
<some_stuff_5>))"

The library is not supposed to care about what those some_stuff_ are.
Actually, the library is suppose to get binary data from the
application along with the tree structure described above, store it
into a sequence of bytes (on a disk or over a network or whatever),
and to retrieve from the byte sequence the original tree structure
along with the binary data provided.

But now that I think about it, I'm wondering whether I'm stuck in my C
way of thinking and trying to apply it to Ada. Am I missing an Ada way
of storing structured data in a text-based way?

> > I thought this was the strong type safety to prevent (de)serialization
> > procedure from trying to interpret just any chunk of memory.
>
> No, actually it eases serialization because you can define
> Serialize/Unserialize operations on the type.

I honestly don't understand what you mean here.

Am I misusing the word "serialization" to describe the process of
converting an object from an internal representation (e.g. an integer)
to a byte sequence (e.g. the ASCII string of its decimal
representation, or the byte sequence in memory, both of them being two
acceptable but different serializations, corresponding to different
trade-offs (portability and readability vs space and time
efficiency))?


Thanks for making me think,
Natacha
From: anon on
In <547afa6b-731e-475f-a7f2-eaefefb25861(a)k8g2000prh.googlegroups.com>, Natacha Kerensikova <lithiumcat(a)gmail.com> writes:
>Hi,
>
>I'm trying to learn Ada, coming from a C background. The first thing I
>planned to code is a S-expression parser, because it's quite easy (at
>least in C, less than 1000 lines including comments and memory
>handling (dynamic arrays reinvented)) and very useful considering
>almost all my existing programs use S-expressions as a serialization
>format.
>
>To describe briefly S-expressions, I consider them to be the simplest
>existing data organization beyond raw sequences of bits. They are
>basically lists of elements, each element being either a list or an
>atom, and atoms being raw sequences of bits.
>
>While I'm still not deep enough into Ada to know how to represent the
>lists, I guess there won't be major issues, I think I can handle it
>myself (though pointers and hints would still be welcome).
>
>My question here is about how to represent the atoms in Ada. In C it
>was merely a void pointer and a size, but it seems more difficult in
>Ada because of the strong typing. Because it's up to the application
>to make sense (i.e. type) out of the raw sequences of bits in atoms,
>the S-expression library has to handle them as a sort of untyped
>memory chunk. Do you know of a way to handle that?
>
>Please correct me if I'm wrong, but my guess would be that the S-
>expression library would provide a Sexp_Atom type, which refers to the
>untyped memory chunks, and the application would have procedures to
>convert back and forth between Sexp_Atom and whatever types it
>internally uses (i.e. serialization and deserialization procedures).
>However the library would provide these procedures for the most common
>types (e.g. strings and numeric types).
>
>Though it looks like a fine Ada API (at least to my eyes), I have
>absolutely no idea about how to implement the library. How to define
>the application-opaque Sexp_Atom type? How to read Sexp_Atom objects
>from a file? How to build them from Ada types? How to write them back
>to disk?
>
>Thanks for your help,
>Natacha


In creating the LISP interpreter, most use:

For outside environment to internal:

Word or Numeric Text (Text_IO) --> Tokenizer --> Internal Text tree
--> Internal List tree

If the I/O is to become a VS to your program use the Direct_IO package
to create a Indexed binary file where the words and string phases would
be stored in one file while the internal access pointers would be stored
in the second.

Internal Text tree <--> Direct_IO ( Text_Sexp )
Internal List tree <--> Direct_IO ( List_Sexp )

In this design, short term (online) memory storage the access pointer
are still valid. In long term storage you would need to remap the
List tree to map the Text tree using indexes.

So, for long time storage it is better to use the Text_IO, and rebuild the
outside environment version of the structures.

Internal Word tree --> unTokenizer --> Word or Numeric Text (Text_IO)
Internal List tree -->


From: Dmitry A. Kazakov on
On Sun, 1 Aug 2010 13:06:10 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 1, 8:49�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> On Sun, 1 Aug 2010 10:35:17 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 1, 2:53�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
>>> wrote:
>>>> How can it make sense if the type is unknown? If the type is known, why not
>>>> to state it?
>>
>>> Actually the type is deduced from the context, e.g.
>>> (tcp-connect (host foo.example) (port 80))
>>
>> Hmm, why do you consider brackets as separate elements?
>
> Because that's the definition of S-expressions :-)

OK, why S-expressions, then? (:-))

> I'm referring to this almost-RFC: http://people.csail.mit.edu/rivest/Sexp.txt

I see, yet another poor data format.

> I use S-expression as a sort of universal text-based container, just
> like most people use XML these days,

These wonder me too. I see no need in text-based containers for binary
data. Binary data aren't supposed to be read by people, they read texts.
And conversely the machine shall not read texts, it has no idea of good
literature...

>>> This means that as far as the S-expression library is concerned, the byte
>>> sequence read from the file is not typed. Its typing actually has to be
>>> delayed until the application has enough context to interpret it. Hence
>>> the need of a typeless chunk of data.
>>
>> The above is mere syntax, it is not clear why internal/external
>> representation must be as you described. (Actually, the structures as above
>> are widely used in compiler construction e.g. syntax tree, Reverse Polish
>> notation etc.)
>>
>> There is no such thing as untyped data. The information about the type must
>> be somewhere. In your example it could be:
>>
>> � �type Connect is new Abstract_Node with record
>> � � � �Host : IP_Address;
>> � � � �Port : Port_Type;
>> � �end record;
>
> It is indeed somewhere, but beyond the reach of a S-expression
> library: its belongs to the application using the library.

Yes

> The record
> you show here can't be part of a generic S-expression handling
> library.

It need not to be. Abstract_Node should declare Serialize and Unserialize
operations, which Connect does implement. The library makes dispatching
calls. Of course you can declare something like:

� type List_Of_Anything is new Abstract_Node with
-- Container of class-wide objects

and implement its Serialize and Unserialize through walking the items of an
calling its Serialize and Unserialize. BTW, this is how attributes 'Input
and 'Output work with arrays.

> In my example, as far as the S-expression library is concerned, it
> looks like:
> "(<some_stuff_1> (<some_stuff_2> <some_stuff_3>) (<some_stuff_4>
> <some_stuff_5>))"

That is no matter. Serialize/Unserialize will use this format if they have
to.

> The library is not supposed to care about what those some_stuff_ are.
> Actually, the library is suppose to get binary data from the
> application along with the tree structure described above, store it
> into a sequence of bytes (on a disk or over a network or whatever),
> and to retrieve from the byte sequence the original tree structure
> along with the binary data provided.

Serialize can yield a binary chunk, but if you have to write it anyway, why
not to write text? You insisted on having text, why do mess with binary
stuff?

> But now that I think about it, I'm wondering whether I'm stuck in my C
> way of thinking and trying to apply it to Ada. Am I missing an Ada way
> of storing structured data in a text-based way?

I think yes. Though it is not Ada-specific, rather commonly used OOP design
patterns.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: Simon Wright on
Natacha Kerensikova <lithiumcat(a)gmail.com> writes:

> This means that as far as the S-expression library is concerned, the
> byte sequence read from the file is not typed. Its typing actually has
> to be delayed until the application has enough context to interpret
> it. Hence the need of a typeless chunk of data.
[...]
> Well, I imagined making such a new type to distinguish 'raw data
> coming from a S-expression to be interpreted' from other raw data
> types. I thought this was the strong type safety to prevent
> (de)serialization procedure from trying to interpret just any chunk of
> memory.

This sounds really like an alternative implementation of Ada Stream
attributes - ARM 13.13,
http://www.adaic.com/standards/05rm/html/RM-13-13.html .

The idea of a stream is that the data is written (perhaps to a file,
perhaps over a network, perhaps to memory) in suce a way that it can be
retrieved later by the same or another program.

type R is record
I : Integer;
F : Float;
end record;
procedure Read
(Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Item : out R);
procedure Write
(Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Item : R);
for R'Read use Read;
for T'Write use Write;

then implement Read and Write as you wish.

But as you will see the medium - the Stream above - is just a bunch of
bytes in a file or a network packet. If you send a Foo and I read it
hoping it'll be a Bar, we're going to have a problem.

I guess, given the above, we could start the Sexp with "(R ...", then
the recipient would raise Constraint_Error if the type expected didn't
match the data being read.

This is like the mechanism standard Streams (in GNAT, at any rate) use
for transferring tagged and classwide types; the introductory segment
names the provided type and compiler magic creates the proper result.

[Team, I seem to remember an '05 feature that would support this? or was
that my imagination?]
From: Georg Bauhaus on
On 8/1/10 11:13 PM, Dmitry A. Kazakov wrote:

>> I use S-expression as a sort of universal text-based container, just
>> like most people use XML these days,
>
> These wonder me too. I see no need in text-based containers for binary
> data. Binary data aren't supposed to be read by people, they read texts.
> And conversely the machine shall not read texts, it has no idea of good
> literature...

The whole idea of machine readable Ada source text is silly, right?

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: GPRbuild compatibility
Next: Irony?