S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Dmitry A. Kazakov on 9 Aug 2010 06:56

On Mon, 9 Aug 2010 02:55:03 -0700 (PDT), Natacha Kerensikova wrote:

> On Aug 8, 5:15�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
> wrote:
>> On Sun, 8 Aug 2010 06:49:09 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 8, 3:01�pm, "Dmitry A. Kazakov" <mail...(a)dmitry-kazakov.de>
>>>> No I do. But you have defined it as a text file. A streamed text file is a
>>>> sequence of Character items.
>>
>>> Actually, I didn't. I only defined it as a bunch of byte sequences
>>> organized in a certain way.
>>
>> I see. This confuses things even more. Why should I represent anything as a
>> byte sequence? It already is, and in 90% cases I just don't care how the
>> compiler does that. Why to convert byte sequences into other sequences and
>> then into a text file. It just does not make sense to me. Any conversion
>> must be accompanied by moving the thing from one medium to another.
>> Otherwise it is wasting.
>
> Representing something as a byte sequence is serialization (at least,
> according to my (perhaps wrong) definition of serialization).

"Byte" here is a RAM unit or a disk file item? I meant the former. All
objects are already sequences of bytes in the RAM.

> S-expressions are not a format on top or below that, it's a format
> *besides* that, at the same level. Objects are serialized into byte
> sequences forming S-expression atoms, and relations between objects/
> atoms are serialized by the S-expression format. This is how one get
> the canonical representation of a S-expression.

I thought you wanted to represent *objects* ... as S-sequences?

Where these representations are supposed to live? In the RAM? Why are you
then talking about text files, configurations and humans reading something?
I cannot remember last time I read memory dump...

> Now depending on the situation one might want additional constrains on
> the representation, for example human-readability or being text-based,
> and the S-expression standard defines non-canonical representations
> for such situations.
>
>>> I know very well these differences, except octet vs character,
>>> especially considering Ada's definition of a Character. Or is it only
>>> that Character is an enumeration while octet is a modular integer?
>>
>> The difference is that Character represents code points and octet does
>> atomic arrays of 8 bits.
>
> Considering Ada's Character also spans over 8 bits (256-element
> enumeration), both are equivalent, right?

Equivalent defiled as? In terms of types they are not, because the types
are different. In terms of the relation "=" they are not either, because
"=" is not defined on the tuple Character x Unsigned_8 (or whatever).

> The only difference is the
> intent and the meaning of values, right?

Huh, there is *nothing* beyond the meaning (semantics).

>>> This leads to a question I had in mind since quite early in the
>>> thread, should I really use an array of Storage_Element, while S-
>>> expression standard considers only sequences of octets?
>>
>> That depends on what are you going to do. Storage_Element is a
>> machine-dependent addressable memory unit. Octet is a machine independent
>> presentation layer unit, a thing of 256 independent states. Yes
>> incidentally Character has 256 code points.
>
> Actually I've started to wonder whether Stream_Element might even more
> appropriated: considering a S-expression atom is the serialization of
> an object, and I guess objects which know how to serialize themselves
> do so using the Stream subsystem, so maybe I could more easily
> leverage existing serialization codes if I use Stream_Element_Array
> for atoms.

Note that Stream_Element is machine-depended as well.

> But then I don't know whether it's possible to have object
> hand over a Stream_Element_Array representing themselves,

This does not make sense to me, it is mixing abstractions:
Stream_Element_Array is a representation of an object in a stream.
Storage_Array might be a representation of in the memory. These are two
different objects. You cannot consider them same, even if they shared
physically same memory (but they do not). The whole purpose of
serialization to a raw stream is conversion of a Storage_Array to
Stream_Element_Array. Deserialization is a backward conversion.

> and I don't
> know either how to deal with cases where Stream_Element is not an
> octet.

By not using Stream_Element_Array, obviously. You should use the encoding
you want to. That is all.

If the encoding is for a text file you have to read Characters, you don't
care about how they land into a Stream_Element_Array, it is not your
business, it is an implementation detail of the text stream. If the
encoding is about octets, you have to read them. You have to chose.

>>>> Once you matched "tcp-connect", you know all the types of the following
>>>> components.
>>
>>> Unfortunately, you know "80" is a 16-bit integer only after having
>>> matched "port".
>>
>> Nope, we certainly know that each TCP connection needs a port. There is
>> nothing to resolve since the notation is not reverse. Parse it top down, it
>> is simple, it is safe, it allows excellent diagnostics, it works.
>
> Consider:
> (tcp-connect (host foo.example) (port 80))
> and:
> (tcp-connect (port 80) (host foo.example))
>
> Both of these are semantically equivalent, but know which of the tail
> atom is a 16-bit integer and which is the string, you have to first
> match "port" and "host" head atoms.

Sure

> Or am I misunderstanding your point?

The point is that you never meet 80 before knowing that this is a "port",
you never meet "port" before knowing it is of "tcp-connect". You always
know all types in advance. It is strictly top-down.

> Thanks for your patience with me,

You are welcome. I think from the responses of the people here you see that
the difference between Ada and C is much deeper than begin/end instead of
curly brackets. Ada does not let you through without a clear concept of
what are you going to do. Surely with some proficiency one could write
classical C programs in Ada, messing everything up. You could even create
buffer overflow in Ada. But it is difficult for a beginner...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Robert A Duff on 9 Aug 2010 09:48

"Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes:

> The implementation or the idea? Would you agree that objects with some
> properties of modular integers have place in Ada programs which do not
> interface C?

No. I do not like implicit "mod". But that's the essence of
modular types.

If I ran the circus, then this:

M : constant := 2**64;
type My_Unsigned is range 0..M-1;
pragma Assert(My_Unsigned'Size = 64);

X : My_Unsigned := ...;
Y : My_Unsigned := (X + 1) mod M;

would be legal, and would not raise any exceptions, even when
X = M-1. (And I'd want "(X + 1) mod M" to be implemented
as a single "add" instruction on a typical 64-bit machine.)
The above is illegal in every Ada compiler, because My_Unsigned
is a _signed_ integer type, and nobody supports that range.

Part of the reason modular types were invented is because
signed integers don't work in cases like the above.

>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>> since I think in an ideal world they should be banished.
>
> I think they could be fixed.

How? And having fixed them, when would you use them?
That is, when would you prefer them over signed integers?

>> By the way, one defense of modular types I've heard is that
>> they are used in mathematics. True.
>
>> But mathematicians do
>> not use _implicit_ mod. They say things like "X = Y (mod N)",
>> which is pronounced "X is congruent to Y (modulo N)".
>> Congruent, not equal.
>
> The mathematical notation (mod N) is untyped. It applies to any natural
> numbers and what is worse you have to add it at each point of the program
> you use the type.

Writing "mod" whenever I want an expression that takes the modulus is
a GOOD thing. "+" should always do an add, and nothing else.
If want to negate the result of "+", I should write "-".
If want to take the modulus of the result of "+", I should
write "mod".

Look at how unsigned types are used in C. size_t is a good
example. It's used to count up the sizes of things.
If I have 1000 objects of size 10_000_000, the total
size is 1000*10_000_000 = 10_000_000_000. If that
calculation wraps around on a 32-bit machine, the
answer is just plain wrong. I'd rather get Constraint_Error.

If I interface to C's size_t, and do similar calculations on
the Ada side, wraparound is equally wrong.

- Bob

From: Dmitry A. Kazakov on 9 Aug 2010 10:38

On Mon, 09 Aug 2010 09:48:02 -0400, Robert A Duff wrote:

> "Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes:
>
>> The implementation or the idea? Would you agree that objects with some
>> properties of modular integers have place in Ada programs which do not
>> interface C?
>
> No. I do not like implicit "mod". But that's the essence of
> modular types.
>
> If I ran the circus, then this:
>
> M : constant := 2**64;
> type My_Unsigned is range 0..M-1;
> pragma Assert(My_Unsigned'Size = 64);
>
> X : My_Unsigned := ...;
> Y : My_Unsigned := (X + 1) mod M;
>
> would be legal, and would not raise any exceptions, even when
> X = M-1. (And I'd want "(X + 1) mod M" to be implemented
> as a single "add" instruction on a typical 64-bit machine.)

What does "+" above return? Is "mod M" a required part of the notation or
not?

>>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>>> since I think in an ideal world they should be banished.
>>
>> I think they could be fixed.
>
> How? And having fixed them, when would you use them?
> That is, when would you prefer them over signed integers?

Ring buffer indexing, flags (by-value sets actually), cryptography,
interfacing, communication.

I think that all these usages require different types of modular types with
different sets of operations. So I would prefer a language extension which
would allow me construction of such types rather than built-in ones, which
satisfy no one. Just one example from a huge list, why "in" is not an
operation?

if 0 /= (Mode and Alarm) then -- Isn't it awful?

why am I not allowed to have:

if Alarm in Mode then

>>> By the way, one defense of modular types I've heard is that
>>> they are used in mathematics. True.
>>
>>> But mathematicians do
>>> not use _implicit_ mod. They say things like "X = Y (mod N)",
>>> which is pronounced "X is congruent to Y (modulo N)".
>>> Congruent, not equal.
>>
>> The mathematical notation (mod N) is untyped. It applies to any natural
>> numbers and what is worse you have to add it at each point of the program
>> you use the type.
>
> Writing "mod" whenever I want an expression that takes the modulus is
> a GOOD thing. "+" should always do an add, and nothing else.

Huh, modular "+" *does* add in the ring. Your "+" does not!

> If want to negate the result of "+", I should write "-".

Negative inverse in a ring is not one of integers. They are different
types. It is good and helpful that Ada promotes this difference.

> If want to take the modulus of the result of "+", I should
> write "mod".

That would be a type conversion. Bad thing. But I see no problem with that.
Give me universal integers or equivalent "2 x width" type and you will get
what you want in return:

function "+" (Left, Right : Modular) return Universal_Integer;
-- overloads standard +, if that exists

-- function "mod" (Left, Right : Universal_Integer) return Modular;
-- just to remember, it already exists

That's it.

> Look at how unsigned types are used in C. size_t is a good
> example. It's used to count up the sizes of things.
> If I have 1000 objects of size 10_000_000, the total
> size is 1000*10_000_000 = 10_000_000_000. If that
> calculation wraps around on a 32-bit machine, the
> answer is just plain wrong. I'd rather get Constraint_Error.
>
> If I interface to C's size_t, and do similar calculations on
> the Ada side, wraparound is equally wrong.

I agree. But it rather means that C is wrong defining size_t modular. On
Ada side it must be Natural_32. Why Ada does not support that? Why is it
impossible to declare 32-bit range 0..2**32-1 without wrapping (=with
overflow check)?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Georg Bauhaus on 9 Aug 2010 11:14

On 09.08.10 16:38, Dmitry A. Kazakov wrote:

> I think that all these usages require different types of modular types with
> different sets of operations. So I would prefer a language extension which
> would allow me construction of such types rather than built-in ones, which
> satisfy no one. Just one example from a huge list, why "in" is not an
> operation?
>
> if 0 /= (Mode and Alarm) then -- Isn't it awful?
>
> why am I not allowed to have:
>
> if Alarm in Mode then

Is anything wrong with packed arrays of Booleans for status thingies?

>> Writing "mod" whenever I want an expression that takes the modulus is
>> a GOOD thing. "+" should always do an add, and nothing else.
>
> Huh, modular "+" *does* add in the ring. Your "+" does not!

How many programmers, among those without a degree in math or
in physics, know what a ring is?

Hence, no "+" can do the "right thing", since "+" is overloaded.
Not just in Ada, but also in popular understanding.
A definition of the "+" operator, whose meaning(s) is (are)
stipulated to be well known, seems impossible:
everyone considers its meaning(s) as clearly established as
the proper ingredients of lentil soup. Because the frames of
reference used in the definition are an obvious pick, aren't they!?

I don't remember when I have last read a final report on either
the statistical meaning of "+" or on lentil soup ingredients research
(other than our bi-weekly CVE :-)
But I do remember finding collections of good recipes.
Each recipe there is named with a unique identifier, and it lists
operands (ingredients) and operations (preparation, cooking).
The essential bit: use words or phrases that say what you
mean. (By implication, express what you want others to understand.)
Obviously, "+" does not express a recipe clearly, or not without
lengthy exegesis...

Imagine a programming language that is just like Ada with
SPARK restrictions on naming. Then, throw out overloading of
predefined names, too, of "+", for example.

No more misunderstandings, then.

But will this language be usable with beginners?

From: Simon Wright on 9 Aug 2010 11:40

Natacha Kerensikova <lithiumcat(a)gmail.com> writes:

> and I don't know either how to deal with cases where Stream_Element is
> not an octet.

By ignoring them, I think!

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: GPRbuild compatibility
Next: Irony?