From: Robert A Duff on
"Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes:

> No. It is the concept, which is broken. And that wasn't Ada, who broke it,
> but crippled operating systems like Windows and Unix. In a proper OS the
> line terminator is not a character.

Why do you say so? The concept of "sequence of characters", which
includes blanks and end-of-line chars, seems pretty good to me.
(Other control chars, such as tabs, should be banished to the far
side of the moon.)

I think the Unix idea -- "line terminator (or separator?) = a particular
character" -- is a pretty convenient model. It's certainly convenient
for parsing input text: read one char at a time, and deal with it,
treating end-of-line as one possible case. E.g., an Ada compiler
typically works that way.

How much human intellectual effort has been wasted by having to deal
with "text mode" versus "binary mode" ftp?! The unix model makes them
identical, and if all operating systems had magically agreed on that
from the dawn of time, we'd all be better off.

To represent end-of-line as TWO characters is just plain stupid.
Even a manual typewriter has a single lever that does both (returns the
carriage, and feeds the line).

Note: I don't know of any Ada compiler that uses Text_IO to read the Ada
source code to be compiled.

- Bob
From: Randy Brukardt on
"Robert A Duff" <bobduff(a)shell01.TheWorld.com> wrote in message
news:wccfybrtj5h.fsf(a)shell01.TheWorld.com...
> Note: I don't know of any Ada compiler that uses Text_IO to read the Ada
> source code to be compiled.

Janus/Ada did originally, back in the early days when we had a partial
implementation of everything. Once we finished the complete Text_IO, though,
the whole thing became too slow. Indeed, these days all of the compiler's IO
is done directly through the lowest of our I/O layers (we called it
"Basic_IO", it's vaguely like Stream_IO).

My two cents on this silly discussion:

(1) The definition of End_of_File requires reading ahead as many as 4
characters. End_of_Line similarly requires read-ahead in some cases. This
requirement has a significant impact on the entire Text_IO (once you've read
those characters ahead, you have to save them somewhere for future use. But
regular buffering would make Standard_Input from the keyboard unusable...).
The requirement for lookahead means that it should *never* be called on
anything that can't be buffered, like the keyboard. So using End_of_File on
Standard_Input is always a mistake.

Besides, it doesn't make sense for a keyboard to even have an EOF. Systems
that allow it - like UNIX - are more likely to cause problems because of an
accidental EOF than any possible use. Way back, we had a CP/M machine which
treated <Ctrl>-Z from the keyboard as closing the keyboard - a reboot was
required to fix it. The machine actually had a <Ctrl>-Z key!! That often
caused loss of work when the keyboard input to the editor suddenly became
closed... End-of-file from Standard_Input is *the* classic example of an
exceptional condition that shouldn't clutter the "normal" code.

(2) The implementation of Text_IO is *very* complicated, especially by
things that are hardly ever used like page terminators, and line and page
counts. Some routines are especially bad; End_of_File is one of these.
Because of this substantial overhead, it's usually far more efficient to
read a file with an infinite loop terminated by an exception.

(3) Text_IO.Get_Line has to read a character at a time. This can be as much
as ten times slower than other methods of reading input. So, if performance
is critical, it's probably best to read and interpret the file another way.

(4) All of this behavior is required by the RM and is enforced by the ACATS.
An implementation that doesn't return End_of_File = True for a file
containing just a blank line will fail the ACATS.

(5) Does this mean that the definition of Text_IO is screwy and
over-complex? Absolutely. But there is absolutely no chance that there will
be any change in the definition of Ada.Text_IO -- it would break a large
percentage of existing Ada programs. So, unless you're designing a
replacement language for Ada, there's no point whining about it. The
definition of the language is not going to change; compilers are not going
to change. Live with it. And that means that for 99% of programs, calling
End_of_File is just wrong; handle End_Error instead. Sorry if that hurts
your sensibilities.

Randy.




From: Larry Kilgallen on
In article <wccfybrtj5h.fsf(a)shell01.TheWorld.com>, Robert A Duff <bobduff(a)shell01.TheWorld.com> writes:

> I think the Unix idea -- "line terminator (or separator?) = a particular
> character" -- is a pretty convenient model. It's certainly convenient
> for parsing input text: read one char at a time, and deal with it,
> treating end-of-line as one possible case. E.g., an Ada compiler
> typically works that way.

It constrains the ASCII values that can be in a record. That may not
be important for compilers but I write programs where record boundaries
different from "new line" is quite useful in the output files.

> To represent end-of-line as TWO characters is just plain stupid.

I will raise (lower) your TWO characters and say that inline record
boundaries are foolish.
From: Dmitry A. Kazakov on
On Thu, 07 Dec 2006 17:50:50 -0500, Robert A Duff wrote:

> "Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes:
>
>> No. It is the concept, which is broken. And that wasn't Ada, who broke it,
>> but crippled operating systems like Windows and Unix. In a proper OS the
>> line terminator is not a character.
>
> Why do you say so? The concept of "sequence of characters", which
> includes blanks and end-of-line chars, seems pretty good to me.
> (Other control chars, such as tabs, should be banished to the far
> side of the moon.)

The concept of a sequence of characters is OK, but it is not text I/O. It
is just String. What I mean is that text I/O cannot be defined in such
terms. If we did that, we would implicitly specify a certain encoding
format, which is OS/presentation specific. It would be OK if there were
only one OS and only one presentation format. But, to give an extreme
example, a text in HTML format should be readable using Text_IO without
seeing any <BR> tags.

> I think the Unix idea -- "line terminator (or separator?) = a particular
> character" -- is a pretty convenient model. It's certainly convenient
> for parsing input text: read one char at a time, and deal with it,
> treating end-of-line as one possible case. E.g., an Ada compiler
> typically works that way.

I don't think so. From the compiler construction perspective this
presentation format is very unfortunate, because you don't know in advance
how long a source line is. (Ada program validity depends on line ends.)
Further areas where this idea works quite poorly are networking (there is
no native way to block packets), keyboard input. It is were all these
buffer overrun issues are rooted. And in general it leads to nowhere. What
about EOF character? What about "abs", "declare", "loop" etc characters?
(:-))

> How much human intellectual effort has been wasted by having to deal
> with "text mode" versus "binary mode" ftp?! The unix model makes them
> identical, and if all operating systems had magically agreed on that
> from the dawn of time, we'd all be better off.

Absolutely, that is exactly my point. It is the flawed Unix model which
considers texts, executables, databases and mouse buttons as sequences of
characters. It is untyped. They should be different ADTs! (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: Maciej Sobczak on
Dmitry A. Kazakov wrote:

>> Why do you assign the special meaning to the line terminator?
>
> Because a line can contain any character.

That's fine, but it doesn't change much from the Get_Line point of view.

>>> My answer is no. Exception is not an error. It indicates an exceptional
>>> state. Note that an exceptional state is a *valid* state. While an error
>>> (bug) has no corresponding program state at all.
>> It's not about bugs. I have presented an example of truncated XML file -
>> there's no bug in a program that happened to be given a broken file to
>> digest. It's an error in a sense that the program cannot read the data
>> that it genuinely expects. Still, the program should handle this case
>> reasonably, so we have valid state.
>
> It is an error in a file, it is not an error in the program. Consider a
> defect HDD. Were an exception appropriate here?

Yes. As in disconnected NFS, and so on.

>> Sorry, I'm not convinced that exception might be a correct design choice
>> for breaking the loop that reads data from well formatted file.
>
> Not only that. I am using exceptions for parsing sources. It fits very
> nicely for recursive descent parsing, makes things a lot cleaner and
> easier.

That's a different kettle of fish. Recursvie descent parser does not
really iterate over things - it *accepts* tokens. The accepting is what
makes a difference between parsing and iterating. There is a failure
logic build in the parser that is not present in iteration.

It's interesting that you mention recursive descent parser, because this
was actually the background for my original problem.
Just few days ago I wanted to practice with some Ada "homework" and
decided to write a simple line-oriented calculator. There is a parser,
of course, and it has a simple grammar with just four productions. The
parser accepts tokens that it expects according to its grammar and
raises an exception when the expected token is not there. The exception
is then handled at the top level, where the user is notified that
ill-formed expression was provided. I have absolutely no problems with
exceptions here - as note above, there is a failure logic in the parser.
But the top level (main subprogram) uses a regular loop for reading
lines of text and End_Of_File predicate to decide whether it's OK to
finish. There is no place for exceptions, it's just pure linear
iteration with single end-of-sequence condition.

Actually, I found this Get_Line problem while having fun with my
calculator "homework".

>> So how do you write iteration routines?
>
> If you mean the case when the number of iterations is statically
> indeterminable, then yes, using exceptions.

What do you mean "statically indeterminable"? What about iterating over
a container?

"Programming in Ada 2005", John Barnes, chapter 19.5 "Iterators".
I don't see any exceptions in there.

> Especially when iteration is
> mixed with recursion.

It's not really mixed, becaue if you decouple parser from tokenizer,
then iteration and recursion work in separated levels of program
structure. :-)

> Protocol_Error : exception;
>
> begin
> loop
> Line := Get_Line (Source);
> -- do something. This may raise an exception as well
> end loop;
> exception
> when End_Error =>
> -- done due to file end
> when Data_Error =>
> -- due to I/O error
> when Protocol_Error =>
> -- due to protocol error
> ...
> end;
>
>> Looks like goto in disguise.
>
> Any execution flow control is. So exceptions are as well.

But you still didn't convince me why exceptions should be preferred in
this case. :-)

>>> End_Of_File in your program serves the
>>> purpose of return code.
>> Nope. It's the end-of-sequence condition. Just like with iterators.
>
> But Get_Line already has a result, which is a string. String is not a
> condition.

Same with iterators. The end-of-sequence condition is the iterator's
state, not the value it returns.

> That's a different idiom. Iterators assume an indexed container.

What about linked lists? They are not indexed.
I've been also using iterators without any containers - that's a nice
solution for function generators, for example (for Python aficionados -
think about range vs. xrange).

> You could
> use iterators for dealing with a container of strings.

That's what I want to see on input when reading consecutinve lines.

> But a stream isn't
> one.

I'm not reading a stream. I'm reading lines - the structure is already
there, and I don't want to care about what is below.

> It is again about mixing abstraction levels. You can convert a
> character stream into a sequence of strings, but the stream itself is a
> container of characters, not lines. While a text file is a third thing.

How does it relate to my problem with Get_Line?

>> If the specs says "read until end", then this means single exit
>> condition to me.
>
> No. This is mixing problem and solution spaces. What if I had a concurrent
> program, which would map the file into virtual memory. Then I could split
> that memory into 10 pieces and let 10 tasks "read it until end."

Then you will have 10 tasks reading their own sequences, very likely
using loops with single exit conditions. It doesn't change the nature of
the problem at all.

>>> Your code didn't managed that either!
>> Why?
>
> Because it contained a hidden goto: "exit when!" (:-))

That is one exit point. Just what the specs says.

>>> Neither manages it inputs longer than 99 characters.
>> Good point. How should I solve this?
>
> By making the main loop dealing with lines instead of reads.

Isn't Get_Line dealing with lines?

>> string line;
>> while (getline(cin, line))
>> {
>> // play with line here
>> }
>
> That's OK to me. However, it is not that clean. line outlives the loop.

It is the price we sometimes pay for more compact representations.
Exactly the same considerations apply to Ada - most loops in Ada I have
seen were written this way.

> But
> it is not equivalent to your Ada code, because you chose fixed-length
> strings. An Ada equivalent of your C++ example would use Unbounded_String.

Of course, but that doesn't change the problem. It's the Get_Line in Ada
vs. getline in C++ that shows the difference.

> Then what happens upon read error, reading the system paging file?

I'd expect std::bad_alloc. That's STORAGE_ERROR in Ada.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/