|
From: Robert A Duff on 7 Dec 2006 17:50 "Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes: > No. It is the concept, which is broken. And that wasn't Ada, who broke it, > but crippled operating systems like Windows and Unix. In a proper OS the > line terminator is not a character. Why do you say so? The concept of "sequence of characters", which includes blanks and end-of-line chars, seems pretty good to me. (Other control chars, such as tabs, should be banished to the far side of the moon.) I think the Unix idea -- "line terminator (or separator?) = a particular character" -- is a pretty convenient model. It's certainly convenient for parsing input text: read one char at a time, and deal with it, treating end-of-line as one possible case. E.g., an Ada compiler typically works that way. How much human intellectual effort has been wasted by having to deal with "text mode" versus "binary mode" ftp?! The unix model makes them identical, and if all operating systems had magically agreed on that from the dawn of time, we'd all be better off. To represent end-of-line as TWO characters is just plain stupid. Even a manual typewriter has a single lever that does both (returns the carriage, and feeds the line). Note: I don't know of any Ada compiler that uses Text_IO to read the Ada source code to be compiled. - Bob
From: Randy Brukardt on 7 Dec 2006 19:13 "Robert A Duff" <bobduff(a)shell01.TheWorld.com> wrote in message news:wccfybrtj5h.fsf(a)shell01.TheWorld.com... > Note: I don't know of any Ada compiler that uses Text_IO to read the Ada > source code to be compiled. Janus/Ada did originally, back in the early days when we had a partial implementation of everything. Once we finished the complete Text_IO, though, the whole thing became too slow. Indeed, these days all of the compiler's IO is done directly through the lowest of our I/O layers (we called it "Basic_IO", it's vaguely like Stream_IO). My two cents on this silly discussion: (1) The definition of End_of_File requires reading ahead as many as 4 characters. End_of_Line similarly requires read-ahead in some cases. This requirement has a significant impact on the entire Text_IO (once you've read those characters ahead, you have to save them somewhere for future use. But regular buffering would make Standard_Input from the keyboard unusable...). The requirement for lookahead means that it should *never* be called on anything that can't be buffered, like the keyboard. So using End_of_File on Standard_Input is always a mistake. Besides, it doesn't make sense for a keyboard to even have an EOF. Systems that allow it - like UNIX - are more likely to cause problems because of an accidental EOF than any possible use. Way back, we had a CP/M machine which treated <Ctrl>-Z from the keyboard as closing the keyboard - a reboot was required to fix it. The machine actually had a <Ctrl>-Z key!! That often caused loss of work when the keyboard input to the editor suddenly became closed... End-of-file from Standard_Input is *the* classic example of an exceptional condition that shouldn't clutter the "normal" code. (2) The implementation of Text_IO is *very* complicated, especially by things that are hardly ever used like page terminators, and line and page counts. Some routines are especially bad; End_of_File is one of these. Because of this substantial overhead, it's usually far more efficient to read a file with an infinite loop terminated by an exception. (3) Text_IO.Get_Line has to read a character at a time. This can be as much as ten times slower than other methods of reading input. So, if performance is critical, it's probably best to read and interpret the file another way. (4) All of this behavior is required by the RM and is enforced by the ACATS. An implementation that doesn't return End_of_File = True for a file containing just a blank line will fail the ACATS. (5) Does this mean that the definition of Text_IO is screwy and over-complex? Absolutely. But there is absolutely no chance that there will be any change in the definition of Ada.Text_IO -- it would break a large percentage of existing Ada programs. So, unless you're designing a replacement language for Ada, there's no point whining about it. The definition of the language is not going to change; compilers are not going to change. Live with it. And that means that for 99% of programs, calling End_of_File is just wrong; handle End_Error instead. Sorry if that hurts your sensibilities. Randy.
From: Larry Kilgallen on 7 Dec 2006 23:04 In article <wccfybrtj5h.fsf(a)shell01.TheWorld.com>, Robert A Duff <bobduff(a)shell01.TheWorld.com> writes: > I think the Unix idea -- "line terminator (or separator?) = a particular > character" -- is a pretty convenient model. It's certainly convenient > for parsing input text: read one char at a time, and deal with it, > treating end-of-line as one possible case. E.g., an Ada compiler > typically works that way. It constrains the ASCII values that can be in a record. That may not be important for compilers but I write programs where record boundaries different from "new line" is quite useful in the output files. > To represent end-of-line as TWO characters is just plain stupid. I will raise (lower) your TWO characters and say that inline record boundaries are foolish.
From: Dmitry A. Kazakov on 8 Dec 2006 04:11 On Thu, 07 Dec 2006 17:50:50 -0500, Robert A Duff wrote: > "Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> writes: > >> No. It is the concept, which is broken. And that wasn't Ada, who broke it, >> but crippled operating systems like Windows and Unix. In a proper OS the >> line terminator is not a character. > > Why do you say so? The concept of "sequence of characters", which > includes blanks and end-of-line chars, seems pretty good to me. > (Other control chars, such as tabs, should be banished to the far > side of the moon.) The concept of a sequence of characters is OK, but it is not text I/O. It is just String. What I mean is that text I/O cannot be defined in such terms. If we did that, we would implicitly specify a certain encoding format, which is OS/presentation specific. It would be OK if there were only one OS and only one presentation format. But, to give an extreme example, a text in HTML format should be readable using Text_IO without seeing any <BR> tags. > I think the Unix idea -- "line terminator (or separator?) = a particular > character" -- is a pretty convenient model. It's certainly convenient > for parsing input text: read one char at a time, and deal with it, > treating end-of-line as one possible case. E.g., an Ada compiler > typically works that way. I don't think so. From the compiler construction perspective this presentation format is very unfortunate, because you don't know in advance how long a source line is. (Ada program validity depends on line ends.) Further areas where this idea works quite poorly are networking (there is no native way to block packets), keyboard input. It is were all these buffer overrun issues are rooted. And in general it leads to nowhere. What about EOF character? What about "abs", "declare", "loop" etc characters? (:-)) > How much human intellectual effort has been wasted by having to deal > with "text mode" versus "binary mode" ftp?! The unix model makes them > identical, and if all operating systems had magically agreed on that > from the dawn of time, we'd all be better off. Absolutely, that is exactly my point. It is the flawed Unix model which considers texts, executables, databases and mouse buttons as sequences of characters. It is untyped. They should be different ADTs! (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
From: Maciej Sobczak on 8 Dec 2006 03:22 Dmitry A. Kazakov wrote: >> Why do you assign the special meaning to the line terminator? > > Because a line can contain any character. That's fine, but it doesn't change much from the Get_Line point of view. >>> My answer is no. Exception is not an error. It indicates an exceptional >>> state. Note that an exceptional state is a *valid* state. While an error >>> (bug) has no corresponding program state at all. >> It's not about bugs. I have presented an example of truncated XML file - >> there's no bug in a program that happened to be given a broken file to >> digest. It's an error in a sense that the program cannot read the data >> that it genuinely expects. Still, the program should handle this case >> reasonably, so we have valid state. > > It is an error in a file, it is not an error in the program. Consider a > defect HDD. Were an exception appropriate here? Yes. As in disconnected NFS, and so on. >> Sorry, I'm not convinced that exception might be a correct design choice >> for breaking the loop that reads data from well formatted file. > > Not only that. I am using exceptions for parsing sources. It fits very > nicely for recursive descent parsing, makes things a lot cleaner and > easier. That's a different kettle of fish. Recursvie descent parser does not really iterate over things - it *accepts* tokens. The accepting is what makes a difference between parsing and iterating. There is a failure logic build in the parser that is not present in iteration. It's interesting that you mention recursive descent parser, because this was actually the background for my original problem. Just few days ago I wanted to practice with some Ada "homework" and decided to write a simple line-oriented calculator. There is a parser, of course, and it has a simple grammar with just four productions. The parser accepts tokens that it expects according to its grammar and raises an exception when the expected token is not there. The exception is then handled at the top level, where the user is notified that ill-formed expression was provided. I have absolutely no problems with exceptions here - as note above, there is a failure logic in the parser. But the top level (main subprogram) uses a regular loop for reading lines of text and End_Of_File predicate to decide whether it's OK to finish. There is no place for exceptions, it's just pure linear iteration with single end-of-sequence condition. Actually, I found this Get_Line problem while having fun with my calculator "homework". >> So how do you write iteration routines? > > If you mean the case when the number of iterations is statically > indeterminable, then yes, using exceptions. What do you mean "statically indeterminable"? What about iterating over a container? "Programming in Ada 2005", John Barnes, chapter 19.5 "Iterators". I don't see any exceptions in there. > Especially when iteration is > mixed with recursion. It's not really mixed, becaue if you decouple parser from tokenizer, then iteration and recursion work in separated levels of program structure. :-) > Protocol_Error : exception; > > begin > loop > Line := Get_Line (Source); > -- do something. This may raise an exception as well > end loop; > exception > when End_Error => > -- done due to file end > when Data_Error => > -- due to I/O error > when Protocol_Error => > -- due to protocol error > ... > end; > >> Looks like goto in disguise. > > Any execution flow control is. So exceptions are as well. But you still didn't convince me why exceptions should be preferred in this case. :-) >>> End_Of_File in your program serves the >>> purpose of return code. >> Nope. It's the end-of-sequence condition. Just like with iterators. > > But Get_Line already has a result, which is a string. String is not a > condition. Same with iterators. The end-of-sequence condition is the iterator's state, not the value it returns. > That's a different idiom. Iterators assume an indexed container. What about linked lists? They are not indexed. I've been also using iterators without any containers - that's a nice solution for function generators, for example (for Python aficionados - think about range vs. xrange). > You could > use iterators for dealing with a container of strings. That's what I want to see on input when reading consecutinve lines. > But a stream isn't > one. I'm not reading a stream. I'm reading lines - the structure is already there, and I don't want to care about what is below. > It is again about mixing abstraction levels. You can convert a > character stream into a sequence of strings, but the stream itself is a > container of characters, not lines. While a text file is a third thing. How does it relate to my problem with Get_Line? >> If the specs says "read until end", then this means single exit >> condition to me. > > No. This is mixing problem and solution spaces. What if I had a concurrent > program, which would map the file into virtual memory. Then I could split > that memory into 10 pieces and let 10 tasks "read it until end." Then you will have 10 tasks reading their own sequences, very likely using loops with single exit conditions. It doesn't change the nature of the problem at all. >>> Your code didn't managed that either! >> Why? > > Because it contained a hidden goto: "exit when!" (:-)) That is one exit point. Just what the specs says. >>> Neither manages it inputs longer than 99 characters. >> Good point. How should I solve this? > > By making the main loop dealing with lines instead of reads. Isn't Get_Line dealing with lines? >> string line; >> while (getline(cin, line)) >> { >> // play with line here >> } > > That's OK to me. However, it is not that clean. line outlives the loop. It is the price we sometimes pay for more compact representations. Exactly the same considerations apply to Ada - most loops in Ada I have seen were written this way. > But > it is not equivalent to your Ada code, because you chose fixed-length > strings. An Ada equivalent of your C++ example would use Unbounded_String. Of course, but that doesn't change the problem. It's the Get_Line in Ada vs. getline in C++ that shows the difference. > Then what happens upon read error, reading the system paging file? I'd expect std::bad_alloc. That's STORAGE_ERROR in Ada. -- Maciej Sobczak : http://www.msobczak.com/ Programming : http://www.msobczak.com/prog/
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: I/O streaming with custom data transport Next: ANNOUNCE: Avatox 1.4 now available |