Prev: problem analysis chart
Next: Please help!
From: Richard Harter on 12 Nov 2007 11:05 On Mon, 12 Nov 2007 02:17:13 GMT, David Thompson <dave.thompson2(a)verizon.net> wrote: [snip] >You can do > while( (fg = fgetline (...)) . status == 0 ){ ... } > /* or != 0, or > 0, or whatever your semantics is */ I thought of this, but I have never used it and didn't know whether it would work. Is there a reason you put spaces around the dot? > >This is not usual practice (in C), and I'd want to get some experience >with it before deciding whether I PREFER it, but it definitely works. > >FWIW it could be argued that it's analogous to the much more idiomatic >use in LISP of a multiple-valued return that silently collapses to the >first value if the caller doesn't ask for the rest. Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Richard Harter on 12 Nov 2007 11:21 On Sun, 11 Nov 2007 19:57:55 -0500, CBFalconer <cbfalconer(a)yahoo.com> wrote: >Richard Harter wrote: >> >... snip ... >> >> where I assume that the return is 0 if a line was read properly >> and a code value if it fails in some way. This adds error >> returns for a modest price in complexity. It has the same built >> in automatic inefficiencies. It also steps on one of C's little >> awkwardnesses, namely the convention that 0 is false and non-zero >> is true. IMNSHO this is an ancient mistake; generally speaking >> there usually is only one way to succeed and many ways to fail. >> It would have better to do it the other way around. However it >> is a convention cast is three billion year old granite so there >> is nothing to be done about it. > >No, the thing is that there is only one signal needed for 'OK', but >it is quite possible to return various error forms. The 0 == OK >matches this. It is a general practice in the standard library, >and avoids returning a separate error value. I opine you're missing the point. Of course one can do all sorts of things but a common C idiom is while (somefunc(/* args */)) {/*body */} If 0 were true and non zero false you could have all sorts of termination codes. The convention would match the idiom. What you did is of the form while (somefunc(...) == SUCCESS_CODE)) {...} which, while correct C, is clumsier and somewhat of a hack. Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Paul Hsieh on 12 Nov 2007 14:11 On Oct 11, 3:05 pm, c...(a)tiac.net (Richard Harter) wrote: > The following is an exercise in thinking out a design. > Comments and thoughts are welcome. > > [...] > What sort of return values should be available in the status > field? These are the successful ones that occur to me: > > end_of file The last read (if any) found an EOL > marker. The current read finds an > immediate EOF. If the line contains a '\n' just before a '\0' doesn't that tell you the same thing? > no_increase Normal read - either buffer was null or > else it was not increased. If you have some sort of out-parameter that gives the target memory length, then the user can deduce this themselves by comparing the starting size with the ending size. This seems to be of incredibly marginal use (I can't even think of a use) to justify explicitly pointing this out redundant to other ways of determining it. > increase Normal read - the buffer size has been > increased. This is not redundant with the above flag? > abn_no_increase Abnormal read - an EOF was found without > an EOL. Buffer was not increased. So if instead a line was read that ended without a '\n' before the '\0' don't we already know this? > abn_increase Abnormal read - an EOF was found without > an EOL. Buffer was increased. It just looks like you are conflating what the flags are trying to indicate. Similarly, file reading failures can be determined by looking at errno() and ferror (). I don't know why you want to wrap all these error easily deducible error conditions into a flag set. > In addition there are numerous kinds of errors that can occur. > Calling sequence arguments include: > > no_file The file pointer is null. > bad_buffer One and only one of buffere and size s > zero. > bad_size Size is 0 or is greater than maxsize > bad_maxsize Maxsize is 0 These are clearly redundant with information available at the call site. > Then there are the memory allocation failures: > > bad_allocate Malloc or realloc failure > big_line Line length is greater than maxsize. Well ok, now *this* is a new error condition that is not obviously deducible by other means from the call site. Your emphasis of capturing every single kind of error seems misplaced to me. You can already determine pretty much all these errors except for a failure to allocate memory. Give that, a prototype that looks more like: long readline (FILE * fp, char ** pline, size_t * curralloc, size_t maxlen); where the length of the line read is returned on success and -1 or some other negative number is returned on failure, seem more appropriate. If *pline is NULL, then *curralloc must be 0, and a new buffer will be allocated, otherwise realloc will be used to resize the buffer and curralloc will be updated to the final parameter passed to realloc. A maxlen of 0 indicates that there is no maximum length. If pline itself is NULL then curralloc must be NULL and the input is not stored anywhere. So: char * line = NULL; size_t lalloc = 0, oldlalloc = 0; size_t llen; /* Naturally reuse a line buffer */ while (0 < (llen = readline (fp, &line, &lalloc, 0))) { if (oldalloc < lalloc) { /* Allocation was increased */ } /* Check line[llen-1] for '\n' to see if an EOL was read */ /* Process the string line which was read */ oldlalloc = lalloc; } free (line); if (llen < 0) { if (ferror (fp)) { /* IO error */ } else { /* malloc or realloc failure or else the programmer fed bad parameters that readline() could detect. */ } } /* else llen == 0 means EOF was reached with no other issues. */ So you can see that all the relevant error conditions are captured, with only 4 simple parameters. While bad parameters and memory allocation failures are aliased together, one requires just development time debugging, while other may require deployment time resolution. So the two cases should be naturally separated as the ordinary process of development. But this misses another point as well. By getting mired in this sort of minutia you are missing out on other issues. What about inputting passwords or reading from a socket? By only servicing FILE *'s you are limiting your capabilities. Also what about inputing using delimiters other than EOL? For example, a simplified CSV-like file could have rows separated by '\n's , but then columns separated by ','s. It makes sense to be able to feed the output (a string) of a line read into the input of another field read with different parameters. These issues are dealt with by taking a more primitive approach and writing wrappers that are specialized to the programmer's requirement here: http://www.azillionmonkeys.com/qed/userInput.html -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
From: Richard Harter on 13 Nov 2007 11:45 On Sun, 11 Nov 2007 16:32:44 -0500, Eric Sosman <esosman(a)ieee-dot-org.invalid> wrote: >Richard Harter wrote: [snip] >> Here we may have to disagree a bit. The trouble with a KISS >> approach in this case is that error and efficiency go by the >> wayside. The simple way is >> >> char * getalife(FILE *); > > Oddly enough, that's the signature of my line-getter. Oddly enough I have one stashed away that I have used for decades. I dare say you did a better job of coding it than I did. It's the obvious thing to do. > >> Now this means that you are automatically committed to separately >> allocating space for each line. > > Well, no. Mine reuses (and perhaps expands) the buffer at >each call. You only get to keep one line at at time "internal" >to the line-getter, which doesn't seem to be a problem: My usual >pattern is not to save the lines verbatim, but to parse them and >extract data. If you *do* want to save the lines verbatim, you >can't re-use the same buffer in any case (although in this case >mine would incur a string copy yours might be able to avoid). > >> The compact prototype also throws away the knowledge of the >> length of the line. If users want that information they have to >> call strlen, i.e., redo the work that had already be done. > > True. (Shrug.) I have seldom found the line length an >interesting datum. I accept that you have seldom found .... > >> The compact prototype leaves no room for a bound on the line >> size. > > True. (Shrug, perhaps with a touch less assuredness.) If >you know an upper limit on line length, fgets() is available. > >> Another thing that is wrong with the KISS prototype is that there >> is no place for an error return. I/O errors can be detected by >> the user, but that again is extra code on the user side. However >> there is no good way to pass back the knowledge that malloc has >> failed. > > "No place for an error return?" Mine returns NULL for end-of- >input, I/O error, or malloc() failure, which can be disambiguated >(if desired) by calling feof() and ferror(). Usually the program >only cares about normal-vs.-other, and the code looks like > > while ((line = getline(stream)) != NULL) { > ... > } > if (! feof(stream)) { > die_horribly(); > } > > Note that *some* kind of test is necessary in any event, >since the underlying getc() lumps end-of-input and error into >a single EOF return value. I stand by "no place for an error return". Now it is quite true for getline that errors disambiguate them with some code in the user space. Although this is a somewhat a matter of style, my view is that leaving error checking to the user is a bad idea. There are two objections: The first is that error checking (and disambiguation) is replicated throughout programs. Every user must (should) do that which the utility routine could have done once. The second is that all too often the error checking simply isn't done, or is incomplete. Incidentally, in your routine, what do you do about a premature EOF (no final \n). Do you accept that prematurely terminated final line as a valid line or do you treat it as an error? If the latter, how is the user supposed to diambiguate it? One way to do that is to include that final \n if it is present, but then what if the user doesn't want a final \n? > >> The real problem with the KISS dictum in this case is that there >> is no way to implement it and still reuse buffer space. > > Not true of mine. My apologies, I should have worded that better. According to the article on Heathfield's page, you return what I call "a dirty copy" of the line. I've implemented that method myself. At some recent point I was going to reuse some existing code and repackage it and I looked at it and said to myself, self this puppy isn't thread safe. It isn't even single thread safe. Very simply, the certified integrity of the line vanishes the moment the user calls any other user defined function. One way to make it safer is to for getline to maintain a table of streams and let each stream have its own buffer. I gather you chose not to do so. I didn't either, but in retrospect I think that not doing so was a mistake. > > I acknowledge that my "simplicity trumps all" approach is not >The Answer for all situations, nor (obviously) for all programmers. >My misgiving about the design you are working on is that it seems >to be trying to be The Answer, and I doubt that's a reasonable goal. Well, I like to think that I'm trying to think of everything that one should think of first, and then whittle it down to size. Maybe I need a better knife. Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Richard Harter on 13 Nov 2007 12:00
On 12 Nov 2007 08:03:08 GMT, Chris Torek <nospam(a)torek.net> wrote: >In article <1194651042.178648(a)news1nwk>, >Eric Sosman <Eric.Sosman(a)Sun.COM> wrote: >[massive snippage] >> It reawakens memories of the days when interfaces >>used "control blocks," ... > >> As you may gather, I am a big fan of small, simple >>interfaces. Some people feel my adoration of simplicity >>is unhealthily intense; ... > >There is a middle path here. Consider: > > char *get_a_line(FILE *file, struct whatever *control_block); > >where the "control_block" parameter is optional, and may be >given as NULL. The "simple interface user" then writes: > > while ((line = get_a_line(fp, NULL)) != NULL) > ... work with the line ... > >and ignores all distinctions between "complete" and "incomplete" >lines, "ordinary EOF" and "read error", and so on. The "complex >interface with control block" user creates and populates the control >block, and does: > > while (get_a_line(fp, &cb), cb.status == OK) > ... etc ... > >or whatever is appropriate. (Note: the above assumes that the >return value used in the "simple interface" case is duplicated >somewhere in the control block, in this case.) That's an excellent idea; thanks for the suggestion. One thing that strikes me as a good idea is to hide a lot of stuff with an opaque pointer. That is, the "control block" looks like: struct control_block ( size_t maxlen; /* Maximum line length permitted */ size_t length; /* Length of returned lines */ long flags; /* Might use bit fields here */ struct private_data * pvt; }; That last puppy holds things like buffer pointers and lengths that the implementation needs as state information that persists from call to call. I opine that it should be set NULL when the control block is initially populated. As a further point it might be best to also return the line pointer and not have it otherwise visible to the user, i.e., usage would look like: while(line = get_a_line(fp, &cb)) Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die |