Designing fgetline - a perspective [General Programming]

Prev: problem analysis chart
Next: Please help!

From: Richard Harter on 12 Nov 2007 11:05

On Mon, 12 Nov 2007 02:17:13 GMT, David Thompson
<dave.thompson2(a)verizon.net> wrote:

[snip]

>You can do
> while( (fg = fgetline (...)) . status == 0 ){ ... }
> /* or != 0, or > 0, or whatever your semantics is */

I thought of this, but I have never used it and didn't know
whether it would work. Is there a reason you put spaces around
the dot?

>
>This is not usual practice (in C), and I'd want to get some experience
>with it before deciding whether I PREFER it, but it definitely works.
>
>FWIW it could be argued that it's analogous to the much more idiomatic
>use in LISP of a multiple-valued return that silently collapses to the
>first value if the caller doesn't ask for the rest.

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

From: Richard Harter on 12 Nov 2007 11:21

On Sun, 11 Nov 2007 19:57:55 -0500, CBFalconer
<cbfalconer(a)yahoo.com> wrote:

>Richard Harter wrote:
>>
>... snip ...
>>
>> where I assume that the return is 0 if a line was read properly
>> and a code value if it fails in some way. This adds error
>> returns for a modest price in complexity. It has the same built
>> in automatic inefficiencies. It also steps on one of C's little
>> awkwardnesses, namely the convention that 0 is false and non-zero
>> is true. IMNSHO this is an ancient mistake; generally speaking
>> there usually is only one way to succeed and many ways to fail.
>> It would have better to do it the other way around. However it
>> is a convention cast is three billion year old granite so there
>> is nothing to be done about it.
>
>No, the thing is that there is only one signal needed for 'OK', but
>it is quite possible to return various error forms. The 0 == OK
>matches this. It is a general practice in the standard library,
>and avoids returning a separate error value.

I opine you're missing the point. Of course one can do all sorts
of things but a common C idiom is

while (somefunc(/* args */)) {/*body */}

If 0 were true and non zero false you could have all sorts of
termination codes. The convention would match the idiom. What
you did is of the form

while (somefunc(...) == SUCCESS_CODE)) {...}

which, while correct C, is clumsier and somewhat of a hack.

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

From: Paul Hsieh on 12 Nov 2007 14:11

On Oct 11, 3:05 pm, c...(a)tiac.net (Richard Harter) wrote:
> The following is an exercise in thinking out a design.
> Comments and thoughts are welcome.
>
> [...]
> What sort of return values should be available in the status
> field? These are the successful ones that occur to me:
>
> end_of file The last read (if any) found an EOL
> marker. The current read finds an
> immediate EOF.

If the line contains a '\n' just before a '\0' doesn't that tell you
the same thing?

> no_increase Normal read - either buffer was null or
> else it was not increased.

If you have some sort of out-parameter that gives the target memory
length, then the user can deduce this themselves by comparing the
starting size with the ending size. This seems to be of incredibly
marginal use (I can't even think of a use) to justify explicitly
pointing this out redundant to other ways of determining it.

> increase Normal read - the buffer size has been
> increased.

This is not redundant with the above flag?

> abn_no_increase Abnormal read - an EOF was found without
> an EOL. Buffer was not increased.

So if instead a line was read that ended without a '\n' before the
'\0' don't we already know this?

> abn_increase Abnormal read - an EOF was found without
> an EOL. Buffer was increased.

It just looks like you are conflating what the flags are trying to
indicate.

Similarly, file reading failures can be determined by looking at
errno() and ferror (). I don't know why you want to wrap all these
error easily deducible error conditions into a flag set.

> In addition there are numerous kinds of errors that can occur.
> Calling sequence arguments include:
>
> no_file The file pointer is null.
> bad_buffer One and only one of buffere and size s
> zero.
> bad_size Size is 0 or is greater than maxsize
> bad_maxsize Maxsize is 0

These are clearly redundant with information available at the call
site.

> Then there are the memory allocation failures:
>
> bad_allocate Malloc or realloc failure
> big_line Line length is greater than maxsize.

Well ok, now *this* is a new error condition that is not obviously
deducible by other means from the call site.

Your emphasis of capturing every single kind of error seems misplaced
to me. You can already determine pretty much all these errors except
for a failure to allocate memory. Give that, a prototype that looks
more like:

long readline (FILE * fp,
char ** pline,
size_t * curralloc,
size_t maxlen);

where the length of the line read is returned on success and -1 or
some other negative number is returned on failure, seem more
appropriate. If *pline is NULL, then *curralloc must be 0, and a new
buffer will be allocated, otherwise realloc will be used to resize the
buffer and curralloc will be updated to the final parameter passed to
realloc. A maxlen of 0 indicates that there is no maximum length. If
pline itself is NULL then curralloc must be NULL and the input is not
stored anywhere.

So:

char * line = NULL;
size_t lalloc = 0, oldlalloc = 0;
size_t llen;

/* Naturally reuse a line buffer */
while (0 < (llen = readline (fp, &line, &lalloc, 0))) {
if (oldalloc < lalloc) {
/* Allocation was increased */
}
/* Check line[llen-1] for '\n' to see if an EOL was read */
/* Process the string line which was read */
oldlalloc = lalloc;
}
free (line);
if (llen < 0) {
if (ferror (fp)) {
/* IO error */
} else {
/* malloc or realloc failure or else the programmer
fed bad parameters that readline() could detect. */
}
}
/* else llen == 0 means EOF was reached with no other issues. */

So you can see that all the relevant error conditions are captured,
with only 4 simple parameters. While bad parameters and memory
allocation failures are aliased together, one requires just
development time debugging, while other may require deployment time
resolution. So the two cases should be naturally separated as the
ordinary process of development.

But this misses another point as well. By getting mired in this sort
of minutia you are missing out on other issues. What about inputting
passwords or reading from a socket? By only servicing FILE *'s you
are limiting your capabilities. Also what about inputing using
delimiters other than EOL? For example, a simplified CSV-like file
could have rows separated by '\n's , but then columns separated by
','s. It makes sense to be able to feed the output (a string) of a
line read into the input of another field read with different
parameters.

These issues are dealt with by taking a more primitive approach and
writing wrappers that are specialized to the programmer's requirement
here:

http://www.azillionmonkeys.com/qed/userInput.html

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

From: Richard Harter on 13 Nov 2007 11:45

On Sun, 11 Nov 2007 16:32:44 -0500, Eric Sosman
<esosman(a)ieee-dot-org.invalid> wrote:

>Richard Harter wrote:

[snip]

>> Here we may have to disagree a bit. The trouble with a KISS
>> approach in this case is that error and efficiency go by the
>> wayside. The simple way is
>>
>> char * getalife(FILE *);
>
> Oddly enough, that's the signature of my line-getter.

Oddly enough I have one stashed away that I have used for
decades. I dare say you did a better job of coding it than I
did. It's the obvious thing to do.
>
>> Now this means that you are automatically committed to separately
>> allocating space for each line.
>
> Well, no. Mine reuses (and perhaps expands) the buffer at
>each call. You only get to keep one line at at time "internal"
>to the line-getter, which doesn't seem to be a problem: My usual
>pattern is not to save the lines verbatim, but to parse them and
>extract data. If you *do* want to save the lines verbatim, you
>can't re-use the same buffer in any case (although in this case
>mine would incur a string copy yours might be able to avoid).
>
>> The compact prototype also throws away the knowledge of the
>> length of the line. If users want that information they have to
>> call strlen, i.e., redo the work that had already be done.
>
> True. (Shrug.) I have seldom found the line length an
>interesting datum.

I accept that you have seldom found ....
>
>> The compact prototype leaves no room for a bound on the line
>> size.
>
> True. (Shrug, perhaps with a touch less assuredness.) If
>you know an upper limit on line length, fgets() is available.
>
>> Another thing that is wrong with the KISS prototype is that there
>> is no place for an error return. I/O errors can be detected by
>> the user, but that again is extra code on the user side. However
>> there is no good way to pass back the knowledge that malloc has
>> failed.
>
> "No place for an error return?" Mine returns NULL for end-of-
>input, I/O error, or malloc() failure, which can be disambiguated
>(if desired) by calling feof() and ferror(). Usually the program
>only cares about normal-vs.-other, and the code looks like
>
> while ((line = getline(stream)) != NULL) {
> ...
> }
> if (! feof(stream)) {
> die_horribly();
> }
>
> Note that *some* kind of test is necessary in any event,
>since the underlying getc() lumps end-of-input and error into
>a single EOF return value.

I stand by "no place for an error return". Now it is quite true
for getline that errors disambiguate them with some code in the
user space. Although this is a somewhat a matter of style, my
view is that leaving error checking to the user is a bad idea.
There are two objections: The first is that error checking (and
disambiguation) is replicated throughout programs. Every user
must (should) do that which the utility routine could have done
once. The second is that all too often the error checking simply
isn't done, or is incomplete.

Incidentally, in your routine, what do you do about a premature
EOF (no final \n). Do you accept that prematurely terminated
final line as a valid line or do you treat it as an error? If
the latter, how is the user supposed to diambiguate it? One way
to do that is to include that final \n if it is present, but then
what if the user doesn't want a final \n?

>
>> The real problem with the KISS dictum in this case is that there
>> is no way to implement it and still reuse buffer space.
>
> Not true of mine.

My apologies, I should have worded that better. According to the
article on Heathfield's page, you return what I call "a dirty
copy" of the line. I've implemented that method myself. At some
recent point I was going to reuse some existing code and
repackage it and I looked at it and said to myself, self this
puppy isn't thread safe. It isn't even single thread safe. Very
simply, the certified integrity of the line vanishes the moment
the user calls any other user defined function.

One way to make it safer is to for getline to maintain a table of
streams and let each stream have its own buffer. I gather you
chose not to do so. I didn't either, but in retrospect I think
that not doing so was a mistake.

>
> I acknowledge that my "simplicity trumps all" approach is not
>The Answer for all situations, nor (obviously) for all programmers.
>My misgiving about the design you are working on is that it seems
>to be trying to be The Answer, and I doubt that's a reasonable goal.

Well, I like to think that I'm trying to think of everything that
one should think of first, and then whittle it down to size.
Maybe I need a better knife.

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

From: Richard Harter on 13 Nov 2007 12:00

On 12 Nov 2007 08:03:08 GMT, Chris Torek <nospam(a)torek.net>
wrote:

>In article <1194651042.178648(a)news1nwk>,
>Eric Sosman <Eric.Sosman(a)Sun.COM> wrote:
>[massive snippage]
>> It reawakens memories of the days when interfaces
>>used "control blocks," ...
>
>> As you may gather, I am a big fan of small, simple
>>interfaces. Some people feel my adoration of simplicity
>>is unhealthily intense; ...
>
>There is a middle path here. Consider:
>
> char *get_a_line(FILE *file, struct whatever *control_block);
>
>where the "control_block" parameter is optional, and may be
>given as NULL. The "simple interface user" then writes:
>
> while ((line = get_a_line(fp, NULL)) != NULL)
> ... work with the line ...
>
>and ignores all distinctions between "complete" and "incomplete"
>lines, "ordinary EOF" and "read error", and so on. The "complex
>interface with control block" user creates and populates the control
>block, and does:
>
> while (get_a_line(fp, &cb), cb.status == OK)
> ... etc ...
>
>or whatever is appropriate. (Note: the above assumes that the
>return value used in the "simple interface" case is duplicated
>somewhere in the control block, in this case.)

That's an excellent idea; thanks for the suggestion. One thing
that strikes me as a good idea is to hide a lot of stuff with an
opaque pointer. That is, the "control block" looks like:

struct control_block (
size_t maxlen; /* Maximum line length permitted */
size_t length; /* Length of returned lines */
long flags; /* Might use bit fields here */
struct private_data * pvt;
};

That last puppy holds things like buffer pointers and lengths
that the implementation needs as state information that persists
from call to call. I opine that it should be set NULL when the
control block is initially populated.

As a further point it might be best to also return the line
pointer and not have it otherwise visible to the user, i.e.,
usage would look like:

while(line = get_a_line(fp, &cb))

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: problem analysis chart
Next: Please help!