Designing fgetline - a perspective [General Programming]

Prev: problem analysis chart
Next: Please help!

From: Carl on 10 Nov 2007 16:27

On 9 Nov 2007 at 23:30, Eric Sosman wrote:
> Still, I can't shake the feeling that you're trying
> to get one function to do too many tasks. Instead of
> using a bazillion flags to govern different modes of
> operation, might you consider a suite of related functions
> to handle the important variations? In effect, you'd be
> moving a few flags out of the struct and into the function
> name.

I think it's better to have one function. Consider that typically an ALU
will be a single circuit and the operation it performs (e.g. add,
multiply, etc.) is determined by control flags. So keeping down the
number of functions is following in good footsteps.

From: Eric Sosman on 10 Nov 2007 16:56

Carl wrote:
> On 9 Nov 2007 at 23:30, Eric Sosman wrote:
>> Still, I can't shake the feeling that you're trying
>> to get one function to do too many tasks. Instead of
>> using a bazillion flags to govern different modes of
>> operation, might you consider a suite of related functions
>> to handle the important variations? In effect, you'd be
>> moving a few flags out of the struct and into the function
>> name.
>
> I think it's better to have one function. Consider that typically an ALU
> will be a single circuit and the operation it performs (e.g. add,
> multiply, etc.) is determined by control flags. So keeping down the
> number of functions is following in good footsteps.

Something about the argument strikes me as unconvincing.
Do you really advocate eliminating sqrt() and cos() and
exp() and log() and all the rest, and replacing them with

double math(struct mathctl*, ...);

Or take my earlier tongue-in-cheek suggestion about
combining all the *printf() functions into one, using a
control block to manage the variations. If you think the
"Swiss Army knife" approach is good, let's see (1) your
design for the combined interface, and (2) helloworld.c
written as your interface would require.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid

From: Malcolm McLean on 10 Nov 2007 17:43

"Eric Sosman" <Eric.Sosman(a)sun.com> wrote in message
> Isn't there some way you can find an excuse to add
> a couple more arguments? Six is too many for most people
> to keep straight, but you may as well try to confuse the
> geniuses, too. ;-)
>
The rule of four. Four arguments is as many as can be scanned. Seven as many
as can be read.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

From: Richard Harter on 11 Nov 2007 13:50

On Fri, 09 Nov 2007 18:30:39 -0500, Eric Sosman
<Eric.Sosman(a)sun.com> wrote:

>Richard Harter wrote On 11/09/07 17:00,:
>> On Fri, 09 Nov 2007 11:36:18 -0500, Eric Sosman
[snip]

>>> Isn't there some way you can find an excuse to add
>>>a couple more arguments? Six is too many for most people
>>>to keep straight, but you may as well try to confuse the
>>>geniuses, too. ;-)
>>
>>
>> I take it you didn't check out the list of flags. I'm
>> after the geniuses too. :-)
>
> Well, I noticed you needed a `long' to hold them,
>not a mere `int' ... Close to two dozen flags, aren't
>there?

22 at the last count. I expect I could trim a few if I had to.

>
>>> Kidding aside, an argument list of that length suggests
>>>to me that the function may be trying to be too many things
>>>to too many people at the same time. It might be wise to
>>>give up some functionality to improve ease of use; the net
>>>change in usefulness could in fact be positive.
>>
>> Point well taken, though there really is nothing that I would
>> want to give up. I wouldn't want to give up error information,
>> nor a bound on maximum line size, nor a reusable buffer (for
>> which both address and size are needed), and not even the line
>> length. One thing that could be done is package some of the
>> arguments in a struct. Frex:
>>
>> struct getfline_args {
>> size_t bufsize,
>> size_t length,
>> size_t maxlen,
>> long flags);
>>
>> Then we can have a prototype that looks like this:
>>
>> int * getfline(FILE *fptr,
>> char **line_ptr,
>> struct getfline_args *args);
>
> I'm not sure why the return value changed from an int
>to a pointer. Typo?

Yep. Read the disclaimer. :-)

>
>> If we adopt the convention a maxlen of zero means no upper bound
>> check or alternately, another input flag needed to activate
>> bounds checking, then instead of a single long calling sequence
>> we can do:
>>
>> struct getfline_args gfl_args = {0,0,0,0};
>> char *line = 0;
>> FILE *fptr = 0;
>> ...
>> while (getfline(fptr,&line,&gfl_args) {
>> /* Do stuff */
>> }
>>
>> This doesn't really change anything but it makes it easier to use
>> the defaults and it moves some of the definitions into the
>> include file. What do you think of this?
>
> It reawakens memories of the days when interfaces
>used "control blocks," often populated by assembly macros.
>The C library's struct tm is such a beast, FILE can be
>thought of in that light, and POSIX has no shortage of
>structured arguments. If you take this route, you'll at
>least be on a well-marked trail.

>
> If you like the control block approach, though, why
>not go whole hog and put the line_ptr in the struct, too?
>You *could* even park the fptr there, but it could ease
>things a little if a struct initialized to all zeroes
>meant something sensible.

If we do the "all zeroes means something sensible" then you don't
want line_ptr in there. If you do the user code would need
something like

line = *(funky.line_ptr);

You could put line in there and let the user refer to funky.line.
That's a workable alternative but I didn't think it would be
popular. Perhaps the answer is that there are no alternatives
that would be popular.

>With a purpose-built struct
>type handy by, that overwhelming mass of flags might
>become a bunch of bit-fields. Bit-fields are, IMHO, a
>mixed blessing, but in a case like this they'd avoid the
>need to pollute the name space with all those GFL_xxx
>macros.

That's probably a good idea. I used the GFL_XXX hack to avoid
infringing on other namespaces. Still, "probably not used" is
not the same thing as "not used".

>
> Still, I can't shake the feeling that you're trying
>to get one function to do too many tasks. Instead of
>using a bazillion flags to govern different modes of
>operation, might you consider a suite of related functions
>to handle the important variations? In effect, you'd be
>moving a few flags out of the struct and into the function
>name. Observe the exec*() family of POSIX interfaces, for
>example: they all do "the same thing" and they probably
>all devolve on the same internal implementation, but using
>different names for different (although related) operations
>is a useful simplification. Try to imagine what it would
>be like if all of printf(), fprintf(), sprintf(), vprintf(),
>vfprintf(), vsprintf(), and vsnprintf() were packaged
>into one function name with a control block to choose
>different operation modes ... Ugh!

It could scarcely be less ugly than the *print* collection.

>
> As you may gather, I am a big fan of small, simple
>interfaces. Some people feel my adoration of simplicity
>is unhealthily intense; my own line-getter has been
>criticized for violating the "... but not simpler" part
>of Einstein's dictum. (The critic was someone I think
>highly of, so I gave his criticism careful thought before
>deciding to ignore it.) I mention all this to make my
>biases clear: the KISS principle is very important to me,
>but may not hold quite so much sway over others.

Here we may have to disagree a bit. The trouble with a KISS
approach in this case is that error and efficiency go by the
wayside. The simple way is

char * getalife(FILE *);

Now this means that you are automatically committed to separately
allocating space for each line. This is a tradeoff; what is
being traded is calling sequence simplicity for execution time.
A KISS advocate is saying in effect that the execution time is
not worth worrying about. I take the view that one should have a
choice. There is a second small issue with the KISS dictum - one
often ends up having to do extra work anyway - in this case
freeing the allocated space for each line.

The compact prototype also throws away the knowledge of the
length of the line. If users want that information they have to
call strlen, i.e., redo the work that had already be done.

The compact prototype leaves no room for a bound on the line
size. I believe an unnamed party has made that very point. I
happen to agree. I tend to take the view that malloc failures
mean that there is something fundamentally wrong with a program.

Another thing that is wrong with the KISS prototype is that there
is no place for an error return. I/O errors can be detected by
the user, but that again is extra code on the user side. However
there is no good way to pass back the knowledge that malloc has
failed.

Falconer's version isn't much better. IIRC it tooks like:

int nolifehere(FILE *, char **);

where I assume that the return is 0 if a line was read properly
and a code value if it fails in some way. This adds error
returns for a modest price in complexity. It has the same built
in automatic inefficiencies. It also steps on one of C's little
awkwardnesses, namely the convention that 0 is false and non-zero
is true. IMNSHO this is an ancient mistake; generally speaking
there usually is only one way to succeed and many ways to fail.
It would have better to do it the other way around. However it
is a convention cast is three billion year old granite so there
is nothing to be done about it.

The real problem with the KISS dictum in this case is that there
is no way to implement it and still reuse buffer space. The
problem is the itsagoodlife function needs state that is
persistent from call to call, and that the space for that state
must be supplied by the user. What it comes down to is that you
can have a getalife function with a simple interface but not a
itsagoodlife function with a simple interface.

Of course one could have a getalife wrapper around a itsagoodlife
implementation; I have no problem with that.

I expect I will do yet another spec and post it. Ugh. Still
more rewriting.

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

From: Eric Sosman on 11 Nov 2007 16:32

Richard Harter wrote:
> On Fri, 09 Nov 2007 18:30:39 -0500, Eric Sosman
> <Eric.Sosman(a)sun.com> wrote:
>>
>> If you like the control block approach, though, why
>> not go whole hog and put the line_ptr in the struct, too?
>> You *could* even park the fptr there, but it could ease
>> things a little if a struct initialized to all zeroes
>> meant something sensible.
>
> If we do the "all zeroes means something sensible" then you don't
> want line_ptr in there. If you do the user code would need
> something like
>
> line = *(funky.line_ptr);
>
> You could put line in there and let the user refer to funky.line.
> That's a workable alternative but I didn't think it would be
> popular. Perhaps the answer is that there are no alternatives
> that would be popular.

I was thinking along the lines of "If the line_ptr element
is NULL, fgetline() allocates and thereafter manages a suitable
buffer." C.f. setvbuf().

>> As you may gather, I am a big fan of small, simple
>> interfaces. [...] I mention all this to make my
>> biases clear: the KISS principle is very important to me,
>> but may not hold quite so much sway over others.
>
> Here we may have to disagree a bit. The trouble with a KISS
> approach in this case is that error and efficiency go by the
> wayside. The simple way is
>
> char * getalife(FILE *);

Oddly enough, that's the signature of my line-getter.

> Now this means that you are automatically committed to separately
> allocating space for each line.

Well, no. Mine reuses (and perhaps expands) the buffer at
each call. You only get to keep one line at at time "internal"
to the line-getter, which doesn't seem to be a problem: My usual
pattern is not to save the lines verbatim, but to parse them and
extract data. If you *do* want to save the lines verbatim, you
can't re-use the same buffer in any case (although in this case
mine would incur a string copy yours might be able to avoid).

> The compact prototype also throws away the knowledge of the
> length of the line. If users want that information they have to
> call strlen, i.e., redo the work that had already be done.

True. (Shrug.) I have seldom found the line length an
interesting datum.

> The compact prototype leaves no room for a bound on the line
> size.

True. (Shrug, perhaps with a touch less assuredness.) If
you know an upper limit on line length, fgets() is available.

> Another thing that is wrong with the KISS prototype is that there
> is no place for an error return. I/O errors can be detected by
> the user, but that again is extra code on the user side. However
> there is no good way to pass back the knowledge that malloc has
> failed.

"No place for an error return?" Mine returns NULL for end-of-
input, I/O error, or malloc() failure, which can be disambiguated
(if desired) by calling feof() and ferror(). Usually the program
only cares about normal-vs.-other, and the code looks like

while ((line = getline(stream)) != NULL) {
...
}
if (! feof(stream)) {
die_horribly();
}

Note that *some* kind of test is necessary in any event,
since the underlying getc() lumps end-of-input and error into
a single EOF return value.

> The real problem with the KISS dictum in this case is that there
> is no way to implement it and still reuse buffer space.

Not true of mine.

I acknowledge that my "simplicity trumps all" approach is not
The Answer for all situations, nor (obviously) for all programmers.
My misgiving about the design you are working on is that it seems
to be trying to be The Answer, and I doubt that's a reasonable goal.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: problem analysis chart
Next: Please help!