From: Richard Harter on
On Thu, 11 Oct 2007 23:05:29 GMT, cri(a)tiac.net (Richard Harter)
wrote:

I have taken the original posting and have commented on it at
length. Some people might enjoy the critique. In a separate
post I will give a (hopefully) complete specification. This
discussion and the spec will also go into web pages.

I appreciate that such an extended discussion drifts into the
area of goldplated bullshit; so be it.

>What we want is a routine that keeps reading and returning lines
>from a file. A simple prototype for this function is
>
> char * fgetline(FILE *);

The names, getline and fgetline, are already in common use. In
the end I decided to use getfline instead.
>
>There are some gotchas with this prototype - well, not really
>gotchas, rather little bits of awkwardness and inefficiency. One
>
>is that we are throwing away information that (usually) has to be
>
>recomputed, namely the length of the line and, as we shall see,
>status information. If we want to add the information to our
>prototype there are several that to do it - it can be
>passed back through the calling sequence or it can be returned by
>
>the function. (We could also pass it to a global - ugh, don't do
>
>it.) My vote is to have it all returned by the function which
>means we create a struct (record) that looks like:
>
> struct fgetline_info {
> char *line;
> size_t len;
> int status;
> };
>
>There are various choices that could be made in the types -
>season to taste. (A pointer to the struct could be passed in as
>an argument if need be.)
>
>
>A second issue is that there is no limit to how large the line
>can be. A plausible way to provide a limit that is to pass one
>in as an argument. If we do, our prototype looks like this:
>
> struct fgetline_info fgetline(FILE * fptr, size_t maxsize);
>
Here I went wrong. Having a struct (record) to hold information
is plausible, but it may be overkill. Be that as it may, in a C
implementation this prototype doesn't work too well. First of all
the user has to go through the struct to access the line and the
status field. More importantly the normal usage in C would be a
while loop, e.g.,

struct fgetline_info fg;
...
while (fg = fgetline(FILE *fptr, size_t maxsize)) {...}

This doesn't work - the idiom requires a test on something that
can be "zero". If we convert the prototype and fg to a pointer
the code still doesn'twork. Now the problem is that when the read
is successful we don't need the status field; when the read fails
we don't have the status field.

>This version has a little problem - where did the storage for the
>
>line come from? One way to deal with that is for fgetline to
>allocate storage for the line and for the calling routine to free
>
>it when it is done with the line.
>
>Okay, so how do we do this in fgetline? Well, there is a very
>simple way to do it, one that has been reinvented time and time
>again. We start out allocating a standard amount of space
>(numbers like 32, 64, and 80 bytes are typical) for a line.
>We read from the file until we either hit an EOL (end of line
>marker), a failure to read, or we have read as many characters as
>
>were allocated. If we haven't hit the EOL we reallocate the
>array, commonly by doubling the size, and doing another read.
>
>This works, but it is inefficient - we have to call malloc and
>free for every line.

Whether the cost of calls to malloc and free matters obviously
depends on the application. However (a) the costs are commonly
underestimated, and (b) a general policy of ignoring apparently
minor costs can cumulate and produce unexpected inefficiencies.



>One way to get around this is to use a
>technique I call highwater buffering. In the file containing the
>
>code for fgetline we have two file-scope variables:
>
> static char * buffer = 0;
> static size_t size = 0;
>
>In fgetline we have some initialization code that allocates
>buffer space when the buffer size is zero. Thereafter we use our
>
>little doubling trick. The advantage of this scheme is that we
>have at most a small number of calls to malloc (the number of
>doublings needed to get to the largest line) and no calls to
>free.
>
>The very real disadvantage of this scheme is that it produces a
>dirty copy of the line. By a dirty copy, I mean one that can be
>scribbled on elsewhere before we are done with it. This will
>happen whenever fgetline is called anywhere else with any FILE
>argument before our next read. This is double plus ungood.

As a note, producing dirty copies "works" provided that there is
only one file being read at a time. Don't count on it.
>
>Is there a way to get the efficiency of the highwater scheme
>without being dirty? Yes; the trick is that the calling program
>provides the initial buffer. If we add the buffer information to
>
>the prototype we get:
>
> fgetline_info fgetline(FILE * fptr,
> char * buffer,
> size_t size,
> size_t maxsize);
>
This is a worst of all worlds prototype - it combines an awkward
return type with a cumbersome calling sequence.

>There are three kinds of copies of the line that we might get
>from fgetline - clean, transient, and dirty. A clean copy is one
>
>that has no encumbrances - it lasts until it is specifically
>deleted. A transient copy is one that lasts until the next
>call to fgetline to get another line from the current file.
>(There may be another file being read elsewhere.)

This distinction between kinds of copies that we might get is a
critical one; without it either a default is chosen without
weighing its merits or, worse yet, concepts are mixed and subtle
bugs can slip in.
>
>What kind of copy should fgetline provide, clean or transient?
>There are arguments for each choice. Clean copies are more
>expensive but safer. Transient copies are (usually) cheaper. In
>
>questions like this there is much to be said for the design
>principle that:
>
> When there is a significant choice as to the kind of
> output being delivered the library routine should let
> the user make the choice rather than dictating the
> choice unless offering the choice unduly complicates
> usage.

This is a good maxim but it is not the whole story. If one choice
commits the user and the other doesn't, provide the noncommital
result. In this case providing clean copies is the commital
choice; if the routine provides a clean copy the user cannot opt
to get a transient copy. The choice commits the user to a malloc
cost and an eventual free cost. If the routine returns a
transient copy the user can duplicate it to get a clean copy.

>
>So how do we provide a choice? That is simple; if the buffer
>pointer in the calling sequence is NULL, fgetline must return a
>clean copy; otherwise fgetline MAY return a transient copy. I
>say "may" because it might have to increase the buffer size.
>If it does the calling routine will have to check whether there
>was a size increase and, if so, will have to free the returned
>buffer. (The fgetline implementation can't realloc the passed
>in buffer; it will have to do a malloc for the first resizing.)

This is an example of a serious error in design. Let us
accept for the moment that we want to offer a choice; naturally
we ask how to do it. What I did was to envisage an implementation
and treat it as the only alternative. Instead of thinking in
terms of a potential implementation I should have thought in
terms of API alternatives. An obvious alternative is to set a
flag instead.

Beating this piece of ground where a horse once died some
more, using different data values to select modes is a suspect
practice, one that one should be wary about using. In this case
the value of the buffer pointer (null or not) was used as a flag.

A more serious issue is that there are alternatives for
supplying the space. They include:

1. The user declares an array on the stack.
2. The user declares a structure that contains an array.
3. The user allocates space using malloc.
4. The read routine allocates the space.

In alternatives 1 and 2 there need not be calls to malloc at all.
However care must be taken to avoid "freeing" stack storage.In
alternatives 3 and 4 there is no need for a separate buffer.


>
>It's probably best to have a value in the status field that
>signifies the size has been increased; call that value
>fg_increased and a normal read fg_normal. Then the usage for
>transient copies might look like this (first cut):
>
> ...
> struct fgetline_info fg;
> char buffer[80];
> int done = 0;
> ....
> for(;!done;) {
> ...
> fg = fgetline(fptr,buffer,80,FG_MAXSIZE);
> if (fg.status == fg_normal || fg.status == fg_increased) {
> /* do stuff */
> if (fg.status == fg_increased) free(fg.line);
> } else done = 1;
> }
>
>If we want clean copies the corresponding loop body would be
>
> fg = fgetline(fptr,0,0,FG_MAXSIZE);
> if (fg.status == fg_normal) {
> /* do stuff */
> free(fg.line);
> } else done = 1;
>

This schematic code illustrates why it was a bad idea to return a
struct and to put the line pointer in the struct. Also it should
be be easy enough for the implementation to free that pesky
increased line; having the user do it is an unnecessary
complication.

>What sort of return values should be available in the status
>field? These are the successful ones that occur to me:
>
> end_of file The last read (if any) found an EOL
> marker. The current read finds an
> immediate EOF.
>
> no_increase Normal read - either buffer was null or
> else it was not increased.
>
> increase Normal read - the buffer size has been
> increased.
>
> abn_no_increase Abnormal read - an EOF was found without
> an EOL. Buffer was not increased.
>
> abn_increase Abnormal read - an EOF was found without
> an EOL. Buffer was increased.

The increase/no_increase distinction is unnecessary. That
leaves three cases, normal EOF, normal line read, and a premature
EOF in the final line.

One issue not covered was what to do about overly long lines.
There are several possible actions, e.g.,

* We could treat it as an error condition.
* We could treat it as an end of data, reserving the option to
pick up later where we left off.
* We could cut the long line into pieces maxlen characters
long.
* We could chop out all characters in the long line after the
first maxlen characters.
* We could skip the long line and proceed to the next line.

Clearly the routine should not decide on its own what action
to take; the user must specify what action to take by passing a
flag. If the user does not specify an action the default should
be to treat a long line as an error condition.

>
>In addition there are numerous kinds of errors that can occur.
>Calling sequence arguments include:
>
> no_file The file pointer is null.
> bad_buffer One and only one of buffere and size s
> zero.
> bad_size Size is 0 or is greater than maxsize
> bad_maxsize Maxsize is 0
>
>Then there are the memory allocation failures:
>
> bad_allocate Malloc or realloc failure
> big_line Line length is greater than maxsize.
>
>When there is a memory allocation failure the line and len fields
>hold what has been successfuly read.

This way of proceeding - listing error tags and short
explanations - has its faults; it comes from focusing too heavily
on potential code. For example, the calling sequence argument
errors only apply to a particular choice of calling sequence. A
better approach might be to try to analyze the kinds of errors
that can occur. Briefly, these include:

* A format fault in the file: This can either be a premature
EOF or an overly large line. This is not necessarily an error
since the user can opt to accept the faulty lines.
* Calling sequence error: The calling sequence is faulty in
some way. The key here is that the constraints on the calling
sequence should be spelled out at some point.
* Memory allocation error: Malloc/realloc fails. Usually, but
not necessarily, this is a fatal error.
* I/O error: Something goes wrong while trying to read the
file.

What action should be taken is quite another matter. Users
may or may not have an error management strategy. If there is
one, the routine shouldn't pre-empt it. At a minimum the routine
should return an error code. The objection to returning an error
code is that the conscientious user is then stuck with adding
error handling code. A useful technique is to pass to the routine
flags telling what to do about errors, e.g., write to a log file,
call exit, etc.

>
>There are various conventions one could use for assigning code
>values for the possible return values. My view is that there
>should be a bit that is set only if there was an error, a bit
>that is set only if there was a memory allocation error, a bit
>that is set only if there was a buffer increase, and a bit that
>is set only if an EOL occurred.
>
>The point of doing this is that we can have simple tests to check
>
>whether an error occurred, whether some space has to be freed,
>and whether the file read is complete.

This is a useful technique. However to avoid magic numbers
disease the positions of the various flag bits should be defined
separately in the include file that spells out the file read
package prototypes.
>
>As a final note, there is one minor decision left unsettled,
>namely should the returned line include an EOL marker before the
>string terminating 0. My take is that this matter is too trivial
>
>to warrant adding a flag to the calling sequence, and that it
>will be slightly less confusing if there is one present even if
>it has to be manufactured. Perhaps someone has a convincing
>argument one way or the other.

The issue here is whether the delivered line contains \n (the EOL
character) or not. In other do lines look like abc\n\0 or abc\0?
Since we are already committed to using flags we may as well have
a flag to select whether to include the EOF character; the
appropriate default is to not include it.

At this point we have arrived at the end of the original
discussion. It went off track, but it was thought provoking.

Finally, what should the interface look like? I now believe that
the prototype should look like this:

int getfline(FILE * fptr,
char ** line_ptr,
long * status,
size_t * length_ptr,
size_t maxlen);

Normal usage would be to declare line, status, and len as char *,
long, and size_t respectively. Getfline returns 1 if the read was
normal and 0 if it was not. It would normally be used in a while
loop, e.g.,

while (getfline(fptr,&line,&status,&len,maxlen) {
/* Code to process line */
}
/* Error testing (if any) goes here
Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die
From: Charlie Gordon on
"Richard Harter" <cri(a)tiac.net> a �crit dans le message de news:
4729fd77.329292093(a)news.sbtc.net...
>
<snipped long interesting discussion>
>
> Finally, what should the interface look like? I now believe that
> the prototype should look like this:
>
> int getfline(FILE * fptr,
> char ** line_ptr,
> long * status,
> size_t * length_ptr,
> size_t maxlen);
>
> Normal usage would be to declare line, status, and len as char *,
> long, and size_t respectively. Getfline returns 1 if the read was
> normal and 0 if it was not. It would normally be used in a while
> loop, e.g.,
>
> while (getfline(fptr,&line,&status,&len,maxlen) {
> /* Code to process line */
> }
> /* Error testing (if any) goes here

You did not document what the parameters mean. Let me infer from the
discussion that:

- FILE *fptr is the stream from which to read characters.
- char **line_ptr is the address of a pointer to the buffer for reading the
line. This buffer will be reallocated with realloc or malloc if needed,
under control of the flags. Its initial value can be NULL.
- long *status is an input/output parameter: it points to a long with
input/output flags:
o whether the buffer was allocated.
o whether is should be extended
o whether the newline character should be stored in the buffer.
o whether there was a read error, an EOF without a newline.
o whether the line was truncated
o whether to skip the truncated part or split it into separate lines.
- size_t *length_ptr points to a size_t containing the current size of the
allocated buffer. It will be updated if reallocation occurs.
- size_t maxlen is the maximum size for buffer reallocation.

the int return value is already documented as a boolean value indicating
normal or abnormal conditions.

This API still need work:

- the maxlen and length_ptr names are misleading, they should be maxsize
and bufsize_ptr. They are sizes in bytes, not the length of the line that
can be read into the buffer. This kind of misnomer tends to produce off by
one errors in implementations and calling code.
- the bufsize_ptr should be passed right after the line_ptr as they are
stronly related.
- there is no need to make the flags a long, int should suffice.
- the return value could be more informative: we could make it an ssize_t to
return the length of the line read or EOF upon end of file. It would also
be more intuitive as a replacement for fgets() if it returned a pointer to
the char buffer.

Input/output parameters are a bit confusing, it might be more effective to
declare a structure for all these fields and pass a pointer to it.

struct getfline_args {
char *buffer;
size_t buffer_size;
char *allocated_buffer;
size_t max_alloc_size;
int flags; // or even better: define flags as bitfields
size_t read_length;
};

char *getfline(FILE *fp, struct getfline_args *gp);

/* example of use */

char buffer[BUFSIZ];
struct getfline_args args = { buffer, sizeof(buffer), NULL, SIZE_MAX };

while (getfline(fp, &args)) {
/* do something with args.buffer contents */
}
free(args.allocated_buffer);

--
Chqrlie.


From: Richard Harter on
Below is a "man page" for a file read utility. An HTML version
can be found at http://home.tiac.net/~cri/2007/gflspec.html. It
is not guaranteed to be typo or brain fart clean.

----

SYNOPSIS:

#include "getfline.h"

int getfline(FILE * fptr,
char ** line_ptr,
size_t * length_ptr,
size_t * bufsize_ptr,
size_t maxlen,
long * flags);

Note: getfline.h is guaranteed to include stdio.h.

DESCRIPTION

Function getfline reads lines from a stream. Each time it is
called it fetches the next line until there are none left.
Getfline has fairly elaborate error detection and reporting
facilities. It also permits minimizing storage allocation and
deallocation.

CALLING SEQUENCE ARGUMENTS:

Argument fptr is the stream from which to read characters. It
must have previously been opened with fopen. fptr==NULL is an
argument error; other invalid values for fptr produce an I/O
error.

Argument line_ptr is the address of a pointer to the buffer for
reading the line. In "clean copy" mode the previous value of
line_ptr is overwritten with a pointer to a newly allocated
buffer; the user is responsible for freeing all buffers. In
"transient mode" the value of line_ptr is controlled by getfline;
it should initially be NULL.

Argument length_ptr is a pointer to a size_t variable that holds
the length of the line when there is a successful read.

Argument bufsize_ptr is a pointer to a size_t variable that holds
the allocated size of the buffer holding the line.

Argument maxlen is the maximum line length that getfline will
return.

Finally argument flags is a pointer to a status word. The status
word is divided into two sections, input flags and output flags.
When getfline is called it uses the input flags to determine the
course of action. When it returns the status word will hold
output flags that indicate what happened during the read. It
should be initialized by oring together the desired input flags.

Function return

Getfline returns 1 if a line was returned and 0 if no line was
returned. A return of 0 can either indicate the EOF was reached
or that the read was terminated by an error.

USAGE MODES

Getfline can produce either "clean copies" or "transient copies".
A clean copy is one that has no encumbrances - it lasts until it
is specifically deleted. A transient copy is one that lasts until
the next call to getfline to get another line from the current
file. Transient copy mode will be used if the GFL_FLAG_TRANS flag
is set; clean copy mode is the default.

In clean copy mode getfline allocates a new buffer for each line
that is read. The user is responsible for freeing the storage for
the lines. Clean copy mode is appropriate when we want to keep
part or all of the file in memory. In clean copy mode the values
of items pointed to by line_ptr, length_ptr, and bufsize_ptr are
overwritten; the previous values are ignored.

In transient copy mode getfline reuses the line buffer; the
previous contents, if any, will be overwritten. Getfline handles
buffer storage management; the user does not have to allocate or
free space for lines. Transient copy mode is appropriate when the
contents of a line are immediately processed. Transient copy mode
may be more efficient than clean copy mode because it minimizes
the number of calls to malloc and free.

In transient copy mode the line buffer pointer must initially
either be NULL or point to storage allocated by malloc, and
bufsize must contain the allocated size of the line buffer. The
user should not alter these variables after initializing them.
When a read fails the line buffer storage is freed by getfline
and the line pointer is cleared.

HANDLING ANOMALOUS LINES

There are two kinds of anomalous lines that can occur, a
prematurely terminated last line (i.e., one that lacks an EOL
(\n) marker), and lines that are longer than maxlen. A
prematurely terminated last line will be treated as error unless
the GFL_FLAG_ADDEOL flag is set. Whether or not it is treated as
an error, the GFL_RD_BADEOF flag will be set.

Getfline offers four different ways to handle "long" lines; they
can be treated as errors, they can be omitted, they can be
truncated, or they can be cut into pieces that are maxlen bytes
long. Regardless of the method chosen, if the line is long the
GFL_RD_LONG flag will be set in the status word.

The default is to treat "long" lines as errors. One of the other
three choices can be setting one of the following three flags:
GFL_FLAG_OMIT, GFL_FLAG_TRUNC, or GFL_FLAG_CUT. Only one of these
flags can be set; setting more than one is an arguments error.

There are two flags, GFL_RD_LONG and GFL_RD_BADEOF, that are set
if the current line being read is anomalous. These flags are set
even if the line is acceptable. The GFL_RD_BADEOF flag is set if
the last line was prematurely terminated. The GFL_RD_LONG if the
current line is long. If the GFL_FLAG_CUT is set the
GFL_RD_BADEOF flag will not be set when the final piece of a long
line is read.

RESPONSES TO ERRORS BY GETFLINE

If there is an error getfline will set three flag bits in the
status word. One is the general error flag, GFL_ERR; the second
is a error category flag; and the third is the specific error.
There are two error categories, argument errors, and execution
errors; the corresponding flags are GFL_ARG and GFL_EX. There is
a separate flag for an error (usually the argument being a NULL
pointer) in each of the arguments.

There are three different execution errors, I/O errors, storage
allocation errors, and unacceptable line errors. An I/O error is
reported if there is a read error. A storage allocation error if
malloc or realloc fails. An unacceptable line is either a long
line or a prematurely terminated line for which there is no input
flag.

There are two input flags, GFL_FLAG_LOG and GFL_FLAG_EXIT, that
specify actions that getfline should take if there is an error.
If the GFL_FLAG_LOG flag is set getfline will write an error
message to stderr when there is an error. If the GFL_FLAG_EXIT is
set, getfline will call exit when there is an error.

SAMPLE USAGE

Here are a couple of usage examples. For each we assume the
following includes and declarations:

#include "getfline.h"
....
FILE *fptr;
char *line = 0;
size_t len, bufsize = 0;
long status;

Example one illustrates clean copy mode. In example one we are
reading a file called somefile.txt. Each line is passed to
processing function that takes ownership of the lines and the
responsibility for freeing their storage.

fptr = fopen("somefile.txt","r");
while(getfline(fptr,&line,&bufsize,1024,&status)) {
process_line(line);
}
if (status & GFL_ERR) fprintf(stderr,"Oops\n");
if (fptr) fclose(fptr);

Example two illustrates transient copy mode. In example two we
are reading input from stdin and writing it to stdout. However in
lines longer than 80 characters we insert new line characters
every 80 characters.

status = GFL_FLAG_CUT | GFL_FLAG_LOG;
while(getfline(stdin,&line,&bufsize,80,&status)) {
fprintf(stdout,"%s\n",line);
}

TABLE OF FLAGS

Input flags:
GFL_FLAG_TRANS Provide transient copies. The default is
clean copies.
GFL_FLAG_EOFOK Says it's okay if the last line is
terminated by a premature EOF. The
default is to treat a premature EOF
as an error.
GFL_FLAG_CUT Says to use all of a long line but
break into pieces of length maxlen.
GFL_FLAG_TRUNC Says to use the first maxlen bytes
of a long line and discard the rest.
GFL_FLAG_OMIT Ignore long lines.
GFL_FLAG_EXIT Exit if there is an execution error.
GFL_FLAG_LOG Write an error message to stderr if
there is an execution error.

General error flags:
GFL_ERR This is set if there was any error.
GFL_ARG This is set if there are any
argument errors.
GFL_EX This is set if there are any
execution errors.

Input argument error flags:
GFL_ARG_FILE An invalid file pointer (presumably
a null pointer) was passed.
GFL_ARG_LINE The pointer to the line argument was
a null pointer.
GFL_ARG_LENGTH The pointer to the length_ptr argument
was a null pointer.
GFL_ARG_BUFSIZE The pointer to the bufsize_ptr argument
was a null pointer.
GFL_ARG_STATUS The pointer to the status argument
was a null pointer.
GFL_ARG_FLAGS Multiple long line action flags are set.

Execution error flags:
GFL_EX_IO There was an I/O error in trying to
read the file.
GFL_EX_STORAGE There was an error in the storage
allocation routines.
GFL_EX_READ The file had a format error (long
line or premature EOF) and no flag
was set for handling the error.

Anomalous line flags:
GFL_RD_LONG This is set if the line is long.
GFL_RD_BADEOF This is set if there was a premature
end of file.


Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die
From: Richard Harter on
On Wed, 7 Nov 2007 11:08:30 +0100, "Charlie Gordon"
<news(a)chqrlie.org> wrote:

>"Richard Harter" <cri(a)tiac.net> a �crit dans le message de news:
>4729fd77.329292093(a)news.sbtc.net...
>>
><snipped long interesting discussion>
>>
>> Finally, what should the interface look like? I now believe that
>> the prototype should look like this:
>>
>> int getfline(FILE * fptr,
>> char ** line_ptr,
>> long * status,
>> size_t * length_ptr,
>> size_t maxlen);
>>
>> Normal usage would be to declare line, status, and len as char *,
>> long, and size_t respectively. Getfline returns 1 if the read was
>> normal and 0 if it was not. It would normally be used in a while
>> loop, e.g.,
>>
>> while (getfline(fptr,&line,&status,&len,maxlen) {
>> /* Code to process line */
>> }
>> /* Error testing (if any) goes here
>
>You did not document what the parameters mean. Let me infer from the
>discussion that:

I got tired of writing; I should have persevered.
>
>- FILE *fptr is the stream from which to read characters.
>- char **line_ptr is the address of a pointer to the buffer for reading the
>line. This buffer will be reallocated with realloc or malloc if needed,
>under control of the flags. Its initial value can be NULL.
>- long *status is an input/output parameter: it points to a long with
>input/output flags:
> o whether the buffer was allocated.
> o whether is should be extended
> o whether the newline character should be stored in the buffer.
> o whether there was a read error, an EOF without a newline.
> o whether the line was truncated
> o whether to skip the truncated part or split it into separate lines.
>- size_t *length_ptr points to a size_t containing the current size of the
>allocated buffer. It will be updated if reallocation occurs.
>- size_t maxlen is the maximum size for buffer reallocation.

These are all correct except for length_ptr, which returns the
length of the line that strlen would return. The prototype above
has a bug - there must also be an argument that contains the
buffer size. In a separate, rather long winded post, I have
spelled out an entire description.
>
>the int return value is already documented as a boolean value indicating
>normal or abnormal conditions.
>
>This API still need work:
>
>- the maxlen and length_ptr names are misleading, they should be maxsize
>and bufsize_ptr. They are sizes in bytes, not the length of the line that
>can be read into the buffer. This kind of misnomer tends to produce off by
>one errors in implementations and calling code.

As noted above there should be a separate bufsize_ptr argument.
My take on this is that the user shouldn't be messing with the
buffer size or the maximum buffer size at all. What the user
gets is a buffer containing the line as a \0 terminated string
(that may not be in the spec) and its length. Ergo, the limit
should be the maximum line length, not the maximum buffer size.
My view is that if the user hits an "off by one" error the user
probably isn't looking for null terminated strings. That said,
I can think of cases where maxbuf would be the "right" thing to
do.

>- the bufsize_ptr should be passed right after the line_ptr as they are
>stronly related.

I disagree. The related things are the pointer to the line and
the length of the line; they should be together. The bufsize_ptr
is part of the passing state info hack.

>- there is no need to make the flags a long, int should suffice.

See the "man page" post - I ended up with 22 flags. A long is
needed for portability.

>- the return value could be more informative: we could make it an ssize_t to
>return the length of the line read or EOF upon end of file. It would also
>be more intuitive as a replacement for fgets() if it returned a pointer to
>the char buffer.

I forget whether ssize_t is posix or C99 but AFAIK it isn't C89.
I could be convinced that it should return a pointer to the char
buffer. The thing that I worried about is whether it might be
confusing because the buffer pointer is also going through the
calling sequence.
>
>Input/output parameters are a bit confusing, it might be more effective to
>declare a structure for all these fields and pass a pointer to it.

I waffled on this one; perhaps I waffled the wrong way. What I
concluded was that it came to much the same thing in that the
calling sequence and a compact structure initialization look much
the same, and adding a struct just adds more machinery.

>
>struct getfline_args {
> char *buffer;
> size_t buffer_size;
> char *allocated_buffer;
> size_t max_alloc_size;
> int flags; // or even better: define flags as bitfields
> size_t read_length;
>};

Buffer and allocated buffer? Yes, bitfields would be much
better.
>
>char *getfline(FILE *fp, struct getfline_args *gp);
>
>/* example of use */
>
>char buffer[BUFSIZ];
>struct getfline_args args = { buffer, sizeof(buffer), NULL, SIZE_MAX };

I get the impression that you have a different kind of
implementation in mind than I do. My thinking is that the user
does not declare buffer as an array. Things get messy if we
start mixing up automatic and allocated storage. So: no
char buffer[BUFSIZ];
Also the initialization looks suspect.
>
>while (getfline(fp, &args)) {
> /* do something with args.buffer contents */
>}
>free(args.allocated_buffer);

The final free isn't needed. Getfline can free the allocated
buffer when it hits a terminating condition.

Thanks muchly for the comments. I think I will follow most of
your suggestions. AFAICT bit fields would be the right way
to go. Using a struct to hold most of the stuff might be
cleaner, though I'm still waffling on that one. Returning a
pointer to the line is probably right. Ugh, I hate rewriting.



Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die
From: Eric Sosman on
Richard Harter wrote On 11/09/07 10:34,:
> Below is a "man page" for a file read utility. An HTML version
> can be found at http://home.tiac.net/~cri/2007/gflspec.html. It
> is not guaranteed to be typo or brain fart clean.
>
> ----
>
> SYNOPSIS:
>
> #include "getfline.h"
>
> int getfline(FILE * fptr,
> char ** line_ptr,
> size_t * length_ptr,
> size_t * bufsize_ptr,
> size_t maxlen,
> long * flags);
> [...]

Isn't there some way you can find an excuse to add
a couple more arguments? Six is too many for most people
to keep straight, but you may as well try to confuse the
geniuses, too. ;-)

Kidding aside, an argument list of that length suggests
to me that the function may be trying to be too many things
to too many people at the same time. It might be wise to
give up some functionality to improve ease of use; the net
change in usefulness could in fact be positive.

--
Eric.Sosman(a)sun.com
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: problem analysis chart
Next: Please help!