Prev: problem analysis chart
Next: Please help!
From: Richard Harter on 1 Nov 2007 12:46 On Thu, 11 Oct 2007 23:05:29 GMT, cri(a)tiac.net (Richard Harter) wrote: I have taken the original posting and have commented on it at length. Some people might enjoy the critique. In a separate post I will give a (hopefully) complete specification. This discussion and the spec will also go into web pages. I appreciate that such an extended discussion drifts into the area of goldplated bullshit; so be it. >What we want is a routine that keeps reading and returning lines >from a file. A simple prototype for this function is > > char * fgetline(FILE *); The names, getline and fgetline, are already in common use. In the end I decided to use getfline instead. > >There are some gotchas with this prototype - well, not really >gotchas, rather little bits of awkwardness and inefficiency. One > >is that we are throwing away information that (usually) has to be > >recomputed, namely the length of the line and, as we shall see, >status information. If we want to add the information to our >prototype there are several that to do it - it can be >passed back through the calling sequence or it can be returned by > >the function. (We could also pass it to a global - ugh, don't do > >it.) My vote is to have it all returned by the function which >means we create a struct (record) that looks like: > > struct fgetline_info { > char *line; > size_t len; > int status; > }; > >There are various choices that could be made in the types - >season to taste. (A pointer to the struct could be passed in as >an argument if need be.) > > >A second issue is that there is no limit to how large the line >can be. A plausible way to provide a limit that is to pass one >in as an argument. If we do, our prototype looks like this: > > struct fgetline_info fgetline(FILE * fptr, size_t maxsize); > Here I went wrong. Having a struct (record) to hold information is plausible, but it may be overkill. Be that as it may, in a C implementation this prototype doesn't work too well. First of all the user has to go through the struct to access the line and the status field. More importantly the normal usage in C would be a while loop, e.g., struct fgetline_info fg; ... while (fg = fgetline(FILE *fptr, size_t maxsize)) {...} This doesn't work - the idiom requires a test on something that can be "zero". If we convert the prototype and fg to a pointer the code still doesn'twork. Now the problem is that when the read is successful we don't need the status field; when the read fails we don't have the status field. >This version has a little problem - where did the storage for the > >line come from? One way to deal with that is for fgetline to >allocate storage for the line and for the calling routine to free > >it when it is done with the line. > >Okay, so how do we do this in fgetline? Well, there is a very >simple way to do it, one that has been reinvented time and time >again. We start out allocating a standard amount of space >(numbers like 32, 64, and 80 bytes are typical) for a line. >We read from the file until we either hit an EOL (end of line >marker), a failure to read, or we have read as many characters as > >were allocated. If we haven't hit the EOL we reallocate the >array, commonly by doubling the size, and doing another read. > >This works, but it is inefficient - we have to call malloc and >free for every line. Whether the cost of calls to malloc and free matters obviously depends on the application. However (a) the costs are commonly underestimated, and (b) a general policy of ignoring apparently minor costs can cumulate and produce unexpected inefficiencies. >One way to get around this is to use a >technique I call highwater buffering. In the file containing the > >code for fgetline we have two file-scope variables: > > static char * buffer = 0; > static size_t size = 0; > >In fgetline we have some initialization code that allocates >buffer space when the buffer size is zero. Thereafter we use our > >little doubling trick. The advantage of this scheme is that we >have at most a small number of calls to malloc (the number of >doublings needed to get to the largest line) and no calls to >free. > >The very real disadvantage of this scheme is that it produces a >dirty copy of the line. By a dirty copy, I mean one that can be >scribbled on elsewhere before we are done with it. This will >happen whenever fgetline is called anywhere else with any FILE >argument before our next read. This is double plus ungood. As a note, producing dirty copies "works" provided that there is only one file being read at a time. Don't count on it. > >Is there a way to get the efficiency of the highwater scheme >without being dirty? Yes; the trick is that the calling program >provides the initial buffer. If we add the buffer information to > >the prototype we get: > > fgetline_info fgetline(FILE * fptr, > char * buffer, > size_t size, > size_t maxsize); > This is a worst of all worlds prototype - it combines an awkward return type with a cumbersome calling sequence. >There are three kinds of copies of the line that we might get >from fgetline - clean, transient, and dirty. A clean copy is one > >that has no encumbrances - it lasts until it is specifically >deleted. A transient copy is one that lasts until the next >call to fgetline to get another line from the current file. >(There may be another file being read elsewhere.) This distinction between kinds of copies that we might get is a critical one; without it either a default is chosen without weighing its merits or, worse yet, concepts are mixed and subtle bugs can slip in. > >What kind of copy should fgetline provide, clean or transient? >There are arguments for each choice. Clean copies are more >expensive but safer. Transient copies are (usually) cheaper. In > >questions like this there is much to be said for the design >principle that: > > When there is a significant choice as to the kind of > output being delivered the library routine should let > the user make the choice rather than dictating the > choice unless offering the choice unduly complicates > usage. This is a good maxim but it is not the whole story. If one choice commits the user and the other doesn't, provide the noncommital result. In this case providing clean copies is the commital choice; if the routine provides a clean copy the user cannot opt to get a transient copy. The choice commits the user to a malloc cost and an eventual free cost. If the routine returns a transient copy the user can duplicate it to get a clean copy. > >So how do we provide a choice? That is simple; if the buffer >pointer in the calling sequence is NULL, fgetline must return a >clean copy; otherwise fgetline MAY return a transient copy. I >say "may" because it might have to increase the buffer size. >If it does the calling routine will have to check whether there >was a size increase and, if so, will have to free the returned >buffer. (The fgetline implementation can't realloc the passed >in buffer; it will have to do a malloc for the first resizing.) This is an example of a serious error in design. Let us accept for the moment that we want to offer a choice; naturally we ask how to do it. What I did was to envisage an implementation and treat it as the only alternative. Instead of thinking in terms of a potential implementation I should have thought in terms of API alternatives. An obvious alternative is to set a flag instead. Beating this piece of ground where a horse once died some more, using different data values to select modes is a suspect practice, one that one should be wary about using. In this case the value of the buffer pointer (null or not) was used as a flag. A more serious issue is that there are alternatives for supplying the space. They include: 1. The user declares an array on the stack. 2. The user declares a structure that contains an array. 3. The user allocates space using malloc. 4. The read routine allocates the space. In alternatives 1 and 2 there need not be calls to malloc at all. However care must be taken to avoid "freeing" stack storage.In alternatives 3 and 4 there is no need for a separate buffer. > >It's probably best to have a value in the status field that >signifies the size has been increased; call that value >fg_increased and a normal read fg_normal. Then the usage for >transient copies might look like this (first cut): > > ... > struct fgetline_info fg; > char buffer[80]; > int done = 0; > .... > for(;!done;) { > ... > fg = fgetline(fptr,buffer,80,FG_MAXSIZE); > if (fg.status == fg_normal || fg.status == fg_increased) { > /* do stuff */ > if (fg.status == fg_increased) free(fg.line); > } else done = 1; > } > >If we want clean copies the corresponding loop body would be > > fg = fgetline(fptr,0,0,FG_MAXSIZE); > if (fg.status == fg_normal) { > /* do stuff */ > free(fg.line); > } else done = 1; > This schematic code illustrates why it was a bad idea to return a struct and to put the line pointer in the struct. Also it should be be easy enough for the implementation to free that pesky increased line; having the user do it is an unnecessary complication. >What sort of return values should be available in the status >field? These are the successful ones that occur to me: > > end_of file The last read (if any) found an EOL > marker. The current read finds an > immediate EOF. > > no_increase Normal read - either buffer was null or > else it was not increased. > > increase Normal read - the buffer size has been > increased. > > abn_no_increase Abnormal read - an EOF was found without > an EOL. Buffer was not increased. > > abn_increase Abnormal read - an EOF was found without > an EOL. Buffer was increased. The increase/no_increase distinction is unnecessary. That leaves three cases, normal EOF, normal line read, and a premature EOF in the final line. One issue not covered was what to do about overly long lines. There are several possible actions, e.g., * We could treat it as an error condition. * We could treat it as an end of data, reserving the option to pick up later where we left off. * We could cut the long line into pieces maxlen characters long. * We could chop out all characters in the long line after the first maxlen characters. * We could skip the long line and proceed to the next line. Clearly the routine should not decide on its own what action to take; the user must specify what action to take by passing a flag. If the user does not specify an action the default should be to treat a long line as an error condition. > >In addition there are numerous kinds of errors that can occur. >Calling sequence arguments include: > > no_file The file pointer is null. > bad_buffer One and only one of buffere and size s > zero. > bad_size Size is 0 or is greater than maxsize > bad_maxsize Maxsize is 0 > >Then there are the memory allocation failures: > > bad_allocate Malloc or realloc failure > big_line Line length is greater than maxsize. > >When there is a memory allocation failure the line and len fields >hold what has been successfuly read. This way of proceeding - listing error tags and short explanations - has its faults; it comes from focusing too heavily on potential code. For example, the calling sequence argument errors only apply to a particular choice of calling sequence. A better approach might be to try to analyze the kinds of errors that can occur. Briefly, these include: * A format fault in the file: This can either be a premature EOF or an overly large line. This is not necessarily an error since the user can opt to accept the faulty lines. * Calling sequence error: The calling sequence is faulty in some way. The key here is that the constraints on the calling sequence should be spelled out at some point. * Memory allocation error: Malloc/realloc fails. Usually, but not necessarily, this is a fatal error. * I/O error: Something goes wrong while trying to read the file. What action should be taken is quite another matter. Users may or may not have an error management strategy. If there is one, the routine shouldn't pre-empt it. At a minimum the routine should return an error code. The objection to returning an error code is that the conscientious user is then stuck with adding error handling code. A useful technique is to pass to the routine flags telling what to do about errors, e.g., write to a log file, call exit, etc. > >There are various conventions one could use for assigning code >values for the possible return values. My view is that there >should be a bit that is set only if there was an error, a bit >that is set only if there was a memory allocation error, a bit >that is set only if there was a buffer increase, and a bit that >is set only if an EOL occurred. > >The point of doing this is that we can have simple tests to check > >whether an error occurred, whether some space has to be freed, >and whether the file read is complete. This is a useful technique. However to avoid magic numbers disease the positions of the various flag bits should be defined separately in the include file that spells out the file read package prototypes. > >As a final note, there is one minor decision left unsettled, >namely should the returned line include an EOL marker before the >string terminating 0. My take is that this matter is too trivial > >to warrant adding a flag to the calling sequence, and that it >will be slightly less confusing if there is one present even if >it has to be manufactured. Perhaps someone has a convincing >argument one way or the other. The issue here is whether the delivered line contains \n (the EOL character) or not. In other do lines look like abc\n\0 or abc\0? Since we are already committed to using flags we may as well have a flag to select whether to include the EOF character; the appropriate default is to not include it. At this point we have arrived at the end of the original discussion. It went off track, but it was thought provoking. Finally, what should the interface look like? I now believe that the prototype should look like this: int getfline(FILE * fptr, char ** line_ptr, long * status, size_t * length_ptr, size_t maxlen); Normal usage would be to declare line, status, and len as char *, long, and size_t respectively. Getfline returns 1 if the read was normal and 0 if it was not. It would normally be used in a while loop, e.g., while (getfline(fptr,&line,&status,&len,maxlen) { /* Code to process line */ } /* Error testing (if any) goes here Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Charlie Gordon on 7 Nov 2007 05:08 "Richard Harter" <cri(a)tiac.net> a �crit dans le message de news: 4729fd77.329292093(a)news.sbtc.net... > <snipped long interesting discussion> > > Finally, what should the interface look like? I now believe that > the prototype should look like this: > > int getfline(FILE * fptr, > char ** line_ptr, > long * status, > size_t * length_ptr, > size_t maxlen); > > Normal usage would be to declare line, status, and len as char *, > long, and size_t respectively. Getfline returns 1 if the read was > normal and 0 if it was not. It would normally be used in a while > loop, e.g., > > while (getfline(fptr,&line,&status,&len,maxlen) { > /* Code to process line */ > } > /* Error testing (if any) goes here You did not document what the parameters mean. Let me infer from the discussion that: - FILE *fptr is the stream from which to read characters. - char **line_ptr is the address of a pointer to the buffer for reading the line. This buffer will be reallocated with realloc or malloc if needed, under control of the flags. Its initial value can be NULL. - long *status is an input/output parameter: it points to a long with input/output flags: o whether the buffer was allocated. o whether is should be extended o whether the newline character should be stored in the buffer. o whether there was a read error, an EOF without a newline. o whether the line was truncated o whether to skip the truncated part or split it into separate lines. - size_t *length_ptr points to a size_t containing the current size of the allocated buffer. It will be updated if reallocation occurs. - size_t maxlen is the maximum size for buffer reallocation. the int return value is already documented as a boolean value indicating normal or abnormal conditions. This API still need work: - the maxlen and length_ptr names are misleading, they should be maxsize and bufsize_ptr. They are sizes in bytes, not the length of the line that can be read into the buffer. This kind of misnomer tends to produce off by one errors in implementations and calling code. - the bufsize_ptr should be passed right after the line_ptr as they are stronly related. - there is no need to make the flags a long, int should suffice. - the return value could be more informative: we could make it an ssize_t to return the length of the line read or EOF upon end of file. It would also be more intuitive as a replacement for fgets() if it returned a pointer to the char buffer. Input/output parameters are a bit confusing, it might be more effective to declare a structure for all these fields and pass a pointer to it. struct getfline_args { char *buffer; size_t buffer_size; char *allocated_buffer; size_t max_alloc_size; int flags; // or even better: define flags as bitfields size_t read_length; }; char *getfline(FILE *fp, struct getfline_args *gp); /* example of use */ char buffer[BUFSIZ]; struct getfline_args args = { buffer, sizeof(buffer), NULL, SIZE_MAX }; while (getfline(fp, &args)) { /* do something with args.buffer contents */ } free(args.allocated_buffer); -- Chqrlie.
From: Richard Harter on 9 Nov 2007 10:34 Below is a "man page" for a file read utility. An HTML version can be found at http://home.tiac.net/~cri/2007/gflspec.html. It is not guaranteed to be typo or brain fart clean. ---- SYNOPSIS: #include "getfline.h" int getfline(FILE * fptr, char ** line_ptr, size_t * length_ptr, size_t * bufsize_ptr, size_t maxlen, long * flags); Note: getfline.h is guaranteed to include stdio.h. DESCRIPTION Function getfline reads lines from a stream. Each time it is called it fetches the next line until there are none left. Getfline has fairly elaborate error detection and reporting facilities. It also permits minimizing storage allocation and deallocation. CALLING SEQUENCE ARGUMENTS: Argument fptr is the stream from which to read characters. It must have previously been opened with fopen. fptr==NULL is an argument error; other invalid values for fptr produce an I/O error. Argument line_ptr is the address of a pointer to the buffer for reading the line. In "clean copy" mode the previous value of line_ptr is overwritten with a pointer to a newly allocated buffer; the user is responsible for freeing all buffers. In "transient mode" the value of line_ptr is controlled by getfline; it should initially be NULL. Argument length_ptr is a pointer to a size_t variable that holds the length of the line when there is a successful read. Argument bufsize_ptr is a pointer to a size_t variable that holds the allocated size of the buffer holding the line. Argument maxlen is the maximum line length that getfline will return. Finally argument flags is a pointer to a status word. The status word is divided into two sections, input flags and output flags. When getfline is called it uses the input flags to determine the course of action. When it returns the status word will hold output flags that indicate what happened during the read. It should be initialized by oring together the desired input flags. Function return Getfline returns 1 if a line was returned and 0 if no line was returned. A return of 0 can either indicate the EOF was reached or that the read was terminated by an error. USAGE MODES Getfline can produce either "clean copies" or "transient copies". A clean copy is one that has no encumbrances - it lasts until it is specifically deleted. A transient copy is one that lasts until the next call to getfline to get another line from the current file. Transient copy mode will be used if the GFL_FLAG_TRANS flag is set; clean copy mode is the default. In clean copy mode getfline allocates a new buffer for each line that is read. The user is responsible for freeing the storage for the lines. Clean copy mode is appropriate when we want to keep part or all of the file in memory. In clean copy mode the values of items pointed to by line_ptr, length_ptr, and bufsize_ptr are overwritten; the previous values are ignored. In transient copy mode getfline reuses the line buffer; the previous contents, if any, will be overwritten. Getfline handles buffer storage management; the user does not have to allocate or free space for lines. Transient copy mode is appropriate when the contents of a line are immediately processed. Transient copy mode may be more efficient than clean copy mode because it minimizes the number of calls to malloc and free. In transient copy mode the line buffer pointer must initially either be NULL or point to storage allocated by malloc, and bufsize must contain the allocated size of the line buffer. The user should not alter these variables after initializing them. When a read fails the line buffer storage is freed by getfline and the line pointer is cleared. HANDLING ANOMALOUS LINES There are two kinds of anomalous lines that can occur, a prematurely terminated last line (i.e., one that lacks an EOL (\n) marker), and lines that are longer than maxlen. A prematurely terminated last line will be treated as error unless the GFL_FLAG_ADDEOL flag is set. Whether or not it is treated as an error, the GFL_RD_BADEOF flag will be set. Getfline offers four different ways to handle "long" lines; they can be treated as errors, they can be omitted, they can be truncated, or they can be cut into pieces that are maxlen bytes long. Regardless of the method chosen, if the line is long the GFL_RD_LONG flag will be set in the status word. The default is to treat "long" lines as errors. One of the other three choices can be setting one of the following three flags: GFL_FLAG_OMIT, GFL_FLAG_TRUNC, or GFL_FLAG_CUT. Only one of these flags can be set; setting more than one is an arguments error. There are two flags, GFL_RD_LONG and GFL_RD_BADEOF, that are set if the current line being read is anomalous. These flags are set even if the line is acceptable. The GFL_RD_BADEOF flag is set if the last line was prematurely terminated. The GFL_RD_LONG if the current line is long. If the GFL_FLAG_CUT is set the GFL_RD_BADEOF flag will not be set when the final piece of a long line is read. RESPONSES TO ERRORS BY GETFLINE If there is an error getfline will set three flag bits in the status word. One is the general error flag, GFL_ERR; the second is a error category flag; and the third is the specific error. There are two error categories, argument errors, and execution errors; the corresponding flags are GFL_ARG and GFL_EX. There is a separate flag for an error (usually the argument being a NULL pointer) in each of the arguments. There are three different execution errors, I/O errors, storage allocation errors, and unacceptable line errors. An I/O error is reported if there is a read error. A storage allocation error if malloc or realloc fails. An unacceptable line is either a long line or a prematurely terminated line for which there is no input flag. There are two input flags, GFL_FLAG_LOG and GFL_FLAG_EXIT, that specify actions that getfline should take if there is an error. If the GFL_FLAG_LOG flag is set getfline will write an error message to stderr when there is an error. If the GFL_FLAG_EXIT is set, getfline will call exit when there is an error. SAMPLE USAGE Here are a couple of usage examples. For each we assume the following includes and declarations: #include "getfline.h" .... FILE *fptr; char *line = 0; size_t len, bufsize = 0; long status; Example one illustrates clean copy mode. In example one we are reading a file called somefile.txt. Each line is passed to processing function that takes ownership of the lines and the responsibility for freeing their storage. fptr = fopen("somefile.txt","r"); while(getfline(fptr,&line,&bufsize,1024,&status)) { process_line(line); } if (status & GFL_ERR) fprintf(stderr,"Oops\n"); if (fptr) fclose(fptr); Example two illustrates transient copy mode. In example two we are reading input from stdin and writing it to stdout. However in lines longer than 80 characters we insert new line characters every 80 characters. status = GFL_FLAG_CUT | GFL_FLAG_LOG; while(getfline(stdin,&line,&bufsize,80,&status)) { fprintf(stdout,"%s\n",line); } TABLE OF FLAGS Input flags: GFL_FLAG_TRANS Provide transient copies. The default is clean copies. GFL_FLAG_EOFOK Says it's okay if the last line is terminated by a premature EOF. The default is to treat a premature EOF as an error. GFL_FLAG_CUT Says to use all of a long line but break into pieces of length maxlen. GFL_FLAG_TRUNC Says to use the first maxlen bytes of a long line and discard the rest. GFL_FLAG_OMIT Ignore long lines. GFL_FLAG_EXIT Exit if there is an execution error. GFL_FLAG_LOG Write an error message to stderr if there is an execution error. General error flags: GFL_ERR This is set if there was any error. GFL_ARG This is set if there are any argument errors. GFL_EX This is set if there are any execution errors. Input argument error flags: GFL_ARG_FILE An invalid file pointer (presumably a null pointer) was passed. GFL_ARG_LINE The pointer to the line argument was a null pointer. GFL_ARG_LENGTH The pointer to the length_ptr argument was a null pointer. GFL_ARG_BUFSIZE The pointer to the bufsize_ptr argument was a null pointer. GFL_ARG_STATUS The pointer to the status argument was a null pointer. GFL_ARG_FLAGS Multiple long line action flags are set. Execution error flags: GFL_EX_IO There was an I/O error in trying to read the file. GFL_EX_STORAGE There was an error in the storage allocation routines. GFL_EX_READ The file had a format error (long line or premature EOF) and no flag was set for handling the error. Anomalous line flags: GFL_RD_LONG This is set if the line is long. GFL_RD_BADEOF This is set if there was a premature end of file. Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Richard Harter on 9 Nov 2007 11:33 On Wed, 7 Nov 2007 11:08:30 +0100, "Charlie Gordon" <news(a)chqrlie.org> wrote: >"Richard Harter" <cri(a)tiac.net> a �crit dans le message de news: >4729fd77.329292093(a)news.sbtc.net... >> ><snipped long interesting discussion> >> >> Finally, what should the interface look like? I now believe that >> the prototype should look like this: >> >> int getfline(FILE * fptr, >> char ** line_ptr, >> long * status, >> size_t * length_ptr, >> size_t maxlen); >> >> Normal usage would be to declare line, status, and len as char *, >> long, and size_t respectively. Getfline returns 1 if the read was >> normal and 0 if it was not. It would normally be used in a while >> loop, e.g., >> >> while (getfline(fptr,&line,&status,&len,maxlen) { >> /* Code to process line */ >> } >> /* Error testing (if any) goes here > >You did not document what the parameters mean. Let me infer from the >discussion that: I got tired of writing; I should have persevered. > >- FILE *fptr is the stream from which to read characters. >- char **line_ptr is the address of a pointer to the buffer for reading the >line. This buffer will be reallocated with realloc or malloc if needed, >under control of the flags. Its initial value can be NULL. >- long *status is an input/output parameter: it points to a long with >input/output flags: > o whether the buffer was allocated. > o whether is should be extended > o whether the newline character should be stored in the buffer. > o whether there was a read error, an EOF without a newline. > o whether the line was truncated > o whether to skip the truncated part or split it into separate lines. >- size_t *length_ptr points to a size_t containing the current size of the >allocated buffer. It will be updated if reallocation occurs. >- size_t maxlen is the maximum size for buffer reallocation. These are all correct except for length_ptr, which returns the length of the line that strlen would return. The prototype above has a bug - there must also be an argument that contains the buffer size. In a separate, rather long winded post, I have spelled out an entire description. > >the int return value is already documented as a boolean value indicating >normal or abnormal conditions. > >This API still need work: > >- the maxlen and length_ptr names are misleading, they should be maxsize >and bufsize_ptr. They are sizes in bytes, not the length of the line that >can be read into the buffer. This kind of misnomer tends to produce off by >one errors in implementations and calling code. As noted above there should be a separate bufsize_ptr argument. My take on this is that the user shouldn't be messing with the buffer size or the maximum buffer size at all. What the user gets is a buffer containing the line as a \0 terminated string (that may not be in the spec) and its length. Ergo, the limit should be the maximum line length, not the maximum buffer size. My view is that if the user hits an "off by one" error the user probably isn't looking for null terminated strings. That said, I can think of cases where maxbuf would be the "right" thing to do. >- the bufsize_ptr should be passed right after the line_ptr as they are >stronly related. I disagree. The related things are the pointer to the line and the length of the line; they should be together. The bufsize_ptr is part of the passing state info hack. >- there is no need to make the flags a long, int should suffice. See the "man page" post - I ended up with 22 flags. A long is needed for portability. >- the return value could be more informative: we could make it an ssize_t to >return the length of the line read or EOF upon end of file. It would also >be more intuitive as a replacement for fgets() if it returned a pointer to >the char buffer. I forget whether ssize_t is posix or C99 but AFAIK it isn't C89. I could be convinced that it should return a pointer to the char buffer. The thing that I worried about is whether it might be confusing because the buffer pointer is also going through the calling sequence. > >Input/output parameters are a bit confusing, it might be more effective to >declare a structure for all these fields and pass a pointer to it. I waffled on this one; perhaps I waffled the wrong way. What I concluded was that it came to much the same thing in that the calling sequence and a compact structure initialization look much the same, and adding a struct just adds more machinery. > >struct getfline_args { > char *buffer; > size_t buffer_size; > char *allocated_buffer; > size_t max_alloc_size; > int flags; // or even better: define flags as bitfields > size_t read_length; >}; Buffer and allocated buffer? Yes, bitfields would be much better. > >char *getfline(FILE *fp, struct getfline_args *gp); > >/* example of use */ > >char buffer[BUFSIZ]; >struct getfline_args args = { buffer, sizeof(buffer), NULL, SIZE_MAX }; I get the impression that you have a different kind of implementation in mind than I do. My thinking is that the user does not declare buffer as an array. Things get messy if we start mixing up automatic and allocated storage. So: no char buffer[BUFSIZ]; Also the initialization looks suspect. > >while (getfline(fp, &args)) { > /* do something with args.buffer contents */ >} >free(args.allocated_buffer); The final free isn't needed. Getfline can free the allocated buffer when it hits a terminating condition. Thanks muchly for the comments. I think I will follow most of your suggestions. AFAICT bit fields would be the right way to go. Using a struct to hold most of the stuff might be cleaner, though I'm still waffling on that one. Returning a pointer to the line is probably right. Ugh, I hate rewriting. Richard Harter, cri(a)tiac.net http://home.tiac.net/~cri, http://www.varinoma.com In the fields of Hell where the grass grows high Are the graves of dreams allowed to die
From: Eric Sosman on 9 Nov 2007 11:36
Richard Harter wrote On 11/09/07 10:34,: > Below is a "man page" for a file read utility. An HTML version > can be found at http://home.tiac.net/~cri/2007/gflspec.html. It > is not guaranteed to be typo or brain fart clean. > > ---- > > SYNOPSIS: > > #include "getfline.h" > > int getfline(FILE * fptr, > char ** line_ptr, > size_t * length_ptr, > size_t * bufsize_ptr, > size_t maxlen, > long * flags); > [...] Isn't there some way you can find an excuse to add a couple more arguments? Six is too many for most people to keep straight, but you may as well try to confuse the geniuses, too. ;-) Kidding aside, an argument list of that length suggests to me that the function may be trying to be too many things to too many people at the same time. It might be wise to give up some functionality to improve ease of use; the net change in usefulness could in fact be positive. -- Eric.Sosman(a)sun.com |