Job Interview, Did I Mess Up? [C++]

Prev: YAMI4 - Messaging Solution for Distributed Systems
Next: std::map floating point key

From: Dave Harris on 5 Mar 2010 20:57

daniel_t(a)earthlink.net (Daniel T.) wrote (abridged):
> However you make an important point. If it were the case that
> "tolower(a)" represented something other than simply "lower case
> a", if tolower() had side effects for example, then we would
> be calling the function, in part, to cause those side effects.
> If that were the case, then tolower(a) and lo_a would *not*
> represent the same thing and you wouldn't be violating DRY,
> that isn't the case though.

It sounds as though which version you think is clearer depends on what
tolower() does. That makes your argument a bit suspect, in my opinion.

> To put it another way, there is no design rule that says
> "prefer multiple exits from blocks of code," yet that is what
> you seem to be advocating. (If I am wrong here, then we are
> probably not that far apart in our positions.)

In my experience we are frequently faced with a choice between three
evils:
1. Adding duplicate code.
2. Adding spurious mutable state.
3. Adding extra multiple exits.

For example,

bool contains( const int *first, const int *last, int value {
while (first != last && *first != value)
++first;
return first != last;
}

this duplicates the test. We can avoid that by adding a variable to
remember the result of the test:

bool found = false;
while (first != last && !found)
found = (*first++ == value);
return found;

Or we can use multiple exits:

while (first != last)
if (*first++ == value)
return true;
return false;

So for me the dispute is over which of these is the lesser evil. I
dislike the first because it is duplicating code and because it is doing
unnecessary work. I dislike the second because it adds mutable state, and
also adds an extra test in the loop. The third sample I prefer. It seems
simplest to me: as soon as you know the answer, stop. It's simple and
efficient. It's only fault is that it offends the single-exit rule, which
is only an issue if you are dogmatic about it.

I imagine you'd prefer the first version. In this case the difference is
trivial because the test we are duplicating is trivial, but that doesn't
make it better; it just means it's worse by a trivial amount. In real
life the test to be duplicated may be more complex and expensive, which
leads us to the version with extra flags.

> _Postconditions and Where to Check For Them_
>
> Again we come back to not repeating ourselves. If the post
> condition of a function can be checked by code, then it should be.
> I think we can agree on that, but where should we do the
> checking? Your suggestion was that this checking should be done
> "at a caller site (as normally done in unit tests)."
>
> Most functions are complex enough that they require multiple tests,
> so they are generally called from multiple sites. If we want
> to put the postcondition checking code in only one place (so we
> don't repeat ourselves,) then the obvious place to put it is
> just before the return within the function. If there is more than
> one return, then we still must repeat ourselves; which is the
> one thing that we have both agreed we should avoid.

In general, if there is something to be done after the loop regardless of
how it terminates, that is not part of the loop or dependent on the loop,
then it's doubtful that it belongs in the same function at all. This can
apply to post-conditions. For example:

bool contains( const int *first, const int *last, int value ) {
assert( first <= last );
bool result = contains_inner( first, last, value );
assert( result == (std::find( first, last, value ) != last) );
return result;
}

In a language like C, without exceptions, this pattern works well for
freeing resources:

bool process( const char *filename ) {
FILE *fp = fopen( filename, "r" );
if (!fp)
return false;
bool result = process_inner( fp );
fclose( fp );
return result;
}

It's an example of separation of concerns, putting the open/close in one
place and the real work in another. A benefit is that we can reuse
process_inner(), for example passing stdin.

In C++, the inner function may be virtual and the public one non-virtual.
That imposes the post-condition on derived classes, which is often what
we want from design by contract.

-- Dave Harris, Nottingham, UK.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Daniel T. on 7 Mar 2010 04:51

brangdon(a)cix.co.uk (Dave Harris) wrote:
> daniel_t(a)earthlink.net (Daniel T.) wrote (abridged):
> >
> > However you make an important point. If it were the case that
> > "tolower(a)" represented something other than simply "lower case a",
> > if tolower() had side effects for example, then we would be calling
> > the function, in part, to cause those side effects. If that were the
> > case, then tolower(a) and lo_a would *not* represent the same thing
> > and you wouldn't be violating DRY, that isn't the case though.
>
> It sounds as though which version you think is clearer depends on what
> tolower() does. That makes your argument a bit suspect, in my opinion.

You absolutely should treat functions that have side effects differently
than functions that don't, and you need to know which are which. If you
find treating them differently "suspect," we are so far different in our
ideas on how to program that discussing function use is pointless.

I find it interesting that on the one hand, I am being lambasted for not
adding extra variables in a function even though what they represent is
already expressed, while on the other (below,) I'm being lambasted for
putting in a variable that actually represents something that isn't
expressed any other way. On top of that, some insist that having
multiple representations of the same piece of information somehow
doesn't break DRY, while having a single representation of a piece of
information does. I'm boggled.

> > To put it another way, there is no design rule that says "prefer
> > multiple exits from blocks of code," yet that is what you seem to be
> > advocating. (If I am wrong here, then we are probably not that far
> > apart in our positions.)
>
> In my experience we are frequently faced with a choice between three
> evils:
> 1. Adding duplicate code.
> 2. Adding spurious mutable state.
> 3. Adding extra multiple exits.
>
> For example,
>
> bool contains( const int *first, const int *last, int value {
> while (first != last && *first != value)
> ++first;
> return first != last;
> }
>
> this duplicates the test. We can avoid that by adding a variable to
> remember the result of the test:
>
> bool found = false;
> while (first != last && !found)
> found = (*first++ == value);
> return found;
>
> Or we can use multiple exits:
>
> while (first != last)
> if (*first++ == value)
> return true;
> return false;
>
> So for me the dispute is over which of these is the lesser evil. I
> dislike the first because it is duplicating code and because it is
> doing unnecessary work. I dislike the second because it adds mutable
> state, and also adds an extra test in the loop. The third sample I
> prefer. It seems simplest to me: as soon as you know the answer, stop.
> It's simple and efficient. It's only fault is that it offends the
> single-exit rule, which is only an issue if you are dogmatic about it.
>
> I imagine you'd prefer the first version. In this case the difference
> is trivial because the test we are duplicating is trivial, but that
> doesn't make it better; it just means it's worse by a trivial amount.
> In real life the test to be duplicated may be more complex and
> expensive, which leads us to the version with extra flags.

I like that you presented these three different ways to code an
algorithm and explained how you feel about each one, so I will return
the favor.

If I were required to write a function like your example (I actually did
write one recently,) my instinct would be to write it like example 2. In
fact, when I'm writing a (non-member) function that returns a result, my
first move it to write:

-function signature-
{
-ReturnType- result = -default-;
return result;
}

Then I fill in the guts... I can almost hear the groans from the people
reading this! :-D

I reject the first example because for most debuggers I use, putting a
breakpoint at the return in the function would be useless in trying to
figure out what the function is going to return, and I can't change the
result during a debug run. Generally, this example is harder to grock
than the other two.

I consider example 3 fine, but with it, I have to put two breakpoints in
the function to track what it will return, and I can't change the result
during a debug run. I think of example 3 as an optimization of example
2, I might switch to it if I'm confident that the function works and
there is some slowdown because of the assignment to the result, but
generally I don't bother.

As you say, for this trivial example my approach is kind of silly, I
accept that. A function like this would be almost impossible to write
incorrectly so testing isn't much of an issue. However, imagine
happening across a function that has 2 continues, 3 breaks (with no
switches,) 4 loops, 5 returns and no documentation. Do that enough times
(and it happened to me a lot in my early career,) and you would probably
come to loathe multiple block exits as much as I do.

> > _Postconditions and Where to Check For Them_
> >
> > Again we come back to not repeating ourselves. If the post
> > condition of a function can be checked by code, then it should be.
> > I think we can agree on that, but where should we do the
> > checking? Your suggestion was that this checking should be done
> > "at a caller site (as normally done in unit tests)."
> >
> > Most functions are complex enough that they require multiple tests,
> > so they are generally called from multiple sites. If we want
> > to put the postcondition checking code in only one place (so we
> > don't repeat ourselves,) then the obvious place to put it is
> > just before the return within the function. If there is more than
> > one return, then we still must repeat ourselves; which is the
> > one thing that we have both agreed we should avoid.
>
> In general, if there is something to be done after the loop regardless of
> how it terminates, that is not part of the loop or dependent on the loop,
> then it's doubtful that it belongs in the same function at all. This can
> apply to post-conditions. For example:
>
> bool contains( const int *first, const int *last, int value ) {
> assert( first <= last );
> bool result = contains_inner( first, last, value );
> assert( result == (std::find( first, last, value ) != last) );
> return result;
> }

Here you have written two functions (contains and contains_inner) that
have the exact same preconditions and postconditions. Two functions that
do exactly the same thing, with the same algorithm, in the same program
doesn't sound like a good idea to me.

Such an idiom makes some sense when dealing with virtual
member-functions (the private virtual idiom,) but not for global
functions.

> In a language like C, without exceptions, this pattern works well for
> freeing resources:
>
> bool process( const char *filename ) {
> FILE *fp = fopen( filename, "r" );
> if (!fp)
> return false;
> bool result = process_inner( fp );
> fclose( fp );
> return result;
> }
>
> It's an example of separation of concerns, putting the open/close in one
> place and the real work in another. A benefit is that we can reuse
> process_inner(), for example passing stdin.

However, note that in this case "process" doesn't do the same thing as
"process_inner" unlike your previous example.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Daniel T. on 8 Mar 2010 02:17

Mathias Gaunard <loufoque(a)gmail.com> wrote:
> On Mar 5, 2:35 pm, "Daniel T." <danie...(a)earthlink.net> wrote:
> > "Nevin :-] Liber" <ne...(a)eviloverlord.com> wrote:
> > > "Daniel T." <danie...(a)earthlink.net> wrote:
> >
> > > > In my experience (and it might be just in my domain,) if an
> > > > exception is thrown, then it is because a programmer made an
> > > > unwarranted assumption while writing some particular piece of
> > > > code and the code should be fixed so the exception is no longer
> > > > thrown.
> >
> > > (In my problem domain, you are describing ASSERT, not exceptions.)
> >
> > Several people have made comments like this one. I can imagine a
> > domain where just asserting and aborting the program, or allowing a
> > bug that wasn't found during test time to put the program in an
> > undefined state, would be a very bad thing; such systems would use
> > exceptions to do some sort of reset and problem logging. However,
> > that isn't my domain.
> >
> > I style my code after the standard when it comes to exceptions. A
> > thrown exception means that the code that detected the programmer
> > error, is not the code that needs to be fixed. (Precondition
> > violations.)
>
> Trying to allocate a resource and it failing is a programmer error?
> Interesting point of view, but it doesn't work.
>
> Exceptions are not for programming errors, there are for errors that
> shouldn't happen ideally but may well happen anyway, and still need to
> be correctly dealt with (at least, if you want to write code of decent
> quality).
>
> For programming errors, you use asserts, which abort the program if a
> precondition has been violated, because the program is in a state it
> definitely should never be in, and there is no way to recover from
> that.

See my response in the thread "I keep running into long term c++
programmers who refuse to use exceptions"

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: hanukas on 9 Mar 2010 22:43

On Mar 5, 1:20 am, "Balog Pal" <p...(a)lib.hu> wrote:
> Looking at Dinkum's basic_string::find implementation, it has 3 returns,
> having a pretty natural natural layout, like
>
> if( needle is empty && startpos is inside)
> return startpos;
> iteration( ... )
> {
> ... return pos where match found}
>
> return npos; // no match
>
> Can it be forced to SESE? Sure. Would it be better? i keep my reservation
> until shown a really elegant alternative.

The above pattern can be improved in some cases (I won't argue that
for find that would be true):

insert match at end of the container
iteration (...)
bool found = (match == iterator)

The iteration loop is simpler and typically most time is spent there,
so that approach has a good chance of being an improvement. There is
no need to test if the iterator is end() as the match will always
terminate the iteration.

Less branching is good with contemporary architectures, in practise
the memory latency/bandwidth probably will dominate the runtime,
especially with non-linear containers.

I don't really care if it's SESE or not, but for this pattern it's a
covenient (?) co-incidence.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Balog Pal on 10 Mar 2010 02:13

"hanukas" <jukka(a)liimatta.org>
>> if( needle is empty && startpos is inside)
>> return startpos;
>> iteration( ... )
>> {
>> ... return pos where match found}
>>
>> return npos; // no match
>>
>> Can it be forced to SESE? Sure. Would it be better? i keep my reservation
>> until shown a really elegant alternative.
>
> The above pattern can be improved in some cases (I won't argue that
> for find that would be true):
>
> insert match at end of the container
> iteration (...)
> bool found = (match == iterator)
>
> The iteration loop is simpler and typically most time is spent there,
> so that approach has a good chance of being an improvement. There is
> no need to test if the iterator is end() as the match will always
> terminate the iteration.

Few years back Alexandrescu had an article in CUJ experts about different
searching approaches and performance -- that included this "send yourself a
postcard" tech. The gain was not convincing. (And the subject collection
was likely list or slist -- the most benign thing for insertion, and quite
ar from a general case...)

> Less branching is good with contemporary architectures, in practise
> the memory latency/bandwidth probably will dominate the runtime,
> especially with non-linear containers.

Well, after joking done back to serious -- this idea is limited to a few
special cases, and hardly pops up beyond AA-like experimenting with
what-ifs. For search purposes the collection is a constant thing, mutating
it is not allowed, full stop.

And with addition of the add/remove code the balance goes to negative even
if you could remove some test elsewhere.

> I don't really care if it's SESE or not, but for this pattern it's a
> covenient (?) co-incidence.

Did you ever seen it actually used? If not, let's not call it 'pattern'.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: YAMI4 - Messaging Solution for Distributed Systems
Next: std::map floating point key