From: Dave Harris on
daniel_t(a)earthlink.net (Daniel T.) wrote (abridged):
> You absolutely should treat functions that have side effects
> differently than functions that don't, and you need to know which
> are which.

You /can/ treat functions differently if they have no side effects, but I
don't agree you necessarily should. And it's not just about side effects.
Storing the result in a local also makes sense if the function can return
different results for the same call, or because it is slow to compute.
For example, if char happens to be a Unicode type, then tolower() would
well be quite expensive. Storing the result is the safest option which
requires least special knowledge.

I don't have strong feelings about whether or not the local should be
used. I do have fairly strong feelings about whether someone should be
criticised for using one.


> I find it interesting that on the one hand, I am being lambasted
> for not adding extra variables in a function even though what they
> represent is already expressed, while on the other (below,) I'm
> being lambasted for putting in a variable that actually
> represents something that isn't expressed any other way.

The difference is that the former case isn't adding mutable state. It's
at most another name for the function result. (This would be clearer if
we added a "const", but I wouldn't bother.)

And a "result" variable is expressing what was already expressed by the
code position.


> > [2]
> > bool found = false;
> > while (first != last && !found)
> > found = (*first++ == value);
> > return found;
> [...]
> If I were required to write a function like your example (I
> actually did write one recently,) my instinct would be to
> write it like example 2.

I didn't expect that, because earlier you said that you wouldn't
introduce a framework of flags, and variables like "found" are what I
think Balog Pal meant by that.


> > [3]
> > while (first != last)
> > if (*first++ == value)
> > return true;
> > return false;
> [...]
> I consider example 3 fine, but with it, I have to put two
> breakpoints in the function to track what it will return, and I
> can't change the result during a debug run.

Although I'm happy to use debugability as an argument, it does have the
problem that different debuggers have different capabilities. I use
MSVC++. With that, having the result expressed in the code position makes
it easier to change the return value. The current code position is
identified with an icon in the left-hand column of the code, and you can
change the code position by dragging the icon. So you can drag it from
one return statement to another. It's easier to do that than to edit the
value of a variable.

This debugger also makes it easy to set breakpoints on a line, by
clicking in that column. Hence another benefit of having different return
values expressed as different lines is that it makes it easy to break
only when the result is true (or only when the result is false).
Similarly a profiler which tells you how often each line is executed,
will tell you how many true results and how many false results.

In general, I find multiple returns are easier to work with with my
tools.


> However, imagine happening across a function that has 2 continues,
> 3 breaks (with no switches,) 4 loops, 5 returns and no documentation.

A function with 4 loops is almost certainly doing too much. Although it
sounds like a mess, adding more mutable state would not make it simpler.
Instead of having to cast a beady eye over it looking for return
statements, you have to inspect it for assignment statements instead.

Indeed, it's actually confusing to see a line like:
bool result = false;

because it is lying to the computer. The result isn't false. We don't yet
know what the result is. Similarly if we later come across:
result = true;

we still can't be sure what the result is, because it might be changed
later. We have to inspect all the code between the assignment and the end
of the function. Where-as with "return true;" we know immediately what
the result is with confidence. Once the answer is known, it's simplest to
stop.


> Here you have written two functions (contains and contains_inner)
> that have the exact same preconditions and postconditions.
> Two functions that do exactly the same thing, with the
> same algorithm, in the same program doesn't sound like a good
> idea to me.

They have different purposes; they have no code in common. It's really
not that uncommon to have one layer that checks arguments and another
layer that does the work. It's a clean separation of concerns.

-- Dave Harris, Nottingham, UK.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Daniel T. on
brangdon(a)cix.co.uk (Dave Harris) wrote:
> daniel_t(a)earthlink.net (Daniel T.) wrote (abridged):
>
> > You absolutely should treat functions that have side effects
> > differently than functions that don't, and you need to know which
> > are which.
>
> You /can/ treat functions differently if they have no side effects,
> but I don't agree you necessarily should.

I'm not sure how to respond here. To my mind, procedures are absolutely
different than functions even if the language doesn't formally recognize
the distinction.

> And it's not just about side effects. Storing the result in a local
> also makes sense if the function can return different results for the
> same call, or because it is slow to compute. For example, if char
> happens to be a Unicode type, then tolower() would well be quite
> expensive. Storing the result is the safest option which requires
> least special knowledge.

I said earlier, that creating the temp works well as a performance
optimization (I even used one in sample code to replace repeated calls
to strlen.) Of course we need "special knowledge" in order to know where
to insert performance optimizations.

> I don't have strong feelings about whether or not the local should be
> used. I do have fairly strong feelings about whether someone should be
> criticised for using one.

My main point here was that contrary to accusations made by others,
eschewing the use of the additional temp does *not* break DRY, it is not
a design mistake. If anything, it's the other way around (adding the
temp breaks DRY but is necessary sometimes for performance.) In other
words, I hold the same view here that Dijkstra expressed in his paper
"Notes on Structured Programming":

In my understanding of programs I want such additional variables that
sore redundant information, to be clearly recognized as such... I am
strongly inclined to view such programs as, say, optimizing refinements
of a more abstract program... (pg 55)

> > I find it interesting that on the one hand, I am being lambasted for
> > not adding extra variables in a function even though what they
> > represent is already expressed, while on the other (below,) I'm
> > being lambasted for putting in a variable that actually represents
> > something that isn't expressed any other way.
>
> The difference is that the former case isn't adding mutable state.
> It's at most another name for the function result. (This would be
> clearer if we added a "const", but I wouldn't bother.)
>
> And a "result" variable is expressing what was already expressed by
> the code position.
>
> > > [2]
> > > bool found = false;
> > > while (first != last && !found)
> > > found = (*first++ == value);
> > > return found;
> > [...]
> > If I were required to write a function like your example (I
> > actually did write one recently,) my instinct would be to
> > write it like example 2.
>
> I didn't expect that, because earlier you said that you wouldn't
> introduce a framework of flags, and variables like "found" are what I
> think Balog Pal meant by that.

Balog may indeed consider that a flag, I don't consider "found" a flag;
it represents the "results of the function so far."

Your comment above about the "results so far" being expressed by the
code position struck me as interesting. I'll have to mull on that one
for a bit, maybe I will change my tune. :-) However, this still leaves
open the issue of where to check the post-condition. Having multiple
returns means we have to check the post-condition multiple times (either
directly in the code via an assertion or mentally while scanning the
program text.) In cases where the post-condition is trivial, that isn't
much of an issue, but what about when the post-condition is *not*
trivial?

Your answer to this question so far is to create a second function who's
job is to check the result of the first one and then return it that
result...

> > Here you have written two functions (contains and contains_inner)
> > that have the exact same preconditions and postconditions.
> > Two functions that do exactly the same thing, with the
> > same algorithm, in the same program doesn't sound like a good
> > idea to me.
>
> They have different purposes; they have no code in common. It's really
> not that uncommon to have one layer that checks arguments and another
> layer that does the work. It's a clean separation of concerns.

If the NDEBUG flag is defined, your "contains" function is literally no
different than your "contains_inner" function.

What if someone uses the primary function without the checking function?
If the primary function is hidden such that only the checking function
can call it, then what have we gained by adding a function that isn't
useful otherwise?

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: hanukas on
On Mar 10, 9:13 pm, "Balog Pal" <p...(a)lib.hu> wrote:
> > Less branching is good with contemporary architectures, in practise
> > the memory latency/bandwidth probably will dominate the runtime,
> > especially with non-linear containers.
>
> Well, after joking done back to serious -- this idea is limited to a few
> special cases, and hardly pops up beyond AA-like experimenting with
> what-ifs. For search purposes the collection is a constant thing, mutating
> it is not allowed, full stop.

The container implementor knows that better than we do; the end()
iterator can easily be implemented as sentinel. All that is needed is
modifying the sentinel, no insertion is required. This works fine for
set, list, deque.. if we have vector, the implementation just needs to
keep one extra object.

In either case, I don't really see why this couldn't be done with any
container if the implementor so chooses. The key issue here is that
the containers have already been invented and done. But that is beyond
the point that I did, or atleast I think so, demonstrate that SESE-
like pattern can be just as, if not more efficient than the early-bail-
out control flow that is very common thing to write. I write that kind
of code all the time. It's one of the LEGO blocks in my programming
arsenal too, stuff like that just grows into you, right?

Question was posed. I did answer it. If the technique doesn't show
significant improvement the answer is simple: it's the JOKING that you
brushed aside as irrelevant, but I feel strongly that memory latency
is a serious issue with modern cpu. I don't find it surprising AT ALL
that such optimization doesn't have any real-world impact.

First, sure, there is less compare-branching logic in the innerloop.
But that doesn't matter since the branches are highly predictable. If
the compare-branch is 1 clock cycle or so, and memory latency is
30-200 clock cycles it's easy to see why the optimization doesn't have
the desired effect.

That ISN'T irrelevant at all. It's very relevant information for
consideration when crafting code from performance point of view these
days.

I did present my case and also reasons why such optimization is
irrelevant. I disputed my own example before you did, so, thanks
anyway.


> And with addition of the add/remove code the balance goes to negative even
> if you could remove some test elsewhere.

The difference is that the test which I did eliminate is repeated. The
worst case is that the assignment is more expensive than compare-
branch, but that requires either very small container and/or a lots of
luck. Statistically I wouldn't worry about it.


> > I don't really care if it's SESE or not, but for this pattern it's a
> > covenient (?) co-incidence.
>
> Did you ever seen it actually used? If not, let's not call it 'pattern'.

You have observed the same scenario if your words are to be believed
and I can assure that I am familiar with it as well. So far we have
100% of contributors with prior knowledge of the technique. On top of
that, I have in fact seen this it actually used in a vertex cache
optimizer. The unoptimized version of the code did run 10 minutes. The
optimized version took about 1-2 seconds.

The most relevant optimization was to use std::set instead of
std::vector, this did drop the runtime from 600 seconds to 4 seconds
or so. The drop from 5 to 1-2 seconds was from micro-optimizations,
including the lower ALU workload. With innerloop re-factoring it's
hard to get more than 50% to 100% performance increase. Usually not
worth the trouble, but in this case it made the artist's workflow more
smooth so it was worth it.

The std::set implementation used continuous storage for the tree nodes
so it was very cache friendly and allowed the ALU optimization to take
visible effect on the performance.

Little things like that. Different people with different experiences
tend to disagree on smallest things. Go figure.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Balog Pal on
"hanukas" <jukka(a)liimatta.org>
>> > Less branching is good with contemporary architectures, in practise
>> > the memory latency/bandwidth probably will dominate the runtime,
>> > especially with non-linear containers.
>>
>> Well, after joking done back to serious -- this idea is limited to a few
>> special cases, and hardly pops up beyond AA-like experimenting with
>> what-ifs. For search purposes the collection is a constant thing,
>> mutating
>> it is not allowed, full stop.
>
> The container implementor knows that better than we do; the end()
> iterator can easily be implemented as sentinel. All that is needed is
> modifying the sentinel, no insertion is required. This works fine for
> set, list, deque.. if we have vector, the implementation just needs to
> keep one extra object.

Well, I thought that you're talking about an "external" implementation of a
find that works with collections already made, like sdt::list or vector.

Certainly the collection itself can provide whatever implementation that
fits with the rest of its implementation, and handle const issues by mutable
members, extra space and so on. Also exceptions from a copy-ctor if the
library is a template for T. Whether it is feasible is a different
question.

However that case fits badly with original discussion that is connected to
code readability -- I do not normally read the code in the std::
implementation (or that of Boost, and other libs). Beyond having better
ideas to use time, for practical reasons too: the code there is lightyears
from readability. You may look fro yourself the infestation of macros and
other special stuff.

> In either case, I don't really see why this couldn't be done with any
> container if the implementor so chooses. The key issue here is that
> the containers have already been invented and done. But that is beyond
> the point that I did, or atleast I think so, demonstrate that SESE-
> like pattern can be just as, if not more efficient than the early-bail-
> out control flow that is very common thing to write.

Err, you did not demonstrate that it would be as or more effective just
claimed it without showing any evidence whatsoever be it source, output
assembly with count or seconds. I at least did refer to an actual article
on the very subject that shows actual code and measuring data for a bunch of
cases.

> I write that kind
> of code all the time. It's one of the LEGO blocks in my programming
> arsenal too, stuff like that just grows into you, right?

Guess everyone around here writes code or did so in quantity. That tells
nothign about its correctness, readability, performance or anything.

> Question was posed. I did answer it. If the technique doesn't show
> significant improvement the answer is simple: it's the JOKING that you
> brushed aside as irrelevant, but I feel strongly that memory latency
> is a serious issue with modern cpu. I don't find it surprising AT ALL
> that such optimization doesn't have any real-world impact.

So then what is the point?

And we were talking about different contexts. In the one we originally
discussed the collection is black box, and if you start casting away const
to insert your elements you too likely introduce UB or break promises.

> First, sure, there is less compare-branching logic in the innerloop.
> But that doesn't matter since the branches are highly predictable. If
> the compare-branch is 1 clock cycle or so, and memory latency is
> 30-200 clock cycles it's easy to see why the optimization doesn't have
> the desired effect.
>
> That ISN'T irrelevant at all. It's very relevant information for
> consideration when crafting code from performance point of view these
> days.

Come again, please?

Actually on this forum you can meet writers of commonly used libraries
(Including dincumware, Boost and others), they can tell you whether they did
consider performance, and experiment with different approaches.

Though the referred measrements show that performance of alternatives may be
widely different depending on the subject type and the element count. On a
single machine. And if you move to a different machine with little differnt
type of processor stepping, core multiplier, cache sizes, etc, the picture
changes again. So for a generic library it is hardly possible to be tuned
for all combinations at once.

Such tuning is so better left to the user. but the user then gets limited as
I said before, unless he starts coding the collection by hand.

>> And with addition of the add/remove code the balance goes to negative
>> even
>> if you could remove some test elsewhere.
>
> The difference is that the test which I did eliminate is repeated. The
> worst case is that the assignment is more expensive than compare-
> branch, but that requires either very small container and/or a lots of
> luck. Statistically I wouldn't worry about it.

Here is another context shift, We were talking about *readability* of source
text, and I was comparing that, not performance. In the source text
number of executed repetitions do not count.

>> > I don't really care if it's SESE or not, but for this pattern it's a
>> > covenient (?) co-incidence.
>>
>> Did you ever seen it actually used? If not, let's not call it 'pattern'.
>
> You have observed the same scenario if your words are to be believed
> and I can assure that I am familiar with it as well. So far we have
> 100% of contributors with prior knowledge of the technique.

Sure, I also know how to create HCN (cianide) gas, as most of us learnt at
school, that does not make it a houshold practice.

I wouldn't call even Duff's device a "pattern" though that is actually used
in real code occasionally.

> On top of
> that, I have in fact seen this it actually used in a vertex cache
> optimizer.

thanks, that at least answers my question.

> The unoptimized version of the code did run 10 minutes. The
> optimized version took about 1-2 seconds.

Great, so we we at least know one proved use. Could you provide the
alternative code variants in the comparision?

> The most relevant optimization was to use std::set instead of
> std::vector, this did drop the runtime from 600 seconds to 4 seconds
> or so. The drop from 5 to 1-2 seconds was from micro-optimizations,
> including the lower ALU workload. With innerloop re-factoring it's
> hard to get more than 50% to 100% performance increase. Usually not
> worth the trouble, but in this case it made the artist's workflow more
> smooth so it was worth it.

Oh, so it does not compare just the effect of postcard trick, but far
differnet things up to changig the collection type. So we get pretty far
from the "find in supplied collection" function, do we?

> The std::set implementation used continuous storage for the tree nodes
> so it was very cache friendly and allowed the ALU optimization to take
> visible effect on the performance.
>
> Little things like that. Different people with different experiences
> tend to disagree on smallest things. Go figure.

Before agree on disagree we'd need at least some common piece of context.
Dropping in statements about different things will not lead to a sensible
comparision.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]