From: Andre Kaufmann on
Peter Olcott wrote:
> On 6/15/2010 3:59 AM, Ulrich Eckhardt wrote:
> [...]
> Although it may be possible for the compiler to inline a function
> without inline being requested I am not sure that it typically does this.


It does.

Andre

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Ulrich Eckhardt on
Peter Olcott wrote:
> On 6/15/2010 4:17 AM, Asher Langton wrote:
>> while (...)
>> {
>> ...
>> if (condition)
>> doSomething();
>> }
>>
>> If doSomething() is inlined, then the body of the loop might no longer
>> fit in the instruction cache, which -- in the case where 'condition'
>
> If the body of the loop without the function call overhead does not fit
> in cache, then the body of the loop with the function call overhead will
> also not fit in cache because it requires even more memory. Adding
> function call overhead can not reduce memory requirements.

If it is inline, it will probably be loaded into the cache even if it is not
called, because the cache always loads contiguous chunks of memory. The
only exception is if the compiler aligns the code in a way that skipping
the call skips exactly one or more chunks that would be loaded as a
cacheline. This is not impossible, but the compiler would probably need to
know that "condition" is not very likely, otherwise it would blow up every
conditionally executed code to multiples of a cacheline, which requires
padding and even more increases memory consumption.

If it is not inline, you will only have the (hopefully small) function call
code in the cache when it is not called. When called, of course the memory
overhead will be even larger. Paying a big price rarely can be better than
paying a small price regularly.


In any case, my summary of this remains "I don't know" and that I would
leave picking the best way to the compiler.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: jwatts on
On Jun 15, 3:01 pm, Peter Olcott <NoS...(a)OCR4Screen.com> wrote:
> On 6/15/2010 4:17 AM, Asher Langton wrote:
>
>
>
> > On Jun 13, 1:43 pm, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
> >> (1) The ONLY reason not to always inline everything (that can be
> >> in-lined) all the time is code bloat.
> >> (2) If a function is only called once then there can not possibly be any
> >> code bloat at all.
> >> (3) Therefore all functions that are only called once should always be
> >> in-lined.
>
> > Consider this snippet:
>
> > while (...)
> > {
> > ...
> > if (condition)
> > doSomething();
> > }
>
> > If doSomething() is inlined, then the body of the loop might no longer
> > fit in the instruction cache, which -- in the case where 'condition'
>
> If the body of the loop without the function call overhead does not fit
> in cache, then the body of the loop with the function call overhead will
> also not fit in cache because it requires even more memory. Adding
> function call overhead can not reduce memory requirements.
>

Once you have entered the loop, the cost in time of the function call
and any memory requirements will have been paid. Since your entire
program will not fit in the cache, you will encounter cache flushes in
the vicinity of the inlined code anyway. The question devolves to "is
the cost of the function call significant?" I would tend to say no
since normally the cost of the loop will dwarf that of the function
call.

As a general rule, I recommend that my developers avoid the temptation
of 'premature optimization'. First, just make it work. Then, if
performance is not acceptable, use a profiler to find the problem
areas. First try simple changes in algorithms, and if that's not
sufficient, perhaps structural or architectural changes are necessary.



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: jwatts on
On Jun 8, 4:54 am, Peter Olcott <NoS...(a)OCR4Screen.com> wrote:
> I am aiming to produce the best balance of speed and space efficiency
> against reliability and maintainability placing a higher weight on the
> latter criteria.
> http://www.ocr4screen.com/UTF8.h
>
> If there are any improvements that can be made within the criteria
> provided, or other improvements that do not detract from the above
> criteria input would be welcomed.
>
> This support file is required for compiling. Please do not critique the
> support file it is not in final form.
> http://www.ocr4screen.com/Array2D.h
>

1. I would recommend pre-generating the 'states' array so that it
becomes a runtime constant;

2. In the constructor UnicodeEncodingConversion(), can you replace the
loops used to initialize states[][] with calls to memset()? That will
almost certainly be both faster and smaller;

3. In method toUTF32(), is it possible to produce a reasonable
estimate of the amount of space required for the result vector UTF32?
Each time a call to push_back() results in reallocation of a vector,
you will pay a runtime cost, especially once the vectors reach
significant size;

4. Be careful when assigning values of small types (i.e. uint8_t) to
variables of a larger type (i.e. uint32_t). e.g. "CodePoint = Byte;"
Should you ever move the code to a different compiler, the sign-
extension behavior may change, causing unexpected, and likely
undesired, behaviors;

5. Instead of using a for() loop to traverse the input vector, prefer
using iterators;

6. I would have prefered to process each extended character at once
rather than process each byte individually. That is, having
determined that a byte is the first of three, go ahead and retrieve
the next two bytes and process them immediately.



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Joshua Maurice on
On Jun 16, 10:46 am, jwatts <jwatt...(a)gmail.com> wrote:
> 3. In method toUTF32(), is it possible to produce a reasonable
> estimate of the amount of space required for the result vector UTF32?
> Each time a call to push_back() results in reallocation of a vector,
> you will pay a runtime cost, especially once the vectors reach
> significant size;

No. As I'm sure many other people will reply, ::std::vector::push_back
is specifically guaranteed to have amortized O(1) runtime. It
accomplishes this by not increasing the capacity by just 1 when it
runs out of capacity, but instead by increasing the capacity by a
multiple when it runs out of capacity. This is generally simplified in
discussions to "doubling the capacity each time push_back runs out of
capacity". However, I have seen reports from good people that doubling
isn't ideal from real world measurement. 2 isn't special; any constant
multiplier greater than 1 will give amortized O(1) runtime.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]