From: Vaclav Haisman on
Peter Olcott wrote, On 8.6.2010 11:54:
> I am aiming to produce the best balance of speed and space efficiency
> against reliability and maintainability placing a higher weight on the
> latter criteria.
> http://www.ocr4screen.com/UTF8.h
>
> If there are any improvements that can be made within the criteria
> provided, or other improvements that do not detract from the above
> criteria input would be welcomed.
>
> This support file is required for compiling. Please do not critique the
> support file it is not in final form.
> http://www.ocr4screen.com/Array2D.h
>

1) Wrap everyting into a namespace.

2) #include <ctime> if you must but the code does not need it. Similarly
<cstdio>.

3) ActionCodes, use enum instead.

4) You do not need UnicodeEncodingConversion class. Pregenerate the States
table using one off script and make it const. Make the member functions free
functions.

5) Since the States array can really be just a const table, you can avoid
using the Array2D class.

6) Make the free functions generic, use function templates that accept
iterators for input and output instead of just std::vector<>.

7) Split the functions into own implementation .cpp file. The functions are
IMHO unlikely to get inlined.


--
VH

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Nick Hounsome on
On 8 June, 22:03, Peter Olcott <NoS...(a)OCR4Screen.com> wrote:
> On 6/8/2010 12:44 PM, Nick Hounsome wrote:
>
> > On 8 June, 10:54, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
> >> I am aiming to produce the best balance of speed and space efficiency
> >> against reliability and maintainability placing a higher weight on the
> >> latter criteria.
> >> http://www.ocr4screen.com/UTF8.h
>
> >> If there are any improvements that can be made within the criteria
> >> provided, or other improvements that do not detract from the above
> >> criteria input would be welcomed.
>
> > 1) uint8_t and uint32_t are already standard types so why are you
> > typedefing your own?
>
> Because they are not defined on my platform. MS VS 2008.

I guessed. MS are pretty useless here - It's not like it's difficult
to provide an these so you have to wonder why they don't.
I would just create my own std header and include that.

>
> > 2) Precalculate the state table - It is logically a static const
> > State[][].
>
> I did it this way so that it would be small enough to fit on the stack,
> and also so that it could be included as a header file.

a) A precalculated table wouldn't be any bigger.
b) Why on earth would you want it on the stack rather than static?
c) Why do you want it in a header?

> > 3) Eliminate all the global stuff - It's not comp.lang.c here.
>
> I don't know what you mean, everything is a member of a class.

all the const uint8_t have global scope

> > 4) Your class is logically stateless hence methods (as well as the
> > state array) should be probably be static
>
> Maybe, what is the advantage?

a) you wouldn't keep creating your state table for every instance.
b) It just seems more logical - When I see a class I start wondering
what state it is holding and why. If you don't agree with this that's
fine but then consider why we would have static members in C++ at all?

>
> > 5) It is highly unlikely that these methods would be inlined by the
> > compiler (inline is a hint not an order) and if they were it could be
> > a space wasting disaster. Make them non-inline - The call overhead
> > will be dwarfed by the processing in almost all cases.
>
> The compiler requires inlining if the methods are included in the
> header. I hate creating the typically required pair of files it seems so
> redundant.

The compiler is never REQUIRED to inline anything in fact the keyword
"inline" will almost certainly go the way of "register" and be
deprecated some time in the future.

Consider the code bloat that you'd have if the compiler DID inline
these and you had several calls!!

Separate header/source files are a huge aid to C++ allowing you to
separate interface from implementation. This makes compilation faster
and allows binary updates of dynamic libraries without recompilation.

In practice what most compilers will do with this example is that they
will not inline it but they will create a copy of the used method in
every compilation unit that uses it. At best you make a lot of extra
work for the linker.

Your comments here make me think that you're probably a C# or Java
programmer. Files have a very different role in C++ and until you
understand the term "compilation unit" you are not really getting it.

>
> > 6) Use vector::reserve so that push_back wont do multiple
> > reallocations/copies
>
> Yes. I had initially allocated the worst case memory, then I thought
> that this wasted memory. When I benchmarked it on 100 instances of the
> entire Unicode set, it only took about 30% more time, but, 50% less
> memory. It might be a good idea to at least reserve() the best case
> memory requirement.
>
> > 7) Template the methods and use iterators rather than collections -
> > it's more flexible (you can keep the current signatures as overloads)
> > 8) Why are you using uint8_t as a State instead of an enum???????
> > NextState = 0 !!!???? Same goes for ActionCode.
>
> Yes this is the best suggestion.



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on
"Nick Hounsome" <nick.hounsome(a)gmail.com> wrote in message
news:b3b23981-a551-4643-b1e0-b5c22f72cd1d(a)i28g2000yqa.googlegroups.com...
> On 8 June, 22:03, Peter Olcott <NoS...(a)OCR4Screen.com> wrote:
>> On 6/8/2010 12:44 PM, Nick Hounsome wrote:
>>
>> > On 8 June, 10:54, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
>> >> I am aiming to produce the best balance of speed and space efficiency
>> >> against reliability and maintainability placing a higher weight on the
>> >> latter criteria.
>> >> http://www.ocr4screen.com/UTF8.h
>>
>> >> If there are any improvements that can be made within the criteria
>> >> provided, or other improvements that do not detract from the above
>> >> criteria input would be welcomed.
>>
>> > 1) uint8_t and uint32_t are already standard types so why are you
>> > typedefing your own?
>>
>> Because they are not defined on my platform. MS VS 2008.
>
> I guessed. MS are pretty useless here - It's not like it's difficult
> to provide an these so you have to wonder why they don't.
> I would just create my own std header and include that.
>
>>
>> > 2) Precalculate the state table - It is logically a static const
>> > State[][].
>>
>> I did it this way so that it would be small enough to fit on the stack,
>> and also so that it could be included as a header file.
>
> a) A precalculated table wouldn't be any bigger.
> b) Why on earth would you want it on the stack rather than static?
> c) Why do you want it in a header?
>

You can define the table in a header if you make it a static member of a
template class.

/Leigh



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Leigh Johnston on
"Nick Hounsome" <nick.hounsome(a)gmail.com> wrote in message
news:b3b23981-a551-4643-b1e0-b5c22f72cd1d(a)i28g2000yqa.googlegroups.com...
> On 8 June, 22:03, Peter Olcott <NoS...(a)OCR4Screen.com> wrote:
>> On 6/8/2010 12:44 PM, Nick Hounsome wrote:
>>
>> > On 8 June, 10:54, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
>> >> I am aiming to produce the best balance of speed and space efficiency
>> >> against reliability and maintainability placing a higher weight on the
>> >> latter criteria.
>> >> http://www.ocr4screen.com/UTF8.h
>>
>> >> If there are any improvements that can be made within the criteria
>> >> provided, or other improvements that do not detract from the above
>> >> criteria input would be welcomed.
>>
>> > 1) uint8_t and uint32_t are already standard types so why are you
>> > typedefing your own?
>>
>> Because they are not defined on my platform. MS VS 2008.
>
> I guessed. MS are pretty useless here - It's not like it's difficult
> to provide an these so you have to wonder why they don't.
> I would just create my own std header and include that.
>
>>
>> > 2) Precalculate the state table - It is logically a static const
>> > State[][].
>>
>> I did it this way so that it would be small enough to fit on the stack,
>> and also so that it could be included as a header file.
>
> a) A precalculated table wouldn't be any bigger.
> b) Why on earth would you want it on the stack rather than static?
> c) Why do you want it in a header?
>

If you make the table const you should be able to define it in a header file
as const objects at namespace scope have internal linkage unless they are
"extern"ed.

/Leigh


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Peter Olcott on
On 6/9/2010 8:25 AM, Nick Hounsome wrote:
> On 8 June, 22:03, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
>> On 6/8/2010 12:44 PM, Nick Hounsome wrote:
>>
>>> On 8 June, 10:54, Peter Olcott<NoS...(a)OCR4Screen.com> wrote:
>>>> I am aiming to produce the best balance of speed and space efficiency
>>>> against reliability and maintainability placing a higher weight on the
>>>> latter criteria.
>>>> http://www.ocr4screen.com/UTF8.h
>>
>>>> If there are any improvements that can be made within the criteria
>>>> provided, or other improvements that do not detract from the above
>>>> criteria input would be welcomed.
>>
>>> 1) uint8_t and uint32_t are already standard types so why are you
>>> typedefing your own?
>>
>> Because they are not defined on my platform. MS VS 2008.
>
> I guessed. MS are pretty useless here - It's not like it's difficult
> to provide an these so you have to wonder why they don't.
> I would just create my own std header and include that.
>
>>
>>> 2) Precalculate the state table - It is logically a static const
>>> State[][].
>>
>> I did it this way so that it would be small enough to fit on the stack,
>> and also so that it could be included as a header file.
>
> a) A precalculated table wouldn't be any bigger.

The table itself takes much more stack space than a std::vector.

> b) Why on earth would you want it on the stack rather than static?
Data hiding within the function that needs it. I create all of my
classes small enough to fit on the stack, this way I never ever have to
do dynamic memory allocation myself. By never doing dynamic memory
allocation myself all of the possible errors associated with this can
not possibly occur.

> c) Why do you want it in a header?
Tow files is redundant and messy. All of my code is in headers.

>
>>> 3) Eliminate all the global stuff - It's not comp.lang.c here.
>>
>> I don't know what you mean, everything is a member of a class.
>
> all the const uint8_t have global scope
OK what is the best way to do this?
>
>>> 4) Your class is logically stateless hence methods (as well as the
>>> state array) should be probably be static
>>
>> Maybe, what is the advantage?
>
> a) you wouldn't keep creating your state table for every instance.
> b) It just seems more logical - When I see a class I start wondering
> what state it is holding and why. If you don't agree with this that's
> fine but then consider why we would have static members in C++ at all?
Where is static member data stored?

>
>>
>>> 5) It is highly unlikely that these methods would be inlined by the
>>> compiler (inline is a hint not an order) and if they were it could be
>>> a space wasting disaster. Make them non-inline - The call overhead
>>> will be dwarfed by the processing in almost all cases.
>>
>> The compiler requires inlining if the methods are included in the
>> header. I hate creating the typically required pair of files it seems so
>> redundant.
>
> The compiler is never REQUIRED to inline anything in fact the keyword
> "inline" will almost certainly go the way of "register" and be
> deprecated some time in the future.
MS VS always inlines when you "suggest" it to.

>
> Consider the code bloat that you'd have if the compiler DID inline
> these and you had several calls!!
In my case it is probably a singleton, thus no possible code bloat.

>
> Separate header/source files are a huge aid to C++ allowing you to
> separate interface from implementation. This makes compilation faster
> and allows binary updates of dynamic libraries without recompilation.
I just can't get over how messy this redundancy is, especially for tiny
little classes.

>
> In practice what most compilers will do with this example is that they
> will not inline it but they will create a copy of the used method in
> every compilation unit that uses it. At best you make a lot of extra
> work for the linker.

It also allows me to keep all of my code in headers.

>
> Your comments here make me think that you're probably a C# or Java
> programmer. Files have a very different role in C++ and until you
> understand the term "compilation unit" you are not really getting it.

I have been a "C" programmer since K&R was the de facto standard. I took
up C++ more than a decade ago.

>
>>
>>> 6) Use vector::reserve so that push_back wont do multiple
>>> reallocations/copies
>>
>> Yes. I had initially allocated the worst case memory, then I thought
>> that this wasted memory. When I benchmarked it on 100 instances of the
>> entire Unicode set, it only took about 30% more time, but, 50% less
>> memory. It might be a good idea to at least reserve() the best case
>> memory requirement.
>>
>>> 7) Template the methods and use iterators rather than collections -
>>> it's more flexible (you can keep the current signatures as overloads)
>>> 8) Why are you using uint8_t as a State instead of an enum???????
>>> NextState = 0 !!!???? Same goes for ActionCode.
>>
>> Yes this is the best suggestion.

The problem might be that the enum is an int whereas what it must
initialize is a uint8_t.
>
>
>


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]