From: Peter Olcott on
On 6/2/2010 6:33 PM, Joseph M. Newcomer wrote:
> See below....
> On Wed, 02 Jun 2010 16:58:49 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>
>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>> See below...
>>> On Wed, 02 Jun 2010 10:52:28 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>>
>>>> On 6/1/2010 1:04 PM, Joseph M. Newcomer wrote:
>>>>> See below...
>>>>> On Tue, 01 Jun 2010 10:34:40 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>>>>
>>>>>> If you measure my code against the incorrect standard that it is
>>>>>> specifically encoded to be the fastest possible encoding, even then it
>>>>>> is not abysmal. All of the performance improvements that you suggested
>>>>>> don't result in as much as a doubling in speed.
>>>>>> http://www.ocr4screen.com/UTF8.cpp
>>>>>>
>>>>>> From benchmarking my code against the code that Hector posted a link to
>>>>>> http://bjoern.hoehrmann.de:80/utf-8/decoder/dfa/
>>>>>> This other code was only 37% faster.
>>>>> *****
>>>>> "Only" 37% faster? Actually 37% is a pretty big number in terms of performance! Most
>>>>> attempts to "improve" performance are lucky if they get single-digit percentage
>>>>> improvement. As someone who spent a nontrivial amount of his life worrying about these
>>>>> issues, I can say that 37% is a SUBSTANTIAL performance improvement!
>>>>>
>>>>> And if it were 1% faster, it would still prove your code was not the fastest possible. But
>>>>> 37%? You aren't even in the running in this contest!
>>>>> joe
>>>>
>>>> I did a better job of benchmarking and I made changes that slowed my
>>>> code down, removing the gotos and using a std::vector<uint8_t> instead
>>>> of NUL the terminated uint8_t*. His code is now 267% faster than mine.
>>> ****
>>> Code that is faster by a factor of 3 means your code is not even worth looking at. Nobody
>>> will care whether it is correct or not, if it is nearly 3 times slower than an
>>> alternative.
>>>
>>> We would sweat blood to get 10% improvement. So if you are not competitive with some
>>> other code by a factor of 3, don't even waste your time. Use the other code.
>>> joe
>>
>> Your experience it apparently not at all typical. Much of the code that
>> I have reviewed from places that I have worked don't even aim for
>> anything better than about 500% slower than the fastest code. Most often
>> this code is about 20-fold or more slower than the fastest code. I once
>> re-wrote a method that my supervisor wrote for a 120-fold improvement in
>> speed. From 2.5 minutes to 1.25 seconds wall clock time.
>>
>> The problem is that many programmers don't even think about making their
>> code fast, they only think about getting it working. They figure that
>> once its working, then they can make it faster, but, this never happens
>> because as soon as it works, they get their next assignment.
> ****
> Maybe it was because we wrote code we thought was as fast as possible, then had to make it
> even faster. We used to refer to it as going into "blood from turnips" mode.

I believe that this was your experience, but, your experience with this
was biased by the the extraordinary degree and knowledge and
intelligence that you have. You were not a COBOL programmer trying to
change the formatting of the monthly sales report, as most programmers
were at the time. When you are in the top 0.1% elite group you are not
getting the view of what is truly typical in the industry.

> ****
>>
>>> ****
>>>>
>>>> The benchmark timed how long it took to decode 100 instances of the
>>>> entire Unicode set encoded as UTF-8.
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on
On 6/2/2010 6:26 PM, Liviu wrote:
> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>> ****

> By the way, did you fix the validation bug still present in the latest
> code you submitted in the other thread?
>
This is the ballpark of my final production code:
http://www.ocr4screen.com/UTF8.h

It is only about half as fast as the tightly written "C" code that
Hector posted a link to:
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

I would estimate that it is about ten-fold easier to fully understand my
design than it is to completely understand the alternative,
(your mileage may vary).

I have found that maximum readability tends to lead to maximum reliability.

From: Liviu on
"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote...
> On 6/2/2010 6:26 PM, Liviu wrote:
>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>
> No what probably makes Joe's case atypical is the degree of knowledge
> and expertise that he has. Very few developers have Phds in computer
> science.

I have no degree or post-degree in CS per se, yet Joe's point makes
all the sense to me. You project too much emphasis on paper affidavits.

>> By the way, did you fix the validation bug still present in the
>> latest code you submitted in the other thread?
>
> There was no validation bug that I am aware of, and in fact the code
> produces identical results to the excellent link that Hector posted. I
> did however add the enhancement of rejecting 0xD800 to 0xDFFF.

Go back and read again my last post there. If that is not a validation
bug, then by all means clarify how you define a "bug", or "aware of".

>> As a blanket statement the above is not only false and unproved
>
> Turbo Pascal 3.0 against the MS Pascal compiler at the time.

OK, this one is a more specific statement, which I have no comments on,
since I haven't used either. But did you really expect readers on an MFC
forum to magically guess that you were referring to Pascal compilers?

Liviu



From: Peter Olcott on
On 6/2/2010 11:10 PM, Liviu wrote:
> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>> On 6/2/2010 6:26 PM, Liviu wrote:
>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>
>> No what probably makes Joe's case atypical is the degree of knowledge
>> and expertise that he has. Very few developers have Phds in computer
>> science.
>
> I have no degree or post-degree in CS per se, yet Joe's point makes
> all the sense to me. You project too much emphasis on paper affidavits.

There are other valid criteria, and it is not just paper documents, it
is also his apparent IQ.

>
>>> By the way, did you fix the validation bug still present in the
>>> latest code you submitted in the other thread?
>>
>> There was no validation bug that I am aware of, and in fact the code
>> produces identical results to the excellent link that Hector posted. I
>> did however add the enhancement of rejecting 0xD800 to 0xDFFF.
>
> Go back and read again my last post there. If that is not a validation
> bug, then by all means clarify how you define a "bug", or "aware of".

The current code (that I just posted) is to the best of my knowledge
entirely correct in every way. By entirely correct in every way, I mean
to the best of my knowledge there is no possible input that ever results
in incorrect output. Also there is no possible invalid input that is not
rejected as invalid input. One more thing the program always correctly
halts.

>
>>> As a blanket statement the above is not only false and unproved
>>
>> Turbo Pascal 3.0 against the MS Pascal compiler at the time.
>
> OK, this one is a more specific statement, which I have no comments on,
> since I haven't used either. But did you really expect readers on an MFC
> forum to magically guess that you were referring to Pascal compilers?

I did not expect them to assume that I was lying. Assuming that I am
lying is certainly far too rude.

>
> Liviu
>
>
>

From: Liviu on
"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote...
> On 6/2/2010 11:10 PM, Liviu wrote:
>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>
>>>> By the way, did you fix the validation bug still present in the
>>>> latest code you submitted in the other thread?
>>>
>>> There was no validation bug that I am aware of
>>
>> Go back and read again my last post there.
>
> The current code (that I just posted) is to the best of my knowledge
> entirely correct in every way.

Don't know and don't really care about your "current code (that I
just posted)". My point, as clearly stated, was about your previous
code from 3+ days ago where the bug definitely existed. After I
brought that up, you still repeated in other posts that "it worked
correctly" (with the expected "typo" excuses, of course).

I don't see how your new claim now, that the just modified code
"is to the best of my knowledge entirely correct in every way",
has any bearing on the particular point that you are either obtuse
or disingenuous about that previous bug, and your awareness of it.

Liviu