Conversion from UTF32 to UTF8 for review [MFC]

Prev: Problems with menu on popup window
Next: Copy A Dialog from Project to Project

From: Peter Olcott on 8 Jun 2010 13:43

On 6/8/2010 11:25 AM, Joseph M. Newcomer wrote:
> See below...
> On Mon, 7 Jun 2010 15:26:23 -0500, "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote:
>

>> What about the fact that it is easier to read makes it more
>> reliable such that it does not crash like the alternative
>> faster version?
> ***
> A faster version that crashes is not useful. But you did not describe why it failed, so
> there is no way to tell what went wrong.

Ultimately it crashed because it was too difficult to understand. For
example it may have been too difficult for the original developer to
understand it well enough to create it correctly. It would have taken
more effort for me to find the problem, than to complete my own
crashless version.

My own version was deliberately made robust enough that it would not
crash even if it was incorrect. The kind of errors that result in
crashing were eliminated at design time. For example I could not
possibly allocate the wrong amount of memory because I made sure to let
std::vector handle this for me. If I allocated less memory than needed,
and took your suggestion and converted push_back() to direct indexed
access, this could have resulted in a crash. For this reason I erred on
the side of reliability over speed.

> ****
>> Twice the speed and crashing is much less quality than never
>> crashing everything else being the same.
> ***
> I agree. I used to call this "The Unix quality standard", which translates as "it doesn't
> matter if it is correct, as long as it is smaller and faster". But we don't know why it
> crashed; if you retyped it, there could be a typo in the code. So I'd be inclined to
> figure out why the faster one failed and fix that.
> ****
>>
>> Also I would not be so sure that no objective measure or
>> readability does not exist. I have been speaking on the
>> software engineering forum, and quantified and even
>> automated measures of complexity already exist.
> ****
> Back in the 1980s, we were seeing people start to worry about these issues. By the
> mid-1990s, the metrics were highly debated. I have not looked at this problem since the
> late 1990s, so if you have some references, it would be useful to give us some pointers so
> we can see what has been happening in this area.
> joe

Even though there may be no metric for readability an accurate
quantifiable measure can be provided. (see other reply). From this
accurate quantifiable measure heuristics can be derived.

> ****
>>
>>>
>>> On Wed, 02 Jun 2010 22:44:52 -0500, Peter Olcott
>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>
>>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>>>>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>>>>>> ****
>>>>
>>>>> By the way, did you fix the validation bug still present
>>>>> in the latest
>>>>> code you submitted in the other thread?
>>>>>
>>>> This is the ballpark of my final production code:
>>>> http://www.ocr4screen.com/UTF8.h
>>>>
>>>> It is only about half as fast as the tightly written "C"
>>>> code that
>>>> Hector posted a link to:
>>>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>>>
>>>> I would estimate that it is about ten-fold easier to fully
>>>> understand my
>>>> design than it is to completely understand the
>>>> alternative,
>>>> (your mileage may vary).
>>>>
>>>> I have found that maximum readability tends to lead to
>>>> maximum reliability.
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 8 Jun 2010 19:28

Do we have a link to the "bad code" example?
joe

On Tue, 08 Jun 2010 12:43:49 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 6/8/2010 11:25 AM, Joseph M. Newcomer wrote:
>> See below...
>> On Mon, 7 Jun 2010 15:26:23 -0500, "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote:
>>
>
>>> What about the fact that it is easier to read makes it more
>>> reliable such that it does not crash like the alternative
>>> faster version?
>> ***
>> A faster version that crashes is not useful. But you did not describe why it failed, so
>> there is no way to tell what went wrong.
>
>Ultimately it crashed because it was too difficult to understand. For
>example it may have been too difficult for the original developer to
>understand it well enough to create it correctly. It would have taken
>more effort for me to find the problem, than to complete my own
>crashless version.
>
>My own version was deliberately made robust enough that it would not
>crash even if it was incorrect. The kind of errors that result in
>crashing were eliminated at design time. For example I could not
>possibly allocate the wrong amount of memory because I made sure to let
>std::vector handle this for me. If I allocated less memory than needed,
>and took your suggestion and converted push_back() to direct indexed
>access, this could have resulted in a crash. For this reason I erred on
>the side of reliability over speed.
>
>> ****
>>> Twice the speed and crashing is much less quality than never
>>> crashing everything else being the same.
>> ***
>> I agree. I used to call this "The Unix quality standard", which translates as "it doesn't
>> matter if it is correct, as long as it is smaller and faster". But we don't know why it
>> crashed; if you retyped it, there could be a typo in the code. So I'd be inclined to
>> figure out why the faster one failed and fix that.
>> ****
>>>
>>> Also I would not be so sure that no objective measure or
>>> readability does not exist. I have been speaking on the
>>> software engineering forum, and quantified and even
>>> automated measures of complexity already exist.
>> ****
>> Back in the 1980s, we were seeing people start to worry about these issues. By the
>> mid-1990s, the metrics were highly debated. I have not looked at this problem since the
>> late 1990s, so if you have some references, it would be useful to give us some pointers so
>> we can see what has been happening in this area.
>> joe
>
>Even though there may be no metric for readability an accurate
>quantifiable measure can be provided. (see other reply). From this
>accurate quantifiable measure heuristics can be derived.
>
>> ****
>>>
>>>>
>>>> On Wed, 02 Jun 2010 22:44:52 -0500, Peter Olcott
>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>
>>>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>>>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>>>>>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>>>>>>> ****
>>>>>
>>>>>> By the way, did you fix the validation bug still present
>>>>>> in the latest
>>>>>> code you submitted in the other thread?
>>>>>>
>>>>> This is the ballpark of my final production code:
>>>>> http://www.ocr4screen.com/UTF8.h
>>>>>
>>>>> It is only about half as fast as the tightly written "C"
>>>>> code that
>>>>> Hector posted a link to:
>>>>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>>>>
>>>>> I would estimate that it is about ten-fold easier to fully
>>>>> understand my
>>>>> design than it is to completely understand the
>>>>> alternative,
>>>>> (your mileage may vary).
>>>>>
>>>>> I have found that maximum readability tends to lead to
>>>>> maximum reliability.
>>>> Joseph M. Newcomer [MVP]
>>>> email: newcomer(a)flounder.com
>>>> Web: http://www.flounder.com
>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 8 Jun 2010 20:12

On 6/8/2010 6:28 PM, Joseph M. Newcomer wrote:
> Do we have a link to the "bad code" example?
> joe

http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

>
> On Tue, 08 Jun 2010 12:43:49 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>
>> On 6/8/2010 11:25 AM, Joseph M. Newcomer wrote:
>>> See below...
>>> On Mon, 7 Jun 2010 15:26:23 -0500, "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote:
>>>
>>
>>>> What about the fact that it is easier to read makes it more
>>>> reliable such that it does not crash like the alternative
>>>> faster version?
>>> ***
>>> A faster version that crashes is not useful. But you did not describe why it failed, so
>>> there is no way to tell what went wrong.
>>
>> Ultimately it crashed because it was too difficult to understand. For
>> example it may have been too difficult for the original developer to
>> understand it well enough to create it correctly. It would have taken
>> more effort for me to find the problem, than to complete my own
>> crashless version.
>>
>> My own version was deliberately made robust enough that it would not
>> crash even if it was incorrect. The kind of errors that result in
>> crashing were eliminated at design time. For example I could not
>> possibly allocate the wrong amount of memory because I made sure to let
>> std::vector handle this for me. If I allocated less memory than needed,
>> and took your suggestion and converted push_back() to direct indexed
>> access, this could have resulted in a crash. For this reason I erred on
>> the side of reliability over speed.
>>
>>> ****
>>>> Twice the speed and crashing is much less quality than never
>>>> crashing everything else being the same.
>>> ***
>>> I agree. I used to call this "The Unix quality standard", which translates as "it doesn't
>>> matter if it is correct, as long as it is smaller and faster". But we don't know why it
>>> crashed; if you retyped it, there could be a typo in the code. So I'd be inclined to
>>> figure out why the faster one failed and fix that.
>>> ****
>>>>
>>>> Also I would not be so sure that no objective measure or
>>>> readability does not exist. I have been speaking on the
>>>> software engineering forum, and quantified and even
>>>> automated measures of complexity already exist.
>>> ****
>>> Back in the 1980s, we were seeing people start to worry about these issues. By the
>>> mid-1990s, the metrics were highly debated. I have not looked at this problem since the
>>> late 1990s, so if you have some references, it would be useful to give us some pointers so
>>> we can see what has been happening in this area.
>>> joe
>>
>> Even though there may be no metric for readability an accurate
>> quantifiable measure can be provided. (see other reply). From this
>> accurate quantifiable measure heuristics can be derived.
>>
>>> ****
>>>>
>>>>>
>>>>> On Wed, 02 Jun 2010 22:44:52 -0500, Peter Olcott
>>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>>
>>>>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>>>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>>>>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>>>>>>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>>>>>>>> ****
>>>>>>
>>>>>>> By the way, did you fix the validation bug still present
>>>>>>> in the latest
>>>>>>> code you submitted in the other thread?
>>>>>>>
>>>>>> This is the ballpark of my final production code:
>>>>>> http://www.ocr4screen.com/UTF8.h
>>>>>>
>>>>>> It is only about half as fast as the tightly written "C"
>>>>>> code that
>>>>>> Hector posted a link to:
>>>>>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>>>>>
>>>>>> I would estimate that it is about ten-fold easier to fully
>>>>>> understand my
>>>>>> design than it is to completely understand the
>>>>>> alternative,
>>>>>> (your mileage may vary).
>>>>>>
>>>>>> I have found that maximum readability tends to lead to
>>>>>> maximum reliability.
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer(a)flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Jun 2010 01:33

OK, I'm perplexed. Your code converts UTF-32 to UTF-8. The code you cited converts UTF-8
to UTF-32. How are these comparable?
joe

On Tue, 08 Jun 2010 19:12:49 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:

>On 6/8/2010 6:28 PM, Joseph M. Newcomer wrote:
>> Do we have a link to the "bad code" example?
>> joe
>
> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>
>
>>
>> On Tue, 08 Jun 2010 12:43:49 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>
>>> On 6/8/2010 11:25 AM, Joseph M. Newcomer wrote:
>>>> See below...
>>>> On Mon, 7 Jun 2010 15:26:23 -0500, "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote:
>>>>
>>>
>>>>> What about the fact that it is easier to read makes it more
>>>>> reliable such that it does not crash like the alternative
>>>>> faster version?
>>>> ***
>>>> A faster version that crashes is not useful. But you did not describe why it failed, so
>>>> there is no way to tell what went wrong.
>>>
>>> Ultimately it crashed because it was too difficult to understand. For
>>> example it may have been too difficult for the original developer to
>>> understand it well enough to create it correctly. It would have taken
>>> more effort for me to find the problem, than to complete my own
>>> crashless version.
>>>
>>> My own version was deliberately made robust enough that it would not
>>> crash even if it was incorrect. The kind of errors that result in
>>> crashing were eliminated at design time. For example I could not
>>> possibly allocate the wrong amount of memory because I made sure to let
>>> std::vector handle this for me. If I allocated less memory than needed,
>>> and took your suggestion and converted push_back() to direct indexed
>>> access, this could have resulted in a crash. For this reason I erred on
>>> the side of reliability over speed.
>>>
>>>> ****
>>>>> Twice the speed and crashing is much less quality than never
>>>>> crashing everything else being the same.
>>>> ***
>>>> I agree. I used to call this "The Unix quality standard", which translates as "it doesn't
>>>> matter if it is correct, as long as it is smaller and faster". But we don't know why it
>>>> crashed; if you retyped it, there could be a typo in the code. So I'd be inclined to
>>>> figure out why the faster one failed and fix that.
>>>> ****
>>>>>
>>>>> Also I would not be so sure that no objective measure or
>>>>> readability does not exist. I have been speaking on the
>>>>> software engineering forum, and quantified and even
>>>>> automated measures of complexity already exist.
>>>> ****
>>>> Back in the 1980s, we were seeing people start to worry about these issues. By the
>>>> mid-1990s, the metrics were highly debated. I have not looked at this problem since the
>>>> late 1990s, so if you have some references, it would be useful to give us some pointers so
>>>> we can see what has been happening in this area.
>>>> joe
>>>
>>> Even though there may be no metric for readability an accurate
>>> quantifiable measure can be provided. (see other reply). From this
>>> accurate quantifiable measure heuristics can be derived.
>>>
>>>> ****
>>>>>
>>>>>>
>>>>>> On Wed, 02 Jun 2010 22:44:52 -0500, Peter Olcott
>>>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>>>
>>>>>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>>>>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>>>>>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>>>>>>>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>>>>>>>>> ****
>>>>>>>
>>>>>>>> By the way, did you fix the validation bug still present
>>>>>>>> in the latest
>>>>>>>> code you submitted in the other thread?
>>>>>>>>
>>>>>>> This is the ballpark of my final production code:
>>>>>>> http://www.ocr4screen.com/UTF8.h
>>>>>>>
>>>>>>> It is only about half as fast as the tightly written "C"
>>>>>>> code that
>>>>>>> Hector posted a link to:
>>>>>>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>>>>>>
>>>>>>> I would estimate that it is about ten-fold easier to fully
>>>>>>> understand my
>>>>>>> design than it is to completely understand the
>>>>>>> alternative,
>>>>>>> (your mileage may vary).
>>>>>>>
>>>>>>> I have found that maximum readability tends to lead to
>>>>>>> maximum reliability.
>>>>>> Joseph M. Newcomer [MVP]
>>>>>> email: newcomer(a)flounder.com
>>>>>> Web: http://www.flounder.com
>>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>>
>>>> Joseph M. Newcomer [MVP]
>>>> email: newcomer(a)flounder.com
>>>> Web: http://www.flounder.com
>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 9 Jun 2010 09:28

On 6/9/2010 12:33 AM, Joseph M. Newcomer wrote:
> OK, I'm perplexed. Your code converts UTF-32 to UTF-8. The code you cited converts UTF-8
> to UTF-32. How are these comparable?
> joe

Take a look at my code again, you missed something.
http://www.ocr4screen.com/UTF8.h
>
> On Tue, 08 Jun 2010 19:12:49 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>
>> On 6/8/2010 6:28 PM, Joseph M. Newcomer wrote:
>>> Do we have a link to the "bad code" example?
>>> joe
>>
>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>
>>
>>>
>>> On Tue, 08 Jun 2010 12:43:49 -0500, Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>>
>>>> On 6/8/2010 11:25 AM, Joseph M. Newcomer wrote:
>>>>> See below...
>>>>> On Mon, 7 Jun 2010 15:26:23 -0500, "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote:
>>>>>
>>>>
>>>>>> What about the fact that it is easier to read makes it more
>>>>>> reliable such that it does not crash like the alternative
>>>>>> faster version?
>>>>> ***
>>>>> A faster version that crashes is not useful. But you did not describe why it failed, so
>>>>> there is no way to tell what went wrong.
>>>>
>>>> Ultimately it crashed because it was too difficult to understand. For
>>>> example it may have been too difficult for the original developer to
>>>> understand it well enough to create it correctly. It would have taken
>>>> more effort for me to find the problem, than to complete my own
>>>> crashless version.
>>>>
>>>> My own version was deliberately made robust enough that it would not
>>>> crash even if it was incorrect. The kind of errors that result in
>>>> crashing were eliminated at design time. For example I could not
>>>> possibly allocate the wrong amount of memory because I made sure to let
>>>> std::vector handle this for me. If I allocated less memory than needed,
>>>> and took your suggestion and converted push_back() to direct indexed
>>>> access, this could have resulted in a crash. For this reason I erred on
>>>> the side of reliability over speed.
>>>>
>>>>> ****
>>>>>> Twice the speed and crashing is much less quality than never
>>>>>> crashing everything else being the same.
>>>>> ***
>>>>> I agree. I used to call this "The Unix quality standard", which translates as "it doesn't
>>>>> matter if it is correct, as long as it is smaller and faster". But we don't know why it
>>>>> crashed; if you retyped it, there could be a typo in the code. So I'd be inclined to
>>>>> figure out why the faster one failed and fix that.
>>>>> ****
>>>>>>
>>>>>> Also I would not be so sure that no objective measure or
>>>>>> readability does not exist. I have been speaking on the
>>>>>> software engineering forum, and quantified and even
>>>>>> automated measures of complexity already exist.
>>>>> ****
>>>>> Back in the 1980s, we were seeing people start to worry about these issues. By the
>>>>> mid-1990s, the metrics were highly debated. I have not looked at this problem since the
>>>>> late 1990s, so if you have some references, it would be useful to give us some pointers so
>>>>> we can see what has been happening in this area.
>>>>> joe
>>>>
>>>> Even though there may be no metric for readability an accurate
>>>> quantifiable measure can be provided. (see other reply). From this
>>>> accurate quantifiable measure heuristics can be derived.
>>>>
>>>>> ****
>>>>>>
>>>>>>>
>>>>>>> On Wed, 02 Jun 2010 22:44:52 -0500, Peter Olcott
>>>>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>>>>
>>>>>>>> On 6/2/2010 6:26 PM, Liviu wrote:
>>>>>>>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>>>>>>> On 6/2/2010 4:58 PM, Peter Olcott wrote:
>>>>>>>>>>> On 6/2/2010 4:40 PM, Joseph M. Newcomer wrote:
>>>>>>>>>>>> ****
>>>>>>>>
>>>>>>>>> By the way, did you fix the validation bug still present
>>>>>>>>> in the latest
>>>>>>>>> code you submitted in the other thread?
>>>>>>>>>
>>>>>>>> This is the ballpark of my final production code:
>>>>>>>> http://www.ocr4screen.com/UTF8.h
>>>>>>>>
>>>>>>>> It is only about half as fast as the tightly written "C"
>>>>>>>> code that
>>>>>>>> Hector posted a link to:
>>>>>>>> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>>>>>>>>
>>>>>>>> I would estimate that it is about ten-fold easier to fully
>>>>>>>> understand my
>>>>>>>> design than it is to completely understand the
>>>>>>>> alternative,
>>>>>>>> (your mileage may vary).
>>>>>>>>
>>>>>>>> I have found that maximum readability tends to lead to
>>>>>>>> maximum reliability.
>>>>>>> Joseph M. Newcomer [MVP]
>>>>>>> email: newcomer(a)flounder.com
>>>>>>> Web: http://www.flounder.com
>>>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>>>
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer(a)flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev |
Pages: 9 10 11 12 13 14 15 16 17 18 19
Prev: Problems with menu on popup window
Next: Copy A Dialog from Project to Project