From: Peter Olcott on
On 5/31/2010 2:50 PM, Daniel T. wrote:
> "Leigh Johnston"<leigh(a)i42.co.uk> wrote:
>> "Daniel T."<daniel_t(a)earthlink.net> wrote:
>>
>>> CodePoint and UTF32[N] are two representations that both refer to
>>> the same piece of knowledge. Why the unnecessary duplication?
>>
>> It is not unnecessary *if* there is a noticeable performance
>> improvement. I agree however that premature optimization should be
>> avoided (obviously) which is why profiling should be performed
>
> I'm glad we agree that the code in question is probably an unnecessary
> optimization.
>
>> but it is also a matter of writing clear code which is easy to parse
>> (understand). There is no real disadvantage to storing the result of
>> "*it" or "UTF32[N]" in a temporary.
>
> The above really is just a matter of opinion. Obviously, my opinion
> differs from yours.

const correctness requires the "extra" CodePoint variable.
From: Leigh Johnston on


"Daniel T." <daniel_t(a)earthlink.net> wrote in message
news:daniel_t-C210FF.15502731052010(a)70-3-168-216.pools.spcsdns.net...
> "Leigh Johnston" <leigh(a)i42.co.uk> wrote:
>> "Daniel T." <daniel_t(a)earthlink.net> wrote:
>>
>> > CodePoint and UTF32[N] are two representations that both refer to
>> > the same piece of knowledge. Why the unnecessary duplication?
>>
>> It is not unnecessary *if* there is a noticeable performance
>> improvement. I agree however that premature optimization should be
>> avoided (obviously) which is why profiling should be performed
>
> I'm glad we agree that the code in question is probably an unnecessary
> optimization.

I didn't say that, it is unclear if the optimization is necessary and
whether or not it is can be determined through profiling and/or examining
the compiler's assembler output.

/Leigh

From: Peter Olcott on
On 5/31/2010 2:41 PM, Daniel T. wrote:
> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>> On 5/31/2010 1:24 PM, Daniel T. wrote:
>>> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>>> On 5/31/2010 11:35 AM, Daniel T. wrote:
>>>
>>>>> The codes 10FFFE and 10FFFF are guaranteed not to be unicode
>>>>> characters...
>>>>
>>>> So then Wikipedia is wrong?
>>>> http://en.wikipedia.org/wiki/Unicode
>>>> 16 100000�10FFFF Supplementary Private Use Area-B
>>>
>>> According to unicode.org, apparently yes. You'd know that if you
>>> hadn't been lazy and only consulted a secondary source.
>>
>> I simply don't have the time to read all of the Unicode stuff to find
>> the two or three paragraphs that I really need to know. I already know
>> about High and Low surrogates. Why is the range that you specified not
>> valid codepoints?
>
> It took me less than two minutes searching unicode.org to answer the
> question you are asking me. I suggest you make the attempt at least.

http://unicode.org/charts/PDF/U100000.pdf

The Private Use Area does not contain any character assignments,
consequently no character code charts or
namelists are provided for this area. However, the two code locations at
the end of each plane are designated
non-characters.

It is almost as if the biblical story of the tower of babble was
literally true, and human language (including Unicode) was deliberately
made much more complex than necessary.

>
>>>> So it looks otherwise correct?
>>>
>>> Does it pass all your tests? You do have tests don't you?
>>
>> I am using the results of this function to mutually exhaustively test
>> the results of another function that does the conversion in the other
>> direction. These tests pass.
>
> Then why are you asking us if the algorithm is correct?

From: Daniel T. on
Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
> On 5/31/2010 2:50 PM, Daniel T. wrote:
> > "Leigh Johnston"<leigh(a)i42.co.uk> wrote:
> >> "Daniel T."<daniel_t(a)earthlink.net> wrote:
> >>
> >>> CodePoint and UTF32[N] are two representations that both refer to
> >>> the same piece of knowledge. Why the unnecessary duplication?
> >>
> >> It is not unnecessary *if* there is a noticeable performance
> >> improvement. I agree however that premature optimization should be
> >> avoided (obviously) which is why profiling should be performed
> >
> > I'm glad we agree that the code in question is probably an unnecessary
> > optimization.
> >
> >> but it is also a matter of writing clear code which is easy to parse
> >> (understand). There is no real disadvantage to storing the result of
> >> "*it" or "UTF32[N]" in a temporary.
> >
> > The above really is just a matter of opinion. Obviously, my opinion
> > differs from yours.
>
> const correctness requires the "extra" CodePoint variable.

false.
From: Daniel T. on
In article <j9OdnTjci7caiJnRnZ2dnUVZ8k-dnZ2d(a)giganews.com>,
"Leigh Johnston" <leigh(a)i42.co.uk> wrote:

> "Daniel T." <daniel_t(a)earthlink.net> wrote in message
> news:daniel_t-C210FF.15502731052010(a)70-3-168-216.pools.spcsdns.net...
> > "Leigh Johnston" <leigh(a)i42.co.uk> wrote:
> >> "Daniel T." <daniel_t(a)earthlink.net> wrote:
> >>
> >> > CodePoint and UTF32[N] are two representations that both refer to
> >> > the same piece of knowledge. Why the unnecessary duplication?
> >>
> >> It is not unnecessary *if* there is a noticeable performance
> >> improvement. I agree however that premature optimization should be
> >> avoided (obviously) which is why profiling should be performed
> >
> > I'm glad we agree that the code in question is probably an unnecessary
> > optimization.
>
> I didn't say that, it is unclear if the optimization is necessary and
> whether or not it is can be determined through profiling and/or examining
> the compiler's assembler output.

Fine, but you do agree that it is an optimization, the only doubt you
hold here is whether or not it is necessary. Since no tests have been
presenting showing that code without the extra variable needs
optimizing, this is by definition, a premature optimization.