From: Peter Olcott on
On 5/31/2010 2:17 PM, Sam wrote:
> Peter Olcott writes:
>
>> On 5/31/2010 1:41 PM, Daniel T. wrote:
>>> CodePoint and UTF32[N] are two representations that both refer to the
>>> same piece of knowledge. Why the unnecessary duplication?
>>
>> Here is the best reason:
>>
>> bool UnicodeEncodingConversion::toUTF8
>> (const std::vector<uint32_t>& UTF32,
>> std::vector<uint8_t>& UTF8) {
>>
>> (see the added const ?)
>
> And even better:
>
> template<typename input_iter_t, typename output_iter_t>
> bool toUTF8(input_iter_t beg_iter, input_iter_t end_iter,
> output_iter_t output_iter)
>
> So that your masterpiece could be used with not just vectors, but any
> container, or any suitable stream.
>
> But, I'm sure you have no time to learn all this complicated stuff.
>
>

Yes you are right. I have not yet spent much time on creating my own
templates or using streams very much. I have focused most of my
attention on perfecting my object oriented design skills, and writing
the fastest code that is very readable and maintainable.

The generalization that you made greatly adds value to this function.
From: Daniel T. on
Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
> On 5/31/2010 1:24 PM, Daniel T. wrote:
> > Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
> > > On 5/31/2010 11:35 AM, Daniel T. wrote:
> >
> > > > The codes 10FFFE and 10FFFF are guaranteed not to be unicode
> > > > characters...
> > >
> > > So then Wikipedia is wrong?
> > > http://en.wikipedia.org/wiki/Unicode
> > > 16 100000�10FFFF Supplementary Private Use Area-B
> >
> > According to unicode.org, apparently yes. You'd know that if you
> > hadn't been lazy and only consulted a secondary source.
>
> I simply don't have the time to read all of the Unicode stuff to find
> the two or three paragraphs that I really need to know. I already know
> about High and Low surrogates. Why is the range that you specified not
> valid codepoints?

It took me less than two minutes searching unicode.org to answer the
question you are asking me. I suggest you make the attempt at least.

> > > So it looks otherwise correct?
> >
> > Does it pass all your tests? You do have tests don't you?
>
> I am using the results of this function to mutually exhaustively test
> the results of another function that does the conversion in the other
> direction. These tests pass.

Then why are you asking us if the algorithm is correct?
From: Daniel T. on
"Leigh Johnston" <leigh(a)i42.co.uk> wrote:
> "Daniel T." <daniel_t(a)earthlink.net> wrote:
>
> > CodePoint and UTF32[N] are two representations that both refer to
> > the same piece of knowledge. Why the unnecessary duplication?
>
> It is not unnecessary *if* there is a noticeable performance
> improvement. I agree however that premature optimization should be
> avoided (obviously) which is why profiling should be performed

I'm glad we agree that the code in question is probably an unnecessary
optimization.

> but it is also a matter of writing clear code which is easy to parse
> (understand). There is no real disadvantage to storing the result of
> "*it" or "UTF32[N]" in a temporary.

The above really is just a matter of opinion. Obviously, my opinion
differs from yours.
From: Peter Olcott on
On 5/31/2010 2:35 PM, Giovanni Dicanio wrote:
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:
>
>> He needed to find some excuse to denigrate my code. He has had a
>> personal grudge against me for several months. I don't really know
>> what I said to offend him, but, it must have occurred sometime after
>> he sung very high praises about my patent a few months ago.
>
> I don't think so.
>
> Joe helps lots of people here (and is a nice guy in person!).
>
> You must have misunderstood.

No I really don't think so. He helps lots of people, and other than his
disdain for me may be a really nice guy. He is certainty not speaking
accurately about the quality of my developmental code. The degree of
this inaccuracy indicates a strong negative bias against me.

Other people here have picked apart several aspects of his negative
assessment and thus sided against this negative assessment.

When viewed within the specific context that the sole purpose of this
code is to validate the correctness of the algorithm the code is
objectively at the very least very good quality.

>
>
>>> std::vector<uint8_t> toUTF8(const std::vector<uint32_t> & utf32);
>>
>> For most compilers this requires making an extra copy.
>
> Before move semantics, I think several C++ compilers implemented the RVO.
>
> Giovanni
>

From: Peter Olcott on
On 5/31/2010 2:41 PM, Daniel T. wrote:
> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>> On 5/31/2010 1:24 PM, Daniel T. wrote:
>>> Peter Olcott<NoSpam(a)OCR4Screen.com> wrote:
>>>> On 5/31/2010 11:35 AM, Daniel T. wrote:
>>>
>>>>> The codes 10FFFE and 10FFFF are guaranteed not to be unicode
>>>>> characters...
>>>>
>>>> So then Wikipedia is wrong?
>>>> http://en.wikipedia.org/wiki/Unicode
>>>> 16 100000�10FFFF Supplementary Private Use Area-B
>>>
>>> According to unicode.org, apparently yes. You'd know that if you
>>> hadn't been lazy and only consulted a secondary source.
>>
>> I simply don't have the time to read all of the Unicode stuff to find
>> the two or three paragraphs that I really need to know. I already know
>> about High and Low surrogates. Why is the range that you specified not
>> valid codepoints?
>
> It took me less than two minutes searching unicode.org to answer the
> question you are asking me. I suggest you make the attempt at least.

OK.

>
>>>> So it looks otherwise correct?
>>>
>>> Does it pass all your tests? You do have tests don't you?
>>
>> I am using the results of this function to mutually exhaustively test
>> the results of another function that does the conversion in the other
>> direction. These tests pass.
>
> Then why are you asking us if the algorithm is correct?

Because they could both be wrong in the same way. If at least one of
them is entirely correct then the mutual test would prove that both are
correct.