From: Arne Vajhøj on
On 19-03-2010 14:25, Jeff Johnson wrote:
> "Peter Duniho"<no.peted.spam(a)no.nwlink.spam.com> wrote in message
> news:uIZGlx3xKHA.5364(a)TK2MSFTNGP05.phx.gbl...
>>>> I would not consider Unicode an encoding.
>>>
>>> Uh, why? An encoding is simply a means of associating a set of bytes with
>>> the characters they represent. That's what Unicode does.
>>
>> I believe Arne's point is that "Unicode" by itself does not describe a way
>> to encode characters as bytes. There are specific encodings within
>> Unicode (as part of the standard): UTF-8, UTF-16, and UTF-32. But Unicode
>> by itself describes a collection of valid characters, not how they are
>> encoded as bytes.
>
> Ah. I just go with the convention that "Unicode" by itself, at least in the
> .NET world, means UTF-16LE.

It is relative common in traditional Win32 C++ context.

I hope that it is not so common in .NET context. The docs for
String and Char very specifically say that it is Unicode in
UTF-16 encoding.

But it may very well be the original posters interpretation
as well, because he listed Unicode but not UTF-16.

Arne

From: Tony Johansson on
"Arne Vajh�j" <arne(a)vajhoej.dk> skrev i meddelandet
news:4ba42194$0$276$14726298(a)news.sunsite.dk...
> On 19-03-2010 14:25, Jeff Johnson wrote:
>> "Peter Duniho"<no.peted.spam(a)no.nwlink.spam.com> wrote in message
>> news:uIZGlx3xKHA.5364(a)TK2MSFTNGP05.phx.gbl...
>>>>> I would not consider Unicode an encoding.
>>>>
>>>> Uh, why? An encoding is simply a means of associating a set of bytes
>>>> with
>>>> the characters they represent. That's what Unicode does.
>>>
>>> I believe Arne's point is that "Unicode" by itself does not describe a
>>> way
>>> to encode characters as bytes. There are specific encodings within
>>> Unicode (as part of the standard): UTF-8, UTF-16, and UTF-32. But
>>> Unicode
>>> by itself describes a collection of valid characters, not how they are
>>> encoded as bytes.
>>
>> Ah. I just go with the convention that "Unicode" by itself, at least in
>> the
>> .NET world, means UTF-16LE.
>
> It is relative common in traditional Win32 C++ context.
>
> I hope that it is not so common in .NET context. The docs for
> String and Char very specifically say that it is Unicode in
> UTF-16 encoding.
>
> But it may very well be the original posters interpretation
> as well, because he listed Unicode but not UTF-16.
>
> Arne

I used the different enum that Encoding class had.
Here the Unicode was UTF-16.

//Tony


From: Jeff Johnson on

"Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message
news:4ba42087$0$276$14726298(a)news.sunsite.dk...

>>> I would not consider Unicode an encoding.
>>
>> Uh, why? An encoding is simply a means of associating a set of bytes with
>> the characters they represent. That's what Unicode does.
>
> No.
>
> Unicode is a mapping between the various symbols and a number.
>
> Encoding is the mapping between the number and 1-many bytes.

Right, but consider this little gem:

===========
Encoding.Unicode Property

Gets an encoding for the UTF-16 format using the little-endian byte order.
===========

I think people can be forgiven for equating the two, especially in the
context of .NET code, since Microsoft plainly made it look that way.