From: Tony Johansson on
Hi!

Here I encode the spanish character "�" to UTF-8 which is encoded as a two
bytes with the values 195 and 177 which is understandable.
As we know a char is a Unicode which is a signed 16-bits integer.
Now to my question when I run this program and use the debugger and hover
over this ch variabel that is of type char
it shows 241.
I mean because a char is Unicode(UTF-16) and this value is using two bytes
when UTF-8 is used how can the debugger show 241 when I hover over this ch
variable ?

static void Main(string[] args)
{
UTF8Encoding utf8 = new UTF8Encoding();
string chars = "�";
char ch = '�';
byte[] byteArray = new byte[utf8.GetByteCount(chars)];
byteArray = utf8.GetBytes(chars);
Console.WriteLine(utf8.GetString(byteArray));
}

//Tony


From: Mihai N. on
> Here I encode the spanish character "�" to UTF-8 which is encoded as a two
> bytes with the values 195 and 177 which is understandable.
> As we know a char is a Unicode which is a signed 16-bits integer.
> Now to my question when I run this program and use the debugger and hover
> over this ch variabel that is of type char
> it shows 241.
> I mean because a char is Unicode(UTF-16) and this value is using two bytes
> when UTF-8 is used how can the debugger show 241 when I hover over this ch
> variable ?


The code point of � is U+00F1
This is 0xF1 (or 241 decimal) in UTF-16 or UTF-32, and C3 B1
(195 177 decimal) as UTF-8.

You can have some fun starting with the table here:
http://en.wikipedia.org/wiki/UTF-8#Description

195 177 decimal = C3 B1 hex = 11000011 10110001 binary
Now you take the binary and compare it to the UTF-8 pattern:
11000011 10110001
110yyyxx 10xxxxxx (second line in the table)
So you extract the usefull bits (above yyyxxxxxxxx) and get
00011 110001
Together that is 00011110001 or split in groups of 4 you
get 000.1111.0001. That is exactly F1 (241).





--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email