From: Tony Johansson on
Hi!

This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
all languages so what is the reason
to use another Unicode then this UTF-8.

//Tony


From: Tony Johansson on
"Tony Johansson" <johansson.andersson(a)telia.com> skrev i meddelandet
news:OU1ZNyCzKHA.5940(a)TK2MSFTNGP02.phx.gbl...
> Hi!
>
> This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
> all languages so what is the reason
> to use another Unicode then this UTF-8.
>
> //Tony

I must correct myself UTF-8 can use up to 48-bit.

//Tony


From: Maate on
On 25 Mar., 16:09, "Tony Johansson" <johansson.anders...(a)telia.com>
wrote:
> "Tony Johansson" <johansson.anders...(a)telia.com> skrev i meddelandetnews:OU1ZNyCzKHA.5940(a)TK2MSFTNGP02.phx.gbl...
>
> > Hi!
>
> > This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
> > all languages so what is the reason
> > to use another Unicode then this UTF-8.
>
> > //Tony
>
> I must correct myself UTF-8 can use up to 48-bit.
>
> //Tony

Hey, I'm not sure, but I would guess that UTF-8 is slightly more
expensive to parse than other unicode encodings. For example, when
reading UTF-16 encoded text the parser would know that it has to read
exactly two bytes per character. On the other hand, if UTF-8 encoded,
the number of bytes to read per character will depend on the
information stored in individual bits. You could consider just a
simple example: this code in c# "my test string".Substring(5, 1), will
be easy to calculate in UTF-16, but with UTF-8 the parser would have
to calculate the individual character starting from the beginning in
order to determine which bytes actually represents character number 5
- perhaps making it at least 5 times as expensive. Probably this also
explains why for example .NET CLR stores text as UTF-16 internally -
it probably makes it easier (better performant) to manipulate and
search text.

Anyway, just some thoughts :-)

Br. Morten
From: Chris Dunaway on
On Mar 25, 10:05 am, "Tony Johansson" <johansson.anders...(a)telia.com>
wrote:
> Hi!
>
> This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
> all languages so what is the reason
> to use another Unicode then this UTF-8.
>
> //Tony

http://www.joelonsoftware.com/articles/Unicode.html
From: Konrad Neitzel on
Hi all!

"Maate" <maate(a)retkomma.dk> schrieb im Newsbeitrag
news:cb98f95e-6f15-45c7-bc05-44e0b96f922d(a)e7g2000yqf.googlegroups.com...
> Hey, I'm not sure, but I would guess that UTF-8 is slightly more
> expensive to parse than other unicode encodings.
Why that? UTF-16 also is not fixed to 2 Bytes per character. It can use more
bytes per character if required (A reason, why there is also a UTF-32)

> For example, when
> reading UTF-16 encoded text the parser would know that it has to read
> exactly two bytes per character. On the other hand, if UTF-8 encoded,
> the number of bytes to read per character will depend on the
> information stored in individual bits.

And yes, that can be the important point. Whenever you want to have random
access to characters without parsing all characters till the character you
want to read, you must be carefull that you really know how you many bytes
each character has.

UTF-16 is not fixed to 2 Bytes! That is a common mistake you find often. If
you want a fixed 2 Byte encoding, UCS-2 could be choosen but then you do not
support all characters that are supported with UTF-16!

More details can be found on
http://en.wikipedia.org/wiki/UTF
http://en.wikipedia.org/wiki/UTF-16
http://en.wikipedia.org/wiki/UTF-32

Konrad