Why use other encoding then UTF-8 when this support almost every language [CSharp]

Prev: Read Mail - tcp
Next: How to prevent focus outlining on buttons

From: Arne Vajhøj on 25 Mar 2010 19:55

On 25-03-2010 12:58, Chris Dunaway wrote:
> On Mar 25, 10:05 am, "Tony Johansson"<johansson.anders...(a)telia.com>
> wrote:
>> This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
>> all languages so what is the reason
>> to use another Unicode then this UTF-8.
>
> http://www.joelonsoftware.com/articles/Unicode.html

<quote>
Back in the semi-olden days, when Unix was being invented and K&R were
writing The C Programming Language, everything was very simple. EBCDIC
was on its way out.
</quote>

I was not tempted to read any further.

Arne

From: Mihai N. on 26 Mar 2010 00:08

> Most applications don't need to support almost all languages.
>
> If you're creating an application for use in Japan by Japanese people,
> for example, then you might prefer to use Shift-JIS, which uses two
> bytes per character, instead of UTF-8, which uses three bytes per
> Japanese character.
>
> If you're building an app for use in the United States by English
> speakers, and much of the input is likely to come from ASCII sources, or
> much of your output is intended for use with software that understands
> only ISO-8859-1, then it may have no use for UTF-8.

Sorry, but this is prety bad advice.
Was probably ok 10 years ago, but not now.

First, the extra bytes here and there don't amount to much.
To one team bringint this argument I have shown that the .jpg they used for
splash-screen was bigger than all the strings together.

Second, all system APIs are Unicode UTF-16.
So if you use Shift-JIS or ASCII, you will waste time for conversions
back and forth (happening in the belly of the OS).
Same for keystrokes: the system gets unicode input and has to convert it
to a code page for legacy application.
Of course, if your application is running on Windows 95/98, then
you are better without Unicode.
This is also true for Mac OS X and Qt.

Third, this is a C# newsgroup, so I would assume the question
refers to that. So all strings are Unicode (UTF-16). Use any other
code page, and you will have to convert.

--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: JeffWhitledge on 26 Mar 2010 10:32

On Mar 25, 10:05 am, "Tony Johansson" <johansson.anders...(a)telia.com>
wrote:
> Hi!
>
> This Unicode UTF-8 can use up to 24 bit for encoding. UTF-8 support almost
> all languages so what is the reason
> to use another Unicode then this UTF-8.
>
> //Tony

UTF-8 supports the complete Unicode character set so it is a fine
choice for many applications. It can be used for nearly all of the
world's written languages, and it is a compact representation for
latin-based texts (like English), which are very common.

Except for interfacing with legacy applications, there is no good
reason to use a non-Unicode character set.

However, there are good reasons for using a Unicode character encoding
other than UTF-8.

Many platforms use UTF-16 internally (Windows NT,XP,Vista,7; the .Net
Framework, C#), so by sticking with that you can avoid conversions.
Many languages (especially Asian languages) have a more compact
representation in UTF-16 than in UTF-8. UTF-16 will be simpler to
process for many texts, since the characters in the Basic Multilingual
Plane (plane 0, which encodes the vast majority of the characters used
by living languages) are always represented by exactly 2 bytes in
UTF-16. (Characters in the higher planes are represented in 4 bytes in
UTF-16, but these characters are far less common.)

For these reasons, UTF-16 can also be an excellent choice of encoding
scheme.

There are few applications where UTF-32 is the best choice, and
probably all of them are for internal processing only. I can't imagine
a scenerio in which UTF-32 would be the best choice for storing or
transmitting text.

From: Chris Dunaway on 26 Mar 2010 17:51

On Mar 25, 6:55 pm, Arne Vajhøj <a...(a)vajhoej.dk> wrote:
>
> I was not tempted to read any further.
>
> Arne

Ummm... why?

From: Arne Vajhøj on 26 Mar 2010 19:47

On 26-03-2010 17:51, Chris Dunaway wrote:
> On Mar 25, 6:55 pm, Arne Vajh�j<a...(a)vajhoej.dk> wrote:
>> I was not tempted to read any further.
>
> Ummm... why?

Because that paragraph revealed a lack of knowledge so
big that I considered it a sure waste of time to read any
further.

Arne

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Read Mail - tcp
Next: How to prevent focus outlining on buttons