From: Maate on

> "Maate" <ma...(a)retkomma.dk> schrieb im Newsbeitragnews:cb98f95e-6f15-45c7-bc05-44e0b96f922d(a)e7g2000yqf.googlegroups.com...> Hey, I'm not sure, but I would guess that UTF-8 is slightly more
> > expensive to parse than other unicode encodings.
>
> Why that? UTF-16 also is not fixed to 2 Bytes per character. It can use more
> bytes per character if required (A reason, why there is also a UTF-32)

Thanks for pointing this out, I really was not aware of that!

However, I still think my point on performance is quite valid, so
allow me to be more specific: You can create better performant
algorithms on text encoded with UTF-16 unless you write in Klingon or
Egyptian Hieroglyphs (and a few other languages using characters
outside of the U+0000 to U+FFFF space) ;-)

Br. Morten
From: Mihai N. on
> You think it doesn't cost anything to buy 50% more drives to store and
> back up your data if you have a system that stores terabytes of text?
> Not to mention backups taking 50% longer.

1. Not all your data is text.
In fact, I bet very little of it is text.
2. The rule of thumb recomandation is:
- legacy code pages to "talk" with ancient software
- utf-8 for storage/serialization/comunication
- utf-16 for processing
- convert at the edge
There are exceptions, nothing is carved in stone, but you should know
why you decide differently


> "it is always not worth using something other than UTF-8".

Processing on a systemt that uis


>> Second, all system APIs are Unicode UTF-16.
>> So if you use Shift-JIS or ASCII, you will waste time for conversions
>> back and forth (happening in the belly of the OS).
>
> Whereas conversion to UTF-8 happens by magic?

Did I recomend utf-8? Read again.
And even if utf-8 makes sense sometimes, you convert at the edge.
You answer recomended Shift-JIS and Latin 1. Same conversion overhead,
with no benefit (international text support)


> So you should spite yourself by writing apps that won't communicate with
> the legacy apps you're stuck supporting?

Are you communicate with other applications thru keyboard messages?
Anyway, this is trying to twist my answer to mean
"always, absolutely always use utf-16, this is a religious tenet and
you have to obey it blindly without using your brain"
That's not the case.


--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Mihai N. on

> Are you saying that all *A Win32 calls convert to Unicode?

If you run on NT/2000/XP/Vista/Win7, then yes.
All *A APIs convert to UTF-16, call the *W version of the API,
then convert the answer back to ANSI code page.
http://www.mihai-nita.net/article.php?artID=20050306b


>> Of course, if your application is running on Windows 95/98, then
>> you are better without Unicode.
>> This is also true for Mac OS X and Qt.
>
> MacOS X support Unicode.
>
> Unicode is the native character set in Qt.

Sorry, this is my bad phrasing.
I meant to say exactly what you did, that the native API for Mac and Qt is
UTF-16. But the sentence before changed the meaning.


>> Third, this is a C# newsgroup, so I would assume the question
>> refers to that. So all strings are Unicode (UTF-16). Use any other
>> code page, and you will have to convert.
>
> It also converts for UTF-16.
>
> It may be faster than UTF-8, but I would not expect a big difference.


I am not sure what you mean by "It also converts for UTF-16"
..NET uses UTF-16 natively, so there is no conversion
(except to comunicate with some external component that does not
understand UTF-16)



--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Mihai N. on
> An answer expressed in terms of a bet concerning the needs of
> MY data isn't pertinent.

A bet usually means "I am highly confident that if you take the bet,
I will take your money" and be statistically right.
It does not mean "I know for sure" :-)


> I don't suppose you consider it to be outside
> the realm of possibility that there's such a thing as a database whose
> purpose is to store billions of documents.

There are very few black and white answers in programming.
So yes, there are valid situations where one would have to think and
make the best decision for the given case.

If you are looking for holes in any answers, then there are always
counter-examples.



--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Arne Vajhøj on
On 28-03-2010 05:33, Mihai N. wrote:
>> Because that paragraph revealed a lack of knowledge so
>> big that I considered it a sure waste of time to read any
>> further.
>
> Actually, even if the article is full of inaccuracies and misunderstandings,
> it has a big merit: it is entertaining enough to be popular. And the message
> at the end of the story is clear: programmers today must know about Unicode
> and use it, the time for legacy code pages is gone.
>
> And that message is what most programmers (and not only) are left with after
> reading it.
>
> In the beginning I was also kind of angry about all that noise for an article
> that is "wrong." But you know what? Until someone writes an article that is
> as popular as that one, sending the right message, and without the mistakes,
> Joel's article is all we got.

If I want to see something that "everybody" is talking about then I
will turn on the TV and watch "American Idol".

If I want to read something about programming, then I find something
that is correct.

Arne