From: sybrandb on
On Wed, 29 Aug 2007 15:21:07 -0700, joel garry <joel-garry(a)home.com>
wrote:

>Ah jeez, Sybrand, Laurenz is one of the good guys.

Is he? His advice to set the database character set to the character
set of the client is utter nonsense.
When pointed to that fact, he doesn't admit that, but he starts to
spout flames and calling me 'clueless'.
Which, apart from being dishonest and a coward, as he doesn't admit
his own mistake, Laurenz Albe clearly is.

--
Sybrand Bakker
Senior Oracle DBA
From: Martin T. on
sybrandb(a)hccnet.nl wrote:
> On Tue, 28 Aug 2007 10:54:57 -0700, Ben <balvey(a)comcast.net> wrote:
>
>> After reading this it sounds like Oracle is saying that conversion is
>> a bad thing and that the database character set should be set the same
>> as the client.
>
> That is EXACTLY what Oracle is NOT (I REPEAT: NOT) saying.
>
> Oracle SAYS:
> THE DATABASE CHARACTERSET SHOULD BE THE SAME AS THE CHARACTERSET OF
> THE O/S.
>
> WHAT IS SO DIFFICULT TO UNDERSTAND ABOUT THAT?
>

Actually, I have to say that this advice confuses me.
Which O/S's character set ?

If I run the Oracle database on Windows, should it be Unicode or
Windows-1252?
An installation on Linux ... should it use UTF8, even when run for a
e.g. Japanese site where this will result in overhead vs. 16bit Unicode?
If I have client applications from 3 different O/S's ... which character
set?
What IS the "character set of the OS" anyway?

br,
Martin
From: Martin T. on
sybrandb(a)hccnet.nl wrote:
> On Fri, 24 Aug 2007 14:05:46 +0200, "Martin T." <0xCDCDCDCD(a)gmx.at>
> wrote:
>
>> Laurenz Albe wrote:
>>> sybrandb(a)hccnet.nl wrote:
>>>>> (...)
>>> You got me wrong.
>>>
>>> Of course it is not Oracle's bug if I set my NLS_LANG wrong.
>>>
>>> But it is Oracle's bug (in my opinion) if I have set the client
>>> character set to US7ASCII, insert a byte > 127 in a text field, and
>>> neither get an error nor (as Oracle seems to prefer) have the byte
>>> clandestinely converted to a question mark.
>>>
>>> I claim that the missing check for incorrect characters is a bug.
>>>
>> Amen. But still Oracle will probably tell you that it's a Feature, not a
>> Bug.
>
> It is not a bug. Laurenz Albe is, as usual, having everything wrong,
> clueless as he is on characterset issues.
> In this situation you will never notice anything when your database
> characterset is US7ASCII and your client characterset is US7ASCII.
> If the client O/S displays the character correctly everything will
> work.
> I am speaking from experience, been there, done that. Laurenz Albe
> just doesn't know what he is talking about (refer to his recent advise
> to set the database characterset to the characterset of the client,
> which is utterly stupid).
>

Would you be so kind as to quote Laurenz's exact wording where he
actually does recommend that? Because I sure cannot remember having read
anything here that would state such a thing.

br,
Martin
From: Laurenz Albe on
Martin T. <0xCDCDCDCD(a)gmx.at> wrote:
> If I run the Oracle database on Windows, should it be Unicode or
> Windows-1252?

Oracle recommends UNICODE:
http://www.oracle.com/technology/pub/columns/trute_unicode.html
and Metalink note 333489.1 say so.

I agree with Oracle.
It is always safe to choose AL32UTF8 as database character set.

But this is just a rule of thumb, your particular requirements may
recommend a different choice.

Particularly if it is not a new installation, it may be wise not to
try and change the character set to UNICODE, but to leave it as it is.

> An installation on Linux ... should it use UTF8, even when run for a
> e.g. Japanese site where this will result in overhead vs. 16bit Unicode?

For Kanji text, UTF-8 is certainly not the perfect choice, if storage
size is important for you.

UTF-16 would be better - but you cannot have UTF-16 as database character
set in Oracle. You'd have to use it as "national character set" and
define all the text columns in your database with NVARCHAR2 instead of
VARCHAR2 and NCHAR instead of CHAR.

There are probably some other "if"s one can think of.

I would probably use AL32UTF8 as database character set and AL16UTF16 as
national character set and try to remember to define all Japanese text
columns as NVARCHAR.

That way I can have the benefits of UTF-16 (less storage), but I am
also safe if somebody tries to insert Japanese text into a VARCHAR
column.

> If I have client applications from 3 different O/S's ... which character
> set?

The safe answer is UNICODE. It is always the safe answer, that is why
Oracle recommends it.

If - say - you only have clients that generate and consume WINDOWS-1252,
ISO8859-1 and ISO8859-15 and you can be certain of that, you could of
course also use WE8MSWIN1252 as database character set (see
Metalink 264294.1).

There is no big benefit in that, however - the ony ones I can think of
are that
a) some special characters will only need 1 byte instead of 2 or 3
b) no character set conversion overhead for WINDOWS-1252 clients.

> What IS the "character set of the OS" anyway?

Good question, I don't think there is an easy answer.

As far as I know, the character set of the Oracle software owner
operating system user has no influence on which database characters set
you should choose (I may be wrong on this though).

Yours,
Laurenz Albe
From: Martin T. on
Laurenz Albe wrote:
> Martin T. <0xCDCDCDCD(a)gmx.at> wrote:
>> If I run the Oracle database on Windows, should it be Unicode or
>> Windows-1252?
>
> Oracle recommends UNICODE:
> http://www.oracle.com/technology/pub/columns/trute_unicode.html
> and Metalink note 333489.1 say so.
>
> I agree with Oracle.
> It is always safe to choose AL32UTF8 as database character set.
>
> (...)

Ah yes. Makes sense. UTF8 the way to go.

cheers,
Martin