what characterset to use? [Oracle]

Prev: OID/LDAP problem - Any help with a fix would be appreciated
Next: Oracle11 & statistics calculations

From: sybrandb on 22 Aug 2007 14:57

On Wed, 22 Aug 2007 08:39:02 -0700, Ben <balvey(a)comcast.net> wrote:

>One other thought. How could it be possible to guarantee that you
>would never corrupt your data in a WE8MSWIN1252 database character set
>or any other multibyte character set?
>

WE8MSWIN1252 is a *single* byte characterset. Please read your
manuals.
>You don't really have control over what character set all the clients
>connect with, do you? If you have a client that uses US7ASCII and they
>select then update based on results, you could potentially corrupt all
>your data. no?

Incorrect again. If your database has been set up correctly (which is
database characterset matches the OS characterset or is supported by
the OS AND NLS_LANG has been set correctly) the US7ASCII data will get
converted upon arrival.
If your client NLS_LANG is US7ASCII AND your database NLS_LANG is
US7ASCII, no conversion will take place. THIS will 'corrupt' your
data.

--

Sybrand Bakker
Senior Oracle DBA

From: Laurenz Albe on 23 Aug 2007 06:21

Ben <balvey(a)comcast.net> wrote:
> 10.2.0.2 Ent Ed, AIX5L
>
> We are planning a new database and I'm having a little confusion on
> what character set to use.
> I don't want to use multibyte data unless
> it is necessary but I do want to have the option available.
>
> 1. Am I correct in thinking that if we use NLS_CHARACTERSET =
> WE8MSWIN1252 and NLS_NCHAR_CHARACTERSET = AL16UTF16 we shouldn't have
> any issues with losing multibyte characters?

Most of your questions have already been answered correctly,
but I would recommend that you create the database with AL32UTF8,
just like Oracle recommends.

This way you don't have to think which columns should be NCHAR and
which should be CHAR, because you can store any character in any text
column.

You also lose nothing if you use AL32UTF8, because ASCII characters
(Latin letters, numbers, the common special characters) will be stored
as single bytes (in AL16UTF16 they would use up two bytes).

> 2. On my windows client I go to a command prompt and issue the
> following c:\> echo %NLS_LANG%,
> I get %NLS_LANG% returned back to me. I'm assuming that this means I
> don't have the variable set. How does my client then treat the data
> that I am getting from the database?

As has been mentioned, NLS_LANG is probably set to WE8MSWIN1252 in the
Windows registry. You can override this setting with the environment
variable. If neither is set, US7ASCII is assumed.

The data will be converted to the character set specified in NLS_LANG,
so that value must always be set to the character set that your
application or operating system expect.

There are two things that can happen when you have a bad NLS_LANG setting:

1) Some or all characters are displayed as either garbage or question marks.
This is rather harmless because you can find and correct it easily.
Unfortunately Oracle has made the design choice to return garbage
instead of throwing an error message when characters cannot be converted.

2) If you (as misguided admins too often do) mistakenly set NLS_LANG
to the database character set, no character conversion AND NO
INTEGRITY CHECKS are performed and you can store all sort of garbage
in your database without even noticing. It will cause problems later on,
though. This is an Oracle bug in my opinion, although Oracle will
probably disagree with me on this.

Again, when you use AL32UTF8, the danger of running into problem 2) is
smaller because on no platform this is the "default" client character set.

Yours,
Laurenz Albe

From: Ben on 23 Aug 2007 08:22

On Aug 22, 2:57 pm, sybra...(a)hccnet.nl wrote:
> On Wed, 22 Aug 2007 08:39:02 -0700, Ben <bal...(a)comcast.net> wrote:
> >One other thought. How could it be possible to guarantee that you
> >would never corrupt your data in a WE8MSWIN1252 database character set
> >or any other multibyte character set?
>
> WE8MSWIN1252 is a *single* byte characterset. Please read your
> manuals.

I wasn't trying to imply that WE*8*MSWIN1252 was a multibyte
characterset. I can see how it read that way though. I have read the
Globalization Guide, thanks for the pointer though.

> >You don't really have control over what character set all the clients
> >connect with, do you? If you have a client that uses US7ASCII and they
> >select then update based on results, you could potentially corrupt all
> >your data. no?
>
> Incorrect again. If your database has been set up correctly (which is
> database characterset matches the OS characterset or is supported by
> the OS AND NLS_LANG has been set correctly) the US7ASCII data will get
> converted upon arrival.

The way I'm understanding Tom Kyte's book, Expert Oracle Database
Architeture, it is contradicting what you're saying here. He has an
example on page 493-494 where a database using characterset
we8iso8859p1 is loaded with three 8 bit values chr(224) chr(225) &
chr(226). The he sets a client's character set to us7ascii and reads
the data. It displays three 7 bit converted characters. So yes it does
convert 8 bit data to a 7 bit replacement character. But if that same
client stores that data in a variable and then updates that data, it
will then be stored as the 7 bit value that was read by the client.
That is what I meant by data corruption.

> If your client NLS_LANG is US7ASCII AND your database NLS_LANG is
> US7ASCII, no conversion will take place. THIS will 'corrupt' your
> data.

If you database is setup as us7ascii then no 8 bit data can be stored
in the first place. correct? If the client was setup as us7ascii also,
then I don't see the problem. Client can't read or write 8 bit data
and database can't store 8 bit data, there wouldn't be any need for a
conversion. What am I missing here?

From: Ben on 23 Aug 2007 08:42

On Aug 22, 2:52 pm, sybra...(a)hccnet.nl wrote:
> On Wed, 22 Aug 2007 05:55:54 -0700, Ben <bal...(a)comcast.net> wrote:
> >10.2.0.2 Ent Ed, AIX5L
>
> >We are planning a new database and I'm having a little confusion on
> >what character set to use. We are not sure if in the future we may
> >need some double byte characters. For the most part we can suffice
> >with AMERICAN_AMERICA.WE8MSWIN1252 but it is possible that in the
> >future we need multibyte characters. I have been reading the
> >Globalization guide and the section in Tom Kyte's Expert Oracle
> >Database Architecture on this but I could be totally mis interrepting
> >something here. I haven't read chapter 6 on unicode in the
> >Globalization Guide (yet). I don't want to use multibyte data unless
> >it is necessary but I do want to have the option available.
>
> >1. Am I correct in thinking that if we use NLS_CHARACTERSET =
> >WE8MSWIN1252 and NLS_NCHAR_CHARACTERSET = AL16UTF16 we shouldn't have
> >any issues with losing multibyte characters?
>
> The basic guideline is you use a characterset which is native to the
> database server. As your database is on AIX5L, this is WE8ISO8859P15,
> NOT mswin1252. Oracle however is moving towards UTF, and I believe I
> did read a note ALL 11g databases are going to be UTF by default.
>
> >2. On my windows client I go to a command prompt and issue the
> >following c:\> echo %NLS_LANG%,
> >I get %NLS_LANG% returned back to me. I'm assuming that this means I
> >don't have the variable set. How does my client then treat the data
> >that I am getting from the database?
>
> The Oracle default for all O/S es is still US7ASCII. However, if you
> did install your client properly, you will have a *registry* string
> NLS_LANG set to MSWIN1252. This will work.

If I don't get a value returned for > echo %NLS_LANG% then I guess it
isn't set, right? Exactly what part of the client install sets that
value? I don't recall being asked or prompted for a character set
setting? I checked my registry under
LOCAL_MACHINE-SOFTWARE-ORACLE and the NLS_LANG there is N/A. This is
on a WinXP Pro client machine.

> >3. When I query v$nls_parameters does this show me what character set
> >my client session is using?
>
> It should but better query v$nls_session_parameters.
> V$ views have documented descriptions BTW: in the database (use DIC)
> in the documentation, and on Metalink.
>

Do you mean nls_session_parameters?

SQL> SELECT * FROM v$nls_session_parameters;
SELECT * FROM v$nls_session_parameters
*
ERROR at line 1:
ORA-00942: table or view does not exist

When I query nls_session_parameters there isn't a row returned for
characterset.

> >4. When I query nls_database_parameters, does this show me what the
> >database has been setup as on the server side?
>
> see answer 3
> >5. Within sqlplus what does 'show parameter nls' tell me? I'm assuming
> >the init.ora nls settings.
>
> Exactly, you 'assume'.
> Your assumption is incorrect.
> It will show you the result from v$parameter. If you started your
> database using a spfile (which you should do anyway), the parameters
> from the spfile will show.
>

I guess I should have said spfile instead of init.ora, as we do use an
spfile. I tend to refer to them as one in the same.

From: Laurenz Albe on 23 Aug 2007 09:53

Ben <balvey(a)comcast.net> wrote:
> If you database is setup as us7ascii then no 8 bit data can be stored
> in the first place. correct? If the client was setup as us7ascii also,
> then I don't see the problem. Client can't read or write 8 bit data
> and database can't store 8 bit data, there wouldn't be any need for a
> conversion. What am I missing here?

Integrity.

The problem will be that the database has no idea about the encoding you
use for text data.

The problem will manifest as soon as you try to retrieve these data in
some other context - with a client on some other platform, with
a client that uses a different character set (say, JDBC).

Yours,
Laurenz Albe

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: OID/LDAP problem - Any help with a fix would be appreciated
Next: Oracle11 & statistics calculations