Oracle 11 Server and Unicode UTF-8 [Oracle]

Prev: Top 10 Technical requirements for In-Memory Reporting
Next: Understanding this query

From: Walt on 27 May 2010 14:49

Hi,

We're running Oracle 10g on Windows using ISO 8859 as the character set.
We're exploring the idea of converting to unicode (UTF8) along with the
upcoming upgrade to v11.

I haven't found a good reference for how to best accomplish the
conversion and what pitfalls to watch out for. Any suggestions?

One thing I expect to be a problem is that we have about 200 columns
that are defined as Varchar2(4000). My understanding is that this limit
is 4000 *bytes* so some of our data may not "fit" if it contains enough
characters with ascii values above 128.

Anyone been through this before who'd like to offer sage advice?

Thanks.

//Walt

From: John Hurley on 27 May 2010 16:40

Walt:

> We're running Oracle 10g on Windows using ISO 8859 as the character set.
> We're exploring the idea of converting to unicode (UTF8) along with the
> upcoming upgrade to v11.
>
> I haven't found a good reference for how to best accomplish the
> conversion and what pitfalls to watch out for. Any suggestions?
>
> One thing I expect to be a problem is that we have about 200 columns
> that are defined as Varchar2(4000). My understanding is that this limit
> is 4000 *bytes* so some of our data may not "fit" if it contains enough
> characters with ascii values above 128.
>
> Anyone been through this before who'd like to offer sage advice?

Hard to believe you have not turned up relevant advice. This kind of
question has been asked and answered a whole bunch of times on the
internet. Yes even on cdos a bunch of times.

I have not tried searching cdos with the google groups interface for a
while ... it used to work pretty well.

The Oracle documentation is pretty good here. Have you looked at it?

Other places to check are the Oracle forums for the database area or
asktom ...

From: Arne Ortlinghaus on 28 May 2010 04:57

Hi Walt,

yes, every field with too much characters (more than 4000 bytes UTF8 code)
can not be converted without data loss. It could be more secure to add new
Unicode columns if you have already database with data in it and then add
conversion programs.

Arne Ortlinghaus
ACS Data Systems

"Walt" <walt_askier(a)SHOESyahoo.com> schrieb im Newsbeitrag
news:G8zLn.326559$Up1.229692(a)en-nntp-09.dc1.easynews.com...
> Hi,
>
> We're running Oracle 10g on Windows using ISO 8859 as the character set.
> We're exploring the idea of converting to unicode (UTF8) along with the
> upcoming upgrade to v11.
>
> I haven't found a good reference for how to best accomplish the conversion
> and what pitfalls to watch out for. Any suggestions?
>
> One thing I expect to be a problem is that we have about 200 columns that
> are defined as Varchar2(4000). My understanding is that this limit is
> 4000 *bytes* so some of our data may not "fit" if it contains enough
> characters with ascii values above 128.
>
> Anyone been through this before who'd like to offer sage advice?
>
> Thanks.
>
> //Walt

From: Lothar =?utf-8?Q?Armbr=C3=BCster?= on 28 May 2010 10:38

"Arne Ortlinghaus" <Arne.Ortlinghaus(a)acs.it> writes:

> Hi Walt,
>
> yes, every field with too much characters (more than 4000 bytes UTF8 code) can
> not be converted without data loss. It could be more secure to add new Unicode
> columns if you have already database with data in it and then add conversion
> programs.
>
> Arne Ortlinghaus
> ACS Data Systems
>

Its not only 4000 bytes that impose a limit. I recently tried to import
an export file from a database using ISO 8859 into a database uning
UTF-8.
I had some columns of type VARCHAR2(3) which meant VARCHAR2(3 BYTE).
The columns in the new database were also VARCHAR2(3 BYTES) and some data
contained german umlauts which are stored in two bytes in UTF-8. So
these rows were not imported.

IIRC there is a syntax like VARCHAR2(3 CHAR) when defining
columns. That should allow 3 UTF-8 characters to be stored in the
column. I think it's a good idea to checks this and possibly convert
columns to allow a maximum number of characters rather than bytes before
doing export/import involving multibyte charactersets.

> "Walt" <walt_askier(a)SHOESyahoo.com> schrieb im Newsbeitrag
> news:G8zLn.326559$Up1.229692(a)en-nntp-09.dc1.easynews.com...
>> Hi,
>>
>> We're running Oracle 10g on Windows using ISO 8859 as the character
>> set. We're exploring the idea of converting to unicode (UTF8) along with the
>> upcoming upgrade to v11.
>>
>> I haven't found a good reference for how to best accomplish the conversion
>> and what pitfalls to watch out for. Any suggestions?
>>
>> One thing I expect to be a problem is that we have about 200 columns that
>> are defined as Varchar2(4000). My understanding is that this limit is 4000
>> *bytes* so some of our data may not "fit" if it contains enough characters
>> with ascii values above 128.
>>
>> Anyone been through this before who'd like to offer sage advice?
>>
>> Thanks.
>>
>> //Walt
>

HTH,
Lothar

--
Lothar Armbrüster | lothar.armbruester(a)t-online.de
Hauptstr. 26 |
65346 Eltville |

From: Serge Rielau on 29 May 2010 09:17

VARCHAR2(4000) just means that whoever defined the DB schema was too
lazy to investigate the domain.
If run a few MAX(LENGTH(..) that will giv eyou a pretty good idea as to
what the domain really is.
In all my engagements I have yet to come across a situation where there
actually was a requirement for 4000 bytes.

Chances are the vast majority of these are monsters under the bed.
So grab a flashlight. :-)

Cheers
Serge

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

| Next | Last
Pages: 1 2
Prev: Top 10 Technical requirements for In-Memory Reporting
Next: Understanding this query