Home Home    Forum    Blog    Feed your aggregator (RSS 2.0)

The Johnnynine Weblog - October, 2008
A weblog by Johnny Hughes
 
 Tuesday, October 21, 2008

To determine an Oracle database's character set:

select value from nls_database_parameters where parameter='NLS_CHARACTERSET'

 

UTF8 was the UTF-8 encoded character set in Oracle8 and 8i.

AL32UTF8 is the UTF-8 encoded character set introduced in Oracle9i.

 

VB6 reports UTF8 characters as 3 bytes, and AL32UTF8 characters as 4 bytes.

 

How "character-based" sizing in Oracle works

It requires setting an instance or session parameter (e.g., alter session set NLS_LENGTH_SEMANTICS = 'CHAR';).  This is available starting in 9i.  Then, subsequent column declarations such as VARCHAR2(10) really mean 10 characters and not 10 bytes.  You can also be explicit on the declaration by saying VARCHAR2(10 CHAR), but that is not as flexible obviously (because it will always mean CHAR even if the database is currently set to default to BYTE).

When doing a DESC command, the keyword "BYTE" or "CHAR" only shows up in the parentheses (as in your dump below) if the table is declared in the mode opposite what your current SQL*Plus session is set to.

 

Writing East Asian characters languages (such as Japanese)

In order for a VB6 or .Net application to write Japanese characters to an Oracle UTF8 database, the East Asian language files must be installed on the machine running the application that writes to the database.  This is done in XP and Windows 2003 from Start->Control Panel->Regional and Language Options->Language tab.  If they are not installed, no error will occur but the Japanese characters will be written as question marks.  It is unclear why this is the case however it has been confirmed with a UTF8 Oracle database, VB6 and .Net applications, and Japanese characters.  It is assumed that this applies to all East Asian languages as well as AL32UTF8 databases.

 

Snippet from an Oracle document describing these character sets' support and origin:

~~~~~~~~~~~~~~~ begin snip ~~~~~~~~~~~~~~~

UTF8

UTF8 was the UTF-8 encoded character set in Oracle8 and 8i. It followed the Unicode 2.1 standard between Oracle 8.0 and 8.1.6, and was upgraded to Unicode version 3.0 for versions 8.1.7, 9i and 10g. To maintain compatibility with existing installations this character set will remain at Unicode 3.0 in future Oracle releases. Although specific supplementary characters were not assigned to Unicode until version 3.1, the allocation for these characters were already defined in 3.0. So if supplementary characters are inserted in a UTF8 database, it will not corrupt the actual data inside the database. They will be treated as 2 separate undefined characters, occupying 6 bytes in storage. We recommend that customers switch to AL32UTF8 for full supplementary character support.

AL32UTF8

This is the UTF-8 encoded character set introduced in Oracle9i. AL32UTF8 is the database character set that supports the latest version (4.01 in Oracle 10.2) of the Unicode standard. It also provides support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes. AL32UTF8 was introduced because when UTF8 was designed (in the times of Oracle8) there was no concept of supplementary characters, therefore UTF8 has a maximum of 3 bytes per character. Changing the design of UTF8 would break backward compatibility, so a new character set was introduced. The introduction of surrogate pairs should mean that no significant architecture changes are needed in future versions of the Unicode standard, so currently the plan is to keep enhancing AL32UTF8 as necessary to support future version of the Unicode standard. For example in Oracle 10.1 this character set implemented the Unicode 3.2 standard, in Oracle 10.2 that has been updated to support the Unicode 4.01 standard.

~~~~~~~~~~~~~~~ end snip ~~~~~~~~~~~~~~~

Tuesday, October 21, 2008 11:14:47 AM (US Mountain Standard Time, UTC-07:00)  #    Comments [0]   .NET | Database | Oracle | Technical  | 
Copyright © 2010 Johnny Hughes. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.