Database Character Set¶
When developing database objects (e.g. tables and PL/SQL packages) the character set(NLS_CHARACTERSET) and NLS_LENGTH_SEMANTICS of the database must be considered.
It is strongly recommended to have a database which uses the Unicode character set(AL32UTF8) together with NLS_LENGTH_SEMANTICS set to CHAR. One reason for this is that, in the future, there could be a requirement to use languages other than the currently installed ones. This will be easier to accomplish if the Unicode character set has been used in the environment and the CHAR length semantic was also used.
All IFS internal support and development environments have Unicode character set.
This document explains some considerations in facilitating international characters in development.
VARCHAR2 Size Consideration¶
When working in an environment that has a Unicode character set it is important to check if the initialization parameter NLS_LENGTH_SEMANTICS is set to CHAR. This setting makes lengths of strings represented by characters as the default, rather than bytes, which are the default in Oracle.
If the NLS_LENGTH_SEMANTICS is set to BYTE, a declaration of a string VARCHAR2(10) means that the string can hold ten bytes. In a Unicode environment one character can be up to 1-4 bytes long. Therefore in a scenario where all characters take up 4 bytes, only two such characters can be put into that variable, not 10 which might be what was expected. If you try to assign a string which is larger than that (in bytes, not necessarily in characters) Oracle will raise the ORA-06502: PL/SQL: numeric or value error: character string buffer too small exception.
If the NLS_LENGTH_SEMANTICS is set to CHAR the same variable declaration will be allocated to hold ten characters. The actual size of the variable will be 20-40 bytes (4 bytes allocated for each character).
However the maximum size allowed for a VARCHAR2 stored in the database is limited to 4000 bytes. This means that the maximum size that can be stored may not be 4000 characters (even though NLS_LENGTH_SEMANTICS is set to CHAR). Likewise the maximum size of VARCHAR2 in PL/SQL is 32767 bytes irrespective of the length of the semantics.
Note: Storing VARCHAR2 of size 32767 in the database by setting the MAX_STRING_SIZE to Extended is not supported.
Substrings and Length Checks¶
It is sometimes required to trim down a PL/SQL VARCHAR2 string (max size 32767) to a VARCHAR2(4000) to be stored in a database table. In order to safely reduce the string without corrupting characters the following must be considered.
Avoid using byte functions such as SUBSTRB(), LENGHTB()¶
Most of the time it is required to count lengths in characters. These functions counts lengths in bytes, which will not give the result required in multi-byte environments where international characters are used.
Columns of size VARCHAR2(4000)¶
In an NLS_LENGTH_SEMANTIC=CHAR environment SUBSTR(string_,1,4000)
returns a string which is 4000 characters long. If international characters/multi-byte characters are in the string then the total size could well be larger than 4000 bytes. When trying to insert this into to a column with a size of VARCHAR2(4000), the exception ORA-01461: can bind a LONG value only for insert into a LONG column will be thrown (since Oracle does not allow data greater than 4000 bytes for varchar2 type).
To avoid this issue a substring operation similar to the following code can be used.
substr(string_, 1, length(substrb(string_, 1, 4000))-1)
Since substrb()
counts the length in bytes and not characters, it can sometimes split a character when using it. The length to which the string is sub-stringed must be taken after the substrb()
result reduced by one character. This will give the correct length in characters to substring for the outer most substr
function.