When data is silently altered..... beware the characterset



I was hit by a vaguely interesting issue again when moving data from one database to another recently and thought it was worth a write up. I recreated the problem using a simple case to illustrate the problem more clearly.

The destination db was created using the following command

create database;

The source db was creating using this command

create database character set al32utf8;

And before you say that wasn't the command you used - it actually was - see this pointless post i did a few years ago:

http://dbaharrison.blogspot.co.uk/2013/12/it-cant-get-much-simpler-than-this.html

I then create a basic table in the source (utf8) system schema called demo

SQL> create table system.demo (col1 varchar2(10));

Table created.

And then inserted a few rows with some 'foreign characters' (making sure NLS_LANG and the emulator are set to utf8 compatible of course)

insert into system.demo values('A');
insert into system.demo values('À' );
insert into system.demo values('Á');
insert into system.demo values('Â');
insert into system.demo values('Ã');
insert into system.demo values('Ä');
insert into system.demo values('Å');

I can then see my nice data in the table

SQL> select * from system.demo;

COL1
------------------------------
A
À
Á
Â
Ã
Ä
Å

7 rows selected.

All good - i now add a PK on that column

SQL> alter table system.demo add primary key (col1);

Table altered.

Still all good - just what we expect.

Now lets move that data to another db where we used the default encoding (US7ASCII).

So we export the table out with data pump (no need for NLS_LANG or anything else to be set here - unlike old exp)

oracle@localhost:~$ expdp system/manager tables=demo reuse_dumpfiles=y

Export: Release 11.2.0.4.0 - Production on Mon Apr 11 20:10:41 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
Starting "SYSTEM"."SYS_EXPORT_TABLE_01":  system/******** tables=demo reuse_dumpfiles=y
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 16 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/PRE_TABLE_ACTION
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
. . exported "SYSTEM"."DEMO"                             5.046 KB       7 rows
Master table "SYSTEM"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_TABLE_01 is:
  /u01/app/oracle/product/11.2.0/rdbms/log/expdat.dmp
Job "SYSTEM"."SYS_EXPORT_TABLE_01" successfully completed at Mon Apr 11 20:10:50 2016 elapsed 0 00:00:08

And to comes out all nicely - however when we load into the other db......

impdp system/manager tables=demo

Import: Release 11.2.0.4.0 - Production on Mon Apr 11 20:12:35 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
Master table "SYSTEM"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_TABLE_01":  system/******** tables=demo
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SYSTEM"."DEMO"                             5.046 KB       7 rows
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
ORA-39083: Object type CONSTRAINT failed to create with error:
ORA-02437: cannot validate (SYSTEM.SYS_C003821) - primary key violated
Failing sql is:
ALTER TABLE "SYSTEM"."DEMO" ADD PRIMARY KEY ("COL1") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255  STORAGE(INITIAL 16384 NEXT 16384 MINEXTENTS 1 MAXEXTENTS 505 PCTINCREASE 50 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "SYSTEM"  ENABLE
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
ORA-39083: Object type INDEX_STATISTICS failed to create with error:
ORA-01403: no data found
ORA-01403: no data found
Failing sql is:
DECLARE I_N VARCHAR2(60);   I_O VARCHAR2(60);   NV VARCHAR2(1);   c DBMS_METADATA.T_VAR_COLL;   df varchar2(21) := 'YYYY-MM-DD:HH24:MI:SS';  stmt varchar2(300) := ' INSERT INTO "SYS"."IMPDP_STATS" (type,version,flags,c1,c2,c3,c5,n1,n2,n3,n4,n5,n6,n7,n8,n9,n10,n11,n12,d1,cl1) VALUES (''I'',6,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,NULL,:14,:15,NULL,:16,:17)';BEGIN  DELETE
Job "SYSTEM"."SYS_IMPORT_TABLE_01" completed with 2 error(s) at Mon Apr 11 20:12:43 2016 elapsed 0 00:00:03


So the data loads in but the PK won't enable and then the index stats fails as the index fails.

Now without the slightly contrived example and me telling you the obvious thing that was different it may have taken longer to find out what was going on here.

If we look directly at the data we can see the issue

SQL> select * from system.demo;

COL1
------------------------------
A
A
A
A
?
A
?


What oracle has done during load is try and convert the characters to ones that exist in US7ASCII so all the ones that are similar enough to A just become A - but for more unusual varieties of A it just gives up and we end up with ?

The worrying thing here is if there was no PK you wouldn't know about the issue - you have silently changed data loaded into your DB.

Character sets are important! - you need to check your source and destination are compatible and this sort of business is not going to happen.

I suspect in a lot of companies you are dealing with single country data and you may never hit this - as soon as you have multi language going on you hit these sort of issues - especially if the company has grown by acquiring systems from other countries......

Personally we now make everything AL32UTF8 - the docs kind of hint at this but I've heard from various sources that this should be a more forceful message - you have to have a really good reason not to use this character set.

1 comments: