Removing blob data with expdp



Datapump has vastly improved the number of options available when exporting/importing data from the database. The Query option allows (in a much fuller way than in exp) filtering of which rows you want to unload/load from the database.

If you want to extract certain columns though  you are restricted in what you can do (at least until 12c with VIEWS_AS_TABLES) - all columns have to be extracted and loaded - however you don't have to actually extract the value stored in the column - you can change it to anything you want. This is occasionally useful if you want to mask/change data in certain columns or in the specifc case below to null out the blob columns to reduce the size of the extract. This is all acheived using the REMAP_DATA option.

In the example below i have a table 'ATTACHMENTS' which contains only a few hundred rows but is ~ 300MB in size - all due to the blobs it contains. In my case i don't need those blobs in the other environments i will load the dumpfile into - just the values in the other columns.

So how do i do that?

First up lets describe the table in question

SYS@DB>desc ASCM.ATTACHMENTS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL VARCHAR2(50)
 REFERENCE_ID                                       VARCHAR2(50)
 SOURCE_ID                                 NOT NULL NUMBER
 FILE_NAME                                 NOT NULL VARCHAR2(50)
 FILE_SIZE                                 NOT NULL NUMBER
 FILE_DATA                                 NOT NULL BLOB
 UPLOADED_BY                               NOT NULL VARCHAR2(50)
 AUDITDATETIME                             NOT NULL DATE


So the column using up all the space is the 'FILE_DATA' column (which in this case contains a load of big pdf's)

If we export this normally we get a 300MB dumpfile

expdp / tables=ASCM.ATTACHMENTS

Export: Release 11.2.0.2.0 - Production on Tue Aug 12 10:21:54 2014

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning option
Starting "OPS$ORACLE"."SYS_EXPORT_TABLE_01":  /******** tables=ASCM.ATTACHMENTS
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 288.1 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "ASCM"."ATTACHMENTS"                        274.3 MB     626 rows
Master table "OPS$ORACLE"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for OPS$ORACLE.SYS_EXPORT_TABLE_01 is:
  /oracle/export/DB/expdat.dmp
Job "OPS$ORACLE"."SYS_EXPORT_TABLE_01" successfully completed at 10:22:16


 ls -l /oracle/export/DB/expdat.dmp
-rw-r----- 1 oracle oinstall 287809536 Aug 12 10:22 /oracle/export/DB/expdat.dmp


So how do we 'null out' those blobs? First up we need to create a small package and function (it has to be a function inside a package for some reason - a direct function is not allowed). This function takes in a column value and then returns a transformed column value - here is the simple code to do that

create or replace package clear_blob_pack as
 function clear_blob (old_blob in blob) return blob;
 end;
/

create or replace package body  clear_blob_pack as
 function clear_blob (old_blob in blob) return blob
 is
  b1   blob := empty_blob ();
begin
    return  b1;
 end;
end;
/


So all that happens with it is a blob value is passed in and an empty blob value is returned back.

Now we just need to use the REMAP_DATA option to tell datapump to use this function when extracting the FILE_DATA column from the ATTACHMENTS table.

So lets join the two things together

expdp / tables=ASCM.ATTACHMENTS remap_data=ASCM.ATTACHMENTS.FILE_DATA:clear_blob_pack.clear_blob



The only new part is the remap_data command - here we say when you extract the file_data column from the attachments table in the ASCM schema apply a transfer function called clear_blob in the clear_blob_pack package

 The output of this is then this:

Export: Release 11.2.0.2.0 - Production on Tue Aug 12 10:40:56 2014

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning option
Starting "OPS$ORACLE"."SYS_EXPORT_TABLE_01":  /******** tables=ASCM.ATTACHMENTS remap_data=ASCM.ATTACHMENTS.FILE_DATA:clear_blob_pack.clear_blob
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 288.1 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "ASCM"."ATTACHMENTS"                        63.71 KB     626 rows
Master table "OPS$ORACLE"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for OPS$ORACLE.SYS_EXPORT_TABLE_01 is:
  /oracle/export/DB/expdat.dmp
Job "OPS$ORACLE"."SYS_EXPORT_TABLE_01" successfully completed at 10:41:01


Note the estimated size is of course 'full' size (as there is no way datapump can work out what the function will do to the data) - the actual size of the dumpfile is then much smaller - as can be seen on disk

ls -l /oracle/export/DB/expdat.dmp
-rw-r----- 1 oracle oinstall 192512 Aug 12 10:41 /oracle/export/DB/expdat.dmp



The actual file size is bigger than 63K as there is more than just teh table data in it (there is a load of metadata)


We can then load this as a normal file

impdp /

Import: Release 11.2.0.3.0 - Production on Tue Aug 12 10:46:03 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning option
Master table "OPS$ORACLE"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "OPS$ORACLE"."SYS_IMPORT_FULL_01":  /********
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "ASCM"."ATTACHMENTS"                        63.71 KB     626 rows
Processing object type TABLE_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "OPS$ORACLE"."SYS_IMPORT_FULL_01" completed with 1 error(s) at 10:46:05


And there we have it  - the same table minus the values from the blob columns - which gives a much smaller and faster export/import process.



3 comments:

  1. Hello!
    Can't make it work directly with impdp and network link option enabled (which means no dump, no expdp required).
    Any clue?

    ReplyDelete
  2. Hi,
    Do you get any kind of error at all - which side did you create the package in - source or destination?

    Have you confirmed it works ok for you with expdp->impdp its just direct network_link that it doesn't like?

    There could be some issue with lob locators over db links but if that was the case i think you would get some obvious error message.

    Cheers,
    Rich

    ReplyDelete
  3. From my understanding it cannot work with impdp with network link. I made several testing and I am ending with either an ORA-22992 or an ORA-00932.

    13-JAN-16 11:44:40.875: ORA-31693: Table data object "CI"."TABLE_X_PR_COMMANDE" failed to load/unload and is being skipped due to error:
    ORA-22992: cannot use LOB locators selected from remote tables

    13-JAN-16 16:09:29.368: ORA-31693: Table data object "CI"."TABLE_X_PR_COMMANDE" failed to load/unload and is being skipped due to error:
    ORA-00932: inconsistent datatypes: expected BLOB got NUMBER

    The difference between the two is that I am using a package function returning an empty lob or a number (via LENGTH() for example).

    Oracle support stated that:
    "Functions on remote LOBs returning scalars is not supported.
    Even if SQL and PL/SQL functions having a LOB parameter and returning a scalar data type are supported, other SQL functions and DBMS_LOB APIs are not supported for use with remote LOB columns."

    But the documentation states that:
    "2. Functions on remote LOBs returning scalars. SQL and PL/SQL functions having a LOB parameter and returning a scalar data type are supported."
    http://docs.oracle.com/cd/E11882_01/appdev.112/e18294/adlob_working.htm#ADLOB2010

    Probably some wording issue I guess which ends with no clue to perform anything else than expdp dumpfile + impdp dumpfile...

    ++
    Guillaume

    ReplyDelete