Zipping datapump in 12c



12c has introduced some new compression options for datapump and i wanted to do a quick (fairly scientific) comparison of the various options to see how they compared.

I did some tests with a single table from the HPQC application whose structure is this

SQL> desc QUALITY_R000000006_DB.AUDIT_LOG
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 AU_ACTION_ID                              NOT NULL NUMBER(10)
 AU_FATHER_ID                              NOT NULL NUMBER(10)
 AU_USER                                   NOT NULL VARCHAR2(2000)
 AU_SESSION_ID                             NOT NULL NUMBER(10)
 AU_TIME                                   NOT NULL DATE
 AU_ACTION                                 NOT NULL VARCHAR2(200)
 AU_ENTITY_TYPE                            NOT NULL VARCHAR2(200)
 AU_ENTITY_ID                                       VARCHAR2(200)
 AU_DESCRIPTION                                     CLOB

The table contains 1.2 million rows and is about 400MB (so pretty small).

I then did a series of tests with the various compression options to check the speed and the file size generated. I then just out of interest repeated this with exp and manual compression (both using a pipe and zipping after the process finished.

The results are summarized here


 Command
 Elapsed
 File Size
 datapump no compression
 40s
 353M
 datapump basic compression
 37s
 32.23M
datapump low compression
75s  46.24M
 datapump medium compression 34s 25.63M
 datapump high compression 42s 18.93M
gzip dumpfile from datapump no compression 20s 21.6
 standard exp 120s 798M
exp with gzip 120s + 10s for zip 36M
exp with gzip (using inline pipe)
 180
39M

So what can even this most basic of tests tell us?

Well with the small data set some of the results may be a little unrepresentative but we can make the following sweeping statements based on this simple test....

1) expdp is much faster and create smaller files than exp (speed is no real surprise but the dumpfile being half the size is interesting)
2) datapump compression did not seem to make much difference to the overall speed
3) 'LOW' compression seems really bad (slow)for some reason, even when the test was re-run
4) 'HIGH' creates the smallest file (even better than the best gzip compression)

The amount and speed of compression is very variable depending on the datatypes and data that is being worked with - so the results above will likely not translate to 'your' environment but at least it gives a general feel for how relatively 'good' things are.

And remember datapump compression is not free - you need an advanced compression licence......


2 comments:

  1. The result for standard exp doesn't look quite right :-)

    ReplyDelete
    Replies
    1. Thanks - the perils of cut and paste - now corrected to 798M!

      Delete