Once again I've been looking into compression to save us disk space across out huge number of test environments. I'd been looking at one table in particular (and it's 4 indexes) which take up about 35% of each of out test environments.
The table is pretty large ~600 million rows, it contains 'normal' datatypes (so no lobs or user defined objects)
The objects and there sizes are shown below
RESET (the table) 48.2119141
EON_RESET_IDX 16.0449829
RESET_INDEX 14.1956787
RESET_INDEX2 14.7709351
RESET_INDEX3 12.7185669
So in total you can see that adds up to 104GB
Advanced compression done on the table brought that down to 18GB
then i moved on to the indexes and this is where it got very confusing....
I did a rebuild of all the indexes just adding the keyword 'compress' to each one
i.e.
alter index index_name rebuild compress;
This took a while, once it had finished a then checked the index sizes
EON_RESET_IDX 11.1334229
RESET_INDEX2 18.0892334
RESET_INDEX 8.06481934
RESET_INDEX3 6.37780762
At first glance that looks ok, but hold on whats going on with RESET_INDEX2 - that's got bigger!
That makes no sense (at least it didn't to start with). perhaps the data is not that repetitive so doesn't compress well. Lets have a look at the actual data and see whats going on.
The data looked normal enough, 2 of the columns are just 1's or 0's so i could see that that would really compress well.
Then i noticed that one of the other indexes (that did get smaller) had practically the same definition as the one that got bigger - they only differed by one column. Looking at the data however and the 'different' column contain practically the same data in both cases. This now makes even less sense.....
Then i noticed something else, the 'smaller' index was defined as unique but the bigger one was not...
Hmm what's going on here - is there some fundamental difference in how compression works for unique vs non unique index data. Does a unique index somehow allow compression of the rowid somehow?
At this point i posted a question on the technet and Jonathan pointed me to a very useful link on his site
http://allthingsoracle.com/compression-in-oracle-part-4-basic-index-compression/
Which also linked me to Richards site where i found this link
http://richardfoote.wordpress.com/2008/02/22/index-compression-part-iii-225/
Here i found the key statement that turned the light bulb on
"If the Unique index has multiple columns, the default prefix length value (number of compressed columns) is the number of indexed columns minus one, not all columns as it is for a Non-Unique index. See, Oracle is doing its best here to prevent a useless attempt at index compression."
So the difference is for my 4 column indexes when i say this
create index index_name on (col1,col2,col3,col4) compress;
it tries to compress all the first 4 columns
when i say
create unique index index_name on (col1,col2,col3,col4) compress;
oracle then realises there is no point trying to compress all 4 columns - they are unique - key compression won;t work, what I'm going to do instead is compress all but the last column - so the statement becomes
create unique index index_name on (col1,col2,col3,col4) compress 3;
Indeed if i now try
create index index_name on (col1,col2,col3,col4) compress 3;
the non-unique index ends up being the same 'small' size.
This now all makes perfect sense - but it's important to note the difference in the way compressions works differently between indexes and tables. Jonathan and Richard explain it in much more detail than i could and it's a useful read. The key thing to take away from this though is dont just blindly use compress for indexes - know your data and choose the correct number of key columns so that compression is actually useful
Very nice explanation on index compression
ReplyDelete