Tipping point

Today I've been looking at why some SQL is slower in one database than another. There can of course be myriad of reasons for this. Everything from you didn't gather stats to you did gather stats and everything conceivable in between.

I personally find it harder and harder to work out what the optimizer is doing now, more and more intelligence gets put into it with every release which makes it all the more difficult to 'get inside its head' (for want of a better term).

The way it's going it will soon become self aware......

Anyway I'm not going to completely explain all the maths that's going on here, i don't know the calculations it does and if i tried I'm sure i'd just get it wrong anyway. I'll talk in more general terms about the high level operations it does (i'm talking very high level here - like value x is bigger than y ).

OK enough preamble already - here is the SQL in question

SELECT  bc.bucket_container_name AS container , bb.bucket_name AS bucket , bb.created_timestamp AS created , bb.last_edit_timestamp AS last_edit , bc.workbench_operation_name AS WB_Operation 
FROM curve_data_store.binary_bucket bb , curve_data_store.bucket_container bc 
WHERE bb.bucket_container_id = bc.bucket_container_id AND LOWER(bc.bucket_container_name ) LIKE LOWER('%mtp%')

So a simple join of two tables and a single predicate to filter the data. We can ignore all the lower() business here in this case it has no influence on what is happening.

Between the two databases everything is the same (version, parameters etc). The only differences are data volume and distribution and therefore statistics.

The basic issue is that in the fast system it does a nested loop to join the two tables and in the slow system it does a hash join.

So this is the 'good' plan

| Id  | Operation                    | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |                              |  3487K|   522M|   394K  (1)| 01:18:57 |
|   1 |  NESTED LOOPS                |                              |  3487K|   522M|   394K  (1)| 01:18:57 |
|   2 |   NESTED LOOPS               |                              |  3608K|   522M|   394K  (1)| 01:18:57 |
|*  3 |    TABLE ACCESS FULL         | BUCKET_CONTAINER             | 25234 |  2143K|  1755   (1)| 00:00:22 |
|*  4 |    INDEX RANGE SCAN          | BINARY_BUCKET_BC_CONT_ID_IDX |   143 |       |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| BINARY_BUCKET                |   138 |  9660 |    16   (0)| 00:00:01 |

And this is the 'bad' one

| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |                  |  2672K|   377M| 54180   (1)| 00:10:51 |
|*  1 |  HASH JOIN         |                  |  2672K|   377M| 54180   (1)| 00:10:51 |
|*  2 |   TABLE ACCESS FULL| BUCKET_CONTAINER |  4505 |   395K|   239   (1)| 00:00:03 |
|   3 |   TABLE ACCESS FULL| BINARY_BUCKET    |    22M|  1232M| 53836   (1)| 00:10:47 |

The reason for the difference is the data profile and how that is reflected in stats, i was interested to see what was the reason for it in this case and where the tipping point was.

Now in my very basic optimizer maths (note this is my 'interpretation - it is not meant to be correct before someone jumps down on me from a great height) a nested loop is driven by the number of cost of getting stuff from table A plus the cost of getting stuff from table B multiplied by the number of rows from A.

If we think we are getting loads of rows from A then the cost massively goes up and quite quickly the cost of a nested loop becomes much more than a simple hash join.

What i was interested to see in this case was where was the tipping point - how many rows did oracle need from table A which would cause the switch between the two versions of the plans. Had we just gone over this or was there something else in play here?

So how to find this out - i could start digging through 10053 trace files but i never have the time (or the inclination) most of the time to start doing that - so i can use a little trick with a hint.

I can do this

SELECT /*+ CARDINALITY(bc 32000) */ bc.bucket_container_name AS container , bb.bucket_name AS bucket , bb.created_timestamp AS created , bb.last_edit_timestamp AS last_edit , bc.workbench_operation_name AS WB_Operation FROM curve_data_store.binary_bucket bb , curve_data_store.bucket_container bc WHERE bb.bucket_container_id = bc.bucket_container_id AND LOWER(bc.bucket_container_name ) LIKE LOWER('%mtp%')

This tells oracle to basically ignore how many rows it things it will get from the table and use the value i provide instead. With some trial and error i can get the plan to flip - the tipping point here seems to be about 32000 rows - anything less than that is nested loop - anything more is hash join.

In the example below we see the output from the plan with the cardinality hint at 32000

| Id  | Operation          | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |                  |  4422K|   662M|       |   486K  (1)| 01:37:18 |
|*  1 |  HASH JOIN         |                  |  4422K|   662M|  3096K|   486K  (1)| 01:37:18 |
|*  2 |   TABLE ACCESS FULL| BUCKET_CONTAINER | 32000 |  2718K|       |  1755   (1)| 00:00:22 |
|   3 |   TABLE ACCESS FULL| BINARY_BUCKET    |    67M|  4500M|       |   222K  (1)| 00:44:31 |

(note the rows explicitly mentioned as 32000 here)

The actual value from the stats is 25234 rows - which is 5% of the row count of the table - the best the optimizer can do with such a crap predicate of %val%

So as data volumes change this value is shuffling around and we could get the case we flip between plans - we have identified here the tipping point.

There is obviously more to it than the simple way i expressed it here and actually when i dug deeper there is a massive difference in the clustering factor of the main index on the table between the two databases which makes the cost of access of the second table much more attractive via an index so also favours the nested loop. We can therefore have a much higher % of the rows being returned in 'fast' db from table A and still have nested loops as the cost to do stuff from table B is far less

I think a 'reload' of the data in the slow database so that the data is more ordered to improve the clustering factor would fix this slowness and is one solution.

There could be many others - hints, stats fixing, parameter changes - the list goes on.

Interesting to see though this tipping point in action - you could see how a single row change is enough to send it one way or the other. Sometimes its useful to know how close you are to that.

I don;t really know what point i'm making here by the way - just though it was quite interesting......


  1. If you run dbms_xplan.display_cursor you can see what the optimizer expects at each line and what it gets - if these 2 are in reasonable agreement then the optimizer has a good idea of what is going on - if they don't then just about anything can happen - the trick is working out why they are out of sync and correcting it

  2. Indeed. That's where the serious thinking comes in.... :-)