This is a short post to illustrate how the understanding of how an application works can make all the difference to how you an tune it. This is a real world case which is actually pretty simple but i think illustrates the point quite well.
The performance symptoms were as follows:
1) Live environment works fine
2) Copy of that environment is very slow performing a specifc function - the data between the two systems is essentially the same - just santitized for test (volumes and data distribution the same)
When we look into the specifics of the slowness (in this case a trader confirming a trade) we can see that the SQL that is running is actually pretty simple but performs completely differently between the two environments - the SQL is shown below:
with b as
(
select distinct BOOK
from J_BOOK_CACHE_V2
where session_id = 'c0287841-cec3-455b-b0a4-bac81418a055'
)
select TRADE_TNUM
FROM tpow t,
b
where t.AUDIT_ALF <= TO_DATE('03102017','DDMMYYYY' )
AND t.AUDIT_AOHM > TO_DATE('03102017','DDMMYYYY' )
AND t.AUDIT_ZKEY = '5243123'
and t.trade_book = b.book;
Now looking it more detail the execution plan is totally different between the two systems - but why is that - the systems should be copies of each other?
The copy was actually done as an rman duplicate (then sanitised) and the spfile contents were copied exactly - so other than the sanitizing changes everything should be the same (including the stats).
So whats going on here - well a closer look at the stats shows that the table in the with clause at the top has a few hundred rows in production but none in the test environment - why is that?
Well after investigating how the application works it transpires that this J_BOOK_CACHE_V2 table is 'temporary' (well it's content are temporary but it's still stored in a normal heap table). There is only data in there when users are logged in and doing things. On logout the data is cleared down again.
So with this behaviour now in mind we have a simple hypothesis:
1) During the day people are logged in and data exists in the ‘temporary book cache’ table J_BOOK_CACHE_V2.
2) At night everyone is logged off and the table is cleared down automatically by the logging out process.
3) The stats job runs when the table is empty – so oracle thinks any query against this table will return no rows and adjusts any queries involving this table to expect that
4) When the screen is run data then exists In the table (where oracle expects none) and because there are rows the way Oracle decided to run the query is wrong – hence the bad performance.
5) In live there is generally always someone logged when the stats job runs so the stats never say 0 and the plans are good…….
Anyway that was the theory.
So to fix it I ran the following series of steps (see below) – basically I insert some dummy data, gather stats while it’s there, lock those stats so they can’t be changed then delete the temporary data.
insert into J_BOOK_CACHE_V2 select distinct 'cac6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;
insert into J_BOOK_CACHE_V2 select distinct 'fhj6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;
insert into J_BOOK_CACHE_V2 select distinct 'jop6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;
commit;
exec dbms_stats.gather_table_stats('ZAINET','J_BOOK_CACHE_V2',method_opt=>'for all indexed columns');
exec dbms_stats.lock_table_stats('ZAINET','J_BOOK_CACHE_V2');
delete from J_BOOK_CACHE_V2 where session_id like '%RICH';
commit;
Now when I re-run my test case from SQL performance is the same as live (i.e. very quick)
Strangely enough this hadn't been a problem in test environments before and I can only assume that there was some data that had not been cleaned up properly by the logout process and this quirk was causing the stats to have some value and things to work OK.
Now this was a very easy fix and a simple example but the principle is very useful - to best tune your database you need to understand how the application works. Tuning just based on database/SQL tuning skills can get you a long way but to get the best performing database application/data knowledge is king.
Comments
Post a Comment