I don't really like posting blog entries about Oracle statistics as so many other people do it better than i ever could (Jonathan, Randolf etc) - well that and i'm likely to get it wrong...
However i've been looking in to something today and i think it's worth sharing the results as it's quite interesting.
We have a large datawarehouse that's having stats issues (par for the course pretty much), i won't go in to why the stats gathering policy is as we have in place at the moment but suffice to say the stats jobs are manually called as part of the ETL processing but they run with the 'GATHER AUTO' and 'GATHER STALE' options - this in itself isn't generally an issue in most cases but we have some tables where a small % data change is significant in terms of data distribution (i.e. we are essentially adding new 'out of range' data to some of the tables) - this gives the optimizer a hard time and it resorts to some guesses - and of course that usually means it's wrong at least some of the time.
Changing the process is possible of course but the way it's built into the core of the system makes this a non trivial task - so what else can we do here without having to touch the code?
In the case of at least one table we pretty much want it to gather stats every day as part of the ETL process - the current hardcoded job will only do this if the table is considered 'stale' so what can we do about that?
The default 'staleness' is 10% so i wondered can we make this much smaller (one of my colleagues suggested 1% and indeed some posts i read seemed to state this as a minimum) - i wanted to go smaller so i just tried the plsql to reduce it to less than 1
begin
dbms_stats.set_table_prefs(USER, 'DEMO','STALE_PERCENT', '0.0001');
end;
/
And it quite happily accepted it - but that doesn't mean it will work of course - so i went about a test.
Lets first create a demo user, log on as that and create a table with a million rows (to make the maths easy) - i tried using the connect by row generator trick but it didn't seem to like large values so i reverted to plsql
SQL> create user demo identified by demo;
User created.
SQL> grant dba to demo;
Grant succeeded.
SQL> conn demo/demo
Connected.
SQL> create table demo (col1 number);
Table created.
SQL>
insert into demo
SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= 10000001
SQL> insert into demo
SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= 10000001 2 3 4
5 /
insert into demo
*
ERROR at line 1:
ORA-30009: Not enough memory for CONNECT BY operation
So revert to plsql here to generate the rows
Elapsed: 00:00:07.57
SQL>
1 declare
2 i number;
3 begin
4 for i in 1..1000000
5 loop
6 insert into demo values(i);
7 end loop;
8* end;
SQL> /
PL/SQL procedure successfully completed.
Elapsed: 00:00:28.68
SQL> select count(*) from demo;
COUNT(*)
----------
1000000
Elapsed: 00:00:00.02
SQL> commit;
So now we have a single column table called demo with a million rows in it.
Lets gather stats and that and check the figures
SQL> exec dbms_stats.gather_table_stats(USER,'DEMO');
PL/SQL procedure successfully completed.
SQL> select table_name,num_rows,blocks from user_tables;
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
DEMO 1000000 7048
All as we would expect - lets now show the current staleness state
SQL> select TABLE_NAME,STALE_STATS from user_tab_statistics;
TABLE_NAME STALE_STATS
------------------------------ ------------
DEMO NO
And everything is 'fresh'
Now lets update 10% of the rows
SQL> update demo set col1=col1 where col1 < 100001
2 /
100000 rows updated.
SQL> commit;
Commit complete.
Is it now 'stale'?
SQL> select TABLE_NAME,STALE_STATS from user_tab_statistics;
TABLE_NAME STALE_STATS
------------------------------ ------------
DEMO NO
Hmm no - and the reason is that monitoring info is held in some buffer and only flushed to disk every so often (not sure of the schedule)
We can force it though - so lets do that
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
PL/SQL procedure successfully completed.
SQL> select TABLE_NAME,STALE_STATS from user_tab_statistics;
TABLE_NAME STALE_STATS
------------------------------ ------------
DEMO YES
OK - so that works nicely, but at this point i then started to think there was a fatal flaw in the plan - how am i going to make the flush job run when i want it to? Anyway i ignored that for now and carried on with the test.
Now i set the stale_percent to one thousandth of a percent (1 row in this case)
begin
dbms_stats.set_table_prefs(USER, 'DEMO','STALE_PERCENT', '0.0001');
end;
/
So now lets try a test (i update a few rows more than just one to account for any rounding or something that might be going on)
SQL> update demo set col1=col1 where col1 <5;
4 rows updated.
I now flush the stats and check
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
PL/SQL procedure successfully completed.
SQL> select TABLE_NAME,STALE_STATS from user_tab_statistics;
TABLE_NAME STALE_STATS
------------------------------ ------------
DEMO YES
Brilliant - so the new setting is working - changing just 4 rows made the whole million row table 'stale'.
Great apart from one thing - i had to manually flush the monitoring info - how was i going to deal with that?
Well some more reading revealed that supposedly dbms_stats calls this automatically when run for 'GATHER AUTO' of 'GATHER STALE' - which is just what i want - so lets try that.
I quickly change the data format so i can see the timestamp easily on when stats were gathered
SQL> alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
Session altered.
I gather the stats now to remove any staleness
SQL> exec dbms_stats.gather_table_stats(USER,'DEMO');
PL/SQL procedure successfully completed.
Then i change a few rows
SQL> update demo set col1=col1 where col1 <5;
4 rows updated.
SQL> commit;
Commit complete.
Confirm the current timestamp of when the stats were gathered
SQL> select table_name,last_analyzed from user_tables;
TABLE_NAME LAST_ANALYZED
------------------------------ -----------------------------
DEMO 25-jun-2015 22:31:27
Now we run the stats job that our ETL batch would run in this case - calling gather_schema_stats with a filter for the single table and using the 'GATHER AUTO' option.
SQL> DECLARE
2 filter_lst DBMS_STATS.OBJECTTAB := DBMS_STATS.OBJECTTAB();
3 BEGIN
4 filter_lst.extend(1);
filter_lst(1).ownname := 'DEMO';
5 6 filter_lst(1).objname := 'DEMO';
7 DBMS_STATS.GATHER_SCHEMA_STATS(NULL, obj_filter_list => filter_lst, options => 'GATHER AUTO');
8 END;
9 /
PL/SQL procedure successfully completed.
That ran OK - so lets check...... (high tension at this point.....)
SQL> select table_name,last_analyzed from user_tables;
TABLE_NAME LAST_ANALYZED
------------------------------ -----------------------------
DEMO 25-jun-2015 22:32:10
And it's worked! So we have a possible short term solution - but longer term we need to be more explicit in the ETL about exactly what to gather and not rely on % of change as a reason for gathering new stats.
Apologies for the picture by the way it was the closest to 'stale bread' i could find.......
Comments
Post a Comment