Friday night ORA-1555 fear fest



Now this is the kind of call you don't want to get on a Friday night.

"The app isn't displaying any updated data on screen for the past x hours"

Time to put down the beer, take the slippers off and power up the laptop.

After I eventually logged on and we dug a little further we discovered that the application screen was basically querying an mview directly to display data for the past day, this mview is an aggregate on top of the actual raw data table which is refresh using the 'on commit' method - so it always reflects the latest view of the world - though in this case wasn't.

So the problem was clear - the mview wasn't refreshing - but why?

Lets try a manual refresh and see what happens - maybe something has gone awry with the on commit process.

BEGIN dbms_mview.refresh('SCHEMA.MVIEW','?'); END;

*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-01555: snapshot too old: rollback segment number 10 with name
"_SYSSMU10_1071639334$" too small
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2566
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2779
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2748
ORA-06512: at line 1

Hmm - wasn't really expecting that , lets repeat with the 'full' refresh option (though the ? should try that anyway i think - but you never know)

BEGIN dbms_mview.refresh('SCHEMA.MVIEW','C'); END;

*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-01555: snapshot too old: rollback segment number 10 with name
"_SYSSMU10_1071639334$" too small
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2566
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2779
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2748
ORA-06512: at line 1

Right - so what happening here - we shouldn't be getting a ORA-1555 error here - there is not really anything else happening ;selects from v$transaction and v_fast_start_transaction show nothing.

Uh oh - this is starting to feel like some horrible kind of undo corruption......

If i dig out the underlying SQL from the mview (which is very simple) and run that i get the same snapshot too old error - so at least we ruled out a lot of the mview parts - it narrows down the problem but doesn't help other than that.

select  max(col) col2,col3 from schema.table x
group by col2,col3 ;

ORA-01555: snapshot too old: rollback segment number 10 with name
"_SYSSMU10_1071639334$" too small

After a bit of googling and trying to remember of we had anything similar before i decided to see which particular rows were causing the issue - the first step i did for that was just to do a CTAS of the table ( i wasn't really sure how i was going to narrow down the rows by doing this looking back now :-) ).

In fact when i ran the CTAS it completed fine - No ORA-1555 - great i though must have just fixed itself - lets try the mview refresh again.

And it failed again!

This is getting weird now......

Then i though about it logically - the CTAS must be reading the data differently in some way, thats always going to be doing a FTS of the table - the mview refresh must be doing something involving indexes. Lets try running the underlying SQL with a FULL hint - like this

select /*+ FULL(x) */ max(col) col2,col3 from schema.table x
group by col2,col3 ;


Guess what - no error!

Aha - so the problem is with an index - whats the explain plan for this select? Using sqldev to make it easy for myself and save remembering syntax i find the plan and discover it's doing 2 index fast full scans and hashing those together to make the result. One or both of those must be corrupt - so lets do a rebuild of them

alter index schema.ind1 rebuild online;
alter index schema.ind2 rebuild online;

After that completes i try the refresh again and.........

It works!

Hooray.

This is probably the strangest cause of an ORA-1555 I've had and it was not at all obvious - it was almost by luck i discovered the real issue. I can only think that the corrupted index entry was pointing at an expired undo segment and therefore throwing the error - perhaps a row was deleted from the table but left in the index?

Anyway this was 11.2.0.2 and we are hopefully soon moving to 12.1.0.2 where any bug should have been long since fixed.


3 comments:

  1. We had such corruption after failing over standby and making it primary .
    ora-1555 when accessed plan used index access .
    G

    ReplyDelete
  2. Hi,
    Actually we had done a switchover earlier in the week - so its likely the same issue as you mention. Was there a bug fix/workaround to prevent it happening again?

    Cheers,
    Rich

    ReplyDelete
  3. Nothing much to add sorry , there are few bugs in MOS when You search for ora-1555 index .
    Rebuild was the solution in our case .
    GG

    ReplyDelete