When you drop a schema by mistake......



Now everyone makes mistakes, some are easier to fix than others of course.

This week i was overcome with an urge to tidy things up in our cloud control repository as we've had a few accounts where people have moved on or the functionality is not used any more.

So i merrily clicked through the screens dropping users, a couple of minutes after i did that i suddenly got that feeling (the one where your brain suddenly wakes up and you realize what you did - otherwise known as the 'oh shit' moment).

So in my case i had a schema that had been dropped from the repository database - i didn't need the account any more in the cloud control tool but i did need the content of what was in it for other things.

So how do i fix this....

I do not want to restore the database back to 15 minutes ago as it will mess up cloud control. I could do some clever stuff with flashback if i take an outage on cloud control for a few minutes - however flashback isn't on anyway so i can't suspend cloud control, flashback, extract the user and then flash forward again (that's something to note for later - always switch on flashback!)
I have no separate 'logical' backups of the schema...

So what options do i have?

I could maybe do something with TSPITR - however this schema was in amongst other stuff (again something to note for later...) - i could possibly interrupt the process at a specific point and complete the last part manually - but it's too risky an approach

So what i decided to do was restore the database (as of 15 mins ago) to a separate server, once it is restored i can extract the schema from there with datapump and load it back into the original database.

So here is how i did that.

Now the first issue i have is that all of my backups are on tape (we have tsm configured to write directly to a VTL) so i need to enable the 'donor' server to be able to pretend to be the actual server from a tape access point of view.

Fortunately this is very simple - i just need to copy the /tsm/ORACLE_SID directory completely from the 'live' server to the 'donor' one - this may be different for your environment - it depends how you set up tsm (or whatever other software you are using).

Once that is set up i need to create an instance of oracle (i.e. just the memory/processes up and running - no control files or any other components). For this i copy the init file from live onto the donor server - make any small amendments needed in config and then start up the database in nomount mode (which just creates the instance).

(I just realized i glossed over the fact you of course need the same oracle software installed on the donor server - i already did - but it's easy enough to use the clone.pl utility to do that for you.)

So assuming now you have an instance up and running we now use the rman duplicate feature to do the copy for us

so we start up rman like this

rman rcvcat=rmanschema/rmanschemapassword@rmancatalog auxiliary=/

where auxiliary is our instance in nomount mode

Once connected we run

run{
     set dbid 12345678;
    set until time "to_date('19-DEC-2014 14:45:00','dd-mon-yyyy hh24:mi:ss')";
     allocate auxiliary channel ch1 type'SBT_TAPE' parms'ENV=(TDPO_OPTFILE=/tsm/DBNAME/conf/tdpo.opt)';
     duplicate database 'DBNAME' to DBNAME;

Couple of points on this - dbid is essential - it't the only way we are able to tell which database we want to duplicate from as we have no info from the controlfile. the DBNAME at the end - the first one has to match 'live' - the second one does not - but the second name has to match the instance name of the auxiliary we just created - in my case i kept them the same name.

So i run it and get this

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 12/19/2014 15:08:15
RMAN-05501: aborting duplication of target database
RMAN-06457: UNTIL SCN (9060086030062) is ahead of last SCN in archived logs (906

which is a new one on me - however the fix is quite clear - the archive logs i need are not backed up yet (as the time i chose for duplicate is only 15 minutes ago)

So i switch logfile and do a backup and then try again - and get this

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 12/19/2014 15:12:22
RMAN-05501: aborting duplication of target database
RMAN-06617: UNTIL TIME (19-DEC-14) is ahead of last NEXT TIME in archived logs (19-DEC-14)

different error but pretty much the same problem - so this time i archive log the current logfile, do some more switches just for good luck - then do another backup on live and try again......

This time it's ore successful and it spews out loads of log info (too much to paste here) - until i hit this:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 12/19/2014 16:40:07
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
ORA-19870: error while restoring backup piece redolog_DB_866518690_20141217_167032_2.rman
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 356509184 bytes disk space from 4294967296 limit

So I've run out of space in the recovery area - so i make it massively bigger and try again....

And this time it runs through - ending with

Executing: alter database force logging

contents of Memory Script:
{
   Alter clone database open resetlogs;
}
executing Memory Script

database opened
Finished Duplicate Db at 20-DEC-14
released channel: ch1

So all good - however it took 12 hours - most of which was jumping around through tapes to bring the archives back (3 days worth) - this is something to be very aware of with large environments - your archive log backups are likely spread over lots of tapes and a lot of time is wasted jumping around and waiting for busy tapes to become free. A definite case for more regular incrementals or some better way of keeping archivelogs co-located.

Anyway i now have my copy db up and running and i can simple do my datapump

expdp user/pass directory=tmp schemas=oneideleted

that runs through fine - i then copy the dumpfile over to the 'live' server and load it in

impdp user/pass directory=tmp

And the issue is fixed - though it took a little longer than i expected.

A good reminder to triple check before you tidy up, but some useful things discovered that need to be addressed in the way the environment is set up - every cloud eh....?



1 comments:

  1. I had the exact same incident...luckily I had a data guard standby with flashback switched on to use and then rebuilt once recovered the schema into the primary DB.

    ReplyDelete