Graphing database growth just from export logfiles?

This week we've been trying to collect some information to let us show the increase in data growth (and we're just talking row counts here) in one of out main OLTP databases. We're interested over a period of about a year how those tables have grown.

The problem is we having set up anything in particular to capture this information and the data sources that are available to us (optimizer stats history, cloud control repository) don't tell us what we want to know. The stats don't go back far enough as the AWR is being purged (i was recently at a user group meeting where it was discussed that keeping an AWR history of 13 months is probably a very sensible thing to do for all sorts of reasons - the space required for this relative to the overall db size and the information it could provide seems to be well worth it) and cloud control doesn't have the granularity of information we need (and also that is purged after a certain time interval i think - some months from what i remember with grid control - though I'm sure that was configurable via some plsql routine).

So what to do, where can we magic up this data from?

Well lucky for us we take full exports every week to be used to populate test environments - and we keep the logfiles from this - so we have the data we need but the format of it is crap.....

So how to get that logfile data into a more usable format?

here's what i did......

First up i needed the data inside the db rather than in flat files, to do this i built an external table routine with a preprocessor to parse the logfile before i loaded in and only extract the information about the rows being extracted from the log

The first step in building the preprocessor is to build a shell script that takes the content of the datapump logfile as an input and strips out the bits i want.

The final script i ended up with is this

/tmp/ (must have execute permission set on the file)

which just contains a single line

/bin/grep ' GB\| KB\| MB'  $1 | /bin/grep -v estimation | /bin/sed 's/. . exported //g'

This is just looking for the string GB or KB or MB, then removing any lines with estimation in and then doing a sed command to replace the string . . exported with nothing (so removing the prefix at the start of the line)

I can then say

/tmp/ logfile_name

and the output to screen will be the 'file' that the external table loader looks at.

Now we have the script that formats the file in the way we want it - we now need to wrap an external table creation script around that - this is what i ended up with (this is all dependant on everything being in /tmp and having an oracle directory object called tmp looking at it)

                    ( tabname varchar2(256),
                      space_user number,
                      size_flag varchar2(256),
                      num_rows number
       ( records delimited by newline PREPROCESSOR TMP:''          
badfile      TMP:'testcsv.bad'
           logfile     TMP:'testcsv.log'
           discardfile TMP:'testcsv.dsc'
     fields  terminated by whitespace
         missing field values are null
                      tabname char(256),
                      space_user ,
                      size_flag char(256),
     LOCATION (TMP:'datapump.log')

Now when i select * from log_reader i see the content of the datapump log (the bits the shell script throws out) as a table.

I now need to store the output from this table somewhere else along with the date this export log was created - to do this i create a new table and then insert some data into it.

SQL> create table growth_report (log_date date,TABNAME VARCHAR2(256),SPACE_USER NUMBER,SIZE_FLAG VARCHAR2(256),NUM_ROWS NUMBER);

Table created.

insert into growth_report
select to_date('24-JAN-2014','DD-MON-YYYY'),TABNAME,SPACE_USER,SIZE_FLAG,NUM_ROWS from log_reader;

Ok so far so good - now i need to do the same thing for all the other logs i have (i used 10 in total for this initial run). I could have done this in a more clever way but i just ended up manually doing the 10 files - i.e. just switch the file datapump.log with the next logfile and then do an insert statement changing the date manually)

So now i have all 10 logfiles worth in the database and i just need to write some SQL to display it.

Hold on i need to pivot the data though which can be a little tricky sometimes - lets look up the new pivot syntax and see if that helps us....

Well the short answer is it did help and the actual code to produce the output i need is very short, the syntax still seems a little odd - i guess it similar to analytic queries in that you just have to get used to it - it's maybe not the syntax I'd have come up with if I'd developed it but it gets the job done.

 So anyway here's what i came up with - this creates one row per table with the rows per month shown in a separate column

FROM   (SELECT log_date, tabname,num_rows
        FROM   growth_report)
PIVOT  (max(num_rows) AS totrows FOR (log_date) IN ('23-JUN-13','30-JUN-13','02-JUL-13','23-AUG-2013','15-SEP-13','18-OCT-13','01-NOV-13','15-NOV-13','13-DEC-13','24-JAN-14'))
ORDER BY tabname;

 The output is shown above - this can then easily be exported straight to excel (at least in can from plsql developer my tool of choice).

Once in Excel we can plot nice graphs to see the growth over time.

Ideally we'd have scripted some regular task to save all this info to our own repository or even better kept an AWR history for longer - but this has at least given us something to use and made use of some useful tricks along the way.

I've also today found how to add a spell checker in to firefox so hopefully my appalling spelling over the past few months will now imporve....


Post a Comment