More datapump master table analysis



We're currently looking at migrating an old 9i system to 12c (via 10g) and the datapump part is taking longer than we'd like. I'm doing a bit of analysis to see where the time is spent and have built a few queries i thought I'd share as they may be useful for others. In this case i kept the master table with KEEP_MASTER=Y and the job was called SYS_IMPORT_FULL_03

First one just shows an overall time for each type of objects that are being loaded

select object_type,
       (completion_time - started) * (24 * 60 * 60) as elapsed_seconds
  from (with datapump_data as (select completion_time, object_type
                                 from sys.sys_import_full_03
                                where process_order = -6
                                  and object_type <> 'TABLE_DATA'
                               union
                               select max(completion_time), 'TABLE_DATA'
                                 from sys.sys_import_full_03 x
                                where object_type = 'TABLE_DATA'
                                  and process_order > 0)
         select nvl((lag(completion_time, 1) over(order by completion_time)),
                    completion_time) as started,
                completion_time,
                object_type
           from datapump_data x
          order by started)
          order by elapsed_seconds desc

Which shows something like this


As you might expect TABLE_DATA takes the longest followed by INDEX - you might say that's not useful as it doesn't break it down further

so lets do that

select object_schema,
       object_name,
       (completion_time - start_time) * (24 * 60 * 60) as elapsed_time,
       object_type,
       process_order,
       base_object_name,
       base_object_schema
  from sys.sys_import_full_03 x
 where object_type = 'TABLE_DATA'
   and process_order > 0
 order by elapsed_time desc

Which shows us this


More useful w can see now how long each table took - in this case we can see one table took way longer than all the others and in fact matches the total run time of the TABLE_DATA section - this is because i used parallel and a single slave was left to do this table while all the other slaves finished all the other tables in less time that it took for this one.

A definite candidate then for closer analysis.

How about indexes though?

select object_schema,
       object_name,
       (completion_time - started) * (24 * 60 * 50) as elapsed
  from (select object_schema,
               object_name,
               nvl((lag(completion_time, 1) over(order by completion_time)),
                   completion_time) as started,
               completion_time
          from sys.sys_import_full_03 x
         where object_type = 'INDEX'
           and process_order > 0)
 order by elapsed desc


So we can see that the indexes on this table took the most time  - as to be expected

How about stats then?

select base_object_schema,
       base_object_name,
       (completion_time - started) * (24 * 60 * 50) as elapsed
  from (select base_object_name,
               base_object_schema,
               nvl((lag(completion_time, 1) over(order by completion_time)),
                   completion_time) as started,
               completion_time
          from sys.sys_import_full_03 x
         where object_type = 'TABLE_STATISTICS'
           and process_order > 0)
 order by elapsed desc


Again the usual suspects i suspect maybe histograms mean that there is a lot more data for the big table as a table being big doesn't necessarily mean a lot of stats information.

So some useful stuff there.

next steps is for us to try partitioning the big table which will enable multiple slaves to work on it and truncate the second biggest table as the data is not required. Then we'll rerun the import job and see how much time we saved.

1 comments: