Why does anyone do xml manipulation outside of the database?



If you've ever looked into the XML features available with oracle it's a little mindblowing, there is so much stuff available for working with XML it's very difficult to know where to start sometimes. This week one of my colleagues (Susan) has been doing a small proof of concept for turning some large XML files (60MB each) into 'rows' to be used for data analysis  - this has turned out to be amazingly easy to do - but finding out how easy it was just took a while.

I think it's worth sharing here as i think i knew this functionality was there i'd just forgotten - but it's a very useful technique.

We're going to load the xml file from disk using sqlldr and then select direct from that xml as if it's a normal table - so here goes -the XML file looks something like this (just a sample of a few fields)

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <OfferteOperatori>
    <PURPOSE_CD>BID</PURPOSE_CD>
    <TYPE_CD>REG</TYPE_CD>
    <STATUS_CD>ACC</STATUS_CD>
    <MARKET_CD>MGP</MARKET_CD>
    <UNIT_REFERENCE_NO>UC_DP0012_CNOR</UNIT_REFERENCE_NO>
    <INTERVAL_NO>2</INTERVAL_NO>
    <BID_OFFER_DATE_DT>20150819</BID_OFFER_DATE_DT>
    <TRANSACTION_REFERENCE_NO>994891806986993</TRANSACTION_REFERENCE_NO>
    <QUANTITY_NO>2.731</QUANTITY_NO>
    <AWARDED_QUANTITY_NO>2.731</AWARDED_QUANTITY_NO>
    <ENERGY_PRICE_NO>0.00</ENERGY_PRICE_NO>
    <MERIT_ORDER_NO>532</MERIT_ORDER_NO>
    <PARTIAL_QTY_ACCEPTED_IN>N</PARTIAL_QTY_ACCEPTED_IN>
    <ADJ_QUANTITY_NO>2.731</ADJ_QUANTITY_NO>
    <GRID_SUPPLY_POINT_NO>PSR_CNOR</GRID_SUPPLY_POINT_NO>
    <ZONE_CD>CNOR</ZONE_CD>
    <AWARDED_PRICE_NO>41.65</AWARDED_PRICE_NO>
    <OPERATORE>Bilateralista</OPERATORE>
    <SUBMITTED_DT>20150818113211953</SUBMITTED_DT>
    <BILATERAL_IN>true</BILATERAL_IN>
  </OfferteOperatori>
etc
etc
etc

First we create a simple table to hold the xml that sqlldr will load along with a filename identifier:


CREATE TABLE MGPOffertePubbliche
(
   filename varchar2(120), 
   xmldata XMLTYPE

 XMLTYPE xmldata STORE AS CLOB;

Now we load that in using this controlfile

load data
infile *
replace
into table MGPOffertePubbliche
(
filename  char(100),
XMLDATA  lobfile(CONSTANT "/tmp/OffertePubbliche.xml") terminated by EOF
)
begindata
20150819MGPOffertePubbliche.xml


Here's the output from that

sqlldr user/xxxx control=OffertePubbliche.ctl log=OffertePubbliche.log

SQL*Loader: Release 11.2.0.4.0 - Production on Thu Oct 15 17:14:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 1

So that's loaded in (which is quite neat in itself i think)

Now comes the really clever bit - look at this SQL statement - the special function xmltable and the syntax goes with it allows us to directly pull out xml values!

select t.filename, x.* 
from MGPOFFERTEPUBBLICHE t,
XMLTABLE('/NewDataSet/OfferteOperatori' PASSING t.XMLDATA
COLUMNS 
PURPOSE_CD Varchar2(10) PATH 'PURPOSE_CD',
TYPE_CD Varchar2(10) PATH 'TYPE_CD',
STATUS_CD Varchar2(10) PATH 'STATUS_CD',
MARKET_CD Varchar2(10) PATH 'MARKET_CD',
UNIT_REFERENCE_NO Varchar2(15) PATH 'UNIT_REFERENCE',
BID_OFFER_DATE Varchar2(15) PATH 'BID_OFFER_DATE') x
where t.filename='OffertePubbliche.xml';

Which returns this

FILENAME                                 PURPOSE_CD                     TYPE_CD                        STATUS_CD                      MARKET_CD
---------------------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
UNIT_REFERENCE_NO                             BID_OFFER_DATE
--------------------------------------------- ---------------------------------------------
20150819MGPOffertePubbliche.xml          OFF                            STND                           REJ                            MGP


20150819MGPOffertePubbliche.xml          OFF                            STND                           REJ                            MGP


20150819MGPOffertePubbliche.xml          OFF                            STND                           REJ                            MGP

Seriously neat - that is almost no code - why do people mess about with middle tier code doing xml work when the database makes it so easy?

All it needs is a little knowledge and you can do complicated things very easily!


2 comments:

  1. More importantly 12c supports JSON which is even more practical for modern web applications.

    ReplyDelete
  2. Indeed Balazs - some good stuff has been added in 12, and I assume it will be further enhanced in 12.2.

    JSON is horrible to read though (for humans at least) - it's like XML without any kind of tags to make it understandable.....

    I'm sure people will learn to hate squiggly brackets very soon.......

    ReplyDelete