If you've ever looked into the XML features available with oracle it's a little mindblowing, there is so much stuff available for working with XML it's very difficult to know where to start sometimes. This week one of my colleagues (Susan) has been doing a small proof of concept for turning some large XML files (60MB each) into 'rows' to be used for data analysis - this has turned out to be amazingly easy to do - but finding out how easy it was just took a while.
I think it's worth sharing here as i think i knew this functionality was there i'd just forgotten - but it's a very useful technique.
We're going to load the xml file from disk using sqlldr and then select direct from that xml as if it's a normal table - so here goes -the XML file looks something like this (just a sample of a few fields)
<?xml version="1.0" standalone="yes"?>
<NewDataSet>
<OfferteOperatori>
<PURPOSE_CD>BID</PURPOSE_CD>
<TYPE_CD>REG</TYPE_CD>
<STATUS_CD>ACC</STATUS_CD>
<MARKET_CD>MGP</MARKET_CD>
<UNIT_REFERENCE_NO>UC_DP0012_CNOR</UNIT_REFERENCE_NO>
<INTERVAL_NO>2</INTERVAL_NO>
<BID_OFFER_DATE_DT>20150819</BID_OFFER_DATE_DT>
<TRANSACTION_REFERENCE_NO>994891806986993</TRANSACTION_REFERENCE_NO>
<QUANTITY_NO>2.731</QUANTITY_NO>
<AWARDED_QUANTITY_NO>2.731</AWARDED_QUANTITY_NO>
<ENERGY_PRICE_NO>0.00</ENERGY_PRICE_NO>
<MERIT_ORDER_NO>532</MERIT_ORDER_NO>
<PARTIAL_QTY_ACCEPTED_IN>N</PARTIAL_QTY_ACCEPTED_IN>
<ADJ_QUANTITY_NO>2.731</ADJ_QUANTITY_NO>
<GRID_SUPPLY_POINT_NO>PSR_CNOR</GRID_SUPPLY_POINT_NO>
<ZONE_CD>CNOR</ZONE_CD>
<AWARDED_PRICE_NO>41.65</AWARDED_PRICE_NO>
<OPERATORE>Bilateralista</OPERATORE>
<SUBMITTED_DT>20150818113211953</SUBMITTED_DT>
<BILATERAL_IN>true</BILATERAL_IN>
</OfferteOperatori>
etc
etc
etc
First we create a simple table to hold the xml that sqlldr will load along with a filename identifier:
CREATE TABLE MGPOffertePubbliche
(
filename varchar2(120),
xmldata XMLTYPE
)
XMLTYPE xmldata STORE AS CLOB;
Now we load that in using this controlfile
load data
infile *
replace
into table MGPOffertePubbliche
(
filename char(100),
XMLDATA lobfile(CONSTANT "/tmp/OffertePubbliche.xml") terminated by EOF
)
begindata
20150819MGPOffertePubbliche.xml
Here's the output from that
sqlldr user/xxxx control=OffertePubbliche.ctl log=OffertePubbliche.log
SQL*Loader: Release 11.2.0.4.0 - Production on Thu Oct 15 17:14:34 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Commit point reached - logical record count 1
So that's loaded in (which is quite neat in itself i think)
Now comes the really clever bit - look at this SQL statement - the special function xmltable and the syntax goes with it allows us to directly pull out xml values!
select t.filename, x.*
from MGPOFFERTEPUBBLICHE t,
XMLTABLE('/NewDataSet/OfferteOperatori' PASSING t.XMLDATA
COLUMNS
PURPOSE_CD Varchar2(10) PATH 'PURPOSE_CD',
TYPE_CD Varchar2(10) PATH 'TYPE_CD',
STATUS_CD Varchar2(10) PATH 'STATUS_CD',
MARKET_CD Varchar2(10) PATH 'MARKET_CD',
UNIT_REFERENCE_NO Varchar2(15) PATH 'UNIT_REFERENCE',
BID_OFFER_DATE Varchar2(15) PATH 'BID_OFFER_DATE') x
where t.filename='OffertePubbliche.xml';
Which returns this
FILENAME PURPOSE_CD TYPE_CD STATUS_CD MARKET_CD
---------------------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
UNIT_REFERENCE_NO BID_OFFER_DATE
--------------------------------------------- ---------------------------------------------
20150819MGPOffertePubbliche.xml OFF STND REJ MGP
20150819MGPOffertePubbliche.xml OFF STND REJ MGP
20150819MGPOffertePubbliche.xml OFF STND REJ MGP
Seriously neat - that is almost no code - why do people mess about with middle tier code doing xml work when the database makes it so easy?
All it needs is a little knowledge and you can do complicated things very easily!
More importantly 12c supports JSON which is even more practical for modern web applications.
ReplyDeleteIndeed Balazs - some good stuff has been added in 12, and I assume it will be further enhanced in 12.2.
ReplyDeleteJSON is horrible to read though (for humans at least) - it's like XML without any kind of tags to make it understandable.....
I'm sure people will learn to hate squiggly brackets very soon.......