Showing posts with label Oracle XML DB. Show all posts
Showing posts with label Oracle XML DB. Show all posts

Tuesday, November 22, 2011

Storing and Retrieving XML data in Oracle 11g

One of the biggest change that Oracle has made in 11g is the introduction of the new “binary” XMLType. The binary XMLType is the third method of storing data in Oracle database; the first two being “structured” and “unstructured”. Oracle 11g’s XML preprocessors now includes a binary XML encoder, decoder and token manager. Binary XMLType is optimized for storage and retrieval of XML data by reducing memory and CPU demand.

Let’s jump in and see how this works. For the purpose of this post, I am using Oracle Express 11g on Windows 7 (x64).

Create a table to store XML contents.

SQL> CREATE TABLE XML_FILES
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT 
XMLTYPE
  5  )
  6  /

Table created.

Note, I did not specify binary type. Under the hood, Oracle created XMLType as binary.

11g:

CREATE TABLE HR.XML_FILES
(
  FILENAME     VARCHAR2(1000 BYTE),
  FILECONTENT  SYS.XMLTYPE
)
XMLTYPE FILECONTENT STORE AS BINARY XML (
  TABLESPACE USERS
  ENABLE       STORAGE IN ROW
  CHUNK       8192
   :

Let’s create same table in Oracle Express 10g:

10g: defaults to CLOB

SQL> CREATE TABLE XML_FILES
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT  XMLTYPE
  5  )
  6  /

Table created.

CREATE TABLE URAO.XML_FILES
(
  FILENAME     VARCHAR2(1000 BYTE),
  FILECONTENT  SYS.XMLTYPE
)
XMLTYPE FILECONTENT STORE AS CLOB (
  TABLESPACE USERS
  ENABLE       STORAGE IN ROW
  CHUNK       8192   

   :

Note, Binary is the default in Oracle 11g. However, you can choose to create an XMLType of CLOB type in 11g.

11g: override BINARY and create as CLOB

SQL> CREATE TABLE XML_FILES_CLOB
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT  XMLTYPE
  5  )
  6  XMLTYPE FILECONTENT STORE AS CLOB
  7  /

Table created.

‘XMLTYPE FILECONTENT STORE AS CLOB’ will store the XML as CLOB.

Now that we’ve created the table, let’s load the XML files into it. I will use SQLLoader to load some XML files.

SQL Loader Control File: xml.ctl

LOAD DATA
INFILE infile.txt
INTO TABLE xml_files APPEND
fields terminated by ','
(
filename CHAR(100),
filecontent LOBFILE(filename) terminated by eof
)

Contents of infile.txt:

ABANDA-2007040105174348PDT.xml
ABANDA-2007040205181412PDT.xml
ABANDA-2007041106302127PDT.xml  
ABANDA-2007041405283313PDT.xml
ABANDA-2007040105264865PDT.xml
ABANDA-2007040400350593PDT.xml
ABANDA-2007041402364776PDT.xml
ABANDA-2007041704324786PDT.xml

Create a bat file: load_xml.bat

sqlldr hr/hr@xe xml.ctl

The batch file above invokes the sqlldr.

image

Let’s check on what is loaded:

SELECT FILENAME, FILECONTENT FROM XML_FILES ;

image

So far so good. We’ve stored the XML in XMLType datatype in Oracle. You could repeat the above to load XML files into XML_FILES_CLOB table that stored XML as CLOB. I won’t repeat that here, but its straight forward, just change the references of table in the control file and you may want to create a separate batch file for it.

Now its time to query it (refer the XML under References for the structure of the XML document).

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

image

Use the xmltable operator to extract the values of the document and by passing the node ('PurchaseOrder' passing xf.filecontent) which contains the values. The columns of the xmltable (PurchaseOrder) can be created as:

columns DateCreated varchar2(100) path '@DateCreated',
Reference varchar2(100) path 'Reference',
Requestor varchar2(100) path 'Requestor',
"User" varchar2(100) path 'User',
CostCenter varchar2(100) path 'CostCenter',
SpecialInstructions varchar2(100) path 'SpecialInstructions'

DateCreated is an attribute of the node PurchaseOrder and hence use ‘@’ to access its values.

Let’s extract the Shipping Information from the Purchase Orders:

-- example 2 shipping instructions
SELECT filename,
       name,
       shipping_address,
       telephone
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns  name varchar2(100) path 'ShippingInstructions/name',
                           shipping_address varchar2(100) path 'ShippingInstructions/address',
                           telephone varchar2(100) path 'ShippingInstructions/telephone'
              ) ShippingInstructions;

image

and the LineItems:

-- example 3 LineItems
SELECT filename,
       ItemNumber,
       Part_Description,
       UnitPrice,
       Quantity,
       UnitPrice * Quantity TotalAmount
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder/LineItems/LineItem' passing xf.filecontent
                  columns  ItemNumber varchar2(100) path '@ItemNumber',
                           Part_Description varchar2(100) path 'Part/@Description',
                           UnitPrice number path 'Part/@UnitPrice',
                           Quantity number path 'Quantity'
              ) ShippingInstructions;

image

Note, I am calculating TotalAmount as  UnitPrice * Quantity in the query.

So far so good, we loaded XML data and are able to query the data from the XML documents.

Let’s examine the query execution plans and compare the plans between the native BINARY XMLType and CLOB XMLType:

query:

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

Query Execution Plan:
image 

Change the table name above to refer to xml_files_clob

image

Let’s run the two queries side by side (BINARY XMLType VS CLOB XMLType):

Binary XMLType: 0.016 sec

image

CLOB XMLType: 0.265 sec

image

Binary XMLType returns data in 0.016 s VS CLOB XMLType returning in 0.265 s which is ~16.5 times faster.

So far so good, now let’s index the two tables!

XMLIndex:

You can create indexes on your XML data, to focus on particular parts of it that you query often, and thus improve performance. XMLIndex provides a general, XML-specific index that indexes the internal structure of XML data. One of its main purposes is to overcome the indexing limitation presented by unstructured and hybrid storage of XML data, that is, CLOB storage. It does this by indexing the XML tags of your document and identifying document fragments based on XPath expressions that target them. It can also index scalar node values, to provide quick lookup based on individual values or ranges of values. It also records document hierarchy information for each node it indexes: relations parent–child, ancestor–descendant, and sibling.

CTXXPath Indexes: deprecated in 11g

Another type of index that is available for indexing XML data, CTXXPath, is deprecated, starting with Oracle Database 11g Release 1 (11.1). It has been superseded by XMLIndex, and it is made available only for use with older database releases. It cannot help in extracting an XML fragment, and it acts only as a preliminary filter for equality predicates; after such filtering, XPath expressions are evaluated functionally (that is, without the benefit of XPath rewrite)

Let’s create the index on BINARY XMLType:

SQL> -- CREATE XMLINDEX
SQL> CREATE INDEX IDX_XMLINDEX_PO ON XML_FILES(FILECONTENT) INDEXTYPE IS XDB.XML
INDEX
  2  /

Index created.

Note the query execution plan. There are no predicates in the query, but the index is already being range scanned.

image

Let’s run the query and see how long it takes with the new index:

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

image

SQLDeveloper is showing 0 seconds, but obviously it is rounding.

Let’s create a similar index on CLOB XMLType and do the same test.

SQL> CREATE INDEX IDX_XMLINDEX_PO_CLOB ON XML_FILES_CLOB(FILECONTENT) INDEXTYPE
IS XDB.XMLINDEX
  2  /

Index created.

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files_clob xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

The query is using the index as well:

image

Let’s try and execute the query:

image

Note it has taken 0.015 seconds to run which is again VS 0 seconds taken by a query against BINARY XMLType.

Summary:

Binary XML Storage:
Binary XML is a new storage model for abstract data type XMLType, joining the existing storage models of structured (object-relational) and unstructured (CLOB) storage. Binary XML is XML-Schema aware, but it can also be used with XML data that is not based on an XML schema. My observation from the limited testing I’ve done is that queries returning data based on Binary XMLType seem more optimized than its predecessors.

XMLIndex:
A new index type is provided for XMLType: XMLIndex. This can greatly improve the performance of XPath-based predicates and fragment extraction for XMLType data, whether based on an XML schema or not. The new index type is a (logical) domain index that consists of underlying physical table(s) and secondary indexes. The CTXSYS.CTXXPath index is deprecated in Oracle Database 11g Release 1 (11.1). The functionality that was provided by CTXXPath is now provided by XMLIndex. Oracle recommends that you replace CTXXPath indexes with XMLIndex indexes. The intention is that CTXXPath will no longer be supported in a future release of the database.

References:

XML: Sample Purchase Order XML:

<PurchaseOrder DateCreated="2007-04-01T12:17:43.48000Z">
   <Reference>ABANDA-2007040105174348PDT</Reference>
   <Actions>
      <Action>
         <User>ABANDA</User>
      </Action>
   </Actions>
   <Rejection>
      <Date xsi:nil="true"/>
      <User xsi:nil="true"/>
   </Rejection>
   <Requestor>Amit Banda</Requestor>
   <User>ABANDA</User>
   <CostCenter>A80</CostCenter>
   <ShippingInstructions>
      <name>Amit Banda</name>
      <address>Magdalen Centre, The Oxford Science Park,
Oxford,
Oxford OX9 9ZB
United Kingdom</address>
      <telephone>377-420-5467</telephone>
   </ShippingInstructions>
   <SpecialInstructions>COD</SpecialInstructions>
   <LineItems>
      <LineItem ItemNumber="1">
         <Part Description="Lexx: Season 2 V -3" UnitPrice="19.95">54961473892</Part>
         <Quantity>8.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="2">
         <Part Description="Starship Troopers" UnitPrice="27.95">43396717190</Part>
         <Quantity>4.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="3">
         <Part Description="Herbert Von Karajan- His Legacy for Home Video: Beethoven Symphonies Nos. 2 &amp; 3- Eroica" UnitPrice="19.95">74644636598</Part>
         <Quantity>4.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="4">
         <Part Description="The Rock" UnitPrice="19.95">717951000040</Part>
         <Quantity>3.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="5">
         <Part Description="The Secret of NIMH II: Timmy to the Rescue" UnitPrice="19.95">27616859181</Part>
         <Quantity>7.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="6">
         <Part Description="The Clan of the Cave Bear" UnitPrice="19.95">85391375326</Part>
         <Quantity>8.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="7">
         <Part Description="Straight to Hell" UnitPrice="19.95">13131131499</Part>
         <Quantity>2.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="8">
         <Part Description="Hero" UnitPrice="19.95">43396515697</Part>
         <Quantity>7.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="9">
         <Part Description="Michael Crawford in Concert" UnitPrice="19.95">85393642020</Part>
         <Quantity>3.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="10">
         <Part Description="Arabian Nights &amp; Gulliver's Travels" UnitPrice="32.95">12236123040</Part>
         <Quantity>1.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="11">
         <Part Description="The In Crowd" UnitPrice="19.95">85391862925</Part>
         <Quantity>1.0</Quantity>
      </LineItem>
   </LineItems>
</PurchaseOrder>