Showing posts with label Oracle XML DB. Show all posts
Showing posts with label Oracle XML DB. Show all posts

Tuesday, November 22, 2011

Storing and Retrieving XML data in Oracle 11g

One of the biggest change that Oracle has made in 11g is the introduction of the new “binary” XMLType. The binary XMLType is the third method of storing data in Oracle database; the first two being “structured” and “unstructured”. Oracle 11g’s XML preprocessors now includes a binary XML encoder, decoder and token manager. Binary XMLType is optimized for storage and retrieval of XML data by reducing memory and CPU demand.

Let’s jump in and see how this works. For the purpose of this post, I am using Oracle Express 11g on Windows 7 (x64).

Create a table to store XML contents.

SQL> CREATE TABLE XML_FILES
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT 
XMLTYPE
  5  )
  6  /

Table created.

Note, I did not specify binary type. Under the hood, Oracle created XMLType as binary.

11g:

CREATE TABLE HR.XML_FILES
(
  FILENAME     VARCHAR2(1000 BYTE),
  FILECONTENT  SYS.XMLTYPE
)
XMLTYPE FILECONTENT STORE AS BINARY XML (
  TABLESPACE USERS
  ENABLE       STORAGE IN ROW
  CHUNK       8192
   :

Let’s create same table in Oracle Express 10g:

10g: defaults to CLOB

SQL> CREATE TABLE XML_FILES
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT  XMLTYPE
  5  )
  6  /

Table created.

CREATE TABLE URAO.XML_FILES
(
  FILENAME     VARCHAR2(1000 BYTE),
  FILECONTENT  SYS.XMLTYPE
)
XMLTYPE FILECONTENT STORE AS CLOB (
  TABLESPACE USERS
  ENABLE       STORAGE IN ROW
  CHUNK       8192   

   :

Note, Binary is the default in Oracle 11g. However, you can choose to create an XMLType of CLOB type in 11g.

11g: override BINARY and create as CLOB

SQL> CREATE TABLE XML_FILES_CLOB
  2  (
  3    FILENAME     VARCHAR2(1000),
  4    FILECONTENT  XMLTYPE
  5  )
  6  XMLTYPE FILECONTENT STORE AS CLOB
  7  /

Table created.

‘XMLTYPE FILECONTENT STORE AS CLOB’ will store the XML as CLOB.

Now that we’ve created the table, let’s load the XML files into it. I will use SQLLoader to load some XML files.

SQL Loader Control File: xml.ctl

LOAD DATA
INFILE infile.txt
INTO TABLE xml_files APPEND
fields terminated by ','
(
filename CHAR(100),
filecontent LOBFILE(filename) terminated by eof
)

Contents of infile.txt:

ABANDA-2007040105174348PDT.xml
ABANDA-2007040205181412PDT.xml
ABANDA-2007041106302127PDT.xml  
ABANDA-2007041405283313PDT.xml
ABANDA-2007040105264865PDT.xml
ABANDA-2007040400350593PDT.xml
ABANDA-2007041402364776PDT.xml
ABANDA-2007041704324786PDT.xml

Create a bat file: load_xml.bat

sqlldr hr/hr@xe xml.ctl

The batch file above invokes the sqlldr.

image

Let’s check on what is loaded:

SELECT FILENAME, FILECONTENT FROM XML_FILES ;

image

So far so good. We’ve stored the XML in XMLType datatype in Oracle. You could repeat the above to load XML files into XML_FILES_CLOB table that stored XML as CLOB. I won’t repeat that here, but its straight forward, just change the references of table in the control file and you may want to create a separate batch file for it.

Now its time to query it (refer the XML under References for the structure of the XML document).

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

image

Use the xmltable operator to extract the values of the document and by passing the node ('PurchaseOrder' passing xf.filecontent) which contains the values. The columns of the xmltable (PurchaseOrder) can be created as:

columns DateCreated varchar2(100) path '@DateCreated',
Reference varchar2(100) path 'Reference',
Requestor varchar2(100) path 'Requestor',
"User" varchar2(100) path 'User',
CostCenter varchar2(100) path 'CostCenter',
SpecialInstructions varchar2(100) path 'SpecialInstructions'

DateCreated is an attribute of the node PurchaseOrder and hence use ‘@’ to access its values.

Let’s extract the Shipping Information from the Purchase Orders:

-- example 2 shipping instructions
SELECT filename,
       name,
       shipping_address,
       telephone
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns  name varchar2(100) path 'ShippingInstructions/name',
                           shipping_address varchar2(100) path 'ShippingInstructions/address',
                           telephone varchar2(100) path 'ShippingInstructions/telephone'
              ) ShippingInstructions;

image

and the LineItems:

-- example 3 LineItems
SELECT filename,
       ItemNumber,
       Part_Description,
       UnitPrice,
       Quantity,
       UnitPrice * Quantity TotalAmount
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder/LineItems/LineItem' passing xf.filecontent
                  columns  ItemNumber varchar2(100) path '@ItemNumber',
                           Part_Description varchar2(100) path 'Part/@Description',
                           UnitPrice number path 'Part/@UnitPrice',
                           Quantity number path 'Quantity'
              ) ShippingInstructions;

image

Note, I am calculating TotalAmount as  UnitPrice * Quantity in the query.

So far so good, we loaded XML data and are able to query the data from the XML documents.

Let’s examine the query execution plans and compare the plans between the native BINARY XMLType and CLOB XMLType:

query:

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

Query Execution Plan:
image 

Change the table name above to refer to xml_files_clob

image

Let’s run the two queries side by side (BINARY XMLType VS CLOB XMLType):

Binary XMLType: 0.016 sec

image

CLOB XMLType: 0.265 sec

image

Binary XMLType returns data in 0.016 s VS CLOB XMLType returning in 0.265 s which is ~16.5 times faster.

So far so good, now let’s index the two tables!

XMLIndex:

You can create indexes on your XML data, to focus on particular parts of it that you query often, and thus improve performance. XMLIndex provides a general, XML-specific index that indexes the internal structure of XML data. One of its main purposes is to overcome the indexing limitation presented by unstructured and hybrid storage of XML data, that is, CLOB storage. It does this by indexing the XML tags of your document and identifying document fragments based on XPath expressions that target them. It can also index scalar node values, to provide quick lookup based on individual values or ranges of values. It also records document hierarchy information for each node it indexes: relations parent–child, ancestor–descendant, and sibling.

CTXXPath Indexes: deprecated in 11g

Another type of index that is available for indexing XML data, CTXXPath, is deprecated, starting with Oracle Database 11g Release 1 (11.1). It has been superseded by XMLIndex, and it is made available only for use with older database releases. It cannot help in extracting an XML fragment, and it acts only as a preliminary filter for equality predicates; after such filtering, XPath expressions are evaluated functionally (that is, without the benefit of XPath rewrite)

Let’s create the index on BINARY XMLType:

SQL> -- CREATE XMLINDEX
SQL> CREATE INDEX IDX_XMLINDEX_PO ON XML_FILES(FILECONTENT) INDEXTYPE IS XDB.XML
INDEX
  2  /

Index created.

Note the query execution plan. There are no predicates in the query, but the index is already being range scanned.

image

Let’s run the query and see how long it takes with the new index:

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

image

SQLDeveloper is showing 0 seconds, but obviously it is rounding.

Let’s create a similar index on CLOB XMLType and do the same test.

SQL> CREATE INDEX IDX_XMLINDEX_PO_CLOB ON XML_FILES_CLOB(FILECONTENT) INDEXTYPE
IS XDB.XMLINDEX
  2  /

Index created.

-- example 1 purchase order
SELECT filename,
       DateCreated,
       Reference,
       Requestor,
       "User",
       CostCenter,
       SpecialInstructions
FROM
xml_files_clob xf,
     xmltable(xmlnamespaces('
http://www.w3.org/2001/XMLSchema-instance' as "xsi"),
                            'PurchaseOrder' passing xf.filecontent
                  columns DateCreated varchar2(100) path '@DateCreated',
                          Reference varchar2(100) path 'Reference',
                          Requestor varchar2(100) path 'Requestor',
                          "User" varchar2(100) path 'User',
                          CostCenter varchar2(100) path 'CostCenter',
                          SpecialInstructions varchar2(100) path 'SpecialInstructions'
              ) PurchaseOrder;

The query is using the index as well:

image

Let’s try and execute the query:

image

Note it has taken 0.015 seconds to run which is again VS 0 seconds taken by a query against BINARY XMLType.

Summary:

Binary XML Storage:
Binary XML is a new storage model for abstract data type XMLType, joining the existing storage models of structured (object-relational) and unstructured (CLOB) storage. Binary XML is XML-Schema aware, but it can also be used with XML data that is not based on an XML schema. My observation from the limited testing I’ve done is that queries returning data based on Binary XMLType seem more optimized than its predecessors.

XMLIndex:
A new index type is provided for XMLType: XMLIndex. This can greatly improve the performance of XPath-based predicates and fragment extraction for XMLType data, whether based on an XML schema or not. The new index type is a (logical) domain index that consists of underlying physical table(s) and secondary indexes. The CTXSYS.CTXXPath index is deprecated in Oracle Database 11g Release 1 (11.1). The functionality that was provided by CTXXPath is now provided by XMLIndex. Oracle recommends that you replace CTXXPath indexes with XMLIndex indexes. The intention is that CTXXPath will no longer be supported in a future release of the database.

References:

XML: Sample Purchase Order XML:

<PurchaseOrder DateCreated="2007-04-01T12:17:43.48000Z">
   <Reference>ABANDA-2007040105174348PDT</Reference>
   <Actions>
      <Action>
         <User>ABANDA</User>
      </Action>
   </Actions>
   <Rejection>
      <Date xsi:nil="true"/>
      <User xsi:nil="true"/>
   </Rejection>
   <Requestor>Amit Banda</Requestor>
   <User>ABANDA</User>
   <CostCenter>A80</CostCenter>
   <ShippingInstructions>
      <name>Amit Banda</name>
      <address>Magdalen Centre, The Oxford Science Park,
Oxford,
Oxford OX9 9ZB
United Kingdom</address>
      <telephone>377-420-5467</telephone>
   </ShippingInstructions>
   <SpecialInstructions>COD</SpecialInstructions>
   <LineItems>
      <LineItem ItemNumber="1">
         <Part Description="Lexx: Season 2 V -3" UnitPrice="19.95">54961473892</Part>
         <Quantity>8.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="2">
         <Part Description="Starship Troopers" UnitPrice="27.95">43396717190</Part>
         <Quantity>4.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="3">
         <Part Description="Herbert Von Karajan- His Legacy for Home Video: Beethoven Symphonies Nos. 2 &amp; 3- Eroica" UnitPrice="19.95">74644636598</Part>
         <Quantity>4.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="4">
         <Part Description="The Rock" UnitPrice="19.95">717951000040</Part>
         <Quantity>3.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="5">
         <Part Description="The Secret of NIMH II: Timmy to the Rescue" UnitPrice="19.95">27616859181</Part>
         <Quantity>7.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="6">
         <Part Description="The Clan of the Cave Bear" UnitPrice="19.95">85391375326</Part>
         <Quantity>8.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="7">
         <Part Description="Straight to Hell" UnitPrice="19.95">13131131499</Part>
         <Quantity>2.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="8">
         <Part Description="Hero" UnitPrice="19.95">43396515697</Part>
         <Quantity>7.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="9">
         <Part Description="Michael Crawford in Concert" UnitPrice="19.95">85393642020</Part>
         <Quantity>3.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="10">
         <Part Description="Arabian Nights &amp; Gulliver's Travels" UnitPrice="32.95">12236123040</Part>
         <Quantity>1.0</Quantity>
      </LineItem>
      <LineItem ItemNumber="11">
         <Part Description="The In Crowd" UnitPrice="19.95">85391862925</Part>
         <Quantity>1.0</Quantity>
      </LineItem>
   </LineItems>
</PurchaseOrder>

ShareThis

Meebo

Labels

2012 (1) Adobe Reader (1) Analysis Services (4) Analytic Functions (1) APEX Web Service (1) Apple (1) Aptana (1) Architecture (1) ASP.NET MVC3 (2) Azure (3) Azure Data Market (1) Azure Data Marketplace (4) Best Practices Analyzer (1) Best States in America (US) for Business–CNBC–Excel 2013 Web App Interactive (1) BI for .NET (1) Big Data (2) Bluefish (1) Boise (1) Butternut (1) C# (3) C# Introduction (1) Chow (1) Chrome (1) Chrome Remote Desktop (1) Chromium (1) Cliplets (1) Cloud 9 IDE (1) Cloudon MS Office iPad app (1) Code (1) CodeAcademy (1) CoffeeCup (1) Command (1) communicator (1) Community Edition (1) compiz (1) Contracts (1) Country Region Reference Data (1) Creating presentations using deck.js (1) Crescent (1) Data Explorer (2) Data Mining (2) Data Modeling (1) Data Quality (1) Data Quality Service (1) DataTables (1) Date (1) DAX (1) deck.js (1) Decrypt Files (1) Demo (1) Denali (6) Detect Categories (1) Dropbox (1) Eclipse IDE for Java EE Developers (1) Editor (1) Enter password for keyring 'default' to unlock on Ubuntu (1) Entity Framework 4.0 (1) error (1) ERwin (3) excel (1) Excel 2010 (2) Excel 2013 (1) Excel Services (1) Filezila (1) Format (1) Free (5) free ebooks (2) Free training (2) Fusion Tables (1) Getaround.com (1) Google (2) Google Cloud Connect for Microsoft Office (1) Google Currents (1) Google Docs (1) Google Graph Calculator (1) Hadoop (2) Hadoop for Microsoft Windows (1) Highlight (1) Hive (2) How to embed a Google Book Preview in Blogger? (1) How to embed word (1) HP Envy Spectre (1) HP Z1 Workstation (1) HTML (1) HTML-Kit (1) HTML5 (1) HTML5 animation using Mugeda tool (1) HTML5 Video (1) iBooks (1) Improvement (1) Inspiration (1) Install (1) Install Oracle SH schema (1) Installation (1) Installing (1) Installing Android Apps on Windows using bluestacks (1) Installing EntityFrameWork (1) Integration Services (1) iOS5 (1) iPad (2) iPhone (1) Issue (1) JavaScript (7) jQuery (5) Kinect (1) Knockout.js (1) Komodo Edit (1) Komodo IDE (1) Kompozer (1) Language Change (1) Learn SQL Server 2012 (1) Learn Windows Azure (1) Lego Building Blocks (1) Lifebrowser (1) LightSwitch (2) LINQ Oracle (1) Managing Project (1) mds (1) Menus using CSS3 (1) Microsoft (2) Microsoft BI (6) Microsoft Research (1) Microsoft's Future Vision on Productivity (1) Model first (1) mongodb (1) Motivation (2) Movie (1) MS Office (1) MSDN (1) NASA (1) nautilus-open-terminal (1) Netbeans (1) Netflix (2) Notepad++ (1) NuGet Package Manager (1) OBIEE (4) OData (2) ODP.NET (1) Office (2) Office 2013 (1) Office 2013 Quick Start Guides (1) Office Diagnostics (1) Office Tips and Tricks (1) Office Touch Guide (1) Office365 (2) Omnibox (1) Open ssh (1) ORA Errors (1) ORA-12514 (1) Oracle (1) Oracle 11g Express Edition Options Not Included (1) Oracle 11g Invisible Indexes (1) Oracle 11g New Features (1) Oracle 11g Read Only tables (1) Oracle APEX (2) Oracle Cloud (1) Oracle Data Pump (1) Oracle Express (1) Oracle Express 11g (2) oracle java 7 (1) Oracle Learning Library (1) Oracle OData (1) Oracle Open World 2011 (1) Oracle Pivot and Unpivot (1) Oracle Prebuilt VM Appliances (1) Oracle SQL Developer (1) Oracle Virtual Columns (1) Oracle XML DB (1) Outlook 2010 (3) Performance (1) PerformancePoint Services (2) Personal (1) Phonegap (1) Picture (1) PluralSight (1) Power View (2) PowerPivot (14) powerpoint in a site or a blog (1) PowerShell (1) Prettyprint (1) prezi (1) Project Juneau (1) Python (1) Rainbow (1) Recipe (1) Recorder (1) Rent (1) Reporting Services (2) Resources (1) RIA Services (1) Ruby (1) Samsung Galaxy S III (ATT) with Android 4.1 Jelly Bean (2) Screen Capture (1) Search (2) Security (1) Self Service BI (4) Server Error (1) Seth Godin (1) Setup (1) SharePoint (1) SharePoint 2010 (9) SharePoint 2010 Training (1) SharePoint BI Performance Point (3) Shell Scripting (1) Shutter (1) Silverlight (1) Skydrive (1) SOA (2) Social Network (1) Soup (1) SP2-0750 (1) Spotify (1) SQL Azure (1) SQL Server 2008 R2 (1) SQL Server 2008 R2 Reporting Services (1) SQL Server 2012 (5) sql server denali data quality services (1) SQL Server Virtual Labs (2) SQL*Plus error (1) Squash (1) SSAS Tabular (1) Storing and Retrieving XML data in Oracle 11g (1) Tabular (1) TechEd 2012 (1) TED (1) Terminator (1) Texmaker (1) Thunderstorm (1) TimeTo365.com (1) Tip (1) Tips (1) Tips and Tricks (1) Training (1) Ubuntu (1) Ubuntu 11.04 (1) Ubuntu Oracle (1) unrar (1) Vertica (2) Vertica VM appliance (1) video.js (1) VirtualBox (1) Visio 2010 crashes (1) Visio Services (2) Visual Studio 11 Developer Preview Training Kit (1) Visual Studio 2010 (1) Visual Studio 2010 and .NET Framework 4 Training Kit (1) Visual Studio Express 2012 (1) VLC (1) VM (1) WCF (1) Web Stack (1) WebMatrix (1) Webucator (1) What is new in SQL Server 2012 Analysis Services (1) Whole Foods Boise ID (1) Why won’t Microsoft Lync 2010 start (1) Windows 2008R2 SP1 (1) Windows 7 (1) Windows 8 Charms (1) Windows 8 Consumer Preview Demo (1) Windows 8 Contracts (1) Windows 8 Metro Apps (3) Windows 8 Touch Language (1) Windows 8 Video Driver issue/problem (1) Windows App Store (1) windows azure (3) Windows Phone 7 (1) Windows Phone Development (1) Wine (1) Word 2010 (1) Word 2013 (1) Word PDF Reflow (1) Work (1) zip (1) ZoomIt v4.2 (1)

Disclaimer

This blog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. My blog comes with no guarantees, and the content might contain errors. Expect to find a repeat of information that you can find in other blogs and sites. This is mainly for my future reference, It is my way of documenting things. I give due credit to contents, images, information sourced from product demos and other external sources wherever possible.