XML¶
There is quite a bit of software written for the financial industry where the data moved around is stored in XML format. To open up the powerful SQL engine to those environments, XML support has been added to Lenses. Instructing the engine to use the XML decoder is as simple as writing the following code:
SELECT *
FROM swap_trades
Supporting XML has been added through the stream processing side of Lenses SQL. Here is an example of joining two topics containing XML data:
-- LEFT JOIN two topics with XML data
INSERT INTO toTopic
SELECT STREAM
od.orderNumber
, od.productCode
, od.quantityOrdered
, od.priceEach
, od.orderLineNumber
, concat(od.productCode,'-',p.productName) as productName
FROM OrdersDetailsTopic AS od
LEFT JOIN `ProductTopic` AS p
ON p.productCode = od.productCode
GROUP BY TUMBLE(4,s)
XML rules¶
With XML, there is no type inferred for a field. Therefore all fields are STRING; using the CAST function, the field can be marshaled to a different data type i.e. INT/LONG/DOUBLE/DECIMAL. The next SQL code computes the total value for a given order entry:
SELECT CAST(priceEach as decimal) * CAST (quantityOrdered as INT) as total
FROM `order_item`
XML contains XML elements. An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.
<price>29.99</price>
Let us see how the elements can be addressed via SQL. Imagine a topic containing book records like the one below:
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
To select all fields, the following code is required:
SELECT book.title,
book.author,
cast(book.year as int) as year,
cast(book.price as decimal) as price
FROM `books`
But XML contains XML attributes. Attributes are designed to contain data related to a specific element.
<note date="2008-01-10">some text</note>
Let us add a category attribute to the book example that was presented earlier:
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
For retrieving the value of the category attribute, the following code should be used:
SELECT book.category,
book.title,
book.author,
cast(book.year as int) as year,
cast(book.price as decimal) as price
FROM books
XML can contain attributes and elements with the same name.
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
<category>novel</category>
</book>
In this scenario, to address the category field, an array index needs to be used.
SELECT book.category[0] as firstCategory,
book.category[1] as secondCategory,
book.title,
book.title,
book.author,
cast(book.year as int) as year,
cast(book.price as decimal) as price
FROM books
Having nested elements with the same name is quite common. Consider the following XML code to represent a list of books:
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Using the fields from the second book entry is as simple as doing this:
SELECT bookstore.book[0].category as category,
cast(bookstore.book[1].price as decimal) as price
FROM books
Earlier an example was given for a note entry. It contains a date attribute and the XML element retains the text associated with the node. Here is the code for addressing the attribute as well as the content of the XML element:
SELECT note.date,
node._content
FROM notes
SQL streaming with XML¶
XML is just another payload type supported by the Lenses engine. Therefore all the rules discussed earlier on AVRO, JSON apply to XML as well. One thing to keep in mind about XML is that each field is a string. Always use the CAST function to convert the value of an XML field to the desired data type.
XML-SQL examples¶
<entry va="2"><va>30</va></entry>
SELECT entry.va[0] as firstVa, --returns 2
entry.va[1] as secondVa --returns 30
FROM topic