Quick start

Welcome to this Lenses SQL engine Snapshot mode tutorial!

This document takes the reader through typical use cases for querying existing data. Each exercise first builds the topics and insert the data before proceeding to the actual SELECT statement.

First query

Let’s create an Apache Kafka topic which stores String values for both Key and Value and then inserts one entry. The CREATE TABLE is used in the SQL engine to create a Kafka topic.

CREATE TABLE greetings(
    _key STRING
    , _value STRING
) 
FORMAT (string, STRING);

INSERT INTO greetings(_key, _value) VALUES ("Hello", "World");

Go ahead and run your first SELECT query now:

SELECT * 
FROM greetings

To remove the topic we just created use the DROP statement below:

DROP TABLE greetings

Record metadata

Each Kafka message contains information related to partition, offset, timestamp, and topic. Additionally the engine adds the key and value raw byte size.

Create a topic and insert a few entries.

CREATE TABLE tutorial(
    _key STRING
  , name STRING
  , difficulty INT
) 
FORMAT (Avro, Avro);

INSERT INTO tutorial(_key, name, difficulty)
VALUES
("1", "Learn Lenses SQL", 3),
("2", "Learn Quantum Physics", 10),
("3", "Learn French", 5);

Now we can query for specific metadata related to the records.

To query for metadata such as the underlying Kafka topic offset, partition and timestamp prefix your desired fields with _meta.

Run the following query to see each tutorial name along with its metadata information:

SELECT name
    , _meta.offset
    , _meta.timestamp
    , _meta.partition
    , _meta.__keysize
    , _meta.__valsize
FROM tutorial

/* The output is (timestamp will be different)
Learn Lenses SQL         0    1540575169198    0    7   23
Learn Quantum Physics    1    1540575169226    0    7   28
Learn French             2    1540575169247    0    7   19
*/

Performing simple arithmetic

SQL expressions allow you to perform basic arithmetic.

The common arithmetic operations +, -, *, / and % are supported.

Create a grocery table and insert data into it:

CREATE TABLE groceries (
    _key INT
    , id INT
    , name STRING
    , quantity INT
    , price DOUBLE) 
FORMAT(INT, Avro);

INSERT INTO groceries(
    _key
    , id
    , name
    , quantity 
    , price)
VALUES
(1, 1, "Fairtrade Bananas", 1, 1.90),
(2, 2, "Meridian Crunchy Peanut Butter", 1, 2.50),
(3, 3, "Green & Black's organic 85% dark chocolate bar", 2, 1.99),
(4, 4, "Activia fat free cherry yogurts", 3, 2.00),
(5, 5, "Green & Blacks Organic Chocolate Ice Cream", 2, 4.20);

Now calculate the price of each purchase:

SELECT 
    name
    , quantity * price AS total 
FROM groceries

    /*The output:
    Fairtrade Bananas                               1.9
    Meridian Crunchy Peanut Butter                  2.5
    Green & Black's organic 85% dark chocolate bar  3.98
    Activia fat free cherry yogurts                 6
    Green & Blacks Organic Chocolate Ice Cream      8.4
    */

Using functions

Functions can be used directly. See the list of SQL functions .

For example, the ROUND function allows you to round numeric functions:


SELECT
    name 
    , ROUND(quantity * price) AS rounded_total
FROM groceries

/*The output:
Fairtrade Bananas                                   2
Meridian Crunchy Peanut Butter                      3
Green & Black's organic 85% dark chocolate bar      4
Activia fat free cherry yogurts                     6
Green & Blacks Organic Chocolate Ice Cream          8
*/

Manipulating text columns

String functions allow you to perform different operations on text. For example, the CONCAT function allows combining multiple values. Let’s first create a customer table:


CREATE TABLE customers (
    _key STRING
    , first_name STRING
    , last_name STRING
) 
FORMAT(string, Avro);

INSERT INTO customers(_key, first_name, last_name)
VALUES
("mikejones", "Mike", "Jones"),
("anasmith", "Ana", "Smith"),
("shannonelliott", "Shannon","Elliott");

Next query combines first and last names into full names:


SELECT 
    CONCAT(first_name, " ", last_name) AS name
FROM customers

/* Output
Mike        Jones
Ana         Smith
Shannon     Elliott
*/

Filtering with WHERE

WHERE clause allows you to define a set of logical predicates the data needs to match in order to be returned. Standard comparison operators are supported (>, >=, <, <=, =, and !=) as well as calling functions.

We are going to use the groceries table created earlier. Select all items purchased where the prices are greater or equal to 2.00:

SELECT
     name
     , price
FROM groceries
WHERE price >= 2.0

/* Output
Meridian Crunchy Peanut Butter              2.5
Activia fat free cherry yogurts             2
Green & Blacks Organic Chocolate Ice Cream  4.2
*/

Select all customers whose last name length equals to 5:

SELECT *
FROM customers
WHERE LEN(last_name) = 5

/* Output
key         value.first_name    value.last_name
mikejones       Mike                Jones
anasmith        Ana                 Smith
*/

Search all customers containing Ana in their first name:

SELECT *
FROM customers
WHERE first_name LIKE '%Ana%'

Keep in mind that text search is case sensitive. To use case insensitive text search, you can write:

SELECT *
FROM customers
WHERE LOWERCASE(first_name) LIKE '%ana%';

-- And here is the negated version
SELECT *
FROM customers
WHERE LOWERCASE(first_name) NOT LIKE '%ana%';

Missing values

Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS [ NOT ] NULL, or EXISTS functions allows you to check for these situations.

Exists is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is `````.

Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.

Create the following table named customers_json:

CREATE TABLE customers_json (
    _key STRING
    , first_name STRING
    , last_name STRING
    , middle_name STRING
) FORMAT(string, json);


INSERT INTO customers_json(_key, first_name, last_name, middle_name) VALUES("mikejones", "Mike", "Jones", "Albert");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("anasmith", "Ana", "Smith");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("shannonelliott", "Shannon","Elliott");

Query this table for all its entries:

SELECT * 
FROM customers_json

/* The output
key             value.first_name   value.middle_name   value.last_name
mikejones           Mike                Albert          Jones
anasmith            Ana                                 Smith
shannonelliott      Shannon                             Elliott
*/

The middle_name is only present on the mikejones record.

Write a query which filters out records where middle_name is not present:


SELECT *
FROM customers_json
WHERE `EXISTS`(middle_name)

/* The output
 key             value.first_name   value.middle_name   value.last_name
mikejones            Mike            Albert                Jones
*/

This can also be written as:

SELECT *
FROM customers_json
WHERE middle_name IS NOT NULL

When a field is actually ``NULL`` or is missing, checking like in the below query has the same outcome.

SELECT *
FROM customers_json
WHERE middle_name IS NULL

Multiple WHERE conditions

You can use AND/OR to specify complex conditions for filtering your data.

To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:

SELECT *
FROM groceries
WHERE quantity > 1 
    AND price > 2

Now try changing the AND logical operand to OR and see the differences in output.

Limit the output

To limit the output of the query you can use two approaches:

use the LIMIT clause
set the max size of the data to be returned

-- limit to 1 record
SELECT *
FROM groceries
LIMIT 1

-- set the data size returned to be 1 megabyte.
SET max.size="1m";

-- on the small dataset we have here 1 MB will accommodate all records added and more
SELECT *
FROM groceries

Sampling data

To sample data and discard the first rows:

SELECT *
FROM groceries
LIMIT 1,2

This statement instructs Lenses to skip the first record matched and then sample the next two.

Read a table partition only

To select data from a specific partition access the meta data of the topic.

In the following example, a table is created with three partitions and the message key is hashed and then the remainder from HashValue % partitions will be the table partition the record is sent to.

-- Run
CREATE TABLE customers_partitioned (
    _key STRING
    , first_name STRING
    , last_name STRING
) 
FORMAT(string, Avro)
properties(partitions = 3);

INSERT INTO customers_partitioned(
    _key
    , first_name
    , last_name)
VALUES
("mikejones", "Mike", "Jones"),
("anasmith", "Ana", "Smith"),
("shannonelliott", "Shannon","Elliott"),
("tomwood", "Tom","Wood"),
("adelewatson", "Adele","Watson"),
("mariasanchez", "Maria", "Sanchez");

Next, run the following query:

SELECT *
FROM customers_partitioned

/* The output
offset  partition   timestamp       key         value.first_name    value.last_name
0       0           1540830780401   mikejones       Mike                Jones
1       0           1540830780441   anasmith        Ana                 Smith
2       0           1540830780464   shannonelliott  Shannon             Elliott
0       2           1540831270170   mariasanchez    Maria               Sanchez
0       1           1540830984698   tomwood         Tom                 Wood
1       1           1540831183308   adelewatson     Adele               Watson
*/

As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:

-- selects only records from partition = 0
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0;

-- selects only records from partition  0 and 2
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0
   OR _meta.partition = 2;

Count the records

Using the COUNT aggregate function you can count the records in a table. Run the following SQL to see how many records we have on the customers_partitioned:

SELECT 
    COUNT(*) AS total
FROM customers_partitioned

Use SUM to aggregate your amounts

Using the SUM function you can sum records in a table.

SELECT 
    SUM(quantity * price) AS amount
FROM groceries

Group data with GROUP BY

To group data use the GROUP BY clause:

CREATE TABLE customer (
    firstName STRING
    , lastName STRING
    , city STRING
    , country STRING
    , phone STRING) 
FORMAT(string, avro) 
properties(compacted=true);

INSERT INTO customer (
    _key
    ,firstName
    , lastName
    , city
    , country
    , phone)
VALUES
('1','Craig', 'Smith', 'New York', 'USA', '1-01-993 2800'),
('2','William', 'Maugham','Toronto','Canada','+1-613-555-0110'),
('3','William', 'Anthony','Leeds','UK', '+44 1632 960427'),
('4','S.B.','Swaminathan','Bangalore','India','+91 7513200000'),
('5','Thomas','Morgan','Arnsberg','Germany','+49-89-636-48018'),
('6','Thomas', 'Merton','New York','USA', '+1-617-555-0147'),
('7','Piers','Gibson','London','UK', '+44 1632 960269'),
('8','Nikolai','Dewey','Atlanta','USA','+1-404-555-0178'),
('9','Marquis', 'de Ellis','Rio De Janeiro','Brazil','+55 21 5555 5555'),
('10','Joseph', 'Milton','Houston','USA','+1-202-555-0153'),
('11','John','Betjeman Hunter','Sydney','Australia','+61 1900 654 321'),
('12','Evan', 'Hayek','Vancouver','Canada','+1-613-555-0130'),
('13','E.','Howard','Adelaide','Australia','+61 491 570 157'),
('14','C. J.', 'Wilde','London','UK','+44 1632 960111'),
('15','Butler', 'Andre','Florida','USA','+1-202-555-0107');

Let’s see how many customers there are from each country. Here is the code which computes that:

SELECT 
    COUNT(*) AS count
    , country
FROM customer
GROUP BY country

/* Output
2    Canada
3    UK
2    Australia
1    India
1    Brazil
5    USA
1    Germany
*/

Nested queries

The tabled query allows you to nest queries. Let us take the query in the previous section and say we are only interested in those entries where there exist more than 1 customer per country.

SELECT *
FROM (
    SELECT 
        COUNT(*) AS count
        , country
    FROM customer
    GROUP BY country
    )
WHERE count > 1

Run the query, and you will only see those entries for which there is more than one person registered per country.

Filtering grouped data with HAVING

To filter the entries returned from a grouping query. As with the WHERE statement, you can use HAVING syntax to achieve the same result when it comes to grouped queries.

SELECT
    COUNT(*) AS count
    , country
FROM customer
GROUP BY country
HAVING count > 1

Joins

Lenses allows you to combine records from two tables. A query can contain zero, one or multiple JOIN operations.

Create an orders table and insert some data into it:

CREATE TABLE orders(
    _key INT
    , orderDate STRING
    , customerId STRING
    , amount DOUBLE
) 
FORMAT(int, avro);

INSERT INTO orders (
    _key
    , orderDate
    , customerId
    , amount
)
VALUES
(1, '2018-10-01', '1', 200.50),
(2, '2018-10-11', '1', 813.00),
(3, '2018-10-11', '3', 625.20),
(4, '2018-10-11', '14', 730.00),
(5, '2018-10-11', '10', 440.00),
(6, '2018-10-11', '9', 444.80);

With these tables in place, join them to get more information about an order by combining it with the customer information found in the customer table:

SELECT 
    o._key AS orderNumber
    , o.amount AS totalAmount
    , c.firstName
    , c.lastName
    , c.city
    , c.country
FROM orders o INNER JOIN customer c
    ON o.customerId = c._key

/*
city        orderNumber     country     totalAmount     lastName    firstName
New York        1               USA         200.5       Smith       Craig
New York        2               USA         813         Smith       Craig
Leeds           3               UK          625.2       Anthony     William
Rio De Janeiro  6               Brazil      444.8       de Ellis    Marquis
Houston         5               USA         440         Milton      Joseph
London          4               UK          730         Wilde       C. J.
*/

Lateral Joins

With lateral joins, Lenses allows you to combine records from a table with the elements of an array expression.

We are going to see in more details what lateral joins are with an example.

Create a batched_readings table and insert some data into it:

CREATE TABLE batched_readings(
  meter_id int
  , readings int[]
)
FORMAT(int, AVRO);

INSERT INTO batched_readings(
    meter_id
    , readings
) VALUES
(1, [100, 80, 95, 91]),
(2, [87, 93, 100]),
(1, [88, 89, 92, 94]),
(2, [81])

You now can use a LATERAL join to inspect, extract and filter the single elements of the readings array, as if they were a normal field:

SELECT
    meter_id
    , reading
 FROM
    batched_readings
    LATERAL readings AS reading
WHERE 
    reading > 90

Running that query we will get the values:

meter_id	reading
1	100
1	95
1	91
2	93
1	92
1	94

You can use multiple LATERAL joins, one inside the other, if you want to extract elements from a nested array:

CREATE TABLE batched_readings_nested(
  sensor_id int
  , nested_readings int[][]
)
FORMAT(int, AVRO);

INSERT INTO batched_readings_nested(
    sensor_id
    , nested_readings
) VALUES
(1, [[100, 101], [103]]),
(2, [[80, 81], [82, 83, 82]]),
(1, [[100], [103, 102], [104]])

Running the following query we will obtain the same records of the previous example:

SELECT
    meter_id
    , reading
 FROM
    batched_readings
    LATERAL nested_readings AS readings
    LATERAL readings as reading
WHERE 
    reading > 90