5.0
Quick start
Welcome to this Lenses SQL engine Snapshot mode tutorial!
This document takes the reader through typical use cases for querying existing data.
Each exercise first builds the topics and insert the data before proceeding to the actual SELECT
statement.
First query
Let’s create an Apache Kafka topic which stores String values for both Key and Value and then inserts one entry.
The CREATE TABLE
is used in the SQL engine to create a Kafka topic.
CREATE TABLE greetings(
_key STRING
, _value STRING
)
FORMAT (string, STRING);
INSERT INTO greetings(_key, _value) VALUES ("Hello", "World");
Go ahead and run your first SELECT
query now:
SELECT *
FROM greetings
To remove the topic we just created use the DROP
statement below:
DROP TABLE greetings
Record metadata
Each Kafka message contains information related to partition, offset, timestamp, and topic. Additionally the engine adds the key and value raw byte size.
Create a topic and insert a few entries.
CREATE TABLE tutorial(
_key STRING
, name STRING
, difficulty INT
)
FORMAT (Avro, Avro);
INSERT INTO tutorial(_key, name, difficulty)
VALUES
("1", "Learn Lenses SQL", 3),
("2", "Learn Quantum Physics", 10),
("3", "Learn French", 5);
Now we can query for specific metadata related to the records.
To query for metadata such as the underlying Kafka topic offset, partition and timestamp prefix your desired fields with _meta
.
Run the following query to see each tutorial name along with its metadata information:
SELECT name
, _meta.offset
, _meta.timestamp
, _meta.partition
, _meta.__keysize
, _meta.__valsize
FROM tutorial
/* The output is (timestamp will be different)
Learn Lenses SQL 0 1540575169198 0 7 23
Learn Quantum Physics 1 1540575169226 0 7 28
Learn French 2 1540575169247 0 7 19
*/
Performing simple arithmetic
SQL expressions allow you to perform basic arithmetic.
The common arithmetic operations +
, -
, *
, /
and %
are supported.
Create a grocery table and insert data into it:
CREATE TABLE groceries (
_key INT
, id INT
, name STRING
, quantity INT
, price DOUBLE)
FORMAT(INT, Avro);
INSERT INTO groceries(
_key
, id
, name
, quantity
, price)
VALUES
(1, 1, "Fairtrade Bananas", 1, 1.90),
(2, 2, "Meridian Crunchy Peanut Butter", 1, 2.50),
(3, 3, "Green & Black's organic 85% dark chocolate bar", 2, 1.99),
(4, 4, "Activia fat free cherry yogurts", 3, 2.00),
(5, 5, "Green & Blacks Organic Chocolate Ice Cream", 2, 4.20);
Now calculate the price of each purchase:
SELECT
name
, quantity * price AS total
FROM groceries
/*The output:
Fairtrade Bananas 1.9
Meridian Crunchy Peanut Butter 2.5
Green & Black's organic 85% dark chocolate bar 3.98
Activia fat free cherry yogurts 6
Green & Blacks Organic Chocolate Ice Cream 8.4
*/
Using functions
Functions can be used directly. See the list of SQL functions .
For example, the ROUND
function allows you to round numeric functions:
SELECT
name
, ROUND(quantity * price) AS rounded_total
FROM groceries
/*The output:
Fairtrade Bananas 2
Meridian Crunchy Peanut Butter 3
Green & Black's organic 85% dark chocolate bar 4
Activia fat free cherry yogurts 6
Green & Blacks Organic Chocolate Ice Cream 8
*/
Manipulating text columns
String functions allow you to perform different operations on text. For example, the CONCAT
function allows combining multiple values.
Let’s first create a customer table:
CREATE TABLE customers (
_key STRING
, first_name STRING
, last_name STRING
)
FORMAT(string, Avro);
INSERT INTO customers(_key, first_name, last_name)
VALUES
("mikejones", "Mike", "Jones"),
("anasmith", "Ana", "Smith"),
("shannonelliott", "Shannon","Elliott");
Next query combines first and last names into full names:
SELECT
CONCAT(first_name, " ", last_name) AS name
FROM customers
/* Output
Mike Jones
Ana Smith
Shannon Elliott
*/
Filtering with WHERE
WHERE
clause allows you to define a set of logical predicates the data needs to match in order to be returned.
Standard comparison operators are supported (>
, >=
, <
, <=
, =
, and !=
) as well as calling functions.
We are going to use the groceries
table created earlier. Select all items purchased where the prices are greater or equal to 2.00:
SELECT
name
, price
FROM groceries
WHERE price >= 2.0
/* Output
Meridian Crunchy Peanut Butter 2.5
Activia fat free cherry yogurts 2
Green & Blacks Organic Chocolate Ice Cream 4.2
*/
Select all customers whose last name length equals to 5:
SELECT *
FROM customers
WHERE LEN(last_name) = 5
/* Output
key value.first_name value.last_name
mikejones Mike Jones
anasmith Ana Smith
*/
Search all customers containing Ana
in their first name:
SELECT *
FROM customers
WHERE first_name LIKE '%Ana%'
Keep in mind that text search is case sensitive. To use case insensitive text search, you can write:
SELECT *
FROM customers
WHERE LOWERCASE(first_name) LIKE '%ana%';
-- And here is the negated version
SELECT *
FROM customers
WHERE LOWERCASE(first_name) NOT LIKE '%ana%';
Missing values
Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS [ NOT ] NULL, or EXISTS functions allows you to check for these situations.
Exists
is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is `````.
Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.
Create the following table named customers_json
:
CREATE TABLE customers_json (
_key STRING
, first_name STRING
, last_name STRING
, middle_name STRING
) FORMAT(string, json);
INSERT INTO customers_json(_key, first_name, last_name, middle_name) VALUES("mikejones", "Mike", "Jones", "Albert");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("anasmith", "Ana", "Smith");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("shannonelliott", "Shannon","Elliott");
Query this table for all its entries:
SELECT *
FROM customers_json
/* The output
key value.first_name value.middle_name value.last_name
mikejones Mike Albert Jones
anasmith Ana Smith
shannonelliott Shannon Elliott
*/
The middle_name
is only present on the mikejones
record.
Write a query which filters out records where middle_name
is not present:
SELECT *
FROM customers_json
WHERE `EXISTS`(middle_name)
/* The output
key value.first_name value.middle_name value.last_name
mikejones Mike Albert Jones
*/
This can also be written as:
SELECT *
FROM customers_json
WHERE middle_name IS NOT NULL
When a field is actually ``NULL`` or is missing, checking like in the below query has the same outcome.
SELECT *
FROM customers_json
WHERE middle_name IS NULL
Multiple WHERE conditions
You can use AND/OR to specify complex conditions for filtering your data.
To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:
SELECT *
FROM groceries
WHERE quantity > 1
AND price > 2
Now try changing the AND logical operand to OR and see the differences in output.
Limit the output
To limit the output of the query you can use two approaches:
- use the
LIMIT
clause - set the max size of the data to be returned
-- limit to 1 record
SELECT *
FROM groceries
LIMIT 1
-- set the data size returned to be 1 megabyte.
SET max.size="1m";
-- on the small dataset we have here 1 MB will accommodate all records added and more
SELECT *
FROM groceries
Sampling data
To sample data and discard the first rows:
SELECT *
FROM groceries
LIMIT 1,2
This statement instructs Lenses to skip the first record matched and then sample the next two.
Read a table partition only
To select data from a specific partition access the meta data of the topic.
In the following example, a table is created with three partitions and the message key is hashed and then the remainder from HashValue % partitions
will be the table partition the record is sent to.
-- Run
CREATE TABLE customers_partitioned (
_key STRING
, first_name STRING
, last_name STRING
)
FORMAT(string, Avro)
properties(partitions = 3);
INSERT INTO customers_partitioned(
_key
, first_name
, last_name)
VALUES
("mikejones", "Mike", "Jones"),
("anasmith", "Ana", "Smith"),
("shannonelliott", "Shannon","Elliott"),
("tomwood", "Tom","Wood"),
("adelewatson", "Adele","Watson"),
("mariasanchez", "Maria", "Sanchez");
Next, run the following query:
SELECT *
FROM customers_partitioned
/* The output
offset partition timestamp key value.first_name value.last_name
0 0 1540830780401 mikejones Mike Jones
1 0 1540830780441 anasmith Ana Smith
2 0 1540830780464 shannonelliott Shannon Elliott
0 2 1540831270170 mariasanchez Maria Sanchez
0 1 1540830984698 tomwood Tom Wood
1 1 1540831183308 adelewatson Adele Watson
*/
As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:
-- selects only records from partition = 0
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0;
-- selects only records from partition 0 and 2
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0
OR _meta.partition = 2;
Count the records
Using the COUNT
aggregate function you can count the records in a table. Run the following SQL to see how many records we have on the customers_partitioned
:
SELECT
COUNT(*) AS total
FROM customers_partitioned
Use SUM to aggregate your amounts
Using the SUM
function you can sum records in a table.
SELECT
SUM(quantity * price) AS amount
FROM groceries
Group data with GROUP BY
To group data use the GROUP BY
clause:
CREATE TABLE customer (
firstName STRING
, lastName STRING
, city STRING
, country STRING
, phone STRING)
FORMAT(string, avro)
properties(compacted=true);
INSERT INTO customer (
_key
,firstName
, lastName
, city
, country
, phone)
VALUES
('1','Craig', 'Smith', 'New York', 'USA', '1-01-993 2800'),
('2','William', 'Maugham','Toronto','Canada','+1-613-555-0110'),
('3','William', 'Anthony','Leeds','UK', '+44 1632 960427'),
('4','S.B.','Swaminathan','Bangalore','India','+91 7513200000'),
('5','Thomas','Morgan','Arnsberg','Germany','+49-89-636-48018'),
('6','Thomas', 'Merton','New York','USA', '+1-617-555-0147'),
('7','Piers','Gibson','London','UK', '+44 1632 960269'),
('8','Nikolai','Dewey','Atlanta','USA','+1-404-555-0178'),
('9','Marquis', 'de Ellis','Rio De Janeiro','Brazil','+55 21 5555 5555'),
('10','Joseph', 'Milton','Houston','USA','+1-202-555-0153'),
('11','John','Betjeman Hunter','Sydney','Australia','+61 1900 654 321'),
('12','Evan', 'Hayek','Vancouver','Canada','+1-613-555-0130'),
('13','E.','Howard','Adelaide','Australia','+61 491 570 157'),
('14','C. J.', 'Wilde','London','UK','+44 1632 960111'),
('15','Butler', 'Andre','Florida','USA','+1-202-555-0107');
Let’s see how many customers there are from each country. Here is the code which computes that:
SELECT
COUNT(*) AS count
, country
FROM customer
GROUP BY country
/* Output
2 Canada
3 UK
2 Australia
1 India
1 Brazil
5 USA
1 Germany
*/
Nested queries
The tabled query allows you to nest queries. Let us take the query in the previous section and say we are only interested in those entries where there exist more than 1 customer per country.
SELECT *
FROM (
SELECT
COUNT(*) AS count
, country
FROM customer
GROUP BY country
)
WHERE count > 1
Run the query, and you will only see those entries for which there is more than one person registered per country.
Filtering grouped data with HAVING
To filter the entries returned from a grouping query. As with the WHERE
statement, you can use HAVING
syntax to achieve the same result when it comes to grouped queries.
SELECT
COUNT(*) AS count
, country
FROM customer
GROUP BY country
HAVING count > 1
Joins
Lenses allows you to combine records from two tables. A query can contain zero, one or multiple JOIN operations.
Create an orders
table and insert some data into it:
CREATE TABLE orders(
_key INT
, orderDate STRING
, customerId STRING
, amount DOUBLE
)
FORMAT(int, avro);
INSERT INTO orders (
_key
, orderDate
, customerId
, amount
)
VALUES
(1, '2018-10-01', '1', 200.50),
(2, '2018-10-11', '1', 813.00),
(3, '2018-10-11', '3', 625.20),
(4, '2018-10-11', '14', 730.00),
(5, '2018-10-11', '10', 440.00),
(6, '2018-10-11', '9', 444.80);
With these tables in place, join them to get more information about an order by combining it with the customer information found in the customer
table:
SELECT
o._key AS orderNumber
, o.amount AS totalAmount
, c.firstName
, c.lastName
, c.city
, c.country
FROM orders o INNER JOIN customer c
ON o.customerId = c._key
/*
city orderNumber country totalAmount lastName firstName
New York 1 USA 200.5 Smith Craig
New York 2 USA 813 Smith Craig
Leeds 3 UK 625.2 Anthony William
Rio De Janeiro 6 Brazil 444.8 de Ellis Marquis
Houston 5 USA 440 Milton Joseph
London 4 UK 730 Wilde C. J.
*/
Lateral Joins
With lateral joins, Lenses allows you to combine records from a table with the elements of an array expression.
We are going to see in more details what lateral joins are with an example.
Create a batched_readings
table and insert some data into it:
CREATE TABLE batched_readings(
meter_id int
, readings int[]
)
FORMAT(int, AVRO);
INSERT INTO batched_readings(
meter_id
, readings
) VALUES
(1, [100, 80, 95, 91]),
(2, [87, 93, 100]),
(1, [88, 89, 92, 94]),
(2, [81])
You now can use a LATERAL join to inspect, extract and filter the single elements of the readings
array, as if they were a normal field:
SELECT
meter_id
, reading
FROM
batched_readings
LATERAL readings AS reading
WHERE
reading > 90
Running that query we will get the values:
meter_id | reading |
---|---|
1 | 100 |
1 | 95 |
1 | 91 |
2 | 93 |
1 | 92 |
1 | 94 |
You can use multiple LATERAL
joins, one inside the other, if you want to extract elements from a nested array:
CREATE TABLE batched_readings_nested(
sensor_id int
, nested_readings int[][]
)
FORMAT(int, AVRO);
INSERT INTO batched_readings_nested(
sensor_id
, nested_readings
) VALUES
(1, [[100, 101], [103]]),
(2, [[80, 81], [82, 83, 82]]),
(1, [[100], [103, 102], [104]])
Running the following query we will obtain the same records of the previous example:
SELECT
meter_id
, reading
FROM
batched_readings
LATERAL nested_readings AS readings
LATERAL readings as reading
WHERE
reading > 90