Text to JSON¶
In this page you will learn how to convert plain text log entries into JSON records using regular expressions.
Applying regex transformations to plain text data is a very common scenario especially when working with log files in order to filter out important information. At the same time, this will convert the format of the log file data into JSON records in order to have a more structured schema to work with.
The following SQL query will process the topic generated by the File Source
we created in the previous page - the name of that topic is var_log_broker
.
The SQL will be the following:
SET autocreate=true;
INSERT INTO log_files_as_json
WITH log_regex AS
(
SELECT STREAM REGEXP(content, '.*\[(.*)\] (.*)') AS regx
FROM var_log_broker
WHERE _vtype='STRING_AS_JSON'
)
SELECT STREAM
regx[1] AS time_date,
regx[2] AS the_text
FROM log_regex
WHERE regx IS NOT NULL
Notice that this query runs using an SQL Processor – SQL Processors are explained in the SQL Processors page. Its a long running process.
The regular expression used, which is .*\[(.*)\] (.*)
, breaks each log entry
into two parts. The first part is the time field of each log entry and the second
part contains the text of the log entry. So, we are going to convert each existing record into
a JSON record with two fields - the SELECT STREAM
query gets the desired
fields from the output of the regular expression and puts them into the
log_files_as_json
Kafka topic. Notice that the indexing in the array returned by the
regular expression begins with 1
, not 0
.
You can find more about Regular Expressions in Lenses SQL here.