Elastic

A set of Kafka Connect sink connectors for writing records from Kafka to Elastic.

Requires:

Elastic 6 or
Elastic 7+

KCQL support

The following KCQL is supported:

INSERT | UPSERT
INTO <elastic_index >
SELECT FIELD, ...
FROM kafka_topic
[PK FIELD,...]
[WITHDOCTYPE=<your_document_type>]
[WITHINDEXSUFFIX=<your_suffix>]
[PROPERTIES (...)]

Examples:

-- Insert mode, select all fields from topicA and write to indexA
INSERT INTO indexA SELECT * FROM topicA

-- Insert mode, select 3 fields and rename from topicB
-- and write to indexB
INSERT INTO indexB SELECT x AS a, y, zc FROM topicB PK y

-- UPSERT
UPSERT INTO indexC SELECT id, string_field FROM topicC PK id

Kafka Tombstone Handling

It is possible to configure how the Connector handles a null value payload (called Kafka tombstones). Please use the behavior.on.null.values property in your KCQL with one of the possible values:

IGNORE (ignores tombstones entirely)
FAIL (throws Exception if tombstone happens)
DELETE (deletes index with specified id)

Example:

INSERT INTO indexA SELECT * FROM topicA PROPERTIES ('behavior.on.null.values'='IGNORE')

Concepts

Primary Keys

The PK keyword can be used to specify the fields which will be used for the key value. The field values will be concatenated and separated by a -. If no fields are set the topic name, partition and message offset are used.

Document Type

WITHDOCTYPE allows you to associate a document type to the document inserted.

Index Suffix

WITHINDEXSUFFIX allows you to specify a suffix to your index and we support date format.

Example:

WITHINDEXSUFFIX=_suffix_{YYYY-MM-dd}

Index Names

1. Static Index Names

To use a static index name, define the target index in the KCQL statement without any prefixes:

INSERT INTO index_name SELECT * FROM topicA

This will consistently create an index named index_name for any messages consumed from topicA.

2. Extracting Index Names from Headers, Keys, and Values

2.1. Headers

To extract an index name from a message header, use the _header prefix followed by the header name:

INSERT INTO _header.gate SELECT * FROM topicA

This statement extracts the value from the gate header field and uses it as the index name.

For headers with names that include dots, enclose the entire target in backticks (```) and each segment which consists of a field name in single quotes ('):

INSERT INTO `_header.'prefix.abc.suffix'` SELECT * FROM topicA

In this case, the value of the header named prefix.abc.suffix is used to form the index name.

2.2. Keys

To use the full value of the message key as the index name, use the _key prefix:

INSERT INTO _key SELECT * FROM topicA

For example, if the message key is "freddie", the resulting index name will be freddie.

2.3. Values

To extract an index name from a field within the message value, use the _value prefix followed by the field name:

INSERT INTO _value.name SELECT * FROM topicA

This example uses the value of the name field from the message’s value. If the field contains "jason", the index name will be jason.

Nested Fields in Values

To access nested fields within a value, specify the full path using dot notation:

INSERT INTO _value.name.firstName SELECT * FROM topicA

If the firstName field is nested within the name structure, its value (e.g., "hans") will be used as the index name.

Fields with Dots in Their Names

For field names that include dots, enclose the entire target in backticks (```) and each segment which consists of a field name in single quotes ('):

INSERT INTO `_value.'customer.name'.'first.name'` SELECT * FROM topicA

If the value structure contains:

{
  "customer.name": {
    "first.name": "hans"
  }
}

The extracted index name will be hans.

Error polices

The connector supports Error polices.

Auto Index Creation

The Sink will automatically create missing indexes at startup.

Please note that this feature is not compatible with index names extracted from message headers/keys/values.

Quickstart

Launch the stack

Copy the docker-compose file.
Bring up the stack.

For the Elasticsearch 6 Connector:

export CONNECTOR=elastic6
docker-compose up -d elastic

For the Elasticsearch 7+ Connector:

export CONNECTOR=elastic7
docker-compose up -d elastic

Start the connector

If you are using Lenses, login into Lenses and navigate to the connectors page, select Elastic as the sink and paste the following:

For the Elasticsearch 6 Connector:

name=elastic
connector.class=io.lenses.streamreactor.connect.elastic6.ElasticSinkConnector
tasks.max=1
topics=orders
connect.elastic.protocol=http
connect.elastic.hosts=elastic
connect.elastic.port=9200
connect.elastic.cluster.name=elasticsearch
connect.elastic.kcql=INSERT INTO orders SELECT * FROM orders
connect.progress.enabled=true

For the Elasticsearch 7+ Connector:

name=elastic
connector.class=io.lenses.streamreactor.connect.elastic7.ElasticSinkConnector
tasks.max=1
topics=orders
connect.elastic.protocol=http
connect.elastic.hosts=elastic
connect.elastic.port=9200
connect.elastic.cluster.name=elasticsearch
connect.elastic.kcql=INSERT INTO orders SELECT * FROM orders
connect.progress.enabled=true

To start the connector using the command line, log into the lenses-box container:

docker exec -ti lenses-box /bin/bash

and create a connector.properties file containing the properties above.

Create the connector, with the connect-cli:

connect-cli create elastic < connector.properties

Wait for the connector to start and check it’s running:

connect-cli status elastic

Inserting test data

In the to lenses-box container start the kafka producer shell:

kafka-avro-console-producer \
 --broker-list localhost:9092 --topic orders \
 --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"id","type":"int"},{"name":"created","type":"string"},{"name":"product","type":"string"},{"name":"price","type":"double"}, {"name":"qty", "type":"int"}]}'

the console is now waiting for your input, enter the following:

{
  "id": 1,
  "created": "2016-05-06 13:53:00",
  "product": "OP-DAX-P-20150201-95.7",
  "price": 94.2,
  "qty": 100
}

Check for data in Elastic

 docker exec -ti elastic bin/elasticsearch-sql-cli

Run the following:

 SELECT * FROM orders;

Clean up

Bring down the stack:

docker-compose down

Options

Name	Description	Type	Default Value
`connect.elastic.protocol`	URL protocol (http, https)	string	http
`connect.elastic.hosts`	List of hostnames for Elastic Search cluster node, not including protocol or port.	string	localhost
`connect.elastic.port`	Port on which Elastic Search node listens on	string	9300
`connect.elastic.tableprefix`	Table prefix (optional)	string
`connect.elastic.cluster.name`	Name of the elastic search cluster, used in local mode for setting the connection	string	elasticsearch
`connect.elastic.write.timeout`	The time to wait in millis. Default is 5 minutes.	int	300000
`connect.elastic.batch.size`	How many records to process at one time. As records are pulled from Kafka it can be 100k+ which will not be feasible to throw at Elastic search at once	int	4000
`connect.elastic.use.http.username`	Username if HTTP Basic Auth required default is null.	string
`connect.elastic.use.http.password`	Password if HTTP Basic Auth required default is null.	string
`connect.elastic.error.policy`	Specifies the action to be taken if an error occurs while inserting the data There are two available options: NOOP - the error is swallowed THROW - the error is allowed to propagate. RETRY - The exception causes the Connect framework to retry the message. The number of retries is based on The error will be logged automatically	string	THROW
`connect.elastic.max.retries`	The maximum number of times to try the write again.	int	20
`connect.elastic.retry.interval`	The time in milliseconds between retries.	int	60000
`connect.elastic.kcql`	KCQL expression describing field selection and routes.	string
`connect.elastic.pk.separator`	Separator used when have more that one field in PK	string	-
`connect.progress.enabled`	Enables the output for how many records have been processed	boolean	false

SSL Configuration Properties

Property Name	Description
`ssl.truststore.location`	Path to the truststore file containing the trusted CA certificates for verifying broker certificates.
`ssl.truststore.password`	Password for the truststore file to protect its integrity.
`ssl.truststore.type`	Type of the truststore (e.g., `JKS`, `PKCS12`). Default is `JKS`.
`ssl.keystore.location`	Path to the keystore file containing the client’s private key and certificate chain for client authentication.
`ssl.keystore.password`	Password for the keystore to protect the private key.
`ssl.keystore.type`	Type of the keystore (e.g., `JKS`, `PKCS12`). Default is `JKS`.
`ssl.protocol`	The SSL protocol used for secure connections (e.g., `TLSv1.2`, `TLSv1.3`). Default is `TLS`.
`ssl.keymanager.algorithm`	Algorithm used by the KeyManager to manage certificates. Default value is the key manager factory algorithm configured for the Java Virtual Machine.
`ssl.trustmanager.algorithm`	Algorithm used by the TrustManager to manage certificates. Default value is the key manager factory algorithm configured for the Java Virtual Machine.

KCQL Properties

Name	Description	Type	Default Value
`behavior.on.null.values`	Specifies behavior on Kafka tombstones: `IGNORE` , `DELETE` or `FAIL`	string	IGNORE

SSL Configuration

Enabling SSL connections between Kafka Connect and Elasticsearch ensures that the communication between these services is secure, protecting sensitive data from being intercepted or tampered with. SSL (or TLS) encrypts data in transit, verifying the identity of both parties and ensuring data integrity.

While newer versions of Elasticsearch have SSL enabled by default for internal communication, it’s still necessary to configure SSL for client connections, such as those from Kafka Connect. Even if Elasticsearch has SSL enabled by default, Kafka Connect still needs these configurations to establish a secure connection. By setting up SSL in Kafka Connect, you ensure:

Data encryption: Prevents unauthorized access to data being transferred.
Authentication: Confirms that Kafka Connect and Elasticsearch are communicating with trusted entities.
Compliance: Meets security standards for regulatory requirements (such as GDPR or HIPAA).

Configuration Example

ssl.truststore.location=/path/to/truststore.jks
ssl.truststore.password=your_truststore_password
ssl.truststore.type=JKS  # Can also be PKCS12 if applicable

ssl.keystore.location=/path/to/keystore.jks
ssl.keystore.password=your_keystore_password
ssl.keystore.type=JKS  # Can also be PKCS12 if applicable

ssl.protocol=TLSv1.2  # Or TLSv1.3 for stronger security

ssl.trustmanager.algorithm=PKIX  # Default algorithm for managing certificates

Terminology:

Truststore: Holds certificates to check if the node’s certificate is valid.
Keystore: Contains your client’s private key and certificate to prove your identity to the node.
SSL Protocol: Use TLSv1.2 or TLSv1.3 for up-to-date security.
Password Security: Protect passwords by encrypting them or using secure methods like environment variables or secret managers.

Last modified: November 18, 2024