Breaking Configuration Change Configuration for the HTTP Sink changes from Json to standard Kafka Connect properties. Please study the documentation if upgrading to 8.0.0.
behavior.on.null.values
defaultUploadSyncPeriod
Dependency version upgrades
INSERT INTO foo SELECT * FROM bar NOPARTITION
This release introduces a new configuration option for three Kafka Connect Sink Connectors—S3, Data Lake, and GCP Storage—allowing users to disable exactly-once semantics. By default, exactly once is enabled, but with this update, users can choose to disable it, opting instead for Kafka Connect’s native at-least-once offset management.
S3 Sink Connector: connect.s3.exactly.once.enable Data Lake Sink Connector: connect.datalake.exactly.once.enable GCP Storage Sink Connector: connect.gcpstorage.exactly.once.enable
connect.s3.exactly.once.enable
connect.datalake.exactly.once.enable
connect.gcpstorage.exactly.once.enable
Default Value: true
true
Indexing is enabled by default to maintain exactly-once semantics. This involves creating an .indexes directory at the root of your storage bucket, with subdirectories dedicated to tracking offsets, ensuring that records are not duplicated.
Users can now disable indexing by setting the relevant property to false. When disabled, the connector will utilise Kafka Connect’s built-in offset management, which provides at-least-once semantics instead of exactly-once.
false
Introduced new properties to allow users to filter the files to be processed based on their extensions.
For AWS S3 Source Connector:
connect.s3.source.extension.excludes
connect.s3.source.extension.includes
For GCP Storage Source Connector:
connect.gcpstorage.source.extension.excludes
connect.gcpstorage.source.extension.includes
These properties provide more control over the files that the AWS S3 Source Connector and GCP Storage Source Connector process, improving efficiency and flexibility. For more information, refer to the updated documentation for AWS S3 or GCP Storage.
max.poll.timeout
Adding retry delay multiplier as a configurable parameter (with default value) to Google Cloud Storage Connector. Main changes revolve around RetryConfig class and its translation to gax HTTP client config.
Making indexes directory configurable for both source and sink.
Use the below properties to customise the indexes root directory:
connect.datalake.indexes.name
connect.gcpstorage.indexes.name
connect.s3.indexes.name
See connector documentation for more information.
connect.datalake.source.partition.search.excludes
connect.gcpstorage.source.partition.search.excludes
connect.s3.source.partition.search.excludes
Full Changelog: https://github.com/lensesio/stream-reactor/compare/7.3.2...7.4.0
INSERT INTO bucket SELECT * FROM `*` ...
When * is used the envelope setting is ignored.
*
This change allows for the * to be taken into account as a default if the given message topic is not found.
Bug fix to ensure that, if specified as part of the template, the Content-Type header is correctly populated.
Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.
Full Changelog: https://github.com/lensesio/stream-reactor/compare/7.1.0...7.2.0
We’ve rolled out enhancements to tackle a common challenge faced by users of the S3 source functionality. Previously, when an external producer abruptly terminated a file without marking the end message, data loss occurred.
To address this, we’ve introduced a new feature: a property entry for KCQL to signal the handling of unterminated messages. Meet the latest addition, read.text.last.end.line.missing. When set to true, this property ensures that in-flight data is still recognized as a message even when EOF is reached but the end line marker is missing.
This release brings substantial enhancements to the data-lakes sink connectors, elevating their functionality and flexibility. The focal point of these changes is the adoption of the new KCQL syntax, designed to improve usability and resolve limitations inherent in the previous syntax.
\
/
Several keywords have been replaced with entries in the PROPERTIES section for improved clarity and consistency:
PROPERTIES ('partition.include.keys'=true/false)
WITHPARTITIONER KeysAndValue
PROPERTIES ('flush.size'=$VALUE)
PROPERTIES ('flush.count'=$VALUE)
PROPERTIES ('flush.interval'=$VALUE)
The adoption of the new KCQL syntax enhances the flexibility of the data-lakes sink connectors, empowering users to tailor configurations more precisely to their requirements. By transitioning keywords to entries in the PROPERTIES section, potential misconfigurations stemming from keyword order discrepancies are mitigated, ensuring configurations are applied as intended.
Please note that the upgrades to the data-lakes sink connectors are not backward compatible with existing configurations. Users are required to update their configurations to align with the new KCQL syntax and PROPERTIES entries. This upgrade is necessary for any instances of the sink connector (S3, Azure, GCP) set up before version 7.0.0.
To upgrade to the new version, users must follow these steps:
This update specifically affects datalake sinks employing the JSON storage format. It serves as a remedy for users who have resorted to a less-than-ideal workaround: employing a Single Message Transform (SMT) to return a Plain Old Java Object (POJO) to the sink. In such cases, instead of utilizing the Connect JsonConverter to seamlessly translate the payload to JSON, reliance is placed solely on Jackson.
However, it’s crucial to note that this adjustment is not indicative of a broader direction for future expansions. This is because relying on such SMT practices does not ensure an agnostic solution for storage formats (such as Avro, Parquet, or JSON).
Please note that the Elasticsearch 6 connector will be deprecated in the next major release.
connect.s3.partition.search.interval
connect.s3.source.partition.search.interval
connect.s3.partition.search.continuous
connect.s3.source.partition.search.continuous
connect.s3.partition.search.recurse.levels
connect.s3.source.partition.search.recurse.levels
cloud-common
The source and sink has been the focus of this release.
PROPERTIES('store.envelope'=true)
STOREAS BYTES
PROPERTIES
INSERT INTO ... SELECT ... FROM ... PROPERTIES(property=key, ...)
PARTITIONBY a, `field1.field2`
For installations that have been using the preview version of the S3 connector and are upgrading to the release, there are a few important considerations:
Previously, default padding was enabled for both “offset” and “partition” values starting in June.
However, in version 5.0, the decision to apply default padding to the “offset” value only, leaving the " partition" value without padding. This change was made to enhance compatibility with querying in Athena.
If you have been using a build from the master branch since June, your connectors might have been configured with a different default padding setting.
To maintain consistency and ensure your existing connector configuration remains valid, you will need to use KCQL configuration properties to customize the padding fields accordingly.
INSERT INTO $bucket[:$prefix] SELECT * FROM $topic ... PROPERTIES( 'padding.length.offset'=12, 'padding.length.partition'=12 )
Starting with version 5.0.0, the following configuration keys have been replaced.
In version 4.1, padding options were available but were not enabled by default. At that time, the default padding length, if not specified, was set to 8 characters.
However, starting from version 5.0, padding is now enabled by default, and the default padding length has been increased to 12 characters.
Enabling padding has a notable advantage: it ensures that the files written are fully compatible with the Lenses Stream Reactor S3 Source, enhancing interoperability and data integration.
Sinks created with 4.2.0 and 4.2.1 should retain the padding behaviour, and, therefore should disable padding:
INSERT INTO $bucket[:$prefix] SELECT * FROM $topic ... PROPERTIES ( 'padding.type'=NoOp )
If padding was enabled in 4.1, then the padding length should be specified in the KCQL statement:
INSERT INTO $bucket[:$prefix] SELECT * FROM $topic ... PROPERTIES ( 'padding.length.offset'=12, 'padding.length.partition'=12 )
STOREAS Bytes_***
The Bytes_*** storage format has been removed. If you are using this storage format, you will need to install the 5.0.0-deprecated connector and upgrade the connector instances by changing the class name:
Source Before:
class.name=io.lenses.streamreactor.connect.aws.s3.source.S3SourceConnector ...
Source After:
class.name=io.lenses.streamreactor.connect.aws.s3.source.S3SourceConnectorDeprecated ...
Sink Before:
class.name=io.lenses.streamreactor.connect.aws.s3.sink.S3SinkConnector ...
Sink After:
class.name=io.lenses.streamreactor.connect.aws.s3.sink.S3SinkConnectorDeprecated connect.s3.padding.strategy=NoOp ...
The deprecated connector won’t be developed any further and will be removed in a future release. If you want to talk to us about a migration plan, please get in touch with us at sales@lenses.io.
To migrate to the new configuration, please follow the following steps:
All
AWS S3 Sink Connector
AWS S3 Source Connector
MQTT Source Connector
MQTT Sink Connector
Elastic6 & Elastic7 Sink Connectors
FTP Source Connector
Hazelcast Sink Connector
Hive Sink Connector
JMS Connector
Pulsar
Cassandra Sink Connector
HBase Sink Connector
Cassandra Sink & Source Connectors
FTP Connector
Influx DB Sink
MongoDB Sink Connector
Redis Sink Connector
Hive Source
connect.hive.hive.metastore
connect.hive.metastore
connect.hive.hive.metastore.uris
connect.hive.metastore.uris
Fix Elastic start up NPE
Fix to correct batch size extraction from KCQL on Pulsar
Secret provider
Secret provider release notes
Package Name Changes
Package name changes
On this page