Spark Structured MetricsΒΆ
If you have an application written using Spark Structured Streams that consumes messages from Kafka, then the topology client can publish consumer metrics using a custom Format
.
You will require the following maven module:
<dependency>
<groupId>com.landoop</groupId>
<artifactId>lenses-topology-client-kafka-spark</artifactId>
<version>1.0.0</version>
</dependency>
Then you will need to make two small changes to the code that creates the streaming Dataset
. The first one is to use the lenses-kafka
format rather than the default kafka
format. This is a slightly modified version of the Kafka source that retains a handle to the
KafkaConsumer
and publishes metrics exposed by that consumer.
The second change is to add an option that informs the metrics publisher the name of the topology and the topics we are interested in.
Dataset<Row> words = spark
.readStream()
// the next line switches to the modified kafka format
.format("lenses-kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
// the next line sets the topology name and topics so the metrics publisher knows which topics we are interested in
.option("kafka.lenses.topology.description", topology.getDescription())
.option("subscribe", inputTopic)
.load();
And that is all that is required. The Dataset
can now be used as you would in any other application.
Note: When you create the topology instance, set the app type to AppType.SparkStreaming
so that the UI renders with the correct visualization.