Data observibility

Data observibility

This guide details exploring and querying data in Kafka or other connected data sources.

Lenses creates an exploration view for your Kafka data, by collecting information about metrics, data types, schemas, metadata, policies and configurations and creates a powerful catalog where you can cross search among topics and navigate to the data itself.

As Kafka gets more and more integrated with other data sources, we also support popular data systems that typically go hand in hand in many use cases: Elasticsearch and PostgreSQL. Your streaming experience gets augmented while making it easier to validate data before and after or even while it’s on the way.

Required permission

Permission	Type	Description
(Per data source)	Namespace	Permissions are available per connection per namespace ie. View Topics to `transactions*`

Data sources integrating with the catalog, are governed by granular permissions subject to namespaces. Like this you can shape the data catalog based on your teams or access constraints.

Access Management & permissions

Explore datasets

To discover data navigate to the Explore page where datasets are listed. By default your Kafka connection is pre-selected so topics of your cluster will be available.

Filters

The filter contains data sources and tags:

Data source filter is an accumulative filter ( logical OR )
Tags filter is an aggregated union ( logical AND )

Selected filters will apply and appear on page, and when removed they apply instantly.

Search

The supported search terms are:

Dataset name ie. the topic name, table name etc.
Schema fields you can search for specific fields from the datasets schemas ie. credit-card. When the field is subject to a data policy, which means is going to be masked, it gets annotated.
Schema fields description particularly common on AVRO based schemas.

Include/Exclude from search

System datasets are the datasets used for configuration or state ie. for Kafka Connect the config topics are flagged as system topics. System topics are not visible or included in search by default, and they are configurable via Lenses configuration.
Schema fields & descriptions can also be excluded from the default search

Keyword matching

Keyword matching is not case-sensitive. For example if you search for info it will match both info and INFO for the related terms.

Display settings

The visible columns of the catalog are defaulting based on the relevant data sources selected. You can enable or disable columns using the table cog.

Catalog refreshes

The catalog updates its cache every 30 seconds. The default interval can be changed via the configuration.

Schemas, fields & types

For every dataset Lenses maintains the schema of the data in order to enable field level capabilities (ie.search for fields, apply field-level masking policies, autocompletion etc) but also to read and visualize the data. For Kafka topics in particular, Lenses can also work with Schema Registry to load the schema from. Lenses will try to automatically identify the schema for each dataset and as a user you can edit and fix any auto-detection inconsistencies.

View data

You can view the data of a dataset by either preview with quick filters or by running SQL queries in SQL Studio.

FAQ

Which are the supported data sources?

Supported sources are Kafka (which is also default), Elasticsearch and PostgreSQL.

How often the data catalog gets updated?

The catalog updates its cache every 30 seconds. You can override the default interval in the configuration.

I’ve added a connection, but I can’t see it in the catalog

If you are not using the default admin user, then you will need to add the connection to the groups of users to make it visible.

Can Lenses pick up schemas from Kafka schema registry?

Yes, Lenses supports schema registry for AVRO topics and integrates the latest version of the subject with the data catalog. See data types and schemas.

Can I save the filters?

Not as part of this version.

Are the metrics values live?

For the different data sources we collect different metrics. For example, for Kafka we showcase messages / second. Currently, in the catalog, the values are not live, they update as per our configuration intervals to update the cache or poll information from the underlying system. You can get though a high level view on how the topic behaves. Navigate to the details to get the latest view.

Compacted topics don’t show number of records

Currently, compacted topics records size is not cacluclated for the catalog. It appears as N/A

Is the keyword matcher case-sensitive?

No, when we match a keyword, we match without checking for case sensitivity. Meaning, that if you search for info, it will match both info and INFO for all the corresponding properties.