When working with Apache Kafka you might want to write data from a Kafka topic to a local text file. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a topic’s content to a local text file requires only a few simple steps.

 

Starting Kafka and Zookeeper

The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.

 

Creating a Topic to Write to

Creating a topic from the command line is very easy to do. In this example we create the my-connect-test topic.

$KAFKA_HOME/bin/kafka-topics.sh \
  --create \
  --zookeeper localhost:2181 \
  --replication-factor 1 \
  --partitions 1 \
  --topic my-connect-test

 

Creating a Sink Config File

Since we are reading from a Kafka topic and writing to a local text file, this file is considered our “sink”. Therefore we will use the FileSink connector. We must create a configuration file to use with this connector. For the most part you can copy the example available in $KAFKA_HOME/config/connect-file-sink.properties Below is an example of our my-file-sink.properties file.

#my-file-sink.properties config file
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=/tmp/my-file-sink.txt
topics=my-connect-test

This file indicates that we will use the FileStreamSinkconnector class, read data from the my-connect-test Kafka topic, and write records to /tmp/my-file-sink.txt. We are also only using 1 task to read this data from Kafka.

 

Creating a Worker Config File

Processes that execute Kafka Connect connectors and tasks are called workers. In this example we can use the simpler of the two worker types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in $KAFKA_HOME/config/connect-standalone.properties. We will call our file my-standalone.properties.

# my-standalone.properties worker config file

#bootstrap kafka servers
bootstrap.servers=localhost:9092

# specify input data format
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

# The internal converter used for offsets, most will always want to use the built-in default
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

# local file storing offsets and config data
offset.storage.file.filename=/tmp/connect.offsets

The main change in this example in comparison to the default is the key.converter and value.converter settings. Since our data is simple text, we use the StringConverter types.

 

Writing Data to a Kafka Topic

We now need to write some sample data to our Kafka topic. This can easily be done with the kafka-console-producer which takes data from STDIN and writes to Kafka.

$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-connect-test
writing line 1
writing line 2
writing line 3

 

Running Kafka Connect

Now it is time to run Kafka Connect with our worker and sink configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom configuration files:

$KAFKA_HOME/bin/connect-standalone.sh my-standalone.properties my-file-sink.properties

At this point the all data available in the Kafka topic should be written to our local text file. We can confirm this by reading the file contents.

#print contents of local sink file
cat /tmp/my-file-sink.txt
writing line 1
writing line 2
writing line 3