When working with Apache Kafka you might want to write data from a Kafka topic to a local text file. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a topic’s content to a local text file requires only a few simple steps.
Table of Contents
Starting Kafka and Zookeeper
The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.
Creating a Topic to Write to
Creating a topic from the command line is very easy to do. In this example we create the
$KAFKA_HOME/bin/kafka-topics.sh \ --create \ --zookeeper localhost:2181 \ --replication-factor 1 \ --partitions 1 \ --topic my-connect-test
Creating a Sink Config File
Since we are reading from a Kafka topic and writing to a local text file, this file is considered our “sink”. Therefore we will use the
FileSink connector. We must create a configuration file to use with this connector. For the most part you can copy the example available in
$KAFKA_HOME/config/connect-file-sink.properties Below is an example of our
#my-file-sink.properties config file name=local-file-sink connector.class=FileStreamSink tasks.max=1 file=/tmp/my-file-sink.txt topics=my-connect-test
This file indicates that we will use the
FileStreamSinkconnector class, read data from the
my-connect-test Kafka topic, and write records to
/tmp/my-file-sink.txt. We are also only using 1 task to read this data from Kafka.
Creating a Worker Config File
Processes that execute Kafka Connect connectors and tasks are called
workers. In this example we can use the simpler of the two worker types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in
$KAFKA_HOME/config/connect-standalone.properties. We will call our file
# my-standalone.properties worker config file #bootstrap kafka servers bootstrap.servers=localhost:9092 # specify input data format key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # The internal converter used for offsets, most will always want to use the built-in default internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false # local file storing offsets and config data offset.storage.file.filename=/tmp/connect.offsets
The main change in this example in comparison to the default is the
value.converter settings. Since our data is simple text, we use the
Writing Data to a Kafka Topic
We now need to write some sample data to our Kafka topic. This can easily be done with the
kafka-console-producer which takes data from STDIN and writes to Kafka.
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-connect-test writing line 1 writing line 2 writing line 3
Running Kafka Connect
Now it is time to run Kafka Connect with our worker and sink configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom configuration files:
$KAFKA_HOME/bin/connect-standalone.sh my-standalone.properties my-file-sink.properties
At this point the all data available in the Kafka topic should be written to our local text file. We can confirm this by reading the file contents.
#print contents of local sink file cat /tmp/my-file-sink.txt writing line 1 writing line 2 writing line 3