I am new to kafka, so it could be a very simple for you experts. I hope this information could be helpful to someone like me. The title of this post sounds very high level, so please let me describe the problem first, which took me about 3 weeks to figure out.

Solo Kafka Server Example

Most of the Kafka tutorial, such as “learn Kafka in 5 mins”, they usually setup one Kafka server in a local network, even just in a virtual machine or in a local computer. This kind of tutorials are quite useful for most 99% of the case. But setting up a internet available kafka service is a little bit different from intranet case. First of all. let’s see a very simple kafka and zookeeper configuration, one node.

#kafka
broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
zookeeper.connect=127.0.0.1:2181

The above example is the simplest Kafka configuration on a local computer. The below one is the corresponding zookeeper configuration.

clientPort=2181

All important configuration are shown above. Others will be left by default. It’s enough to setup one node kafka service on a local computer.

Setup Internet Based Kafka Service with One Node

Now, let’s make it a little bit complicated. I will setup the Kafka service on a computer with internet capability. In this case, the kafak configuration will be different from above simple case.

#kafka
broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://101.101.101.101:9092
zookeeper.connect=127.0.0.1:2181

We remain all the configuration but add a new statement advertised.listeners=PLAINTEXT://101.101.101.101:9092. The IP 101.101.101.101 is the public IP which other message producer will be able to access it from internet. The zookeeper configuration will be 100% the same as above example.

clientPort=2181

All zookeeper configuration will use default value.

Setup Intranet Based Kafka Cluster with 3 Nodes

Before moving to setup a internet based Kafka Cluster with multiple nodes, let’s talk about a simpler case, setup a Kafka cluster in a local network. See the below to topology chart:
local kafka with 3 nodes

To make it easy to explain, just set the host name and local ip in /etc/hosts as following:

192.168.0.100 k1
192.168.0.101 k2
192.168.0.102 k3

The configuration for each Kafka node is different.

#kafka 1
broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://k1:9092
zookeeper.connect=k1:2181,k2:2181,k3:2181
log.dirs=/root/kafka/kafka-logs
num.partitions=12
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3

#kafka 2
broker.id=1
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://k2:9092
zookeeper.connect=k1:2181,k2:2181,k3:2181
log.dirs=/root/kafka/kafka-logs
num.partitions=12
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3

#kafka 3
broker.id=2
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://k3:9092
zookeeper.connect=k1:2181,k2:2181,k3:2181
log.dirs=/root/kafka/kafka-logs
num.partitions=12
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3

Please pay attention on the advertised.listeners and zookeeper.connect. Both of them are different from solo example. The difficult part of this example is in Zookeeper configuration. Please see the configuration first:

dataDir=/root/kafka/zookeeper/data
dataLogDir=/root/kafka/zookeeper/log
clientPort=2181

maxClientCnxns=100
tickTime=2000
initLimit=10
syncLimit=2

server.0=k1:2888:3888
server.1=k2:2888:3888
server.2=k3:2888:3888

The zookeeper configuration of 3 nodes are the same. But please note in each node, we need to put a myid file in dataDir. The content of myid is server number “0”, “1”, or “2”. With above example, we will put the myid file (the file name is myid) in the path:

/root/kafka/zookeeper/data/myid

Setup Internet Based Kafka Cluster

Now, let’s talk about the most complicated case, setting up an internet based Kafka cluster with multiple node. In my example, I will use 3 Kafka nodes. If your servers have only one IP address and it’s public IP. You can use above example, change the IP in the /etc/hosts as your public IP. I am using cloud server which will be assigned with a private IP and a public IP. When you search on Google to query such as “Kafka over internet”, the answer is that change the advertised.listeners with your public IP. Most of the case, it will work.
internet kafka with 3 nodes

My problem is the cloud servers cannot communicate with each other through their public IP, but only through their private IP. So if I set advertised.listeners with their public IP, none of them will work as they cannot talk with each others. Therefore, I cannot set advertised.listeners with public IP, neither setup with private IP as producers and consumers are on the internet. If the advertised.listeners set as private IP, producers and consumers will never access them with their private IP.

The solution is by using hostname in advertised.listeners, for example k1, k2, k3. The hostname in kafka server and all other local servers in the same network will be like this:

192.168.0.100 k1
192.168.0.101 k2
192.168.0.102 k3

The producers and consumers in the public network will set their hosts file as:

101.101.101.10 k1
101.101.101.11 k2
101.101.101.12 k3

In all servers which want to connect to kafka cluster can set the kafka hosts as below:

k1:9092,k2:9092,k3:9092

kafka topology

Now, the whole network topology will look like below:

This looks very simple, but it really cost me lots of time. Hope this will be helpful. In the end, let’s start the kafka service and zookeeper service by following command:

nohup ./bin/zookeeper-server-start.sh ./config/zookeeper.properties >>zoo.log 2>&1 &
nohup ./bin/kafka-server-start.sh ./config/server.properties >>kafka.log 2>&1 &
Previous PostNext Post

Leave a Reply

Your email address will not be published. Required fields are marked *