
I am new to kafka, so it could be a very simple for you experts. I hope this information could be helpful to someone like me. The title of this post sounds very high level, so please let me describe the problem first, which took me about 3 weeks to figure out.
Solo Kafka Server Example
Most of the Kafka tutorial, such as “learn Kafka in 5 mins”, they usually setup one Kafka server in a local network, even just in a virtual machine or in a local computer. This kind of tutorials are quite useful for most 99% of the case. But setting up a internet available kafka service is a little bit different from intranet case. First of all. let’s see a very simple kafka and zookeeper configuration, one node.
#kafka broker.id=0 listeners=PLAINTEXT://0.0.0.0:9092 zookeeper.connect=127.0.0.1:2181
The above example is the simplest Kafka configuration on a local computer. The below one is the corresponding zookeeper configuration.
clientPort=2181
All important configuration are shown above. Others will be left by default. It’s enough to setup one node kafka service on a local computer.
Setup Internet Based Kafka Service with One Node
Now, let’s make it a little bit complicated. I will setup the Kafka service on a computer with internet capability. In this case, the kafak configuration will be different from above simple case.
#kafka broker.id=0 listeners=PLAINTEXT://0.0.0.0:9092 advertised.listeners=PLAINTEXT://101.101.101.101:9092 zookeeper.connect=127.0.0.1:2181
We remain all the configuration but add a new statement advertised.listeners=PLAINTEXT://101.101.101.101:9092
. The IP 101.101.101.101 is the public IP which other message producer will be able to access it from internet. The zookeeper configuration will be 100% the same as above example.
clientPort=2181
All zookeeper configuration will use default value.
Setup Intranet Based Kafka Cluster with 3 Nodes
Before moving to setup a internet based Kafka Cluster with multiple nodes, let’s talk about a simpler case, setup a Kafka cluster in a local network. See the below to topology chart:
To make it easy to explain, just set the host name and local ip in /etc/hosts as following:
192.168.0.100 k1 192.168.0.101 k2 192.168.0.102 k3
The configuration for each Kafka node is different.
#kafka 1 broker.id=0 listeners=PLAINTEXT://0.0.0.0:9092 advertised.listeners=PLAINTEXT://k1:9092 zookeeper.connect=k1:2181,k2:2181,k3:2181 log.dirs=/root/kafka/kafka-logs num.partitions=12 default.replication.factor=3 offsets.topic.replication.factor=3 transaction.state.log.replication.factor=3 #kafka 2 broker.id=1 listeners=PLAINTEXT://0.0.0.0:9092 advertised.listeners=PLAINTEXT://k2:9092 zookeeper.connect=k1:2181,k2:2181,k3:2181 log.dirs=/root/kafka/kafka-logs num.partitions=12 default.replication.factor=3 offsets.topic.replication.factor=3 transaction.state.log.replication.factor=3 #kafka 3 broker.id=2 listeners=PLAINTEXT://0.0.0.0:9092 advertised.listeners=PLAINTEXT://k3:9092 zookeeper.connect=k1:2181,k2:2181,k3:2181 log.dirs=/root/kafka/kafka-logs num.partitions=12 default.replication.factor=3 offsets.topic.replication.factor=3 transaction.state.log.replication.factor=3
Please pay attention on the advertised.listeners and zookeeper.connect. Both of them are different from solo example. The difficult part of this example is in Zookeeper configuration. Please see the configuration first:
dataDir=/root/kafka/zookeeper/data dataLogDir=/root/kafka/zookeeper/log clientPort=2181 maxClientCnxns=100 tickTime=2000 initLimit=10 syncLimit=2 server.0=k1:2888:3888 server.1=k2:2888:3888 server.2=k3:2888:3888
The zookeeper configuration of 3 nodes are the same. But please note in each node, we need to put a myid file in dataDir. The content of myid is server number “0”, “1”, or “2”. With above example, we will put the myid file (the file name is myid) in the path:
/root/kafka/zookeeper/data/myid
Setup Internet Based Kafka Cluster
Now, let’s talk about the most complicated case, setting up an internet based Kafka cluster with multiple node. In my example, I will use 3 Kafka nodes. If your servers have only one IP address and it’s public IP. You can use above example, change the IP in the /etc/hosts as your public IP. I am using cloud server which will be assigned with a private IP and a public IP. When you search on Google to query such as “Kafka over internet”, the answer is that change the advertised.listeners with your public IP. Most of the case, it will work.
My problem is the cloud servers cannot communicate with each other through their public IP, but only through their private IP. So if I set advertised.listeners with their public IP, none of them will work as they cannot talk with each others. Therefore, I cannot set advertised.listeners with public IP, neither setup with private IP as producers and consumers are on the internet. If the advertised.listeners set as private IP, producers and consumers will never access them with their private IP.
The solution is by using hostname in advertised.listeners, for example k1, k2, k3. The hostname in kafka server and all other local servers in the same network will be like this:
192.168.0.100 k1 192.168.0.101 k2 192.168.0.102 k3
The producers and consumers in the public network will set their hosts file as:
101.101.101.10 k1 101.101.101.11 k2 101.101.101.12 k3
In all servers which want to connect to kafka cluster can set the kafka hosts as below:
k1:9092,k2:9092,k3:9092

Now, the whole network topology will look like below:
This looks very simple, but it really cost me lots of time. Hope this will be helpful. In the end, let’s start the kafka service and zookeeper service by following command:
nohup ./bin/zookeeper-server-start.sh ./config/zookeeper.properties >>zoo.log 2>&1 & nohup ./bin/kafka-server-start.sh ./config/server.properties >>kafka.log 2>&1 &