Zookeeper is an Apache project that was designed and implemented to help other developers build distributed applications. Many of the services that Zookeeper provides are a common set of services that all distributed applications require. Zookeeper implements them for you so that you don’t have to implement them yourself, letting you focus on your application rather than on distributed infrastructure components, like naming, configuration management, synchronization, group services e.t.c.
Note: This is the first story in our encounter with Zookeeper. Following story can be found here.
What is this post about?
This post is a brief introduction to Zookeeper. We will show you how to
- install
- setup
- start
- interact
with a Zookeeper server.
Then we will also show you how to interact with the Zookeeper server using Ruby.
Let’s start.
Download and Install Zookeeper
Note: For those who work in OS X and who like to use brew
, installing Zookeeper might be as simple as brew install zookeeper
.
You can download Zookeeper from any site that is listed in the download page. Pick up a stable release by downloading the file inside the stable folder.
I have downloaded the file zookeeper-3.4.10.tar.gz
.
Then, I have unzipped/untarred the file into the folder ~/Documents/zookeeper-3.4.10
.
Setup
Before we can start the Zookeeper server, we will have to specify a minimum configuration file. Let’s create the file conf/zoo.cfg
inside the folder where you have your Zookeeper installation. This file can be created as a copy of the existing sample file.
In the Zookeeper folder:
$ cp conf/zoo_sample.cfg conf/zoo.cfg
The minimum configuration that will allow you to start 1 Zookeeper server is the following:
tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data
clientPort=2181
As you can see, I have specified the dataDir
to be a directory in my Documents
folder. This folder, /Users/pmatsino/Documents/zookeeper-data
needs to be present. Go ahead and create this folder before you continue. This directory is going to keep the memory snapshots and the transaction log for the updates of the database. Note that Zookeeper keeps its state in memory in order to be efficient, but it flashes it into snapshots inside the dataDir
. Also, the updates are atomic, and the transaction log is kept inside the dataDir
too.
The tickTime
is given in milliseconds and it is used to be the basic time unit for every Zookeeper configuration key that specifies time. It is used to implement heartbeats and also, it specifies the minimum session timeout, which is going to be twice the tickTime
.
The clientPort
is the port the Zookeeper server is going to listen to.
Start server
With the above settings in place, it is very easy to start a Zookeeper server:
In the Zookeeper folder:
$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$
Zookeeper data model
Zookeeper data model is based on nodes, which are called znodes. Each node has a unique name which is composed of the path parts to reach that node. Like we do with the file system and the folders and files. The root node is /
. Then you can create as many children nodes as you like. And then children of children and so on. Here is another node: /foo/bar
. The node bar
is a child of node /foo
, which in turn is a child of the root node (/
).
Besides that, each node may have data attached to it. In fact, a node either has children nodes or data or both. But it cannot be without either of them. However, the data might be an empty string.
Connect to server
Now we can use the client tool that is provided with Zookeeper installation in order to connect to the Zookeeper server. Here it is how:
In the Zookeeper folder:
$ bin/zkCli.sh -server localhost:2181
...
2017-08-29 13:58:40,109 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15e2de052c30000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
After lots of lines being printed on your terminal, you reach the [zk: localhost:2181(CONNECTED) 0]
which is the prompt that you can use to send commands from the client to the server.
The help
command will list the commands that you can use:
[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
stat path [watch]
set path data [version]
ls path [watch]
delquota [-n|-b] path
ls2 path [watch]
setAcl path acl
setquota -n|-b val path
history
redo cmdno
printwatches on|off
delete path [version]
sync path
listquota path
rmr path
get path [watch]
create [-s] [-e] path data acl
addauth scheme auth
quit
getAcl path
close
connect host:port
[zk: localhost:2181(CONNECTED) 1]
List nodes
Now that we are using the command line interface, let’s list the current nodes:
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2]
As you can see, we have one node, the /
in the zookeeper
namespace.
Create a Zookeeper node
Let’s create a new node, and then list the nodes again. Note that when we create a new node we give the data to attach to the node. In the following example, "bar"
is a piece of string data to attach to node /foo
.
[zk: localhost:2181(CONNECTED) 4] create /foo bar
Created /foo
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper, foo]
Get details of node
And we can get the details of a node:
[zk: localhost:2181(CONNECTED) 6] get /foo
bar
... ( more output here ) ...
dataLength = 3
numChildren = 0
[zk: localhost:2181(CONNECTED) 7]
Do you see the first line? bar
. It is the data associated to the node /foo
.
Update details of node
We can update the details of a node with the set
command:
[zk: localhost:2181(CONNECTED) 7] set /foo mary
... ( more output here ) ...
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 8] get /foo
mary
... ( more output here ) ...
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 9]
With set /foo mary
we update the data for the node /foo
to be the string mary
. We then confirm with get /foo
.
Delete a node
Let’s now delete the /foo
node:
[zk: localhost:2181(CONNECTED) 9] delete /foo
[zk: localhost:2181(CONNECTED) 10] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 11]
The delete /foo
, deletes the /foo
node. We then confirm with ls /
.
Replicated Zookeeper
Things are getting more interesting, of course, when you have multiple Zookeeper servers replicating your data. Working with 1 server is good while doing development. On the other hand, your production system will have to have more Zookeeper servers working in a replicated configuration.
Let’s see how we can start 3 Zookeeper servers locally.
Stop server
First, let’s stop the server that is running at the moment:
In the Zookeeper folder:
$ bin/zkServer.sh stop
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
$
Create different data directories
Create 3 different data directories for each one of the Zookeeper servers.
In ~/Documents
folder:
$ mkdir zookeeper-data-1
$ mkdir zookeeper-data-2
$ mkdir zookeeper-data-3
Create the myid
files
Inside the data directories, you need to create a myid
file with the id of each server. Let’s keep it very simple for our demo:
In ~/Documents
folder:
$ echo '1' > zookeeper-data-1/myid
$ echo '2' > zookeeper-data-1/myid
$ echo '3' > zookeeper-data-1/myid
The ids of our servers will be 1
, 2
and 3
respectively.
Create server configurations
Now, let’s go to our conf
folder and create three different configurations. One for each of the servers. We will use these files to start each server accordingly:
conf/1.cfg
The configuration file for the first server, with id 1
(create it inside ~/Documents/zookeeper-3.4.10/conf
):
tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-1
clientPort=2181
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
Pay attention to the following. Since we are starting all servers in the same machine:
- The
dataDir
is different per server. Here we specify thedataDir
for the server with id1
. - The
clientPort
is different per server. - The
quorum
andleader
election ports are different for each server. Also, the configuration of a server needs to know thequorum
andleader
election ports for the other servers too. Here we specify the2888
and3888
for the first server with id1
. Then2889
and3889
for server with id2
. Finally,2890
and3890
for server with id3
.
Having said the above, let’s create the configuration files for the other 2 servers:
conf/2.cfg
The configuration file for the server with id 2
(create it inside ~/Documents/zookeeper-3.4.10/conf
):
tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-2
clientPort=2182
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
conf/3.cfg
The configuration file for the server with id 3
(create it inside ~/Documents/zookeeper-3.4.10/conf
):
tickTime=2000
dataDir=/Users/pmatsino/Documents/zookeeper-data-3
clientPort=2183
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
Create start all
script
Everything is ready. However, let’s make our life a little bit easier by creating the start_all.sh
script inside the folder ~/Documents/zookeeper-3.4.10/bin
# !/usr/bin/env bash
bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/1.cfg
bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/2.cfg
bin/zkServer.sh start ~/Documents/zookeeper-3.4.10/conf/3.cfg
As you can see, we start the Zookeeper server by giving as argument the configuration file they need to use.
Make sure that the script is an executable:
In the Zookeeper folder:
$ chmod +x bin/start_all.sh
Create stop all
script
Similarly, let’s create the stop_all.sh
script inside the folder ~/Documents/zookeeper-3.4.10/bin
:
# !/usr/bin/env bash
bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/1.cfg
bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/2.cfg
bin/zkServer.sh stop ~/Documents/zookeeper-3.4.10/conf/3.cfg
Don’t forget to make it executable:
In the Zookeeper folder:
$ chmod +x bin/stop_all.sh
Kick-off servers
Let’s now kick-off our replicated Zookeeper servers:
In the Zookeeper folder:
$ bin/start_all.sh
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/1.cfg
Starting zookeeper ... STARTED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/2.cfg
Starting zookeeper ... STARTED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/3.cfg
Starting zookeeper ... STARTED
$
Connect to server 1 – Create data
Now, let’s connect to server 1 and create some data and then quit:
In the Zookeeper folder:
$ bin/zkCli.sh -server 127.0.0.1:2181
...
WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2181(CONNECTED) 0] create /replicated_demo three_servers
Created /replicated_demo
[zk: 127.0.0.1:2181(CONNECTED) 1] quit
$
Connect to server 3 – Confirm data
Now, let’s connect to server 3 and confirm that we have access to the same data:
In the Zookeeper folder:
$ bin/zkCli.sh -server 127.0.0.1:2181
...
WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2183(CONNECTED) 0] ls /
[zookeeper, test, replicated_demo]
[zk: 127.0.0.1:2183(CONNECTED) 1] get /replicated_demo
three_servers
... ( more output here ) ...
dataLength = 13
numChildren = 0
[zk: 127.0.0.1:2183(CONNECTED) 2] quit
$
Bingo! The same data is available via all the servers. Try the second one too. And this is the idea behind the replicated Zookeeper.
Stop all servers
Let’s now stop all servers:
In the Zookeeper folder:
$ bin/stop_all.sh
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/1.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/2.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/conf/3.cfg
Stopping zookeeper ... STOPPED
$
Ruby client bindings
Zookeeper provides client bindings for Java and C. But you can also use zookeeper gem which allows you to access a Zookeeper server using Ruby. Let’s see an example:
Start Zookeeper server
Start again the single instance Zookeeper server:
In the Zookeeper folder:
$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /Users/pmatsino/Documents/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$
Install zookeeper
gem
In the Zookeeper folder:
$ gem install zookeeper --no-ri --no-rdoc
Building native extensions. This could take a while...
Successfully installed zookeeper-1.4.11
1 gem installed
$
Interact with Zookeeper server using Ruby
Start irb
and issue commands to Zookeeper server using Ruby:
Create a client object
In the Zookeeper folder:
$ irb
irb(main):001:0> require 'zookeeper'
=> true
irb(main):002:0> zookeeper = Zookeeper.new('127.0.0.1:2181')
=> #<Zookeeper::Client:0x007fc2b9cc3c78 @host="127.0.0.1:2181", @chroot_path="", @req_registry=#<Zookeeper::RequestRegistry...>>>
Get children of root path
irb(main):003:0> zookeeper.get_children(path: '/')
=> {:req_id=>0, :rc=>0, :children=>["zookeeper"], :stat=>#<Zookeeper::Stat:0x007fc2b9ca9788 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=3, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=1, @pzxid=22>}
irb(main):004:0>
It is done with the method #get_children
, which takes as input the path
key. Do you see the result containing :children=>["zookeeper"]
?
Create a node
irb(main):004:0> zookeeper.create(path: '/foo', data: 'bar')
=> {:req_id=>1, :rc=>0, :path=>"/foo"}
irb(main):005:0> zookeeper.get_children(path: '/')
=> {:req_id=>8, :rc=>0, :children=>["zookeeper", "foo"], :stat=>#<Zookeeper::Stat:0x007fc2b9bf98b0 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=6, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=2, @pzxid=28>}
irb(main):012:0>
Do you see that the children
now has value ["zookeeper", "foo"]
?
Get node data
irb(main):005:0> zookeeper.get(path: '/foo')
=> {:req_id=>2, :rc=>0, :data=>"bar", :stat=>#<Zookeeper::Stat:0x007fc2b9c82e08 @exists=true, @czxid=25, @mzxid=25, @ctime=1504016747266, @mtime=1504016747266, @version=0, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=3, @numChildren=0, @pzxid=25>}
irb(main):006:0>
Update node data
irb(main):006:0> zookeeper.set(path: '/foo', data: 'mary')
=> {:req_id=>3, :rc=>0, :stat=>#<Zookeeper::Stat:0x007fc2b9c6bca8 @exists=true, @czxid=25, @mzxid=26, @ctime=1504016747266, @mtime=1504016802433, @version=1, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=4, @numChildren=0, @pzxid=25>}
irb(main):007:0> zookeeper.get(path: '/foo')
=> {:req_id=>4, :rc=>0, :data=>"mary", :stat=>#<Zookeeper::Stat:0x007fc2b9c58158 @exists=true, @czxid=25, @mzxid=26, @ctime=1504016747266, @mtime=1504016802433, @version=1, @cversion=0, @aversion=0, @ephemeralOwner=0, @dataLength=4, @numChildren=0, @pzxid=25>}
irb(main):008:0>
Delete node
irb(main):008:0> zookeeper.delete(path: '/foo')
=> {:req_id=>5, :rc=>0}
irb(main):009:0> zookeeper.get_children(path: '/')
=> {:req_id=>6, :rc=>0, :children=>["zookeeper"], :stat=>#<Zookeeper::Stat:0x007fc2b9c325e8 @exists=true, @czxid=0, @mzxid=0, @ctime=0, @mtime=0, @version=0, @cversion=5, @aversion=0, @ephemeralOwner=0, @dataLength=0, @numChildren=1, @pzxid=27>}
irb(main):010:0>
Closing Note
Zookeeper will take a lot of the burden off your back, when designing and developing a distributed application. That was a first introduction to Zookeeper with the very basics of it.
Thank you for reading this blog post. And don’t forget that your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. I would like to have your feedback because I learn from you as much as you learn from me.
About the Author
Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.
Ready to start your career at Simply Business?
Want to know more about what it’s like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.
This block is configured using JavaScript. A preview is not available in the editor.