Introduction to Zookeeper

Zookeeper is an Apache project that was designed and implemented to help other developers build distributed applications. Many of the services that Zookeeper provides are a common set of services that all distributed applications require. Zookeeper implements them for you so that you don’t have to implement them yourself, letting you focus on your application rather than on distributed infrastructure components, like naming, configuration management, synchronization, group services e.t.c.

Note: This is the first story in our encounter with Zookeeper. Following story can be found here.

What is this post about?

This post is a brief introduction to Zookeeper. We will show you how to

  1. install
  2. setup
  3. start
  4. interact

with a Zookeeper server.

Then we will also show you how to interact with the Zookeeper server using Ruby.

Let’s start.

Download and Install Zookeeper

Note: For those who work in OS X and who like to use brew, installing Zookeeper might be as simple as brew install zookeeper.

You can download Zookeeper from any site that is listed in the download page. Pick up a stable release by downloading the file inside the stable folder.

I have downloaded the file zookeeper-3.4.10.tar.gz.

Then, I have unzipped/untarred the file into the folder ~/Documents/zookeeper-3.4.10.

Setup

Before we can start the Zookeeper server, we will have to specify a minimum configuration file. Let’s create the file conf/zoo.cfg inside the folder where you have your Zookeeper installation. This file can be created as a copy of the existing sample file.

In the Zookeeper folder:

The minimum configuration that will allow you to start 1 Zookeeper server is the following:

As you can see, I have specified the dataDir to be a directory in my Documents folder. This folder, /Users/pmatsino/Documents/zookeeper-data needs to be present. Go ahead and create this folder before you continue. This directory is going to keep the memory snapshots and the transaction log for the updates of the database. Note that Zookeeper keeps its state in memory in order to be efficient, but it flashes it into snapshots inside the dataDir. Also, the updates are atomic, and the transaction log is kept inside the dataDir too.

The tickTime is given in milliseconds and it is used to be the basic time unit for every Zookeeper configuration key that specifies time. It is used to implement heartbeats and also, it specifies the minimum session timeout, which is going to be twice the tickTime.

The clientPort is the port the Zookeeper server is going to listen to.

Start server

With the above settings in place, it is very easy to start a Zookeeper server:

In the Zookeeper folder:

Zookeeper data model

Zookeeper data model is based on nodes, which are called znodes. Each node has a unique name which is composed of the path parts to reach that node. Like we do with the file system and the folders and files. The root node is /. Then you can create as many children nodes as you like. And then children of children and so on. Here is another node: /foo/bar. The node bar is a child of node /foo, which in turn is a child of the root node (/).

Besides that, each node may have data attached to it. In fact, a node either has children nodes or data or both. But it cannot be without either of them. However, the data might be an empty string.

Connect to server

Now we can use the client tool that is provided with Zookeeper installation in order to connect to the Zookeeper server. Here it is how:

In the Zookeeper folder:

After lots of lines being printed on your terminal, you reach the [zk: localhost:2181(CONNECTED) 0] which is the prompt that you can use to send commands from the client to the server.

The help command will list the commands that you can use:

List nodes

Now that we are using the command line interface, let’s list the current nodes:

As you can see, we have one node, the / in the zookeeper namespace.

Create a Zookeeper node

Let’s create a new node, and then list the nodes again. Note that when we create a new node we give the data to attach to the node. In the following example, "bar" is a piece of string data to attach to node /foo.

Get details of node

And we can get the details of a node:

Do you see the first line? bar. It is the data associated to the node /foo.

Update details of node

We can update the details of a node with the set command:

With set /foo mary we update the data for the node /foo to be the string mary. We then confirm with get /foo.

Delete a node

Let’s now delete the /foo node:

The delete /foo, deletes the /foo node. We then confirm with ls /.

Replicated Zookeeper

Things are getting more interesting, of course, when you have multiple Zookeeper servers replicating your data. Working with 1 server is good while doing development. On the other hand, your production system will have to have more Zookeeper servers working in a replicated configuration.

Let’s see how we can start 3 Zookeeper servers locally.

Stop server

First, let’s stop the server that is running at the moment:

In the Zookeeper folder:

Create different data directories

Create 3 different data directories for each one of the Zookeeper servers.

In ~/Documents folder:

Create the myid files

Inside the data directories, you need to create a myid file with the id of each server. Let’s keep it very simple for our demo:

In ~/Documents folder:

The ids of our servers will be 12 and 3 respectively.

sb-tech-site-technology

Create server configurations

Now, let’s go to our conf folder and create three different configurations. One for each of the servers. We will use these files to start each server accordingly:

conf/1.cfg

The configuration file for the first server, with id 1 (create it inside ~/Documents/zookeeper-3.4.10/conf):

Pay attention to the following. Since we are starting all servers in the same machine:

  1. The dataDir is different per server. Here we specify the dataDir for the server with id 1.
  2. The clientPort is different per server.
  3. The quorum and leader election ports are different for each server. Also, the configuration of a server needs to know the quorum and leader election ports for the other servers too. Here we specify the 2888 and 3888 for the first server with id 1. Then 2889 and 3889 for server with id 2. Finally, 2890 and 3890 for server with id 3.

Having said the above, let’s create the configuration files for the other 2 servers:

conf/2.cfg

The configuration file for the server with id 2 (create it inside ~/Documents/zookeeper-3.4.10/conf):

conf/3.cfg

The configuration file for the server with id 3 (create it inside ~/Documents/zookeeper-3.4.10/conf):

Create start all script

Everything is ready. However, let’s make our life a little bit easier by creating the start_all.sh script inside the folder ~/Documents/zookeeper-3.4.10/bin

As you can see, we start the Zookeeper server by giving as argument the configuration file they need to use.

Make sure that the script is an executable:

In the Zookeeper folder:

Create stop all script

Similarly, let’s create the stop_all.sh script inside the folder ~/Documents/zookeeper-3.4.10/bin:

Don’t forget to make it executable:

In the Zookeeper folder:

Kick-off servers

Let’s now kick-off our replicated Zookeeper servers:

In the Zookeeper folder:

Connect to server 1 – Create data

Now, let’s connect to server 1 and create some data and then quit:

In the Zookeeper folder:

Connect to server 3 – Confirm data

Now, let’s connect to server 3 and confirm that we have access to the same data:

In the Zookeeper folder:

Bingo! The same data is available via all the servers. Try the second one too. And this is the idea behind the replicated Zookeeper.

Stop all servers

Let’s now stop all servers:

In the Zookeeper folder:

Ruby client bindings

Zookeeper provides client bindings for Java and C. But you can also use zookeeper gem which allows you to access a Zookeeper server using Ruby. Let’s see an example:

Start Zookeeper server

Start again the single instance Zookeeper server:

In the Zookeeper folder:

Install zookeeper gem

In the Zookeeper folder:

Interact with Zookeeper server using Ruby

Start irb and issue commands to Zookeeper server using Ruby:

Create a client object

In the Zookeeper folder:

Get children of root path

It is done with the method #get_children, which takes as input the path key. Do you see the result containing :children=>["zookeeper"]?

Create a node

Do you see that the children now has value ["zookeeper", "foo"]?

Get node data

Update node data

Delete node

Closing Note

Zookeeper will take a lot of the burden off your back, when designing and developing a distributed application. That was a first introduction to Zookeeper with the very basics of it.

Thank you for reading this blog post. And don’t forget that your comments below are more than welcome. I am willing to answer any questions that you may have and give you feedback on any comments that you may post. I would like to have your feedback because I learn from you as much as you learn from me.

About the Author

Panayotis Matsinopoulos works as Development Lead at Simply Business and, on his free time, enjoys giving and taking classes about Web development at Tech Career Booster.

Ready to start your career at Simply Business?

Want to know more about what it’s like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.

Panos G. Matsinopoulos

This block is configured using JavaScript. A preview is not available in the editor.