Monday, April 02, 2018

Docker - How to Containerize Zookeeper with Exhibitor

This blog provides your a way of containerizing Zookeeper + Exhibitor. It shows you step by step, how to spin up Zookeeper instance(s) that supervised by Exhibitor. A working "docker-compose.yml" also provided.

Requirements:

  • Zookeeper: 3.4.11
  • Exhibitor: latest
  • Maven: 3.5.3
  • Docker: 17.12.0-ce
  • Docker Compose: Version 3

Repositories:

Dockerfile:

Environment variables:

The container expects the following environment variables to be passed in:
  • HOSTNAME - addressable hostname for this node (Exhibitor will forward users of the UI to this address)
  • S3_BUCKET - (optional) bucket used by Exhibitor for backups and coordination
  • S3_PREFIX - (optional) key prefix within S3_BUCKET to use for this cluster
  • AWS_ACCESS_KEY_ID - (optional) AWS access key ID with read/write permissions on S3_BUCKET
  • AWS_SECRET_ACCESS_KEY - (optional) secret key for AWS_ACCESS_KEY_ID
  • AWS_REGION - (optional) the AWS region of the S3 bucket (defaults to us-west-2)
  • ZK_PASSWORD - (optional) the HTTP Basic Auth password for the "zk" user
  • ZK_DATA_DIR - (optional) Zookeeper data directory
  • ZK_LOG_DIR - (optional) Zookeeper log directory
  • AUTO_MANAGE_SETTLING_PERIOD - (optional) The amount in milliseconds for Exhibitor to wait before adding/removing nodes
  • HTTP_PROXY_HOST - (optional) HTTP Proxy hostname
  • HTTP_PROXY_PORT - (optional) HTTP Proxy port
  • HTTP_PROXY_USERNAME - (optional) HTTP Proxy username
  • HTTP_PROXY_PASSWORD - (optional) HTTP Proxy password

Run Container:

    1.With AWS s3 bucket:
$ docker run -p 8181:8181 -p 2181:2181 -p 2888:2888 -p 3888:3888 \
    -e S3_BUCKET=<bucket> \
    -e S3_PREFIX=<key_prefix> \
    -e AWS_ACCESS_KEY_ID=<access_key> \
    -e AWS_SECRET_ACCESS_KEY=<secret_key> \
    -e HOSTNAME=<host> \
    tonylixu/ex-zookeeper:1.0

     2. With local file system:
$ docker run -p 8181:8181 -p 2181:2181 -p 2888:2888 -p 3888:3888 \
    -e HOSTNAME=<host> \
    tonylixu/ex-zookeeper:1.0


Run with Docker Compose:

Download the "docker-compose.yml" file. If you don't want to build your own image, comment out the "build: ."
To start a single instance:
$ docker-compose up -d

To run multiple containers, you can use the --scale option:
$ docker-compose up --scale zookeeper=3 -d

The will create three zookeeper containers, each with its own exhibitor. Give it a minute or two to let nodes get balanced out and recognize all nodes. This is only recommended for dev/test environment, for production deploy, I strongly suggest that you create separate services in the "docker-compose.yml" file.
Note: the local file system backup does not work in the "scale" mode, please create separate services in "docker-compose.yml" file for production env.

What is Zookeeper?

Apache ZooKeeper is a software project of the Apache Software Foundation, providing an open source distributed configuration service, synchronization service, and naming registry for large distributed systems. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.

The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. Now Zookeeper is a top-level Apache project.

What services does Zookeeper provide?

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques.

The common services provided by ZooKeeper are as follows:
  • Naming service − Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.
  • Configuration management − Latest and up-to-date configuration information of the system for a joining node.
  • Cluster management − Joining / leaving of a node in a cluster and node status at real time.
  • Leader election − Electing a node as leader for coordination purpose.
  • Locking and synchronization service − Locking the data while modifying it.
  • Highly reliable data registry − Availability of data even when one or a few nodes are down.

Benefits of Zookeeper:

  • Simple distributed coordination process
  • Synchronization − Mutual exclusion and co-operation between server processes. This process helps in Apache HBase for configuration management.
  • Ordered Messages
  • Serialization − Encode the data according to specific rules. Ensure your application runs consistently. This approach can be used in MapReduce to coordinate queue to execute running threads.
  • Reliability
  • Atomicity − Data transfer either succeed or fail completely, but no transaction is partial.

What is Exhibitor?

Exhibitor is a supervisor of the Zookeeper instances. Exhibitor does periodic backups, checking nodes status and auto restart on Zookeeper node failures.

Exhibitor features:

Zookeeper instance monitoring:
  • Each Exhibitor instance monitors the ZooKeeper server running on the same server. If ZooKeeper is not running, Exhibitor will write the zoo.cfg file (see Cluster-wide Configuration below) and start it. If ZooKeeper crashes for some reason, Exhibitor will restart it.

Backup/Restore:
  • Backups in a ZooKeeper ensemble are more complicated than for a traditional data store (e.g. a RDBMS). Generally, most of the data in ZooKeeper is ephemeral. It would be harmful to blindly restore an entire ZooKeeper data set. What is needed is selective restoration to prevent accidental damage to a subset of the data set. Exhibitor enables this.
  • Exhibitor will periodically backup the ZooKeeper transaction files. Once backed up, you can index any of these transaction files. Once indexed, you can search for individual transactions and “replay” them to restore a given ZNode to ZooKeeper.
Log Cleanup:
  • Exhibitor does this maintenance automatically.
If you have any questions, please send me an email at tony@lixu.ca

No comments: