The easiest way to deploy BookKeeper is using schedulers like DC/OS, but you can also deploy BookKeeper clusters manually. A BookKeeper cluster consists of two main components:

  • A ZooKeeper cluster that is used for configuration- and coordination-related tasks
  • An ensemble of bookies

ZooKeeper setup

We won’t provide a full guide to setting up a ZooKeeper cluster here. We recommend that you consult this guide in the official ZooKeeper documentation.

Starting up bookies

Once your ZooKeeper cluster is up and running, you can start up as many bookies as you’d like to form a cluster. Before starting up each bookie, you need to modify the bookie’s configuration to make sure that it points to the right ZooKeeper cluster.

On each bookie host, you need to download the BookKeeper package as a tarball. Once you’ve done that, you need to configure the bookie by setting values in the bookkeeper-server/conf/bk_server.conf config file. The one parameter that you will absolutely need to change is the zkServers parameter, which you will need to set to the ZooKeeper connection string for your ZooKeeper cluster. Here’s an example:

zkServers=100.0.0.1:2181,100.0.0.2:2181,100.0.0.3:2181

A full listing of configurable parameters available in bookkeeper-server/conf/bk_server.conf can be found in the Configuration reference manual.

Once the bookie’s configuration is set, you can start it up using the bookie command of the bookkeeper CLI tool:

$ bookkeeper-server/bin/bookkeeper bookie

You can also build BookKeeper by cloning it from source or using Maven.

System requirements

The number of bookies you should run in a BookKeeper cluster depends on the quorum mode that you’ve chosen, the desired throughput, and the number of clients using the cluster simultaneously.

Quorum type Number of bookies
Self-verifying quorum 3
Generic 4

Increasing the number of bookies will enable higher throughput, and there is no upper limit on the number of bookies.

Cluster metadata setup

Once you’ve started up a cluster of bookies, you need to set up cluster metadata for the cluster by running the following command from any bookie in the cluster:

$ bookkeeper-server/bin/bookkeeper shell metaformat

You can run in the formatting

The metaformat command performs all the necessary ZooKeeper cluster metadata tasks and thus only needs to be run once and from any bookie in the BookKeeper cluster.

Once cluster metadata formatting has been completed, your BookKeeper cluster is ready to go!

An entry is a sequence of bytes (plus some metadata) written to a BookKeeper ledger. Entries are also known as records.

A ledger is a sequence of entries written to BookKeeper. Entries are written sequentially to ledgers and at most once, giving ledgers append-only semantics.

A bookie is an individual BookKeeper storage server.

Bookies store the content of ledgers and act as a distributed ensemble.

A subsystem that runs in the background on bookies to ensure that ledgers are fully replicated even if one bookie from the ensemble is down.

Striping is the process of distributing BookKeeper ledgers to sub-groups of bookies rather than to all bookies in a BookKeeper ensemble.

Striping is essential to ensuring fast performance.

A journal file stores BookKeeper transaction logs.

When a reader forces a ledger to close, preventing any further entries from being written to the ledger.

A record is a sequence of bytes (plus some metadata) written to a BookKeeper ledger. Records are also known as entries.