This is the fifth release of BookKeeper as an Apache Top Level Project!

The 4.5.0 release incorporates hundreds of new fixes, improvements, and features since previous major release, 4.4.0, which was released over a year ago. It is a big milestone in Apache BookKeeper community, converging from three main branches (Salesforce, Twitter and Yahoo).

Apache BookKeeper users are encouraged to upgrade to 4.5.0. The technical details of this release are summarized below.

Highlights

The main features in 4.5.0 cover are around following areas:

  • Dependencies Upgrade
  • Security
  • Public API
  • Performance
  • Operations

Dependencies Upgrade

Here is a list of dependencies upgraded in 4.5.0:

  • Moved the developement from Java 7 to Java 8.
  • Upgrade Protobuf to 2.6.
  • Upgrade ZooKeeper from 3.4 to 3.5.
  • Upgrade Netty to 4.1.
  • Upgrade Guava to 20.0.
  • Upgrade SLF4J to 1.7.25.
  • Upgrade Codahale to 3.1.0.

Security

Prior to this release, Apache BookKeeper only supports simple DIGEST-MD5 type authentication.

With this release of Apache BookKeeper, a number of feature are introduced that can be used, together of separately, to secure a BookKeeper cluster.

The following security features are currently supported.

  • Authentication of connections to bookies from clients, using either TLS or `SASL (Kerberos).
  • Authentication of connections from clients, bookies, autorecovery daemons to ZooKeeper, when using zookeeper based ledger managers.
  • Encryption of data transferred between bookies and clients, between bookies and autorecovery daemons using TLS.

It’s worth noting that those security features are optional - non-secured clusters are supported, as well as a mix of authenticated, unauthenticated, encrypted and non-encrypted clients.

For more details, have a look at BookKeeper Security.

Public API

There are multiple new client features introduced in 4.5.0.

LedgerHandleAdv

The [Ledger API] is the low level API provides by BookKeeper for interacting with ledgers in a bookkeeper cluster. It is simple but not flexible on ledger id or entry id generation. Apache BookKeeper introduces LedgerHandleAdv as an extension of existing LedgerHandle for advanced usage. The new LedgerHandleAdv allows applications providing its own ledger-id and assigning entry-id on adding entries.

See Ledger Advanced API for more details.

Long Poll

Long Poll is a main feature that DistributedLog uses to achieve low-latency tailing. This big feature has been merged back in 4.5.0 and available to BookKeeper users.

This feature includes two main changes, one is LastAddConfirmed piggyback, while the other one is a new long poll read API.

The first change piggyback the latest LastAddConfirm along with the read response, so your LastAddConfirmed will be automatically advanced when your read traffic continues. It significantly reduces the traffic to explicitly polling LastAddConfirmed and hence reduces the end-to-end latency.

The second change provides a new long poll read API, allowing tailing-reads without polling LastAddConfirmed everytime after readers exhaust known entries. Although long poll API brings great latency improvements on tailing reads, it is still a very low-level primitive. It is still recommended to use high level API (e.g. DistributedLog API) for tailing and streaming use cases.

See Streaming Reads for more details.

Explicit LAC

Prior to 4.5.0, the LAC is only advanced when subsequent entries are added. If there is no subsequent entries added, the last entry written will not be visible to readers until the ledger is closed. High-level client (e.g. DistributedLog) or applications has to work around this by writing some sort of control records to advance LAC.

In 4.5.0, a new explicit lac feature is introduced to periodically advance LAC if there are not subsequent entries added. This feature can be enabled by setting explicitLacInterval to a positive value.

Performance

There are a lot for performance related bug fixes and improvements in 4.5.0. These changes includes:

  • Upgraded netty from 3.x to 4.x to leverage buffer pooling and reduce memory copies.
  • Moved developement from Java 7 to Java 8 to take advantage of Java 8 features.
  • A lot of improvements around scheduling and threading on bookies.
  • Delay ensemble change to improve tail latency.
  • Parallel ledger recovery to improve the recovery speed.

We outlined following four changes as below. For a complete list of performance improvements, please checkout the full list of changes at the end.

Netty 4 Upgrade

The major performance improvement introduced in 4.5.0, is upgrading netty from 3.x to 4.x.

For more details, please read upgrade guide about the netty related tips when upgrading bookkeeper from 4.4.0 to 4.5.0.

Delay Ensemble Change

Ensemble Change is a feature that Apache BookKeeper uses to achieve high availability. However it is an expensive metadata operation. Especially when Apache BookKeeper is deployed in a multiple data-centers environment, losing a data center will cause churn of metadata operations due to ensemble changes. Delay Ensemble Change is introduced in 4.5.0 to overcome this problem. Enabling this feature means an Ensemble Change will only occur when clients can’t receive enough valid responses to satisfy ack-quorum constraint. This feature improves the tail latency.

To enable this feature, please set delayEnsembleChange to true on your clients.

Parallel Ledger Recovery

BookKeeper clients recovers entries one-by-one during ledger recovery. If a ledger has very large volumn of traffic, it will have large number of entries to recover when client failures occur. BookKeeper introduces parallel ledger recovery in 4.5.0 to allow batch recovery to improve ledger recovery speed.

To enable this feature, please set enableParallelRecoveryRead to true on your clients. You can also set recoveryReadBatchSize to control the batch size of recovery read.

Multiple Journals

Prior to 4.5.0, bookies are only allowed to configure one journal device. If you want to have high write bandwidth, you can raid multiple disks into one device and mount that device for jouranl directory. However because there is only one journal thread, this approach doesn’t actually improve the write bandwidth.

BookKeeper introduces multiple journal directories support in 4.5.0. Users can configure multiple devices for journal directories.

To enable this feature, please use journalDirectories rather than journalDirectory.

Operations

LongHierarchicalLedgerManager

Apache BookKeeper supports pluggable metadata store. By default, it uses Apache ZooKeeper as its metadata store. Among the zookeeper-based ledger manager implementations, HierarchicalLedgerManager is the most popular and widely adopted ledger manager. However it has a major limitation, which it assumes ledger-id is a 32-bits integer. It limits the number of ledgers to 2^32.

LongHierarchicalLedgerManager is introduced to overcome this limitation.

See Ledger Manager for more details.

Weight-based placement policy

Rack-Aware and Region-Aware placement polices are the two available placement policies in BookKeeper client. It places ensembles based on users’ configured network topology. However they both assume that all nodes are equal. weight-based placement is introduced in 4.5.0 to improve the existing placement polices. weight-based placement was not built as separated polices. It is built in the existing placement policies. If you are using Rack-Aware or Region-Aware, you can simply enable weight-based placement by setting diskWeightBasedPlacementEnabled to true.

Customized Ledger Metadata

A Map<String, byte[]> is introduced in ledger metadata in 4.5.0. Clients now are allowed to pass in a key/value map when creating ledgers. This customized ledger metadata can be later on used by user defined placement policy. This extends the flexibility of bookkeeper API.

Add Prometheus stats provider

A new Prometheus stats provider is introduce in 4.5.0. It simplies the metric collection when running bookkeeper on kubernetes.

Add more tools in BookieShell

BookieShell is the tool provided by Apache BooKeeper to operate clusters. There are multiple importants tools introduced in 4.5.0, for example, decommissionbookie, expandstorage, lostbookierecoverydelay, triggeraudit.

For the complete list of commands in BookieShell, please read BookKeeper CLI tool reference.

Full list of changes

JIRA

Sub-task

Bug

  • [BOOKKEEPER-852] - Release LedgerDescriptor and master-key objects when not used anymore
  • [BOOKKEEPER-903] - MetaFormat BookieShell Command is not deleting UnderReplicatedLedgers list from the ZooKeeper
  • [BOOKKEEPER-907] - for ReadLedgerEntriesCmd, EntryFormatter should be configurable and HexDumpEntryFormatter should be one of them
  • [BOOKKEEPER-908] - Case to handle BKLedgerExistException
  • [BOOKKEEPER-924] - addEntry() is susceptible to spurious wakeups
  • [BOOKKEEPER-927] - Extend BOOKKEEPER-886 to LedgerHandleAdv too (BOOKKEEPER-886: Allow to disable ledgers operation throttling)
  • [BOOKKEEPER-933] - ClientConfiguration always inherits System properties
  • [BOOKKEEPER-938] - LedgerOpenOp should use digestType from metadata
  • [BOOKKEEPER-939] - Fix typo in bk-merge-pr.py
  • [BOOKKEEPER-940] - Fix findbugs warnings after bumping to java 8
  • [BOOKKEEPER-952] - Fix RegionAwarePlacementPolicy
  • [BOOKKEEPER-955] - in BookKeeperAdmin listLedgers method currentRange variable is not getting updated to next iterator when it has run out of elements
  • [BOOKKEEPER-956] - HierarchicalLedgerManager doesn't work for ledgerid of length 9 and 10 because of order issue in HierarchicalLedgerRangeIterator
  • [BOOKKEEPER-958] - ZeroBuffer readOnlyBuffer returns ByteBuffer with 0 remaining bytes for length > 64k
  • [BOOKKEEPER-959] - ClientAuthProvider and BookieAuthProvider Public API used Protobuf Shaded classes
  • [BOOKKEEPER-976] - Fix license headers with "Copyright 2016 The Apache Software Foundation"
  • [BOOKKEEPER-980] - BookKeeper Tools doesn't process the argument correctly
  • [BOOKKEEPER-981] - NullPointerException in RackawareEnsemblePlacementPolicy while running in Docker Container
  • [BOOKKEEPER-984] - BookieClientTest.testWriteGaps tested
  • [BOOKKEEPER-986] - Handle Memtable flush failure
  • [BOOKKEEPER-987] - BookKeeper build is broken due to the shade plugin for commit ecbb053e6e
  • [BOOKKEEPER-988] - Missing license headers
  • [BOOKKEEPER-989] - Enable travis CI for bookkeeper git
  • [BOOKKEEPER-999] - BookKeeper client can leak threads
  • [BOOKKEEPER-1013] - Fix findbugs errors on latest master
  • [BOOKKEEPER-1018] - Allow client to select older V2 protocol (no protobuf)
  • [BOOKKEEPER-1020] - Fix Explicit LAC tests on master
  • [BOOKKEEPER-1021] - Improve the merge script to handle github reviews api
  • [BOOKKEEPER-1031] - ReplicationWorker.rereplicate fails to call close() on ReadOnlyLedgerHandle
  • [BOOKKEEPER-1044] - Entrylogger is not readding rolled logs back to the logChannelsToFlush list when exception happens while trying to flush rolled logs
  • [BOOKKEEPER-1047] - Add missing error code in ZK setData return path
  • [BOOKKEEPER-1058] - Ignore already deleted ledger on replication audit
  • [BOOKKEEPER-1061] - BookieWatcher should not do ZK blocking operations from ZK async callback thread
  • [BOOKKEEPER-1065] - OrderedSafeExecutor should only have 1 thread per bucket
  • [BOOKKEEPER-1071] - BookieRecoveryTest is failing due to a Netty4 IllegalReferenceCountException
  • [BOOKKEEPER-1072] - CompactionTest is flaky when disks are almost full
  • [BOOKKEEPER-1073] - Several stats provider related changes.
  • [BOOKKEEPER-1074] - Remove JMX Bean
  • [BOOKKEEPER-1075] - BK LedgerMetadata: more memory-efficient parsing of configs
  • [BOOKKEEPER-1076] - BookieShell should be able to read the 'FENCE' entry in the log
  • [BOOKKEEPER-1077] - BookKeeper: Local Bookie Journal and ledger paths
  • [BOOKKEEPER-1079] - shell lastMark throws NPE
  • [BOOKKEEPER-1098] - ZkUnderreplicationManager can build up an unbounded number of watchers
  • [BOOKKEEPER-1101] - BookKeeper website menus not working under https
  • [BOOKKEEPER-1102] - org.apache.bookkeeper.client.BookKeeperDiskSpaceWeightedLedgerPlacementTest.testDiskSpaceWeightedBookieSelectionWithBookiesBeingAdded is unreliable
  • [BOOKKEEPER-1103] - LedgerMetadataCreateTest bug in ledger id generation causes intermittent hang
  • [BOOKKEEPER-1104] - BookieInitializationTest.testWithDiskFullAndAbilityToCreateNewIndexFile testcase is unreliable

Improvement

New Feature

Story

Task

Test

Wish

  • [BOOKKEEPER-943] - Reduce log level of AbstractZkLedgerManager for register/unregister ReadOnlyLedgerHandle

Github

An entry is a sequence of bytes (plus some metadata) written to a BookKeeper ledger. Entries are also known as records.

A ledger is a sequence of entries written to BookKeeper. Entries are written sequentially to ledgers and at most once, giving ledgers append-only semantics.

A bookie is an individual BookKeeper storage server.

Bookies store the content of ledgers and act as a distributed ensemble.

A subsystem that runs in the background on bookies to ensure that ledgers are fully replicated even if one bookie from the ensemble is down.

Striping is the process of distributing BookKeeper ledgers to sub-groups of bookies rather than to all bookies in a BookKeeper ensemble.

Striping is essential to ensuring fast performance.

A journal file stores BookKeeper transaction logs.

When a reader forces a ledger to close, preventing any further entries from being written to the ledger.

A record is a sequence of bytes (plus some metadata) written to a BookKeeper ledger. Records are also known as entries.