This is the fifth release of BookKeeper as an Apache Top Level Project!
The 4.5.0 release incorporates hundreds of new fixes, improvements, and features since previous major release, 4.4.0, which was released over a year ago. It is a big milestone in Apache BookKeeper community, converging from three main branches (Salesforce, Twitter and Yahoo).
Apache BookKeeper users are encouraged to upgrade to 4.5.0. The technical details of this release are summarized below.
Highlights
The main features in 4.5.0 cover are around following areas:
- Dependencies Upgrade
- Security
- Public API
- Performance
- Operations
Dependencies Upgrade
Here is a list of dependencies upgraded in 4.5.0:
- Moved the developement from Java 7 to Java 8.
- Upgrade Protobuf to
2.6
. - Upgrade ZooKeeper from
3.4
to3.5
. - Upgrade Netty to
4.1
. - Upgrade Guava to
20.0
. - Upgrade SLF4J to
1.7.25
. - Upgrade Codahale to
3.1.0
.
Security
Prior to this release, Apache BookKeeper only supports simple DIGEST-MD5
type authentication.
With this release of Apache BookKeeper, a number of feature are introduced that can be used, together of separately, to secure a BookKeeper cluster.
The following security features are currently supported.
- Authentication of connections to bookies from clients, using either
TLS
or `SASL (Kerberos). - Authentication of connections from clients, bookies, autorecovery daemons to
ZooKeeper
, when using zookeeper based ledger managers. - Encryption of data transferred between bookies and clients, between bookies and autorecovery daemons using
TLS
.
It’s worth noting that those security features are optional - non-secured clusters are supported, as well as a mix of authenticated, unauthenticated, encrypted and non-encrypted clients.
For more details, have a look at BookKeeper Security.
Public API
There are multiple new client features introduced in 4.5.0.
LedgerHandleAdv
The [Ledger API] is the low level API provides by BookKeeper for interacting with ledgers
in a bookkeeper cluster.
It is simple but not flexible on ledger id or entry id generation. Apache BookKeeper introduces LedgerHandleAdv
as an extension of existing LedgerHandle
for advanced usage. The new LedgerHandleAdv
allows applications providing
its own ledger-id
and assigning entry-id
on adding entries.
See Ledger Advanced API for more details.
Long Poll
Long Poll
is a main feature that DistributedLog uses to achieve low-latency tailing.
This big feature has been merged back in 4.5.0 and available to BookKeeper users.
This feature includes two main changes, one is LastAddConfirmed
piggyback, while the other one is a new long poll
read API.
The first change piggyback the latest LastAddConfirm
along with the read response, so your LastAddConfirmed
will be automatically advanced
when your read traffic continues. It significantly reduces the traffic to explicitly polling LastAddConfirmed
and hence reduces the end-to-end latency.
The second change provides a new long poll
read API, allowing tailing-reads without polling LastAddConfirmed
everytime after readers exhaust known entries.
Although long poll
API brings great latency improvements on tailing reads, it is still a very low-level primitive.
It is still recommended to use high level API (e.g. DistributedLog API) for tailing and streaming use cases.
See Streaming Reads for more details.
Explicit LAC
Prior to 4.5.0, the LAC
is only advanced when subsequent entries are added. If there is no subsequent entries added,
the last entry written will not be visible to readers until the ledger is closed. High-level client (e.g. DistributedLog) or applications
has to work around this by writing some sort of control records
to advance LAC
.
In 4.5.0, a new explicit lac
feature is introduced to periodically advance LAC
if there are not subsequent entries added. This feature
can be enabled by setting explicitLacInterval
to a positive value.
Performance
There are a lot for performance related bug fixes and improvements in 4.5.0. These changes includes:
- Upgraded netty from 3.x to 4.x to leverage buffer pooling and reduce memory copies.
- Moved developement from Java 7 to Java 8 to take advantage of Java 8 features.
- A lot of improvements around scheduling and threading on
bookies
. - Delay ensemble change to improve tail latency.
- Parallel ledger recovery to improve the recovery speed.
- …
We outlined following four changes as below. For a complete list of performance improvements, please checkout the full list of changes
at the end.
Netty 4 Upgrade
The major performance improvement introduced in 4.5.0, is upgrading netty from 3.x to 4.x.
For more details, please read upgrade guide about the netty related tips when upgrading bookkeeper from 4.4.0 to 4.5.0.
Delay Ensemble Change
Ensemble Change
is a feature that Apache BookKeeper uses to achieve high availability. However it is an expensive metadata operation.
Especially when Apache BookKeeper is deployed in a multiple data-centers environment, losing a data center will cause churn of metadata
operations due to ensemble changes. Delay Ensemble Change
is introduced in 4.5.0 to overcome this problem. Enabling this feature means
an Ensemble Change
will only occur when clients can’t receive enough valid responses to satisfy ack-quorum
constraint. This feature
improves the tail latency.
To enable this feature, please set delayEnsembleChange
to true
on your clients.
Parallel Ledger Recovery
BookKeeper clients recovers entries one-by-one during ledger recovery. If a ledger has very large volumn of traffic, it will have
large number of entries to recover when client failures occur. BookKeeper introduces parallel ledger recovery
in 4.5.0 to allow
batch recovery to improve ledger recovery speed.
To enable this feature, please set enableParallelRecoveryRead
to true
on your clients. You can also set recoveryReadBatchSize
to control the batch size of recovery read.
Multiple Journals
Prior to 4.5.0, bookies are only allowed to configure one journal device. If you want to have high write bandwidth, you can raid multiple disks into one device and mount that device for jouranl directory. However because there is only one journal thread, this approach doesn’t actually improve the write bandwidth.
BookKeeper introduces multiple journal directories support in 4.5.0. Users can configure multiple devices for journal directories.
To enable this feature, please use journalDirectories
rather than journalDirectory
.
Operations
LongHierarchicalLedgerManager
Apache BookKeeper supports pluggable metadata store. By default, it uses Apache ZooKeeper as its metadata store. Among the zookeeper-based
ledger manager implementations, HierarchicalLedgerManager
is the most popular and widely adopted ledger manager. However it has a major
limitation, which it assumes ledger-id
is a 32-bits integer. It limits the number of ledgers to 2^32
.
LongHierarchicalLedgerManager
is introduced to overcome this limitation.
See Ledger Manager for more details.
Weight-based placement policy
Rack-Aware
and Region-Aware
placement polices are the two available placement policies in BookKeeper client. It places ensembles based
on users’ configured network topology. However they both assume that all nodes are equal. weight-based
placement is introduced in 4.5.0 to
improve the existing placement polices. weight-based
placement was not built as separated polices. It is built in the existing placement policies.
If you are using Rack-Aware
or Region-Aware
, you can simply enable weight-based
placement by setting diskWeightBasedPlacementEnabled
to true
.
Customized Ledger Metadata
A Map<String, byte[]>
is introduced in ledger metadata in 4.5.0. Clients now are allowed to pass in a key/value map when creating ledgers.
This customized ledger metadata can be later on used by user defined placement policy. This extends the flexibility of bookkeeper API.
Add Prometheus stats provider
A new Prometheus stats provider is introduce in 4.5.0. It simplies the metric collection when running bookkeeper on kubernetes.
Add more tools in BookieShell
BookieShell
is the tool provided by Apache BooKeeper to operate clusters. There are multiple importants tools introduced in 4.5.0, for example, decommissionbookie
,
expandstorage
, lostbookierecoverydelay
, triggeraudit
.
For the complete list of commands in BookieShell
, please read BookKeeper CLI tool reference.
Full list of changes
JIRA
Sub-task
- [BOOKKEEPER-552] - 64 Bits Ledger ID Generation
- [BOOKKEEPER-553] - New LedgerManager for 64 Bits Ledger ID Management in ZooKeeper
- [BOOKKEEPER-588] - SSL support
- [BOOKKEEPER-873] - Enhance CreatedLedger API to accept ledgerId as input
- [BOOKKEEPER-949] - Allow entryLog creation even when bookie is in RO mode for compaction
- [BOOKKEEPER-965] - Long Poll: Changes to the Write Path
- [BOOKKEEPER-997] - Wire protocol change for supporting long poll
- [BOOKKEEPER-1017] - Create documentation for ZooKeeper ACLs
- [BOOKKEEPER-1086] - Ledger Recovery - Refactor PendingReadOp
- [BOOKKEEPER-1087] - Ledger Recovery - Add a parallel reading request in PendingReadOp
- [BOOKKEEPER-1088] - Ledger Recovery - Add a ReadEntryListener to callback on individual request
- [BOOKKEEPER-1089] - Ledger Recovery - allow batch reads in ledger recovery
- [BOOKKEEPER-1092] - Ledger Recovery - Add Test Case for Parallel Ledger Recovery
- [BOOKKEEPER-1093] - Piggyback LAC on ReadResponse
- [BOOKKEEPER-1094] - Long Poll - Server and Client Side Changes
- [BOOKKEEPER-1095] - Long Poll - Client side changes
Bug
- [BOOKKEEPER-852] - Release LedgerDescriptor and master-key objects when not used anymore
- [BOOKKEEPER-903] - MetaFormat BookieShell Command is not deleting UnderReplicatedLedgers list from the ZooKeeper
- [BOOKKEEPER-907] - for ReadLedgerEntriesCmd, EntryFormatter should be configurable and HexDumpEntryFormatter should be one of them
- [BOOKKEEPER-908] - Case to handle BKLedgerExistException
- [BOOKKEEPER-924] - addEntry() is susceptible to spurious wakeups
- [BOOKKEEPER-927] - Extend BOOKKEEPER-886 to LedgerHandleAdv too (BOOKKEEPER-886: Allow to disable ledgers operation throttling)
- [BOOKKEEPER-933] - ClientConfiguration always inherits System properties
- [BOOKKEEPER-938] - LedgerOpenOp should use digestType from metadata
- [BOOKKEEPER-939] - Fix typo in bk-merge-pr.py
- [BOOKKEEPER-940] - Fix findbugs warnings after bumping to java 8
- [BOOKKEEPER-952] - Fix RegionAwarePlacementPolicy
- [BOOKKEEPER-955] - in BookKeeperAdmin listLedgers method currentRange variable is not getting updated to next iterator when it has run out of elements
- [BOOKKEEPER-956] - HierarchicalLedgerManager doesn't work for ledgerid of length 9 and 10 because of order issue in HierarchicalLedgerRangeIterator
- [BOOKKEEPER-958] - ZeroBuffer readOnlyBuffer returns ByteBuffer with 0 remaining bytes for length > 64k
- [BOOKKEEPER-959] - ClientAuthProvider and BookieAuthProvider Public API used Protobuf Shaded classes
- [BOOKKEEPER-976] - Fix license headers with "Copyright 2016 The Apache Software Foundation"
- [BOOKKEEPER-980] - BookKeeper Tools doesn't process the argument correctly
- [BOOKKEEPER-981] - NullPointerException in RackawareEnsemblePlacementPolicy while running in Docker Container
- [BOOKKEEPER-984] - BookieClientTest.testWriteGaps tested
- [BOOKKEEPER-986] - Handle Memtable flush failure
- [BOOKKEEPER-987] - BookKeeper build is broken due to the shade plugin for commit ecbb053e6e
- [BOOKKEEPER-988] - Missing license headers
- [BOOKKEEPER-989] - Enable travis CI for bookkeeper git
- [BOOKKEEPER-999] - BookKeeper client can leak threads
- [BOOKKEEPER-1013] - Fix findbugs errors on latest master
- [BOOKKEEPER-1018] - Allow client to select older V2 protocol (no protobuf)
- [BOOKKEEPER-1020] - Fix Explicit LAC tests on master
- [BOOKKEEPER-1021] - Improve the merge script to handle github reviews api
- [BOOKKEEPER-1031] - ReplicationWorker.rereplicate fails to call close() on ReadOnlyLedgerHandle
- [BOOKKEEPER-1044] - Entrylogger is not readding rolled logs back to the logChannelsToFlush list when exception happens while trying to flush rolled logs
- [BOOKKEEPER-1047] - Add missing error code in ZK setData return path
- [BOOKKEEPER-1058] - Ignore already deleted ledger on replication audit
- [BOOKKEEPER-1061] - BookieWatcher should not do ZK blocking operations from ZK async callback thread
- [BOOKKEEPER-1065] - OrderedSafeExecutor should only have 1 thread per bucket
- [BOOKKEEPER-1071] - BookieRecoveryTest is failing due to a Netty4 IllegalReferenceCountException
- [BOOKKEEPER-1072] - CompactionTest is flaky when disks are almost full
- [BOOKKEEPER-1073] - Several stats provider related changes.
- [BOOKKEEPER-1074] - Remove JMX Bean
- [BOOKKEEPER-1075] - BK LedgerMetadata: more memory-efficient parsing of configs
- [BOOKKEEPER-1076] - BookieShell should be able to read the 'FENCE' entry in the log
- [BOOKKEEPER-1077] - BookKeeper: Local Bookie Journal and ledger paths
- [BOOKKEEPER-1079] - shell lastMark throws NPE
- [BOOKKEEPER-1098] - ZkUnderreplicationManager can build up an unbounded number of watchers
- [BOOKKEEPER-1101] - BookKeeper website menus not working under https
- [BOOKKEEPER-1102] - org.apache.bookkeeper.client.BookKeeperDiskSpaceWeightedLedgerPlacementTest.testDiskSpaceWeightedBookieSelectionWithBookiesBeingAdded is unreliable
- [BOOKKEEPER-1103] - LedgerMetadataCreateTest bug in ledger id generation causes intermittent hang
- [BOOKKEEPER-1104] - BookieInitializationTest.testWithDiskFullAndAbilityToCreateNewIndexFile testcase is unreliable
Improvement
- [BOOKKEEPER-612] - RegionAwarePlacement Policy
- [BOOKKEEPER-748] - Move fence requests out of read threads
- [BOOKKEEPER-757] - Ledger Recovery Improvement
- [BOOKKEEPER-759] - bookkeeper: delay ensemble change if it doesn't break ack quorum requirement
- [BOOKKEEPER-772] - Reorder read sequnce
- [BOOKKEEPER-874] - Explict LAC from Writer to Bookies
- [BOOKKEEPER-881] - upgrade surefire plugin to 2.19
- [BOOKKEEPER-887] - Allow to use multiple bookie journals
- [BOOKKEEPER-922] - Create a generic (K,V) map to store ledger metadata
- [BOOKKEEPER-935] - Publish sources and javadocs to Maven Central
- [BOOKKEEPER-937] - Upgrade protobuf to 2.6
- [BOOKKEEPER-944] - Multiple issues and improvements to BK Compaction.
- [BOOKKEEPER-945] - Add counters to track the activity of auditor and replication workers
- [BOOKKEEPER-946] - Provide an option to delay auto recovery of lost bookies
- [BOOKKEEPER-961] - Assing read/write request for same ledger to a single thread
- [BOOKKEEPER-962] - Add more journal timing stats
- [BOOKKEEPER-963] - Allow to use multiple journals in bookie
- [BOOKKEEPER-964] - Add concurrent maps and sets for primitive types
- [BOOKKEEPER-966] - change the bookieServer cmdline to make conf-file and option co-exist
- [BOOKKEEPER-968] - Entry log flushes happen on log rotation and cause long spikes in IO utilization
- [BOOKKEEPER-970] - Bump zookeeper version to 3.5
- [BOOKKEEPER-971] - update bk codahale stats provider version
- [BOOKKEEPER-998] - Increased the max entry size to 5MB
- [BOOKKEEPER-1001] - Make LocalBookiesRegistry.isLocalBookie() public
- [BOOKKEEPER-1002] - BookieRecoveryTest can run out of file descriptors
- [BOOKKEEPER-1003] - Fix TestDiskChecker so it can be used on /dev/shm
- [BOOKKEEPER-1004] - Allow bookie garbage collection to be triggered manually from tests
- [BOOKKEEPER-1007] - Explicit LAC: make the interval configurable in milliseconds instead of seconds
- [BOOKKEEPER-1008] - Move to netty4
- [BOOKKEEPER-1010] - Bump up Guava version to 20.0
- [BOOKKEEPER-1022] - Make BookKeeperAdmin implement AutoCloseable
- [BOOKKEEPER-1039] - bk-merge-pr.py ask to run findbugs and rat before merge
- [BOOKKEEPER-1046] - Avoid long to Long conversion in OrderedSafeExecutor task submit
- [BOOKKEEPER-1048] - Use ByteBuf in LedgerStorageInterface
- [BOOKKEEPER-1050] - Cache journalFormatVersionToWrite when starting Journal
- [BOOKKEEPER-1051] - Fast shutdown for GarbageCollectorThread
- [BOOKKEEPER-1052] - Print autorecovery enabled or not in bookie shell
- [BOOKKEEPER-1053] - Upgrade RAT maven version to 0.12 and ignore Eclipse project files
- [BOOKKEEPER-1055] - Optimize handling of masterKey in case it is empty
- [BOOKKEEPER-1056] - Removed PacketHeader serialization/deserialization allocation
- [BOOKKEEPER-1063] - Use executure.execute() instead of submit() to avoid creation of unused FutureTask
- [BOOKKEEPER-1066] - Introduce GrowableArrayBlockingQueue
- [BOOKKEEPER-1068] - Expose ByteBuf in LedgerEntry to avoid data copy
- [BOOKKEEPER-1069] - If client uses V2 proto, set the connection to always decode V2 messages
- [BOOKKEEPER-1083] - Improvements on OrderedSafeExecutor
- [BOOKKEEPER-1084] - Make variables finale if necessary
- [BOOKKEEPER-1085] - Introduce the AlertStatsLogger
- [BOOKKEEPER-1090] - Use LOG.isDebugEnabled() to avoid unexpected allocations
- [BOOKKEEPER-1096] - When ledger is deleted, along with leaf node all the eligible branch nodes also should be deleted in ZooKeeper.
New Feature
- [BOOKKEEPER-390] - Provide support for ZooKeeper authentication
- [BOOKKEEPER-391] - Support Kerberos authentication of bookkeeper
- [BOOKKEEPER-575] - Bookie SSL support
- [BOOKKEEPER-670] - Longpoll Read & Piggyback Support
- [BOOKKEEPER-912] - Allow EnsemblePlacementPolicy to choose bookies using ledger custom data (multitenancy support)
- [BOOKKEEPER-928] - Add custom client supplied metadata field to LedgerMetadata
- [BOOKKEEPER-930] - Option to disable Bookie networking
- [BOOKKEEPER-941] - Introduce Feature Switches For controlling client and server behavior
- [BOOKKEEPER-948] - Provide an option to add more ledger/index directories to a bookie
- [BOOKKEEPER-950] - Ledger placement policy to accomodate different storage capacity of bookies
- [BOOKKEEPER-969] - Security Support
- [BOOKKEEPER-983] - BookieShell Command for LedgerDelete
- [BOOKKEEPER-991] - bk shell - Get a list of all on disk files
- [BOOKKEEPER-992] - ReadLog Command Enhancement
- [BOOKKEEPER-1019] - Support for reading entries after LAC (causal consistency driven by out-of-band communications)
- [BOOKKEEPER-1034] - When all disks are full, start Bookie in RO mode if RO mode is enabled
- [BOOKKEEPER-1067] - Add Prometheus stats provider
Story
- [BOOKKEEPER-932] - Move to JDK 8
Task
- [BOOKKEEPER-931] - Update the committers list on website
- [BOOKKEEPER-996] - Apache Rat Check Failures
- [BOOKKEEPER-1012] - Shade and relocate Guava
- [BOOKKEEPER-1027] - Cleanup main README and main website page
- [BOOKKEEPER-1038] - Fix findbugs warnings and upgrade to 3.0.4
- [BOOKKEEPER-1043] - Upgrade Apache Parent Pom Reference to latest version
- [BOOKKEEPER-1054] - Add gitignore file
- [BOOKKEEPER-1059] - Upgrade to SLF4J-1.7.25
- [BOOKKEEPER-1060] - Add utility to use SafeRunnable from Java8 Lambda
- [BOOKKEEPER-1070] - bk-merge-pr.py use apache-rat:check goal instead of rat:rat
- [BOOKKEEPER-1091] - Remove Hedwig from BookKeeper website page
Test
- [BOOKKEEPER-967] - Create new testsuite for testing RackAwareEnsemblePlacementPolicy using ScriptBasedMapping.
- [BOOKKEEPER-1045] - Execute tests in different JVM processes
- [BOOKKEEPER-1064] - ConcurrentModificationException in AuditorLedgerCheckerTest
- [BOOKKEEPER-1078] - Local BookKeeper enhancements for testability
- [BOOKKEEPER-1097] - GC test when no WritableDirs
Wish
- [BOOKKEEPER-943] - Reduce log level of AbstractZkLedgerManager for register/unregister ReadOnlyLedgerHandle