B-Trees
B-trees are a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. They are a generalization of binary search trees and are optimized for systems that read and write large blocks of data, such as disk drives.
Key Characteristics
- Order (m)
- Each node (except the root) can have between
m/2andmchildren, and must contain betweenm/2 - 1andm - 1keys. The root can have at least 2 children. - Keys and Children
- A node with
kkeys hask+1children. Keys within a node are stored in sorted order. - Balancing
- All leaf nodes are at the same depth. This ensures that the tree remains balanced and operations have a consistent logarithmic time complexity.
- Disk Optimization
- B-trees are designed to minimize disk I/O operations. The high branching factor means the tree is wide and shallow, reducing the number of nodes that need to be read from disk.
Operations
Insertion
Insertion starts by searching for the appropriate leaf node. If the node has space, the key is inserted. If the node is full, it is split into two nodes, and the middle key is promoted to the parent node. This splitting process can propagate up the tree, potentially creating a new root.
// Conceptual Insertion Logic
function insert(tree, key) {
let node = findLeafNode(tree, key);
if (!node.isFull()) {
insertKeyIntoNode(node, key);
} else {
splitNodeAndPropagate(tree, node, key);
}
}
Deletion
Deletion involves finding the key. If the key is in a leaf node, it's removed. If it's in an internal node, it's replaced by its in-order predecessor or successor (from a leaf node), which is then deleted from the leaf. If a node becomes underfull (fewer than m/2 - 1 keys) after deletion, it needs to be rebalanced by borrowing a key from a sibling or by merging with a sibling. This can also propagate up the tree.
// Conceptual Deletion Logic
function deleteKey(tree, key) {
let node = findNodeContainingKey(tree, key);
if (node.isLeaf()) {
removeKeyFromLeaf(node, key);
} else {
replaceWithSuccessor(tree, node, key);
}
if (node.isUnderfull()) {
rebalance(tree, node);
}
}
Search
Searching in a B-tree is similar to a binary search tree but is performed within each node. Starting from the root, the appropriate child to traverse is determined by comparing the search key with the keys in the current node. This process continues until the key is found or a leaf node is reached without finding the key.
// Conceptual Search Logic
function search(node, key) {
let i = 0;
while (i < node.keys.length && key > node.keys[i]) {
i++;
}
if (i < node.keys.length && key === node.keys[i]) {
return true; // Found
}
if (node.isLeaf()) {
return false; // Not found
}
return search(node.children[i], key); // Recurse into child
}
Applications
- Database indexing (e.g., MySQL, PostgreSQL)
- File systems (e.g., NTFS, HFS+)
- Efficient storage and retrieval of large datasets on disk.