panthema / 2010 / stx-cbtreedb / stx-cbtreedb-0.7.0 / mainpage.dox (Download File)
// -*- mode: c++; mode: flyspell; fill-column: 79 -*-
// $Id: mainpage.dox 5 2010-04-14 09:57:44Z tb $

/*
 * STX Constant B-Tree Database Template Classes v0.7.0
 * Copyright (C) 2010 Timo Bingmann
 *
 * This library is free software; you can redistribute it and/or modify it
 * under the terms of the GNU Lesser General Public License as published by the
 * Free Software Foundation; either version 2.1 of the License, or (at your
 * option) any later version.
 *
 * This library is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 * for more details.
 *
 * You should have received a copy of the GNU Lesser General Public License
 * along with this library; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */

/** \mainpage stx::CBTreeDB - STX Constant B-Tree Database Template Classes README

\author Timo Bingmann (Mail: tb a-with-circle idlebox dot net)
\date 2010-04-14

\section sec_summary Summary

The stx::CBTreeDB is a collection of C++ classes with which read-only key-value
database files can be created and read. A database efficiently maps a large
number of integral fixed-length keys to opaque binary value blobs.
Variable-length or duplicate keys are currently not supported. Keys are
organized into a highly compact index structure, which is very similar to a
B-tree and allows very fast key lookups. Both keys and values are stored in
order and thus queries in a local proximity can benefit from caching effects.
All applications mapping a large number of constant, integral keys to string or
data blobs can benefit from this library.

Key features of the database classes are:
- key locality due to ordered storage of keys and values
- lookups in close proximity benefit from caching effects
- fixed-length integral keys
- opaque binary blobs as values
- compact B-tree-like index structure
- sequential storage of value data with minimal overhead
- O(1) page cache to keep hot pages in memory
- simple C++ API for \ref stx::CBTreeDB::Reader "Reader" and two \ref stx::CBTreeDB::Writer "Writer" classes
- CRC32 checksum of header and SHA256 digest of b-tree and value data for integrity
- extensive test suite with 91.3% line coverage

The complete source code is contained in the header file
include/stx-cbtreedb.h.

\section sec_website Website / API Docs / Bugs / License

The current source package can be downloaded from
http://idlebox.net/2010/stx-cbtreedb/

The classes are extensively documented using doxygen. The compiled doxygen HTML
documentation is included in the source package or can be viewed online at
http://idlebox.net/2010/stx-cbtreedb/stx-cbtreedb-0.7.0/doxygen-html/ (if you
are not reading it right now).

The tools directory contains an example1.cc, which is further documented below
(\ref sec_example) and an all-purpose <tt>cbtreedb-tool</tt> for use with
standard database formats.

If bugs should become known they will be posted on the above web page together
with patches or corrected versions.

The B-tree source code is released under the GNU Lesser General Public License
v2.1 (LGPL) which can be found in the file COPYING. Other parts of the source
code were copied from the Botan library and are under a BSD-license.

\section sec_application Original Application

The original purpose of this library is to organize lookups of 32-bit integer
identifiers to constant data blobs. In my application a few million
non-sequential identifier keys are mapping to data blobs, which are unalterable
short strings or small (1-10 kb) data files. Most key lookups occur in close
proximity, usually in ascending order.

The resulting data volume is around 20 GB in size and was previously stored
using the BerkeleyDB library. However, since the key and values in my
application are read-only, the overhead both in file size and processing time
introduced by BerkeleyDB became unacceptable. Using stx::CBTreeDB access to the
value records is faster than before, and file sizes were reduced to a minimum
due to the compact sequential storage in the read-only databases.

An alternative to the B-tree database classes are the well-known cdb or tinycdb
libraries. However, these are basically hash tables and thus do not preserve
key locality. Thus retrieval of 10 keys in ascending order requires 10 disk
accesses at pseudo-random places in the database. With the B-tree library the
disk areas read are stored in ascending order, just like the keys.

\section sec_architecture Design Principles and Database Architecture

Most design principles follow directly from the intended application. 

Maybe the most important design aspect was to store the data blobs without
separation in the most compact way possible. Thus the read-only database file
simply stores all data blobs in sequence, without putting them onto data pages
or similar overhead typical of a read-write database.

To accelerate key lookups an index structure very similar to a B-tree is
prepended to the data area. This "packed, sequential" B-tree is constructed by
the Writer classes and only differs from the original B-tree in one aspect. The
nodes of each level containing the highest keys may be less than half full. The
basic idea of the "packed, sequential" B-tree is to use only full nodes and
creating inner nodes and levels as needed. All nodes except the one with the
highest key are full! Thus the number of nodes used by the search tree is
minimal and all lookup properties of the B-tree are retained. A drawing of the
B-tree structure can be seen below:

\htmlonly
<div style="text-align: center">
<p>Structure of the packed, sequential B-tree</p>
<object type="image/svg+xml" data="drawing-2.svg" style="width: 90%; max-height: 20em"><p>SVG viewer required</p></object>
</div>
\endhtmlonly

All B-tree nodes are stored "in order" after the database file's header. The
root node is always first, followed by all nodes of the next level, which in
turn are followed by the next level until the leaf level is reached.

See the corresponding structures for the data fields in these nodes: \ref
stx::CBTreeDB::InnerNode and \ref stx::CBTreeDB::LeafNode. These structures use
some less obvious optimizations to further reduce overhead:

The in order storage of all B-tree nodes makes saving of offsets for each child
node of an inner node unnecessary: the n+1 children nodes are stored
consecutively starting after the first node. Thus the page offset of a child
node can be calculated from only the first node's offset and the child number.
This optimization removes about half of ordinarily needed fields in an inner
node (\ref stx::CBTreeDB::InnerNode) and allows a higher fan-out.

Each leaf node (\ref stx::CBTreeDB::LeafNode) contains both an array of keys
and corresponding data offsets. For size optimization all data offsets are
32-bit values relative to a base 64-bit starting offset. This reduces the size
of the offset array and allows more keys to fit into a leaf node. This also
imposes a restriction on data size: the sum of all data handled by one leaf
must not exceed 2^32. This constraint is currently unchecked by the library,
but should not occur in practice. Data sizes are calculated from subsequent
data offsets. Furthermore, each leaf node contains one more offset number than
strictly necessary: the offset of the data item following the highest key in
the leaf. This offset is used to calculate the data size for the highest key
without requiring retrieval of the following leaf.

\section sec_implementation Implementation Overview

All classes of the cbtreedb library are enclosed in the top-level template
class \ref stx::CBTreeDB. This class is parametrized by two types and two
integer values. See the \ref stx::CBTreeDB "class documentation" for more
information on the template parameters.

The library contains two writer classes, which are used to create read-only
databases from a set of key-value pairs. The difference between the two classes
is the order in which key-value pairs are delivered to the class. The
stx::CBTreeDB::Writer allows keys to be added in <b>random order</b> to the
internal std::map. When finished the B-tree is constructed and the whole set is
written to the database in one function call.

The obvious problem with Writer is that with large databases all key-value data
must be stored in memory until written. This prohibits use of this writer class
for very large databases, which are a main goal of the library. For purpose of
writing large databases the sequential writer class \ref
stx::CBTreeDB::WriterSequential can be used. With this writer the key sequence
must be delivered <b>in ascending order</b> and the database is constructed in
two phases: in the first phase key and value-lengths are <i>declared</i> and
the B-tree is constructed, and in the second phase the key-value pairs are
<i>delivered</i> and written directly to the file with no extra buffering.

Note that both writer classes create _identical_ databases for equal input
sets.

There is only one \ref stx::CBTreeDB::Reader class. The Reader class itself is
a pointer implementation to a referenced counted implementation object
(stx::CBTreeDB::ReaderImpl), so you can easily copy Reader objects. Note
however, that non of the classes are reentrent or thread-safe. For
multi-threaded applications access must be guarded by mutexes, patches welcome.

Each Reader object may optionally have an associated \ref
stx::CBTreeDB::PageCache object, which is then used to cache hot B-tree pages
like the root. The PageCache contains the most recently used pages (LRU
replacement strategy) up to the maximum number specified. It features a
structure allowing O(1) Store() and Retrieve() functions. A PageCache object
can be shared between different database Readers. For more information see the
corresponding documentation page.

A Reader object can load exactly one database using the \ref
stx::CBTreeDB::Reader::Open() "Open()" function. Note that the class template
parameters must match those used to write the database. Some parameters are
checked by the \ref stx::CBTreeDB::Reader::Open() "Open()" function and
appropriate error messages are returned.

Once opened the database can be queried using \ref
stx::CBTreeDB::Reader::Exists() "Exists()", \ref
stx::CBTreeDB::Reader::Lookup() "Lookup()", \ref
stx::CBTreeDB::Reader::GetIndex() "GetIndex()" and the operator[] functions as
if it were a map.

\section sec_testsuite Test Suite

The B-tree distribution contains an extensive test suite. According to gcov
91.9% of the stx-cbtreedb.h implementation is covered. The remaining lines are
all copies or unimportant.

\section sec_example Example Usage

These example snippets are based on code from \ref tools/example1.cc

Most programs will first use a typedef to specify the template parameters of
stx::CBTreeDB. In this example simple uint32_t keys are used in ascending
order:

\code
// declare cbtreedb parameters
typedef stx::CBTreeDB< uint32_t, std::less<uint32_t> > cbtreedb;
\endcode

A read-only database can be created using stx::CBTreeDB::Writer as follows:

\code
const unsigned int items = 64;

// create random order writer object
cbtreedb::Writer writer;

// add some key-values into writer map
for(unsigned int i = 0; i < items; ++i)
{
    std::ostringstream oss;
    oss << "value " << i;

    writer.Add(i * i, oss.str());
}

// write out database via ofstream
std::ofstream testdb("example1.db");
writer.Write(testdb);
\endcode

Once created the database file can be opened and read by a
stx::CBTreeDB::Reader object. Note that the file ifstream must exist as long as
the Reader is used.

\code
// create reader object
cbtreedb::Reader reader;

// set up page cache to keep hot pages in memory
cbtreedb::PageCache cache(128);
reader.SetPageCache(&cache);

// attach an ifstream to the reader and open db
std::ifstream testdb("example1.db");
std::string errorstring;
if (!reader.Open(testdb, &errorstring))
{
    std::cout << "Error loading database: " << errorstring << std::endl;
    return -1;
}	   
\endcode

The key-value pairs in the database can then be accessed by different
functions.

\code
// pick out some items via operator[]
std::cout << "sqrt(2500) -> " << reader[2500] << std::endl;
std::cout << "sqrt(2501) -> " << reader[2501] << std::endl;
std::cout << "sqrt(2601) -> " << reader[2601] << std::endl;

// check existance of keys
std::cout << "isSquare(2704) -> " << reader.Exists(2704) << std::endl;
std::cout << "isSquare(2705) -> " << reader.Exists(2705) << std::endl;

// full lookup function
std::string out;

if (reader.Lookup(2809, out))
    std::cout << "Lookup 2809 -> " << out << " (was found.)" << std::endl;
else
    std::cout << "Lookup 2809 has failed." << std::endl;
\endcode

All pairs can also be enumerated using the GetIndex() functions:

\code
// iterate through all items in the database
std::cout << "Full listing:" << std::endl;
for(unsigned int i = 0; i < reader.Size(); ++i)
{
    uint32_t key;
    std::string value;
    
    reader.GetIndex(i, key, value);

    std::cout << "key " << key << " -> " << value << ", ";
}
std::cout << std::endl;
\endcode

Data integrity is protected by the database format using CRC32 and SHA256
checksums of both B-tree keys and data area. The full database check is rather
expensive and can be run using the Verify() function.

\code
// run all internal checks
if (!reader.Verify())
{
    std::cout << "Database integrity failed!" << std::endl;
}
\endcode

When no longer needed the Reader object can either be destroyed or Close() can
explicitly be called to release the file association.

\code
// release file stream
reader.Close();
\endcode

Alternatively, the sequential order writer stx::CBTreeDB::WriterSequential can
be used to create the same database. Note that the value data is not necessary
during phase 1.

\code
const unsigned int items = 64;

// create sequential order writer object
cbtreedb::WriterSequential writer;

// phase 1: declare key and values lengths
for(unsigned int i = 0; i < items; ++i)
{
    writer.Add(i*i, strlen("value ") + (i > 0 ? (log10(i) + 1) : 1));
}

// write out header and B-tree to database via ofstream
std::ofstream testdb("example1.db");
writer.WriteHeader(testdb);

// phase 2: deliver key and value data
for(unsigned int i = 0; i < items; ++i)
{
    std::ostringstream oss;
    oss << "value " << i;

    writer.WriteValue(i*i, oss.str());
}

// finalize database by updating signature page
writer.WriteFinalize();
\endcode

*/

// LocalWords: stx CBTreeDB cbtreedb doxygen online lookups checksum