Skip to content

Commit

Permalink
readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
mkirchner committed Jul 10, 2023
1 parent 01c2d31 commit f1683bd
Showing 1 changed file with 45 additions and 92 deletions.
137 changes: 45 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
# libhamt

A hash array-mapped trie (HAMT) implementation in C99. The implementation
loosely follows Bagwell's 2000 paper[[1]][bagwell_00_ideal], with a focus
on code clarity.
*FIXME: Add removal and persistence docs.*

A HAMT is a data structure that can be used to efficiently implement
[*persistent*][wiki_persistent_data_structure] associative arrays (aka maps)
and sets, see the [Introduction](#introduction).

The original motivation for this effort was the desire to understand and
implement an efficient persistent data structure with structural sharing for
maps and sets for [my own Lisp implementation][stutter].
A hash array-mapped trie (HAMT) implementation in C99. A HAMT is a data
structure that can be used to efficiently implement
[*persistent*][wiki_persistent_data_structure] associative arrays (aka maps,
dicts) and sets, see the [Introduction](#introduction). The implementation here
loosely follows Bagwell's 2000 paper[[1]][bagwell_00_ideal], with a focus on
code clarity.

What prompted the somewhat detailed writeup was the realization that there is
not a lot of in-depth documentation for HAMTs beyond the original Bagwell
paper[[1][bagwell_00_ideal]] Some of the more helpful posts are [Karl Krukow's
paper[[1][bagwell_00_ideal]]. Some of the more helpful posts are [Karl Krukow's
intro to Clojure's `PersistentHashMap`][krukov_09_understanding], [C. S. Lim's
C++ template implementation][chaelim_hamt], and [Adrian Coyler's morning paper
post][coyler_15_champ] on compressed HAMTs. There is more, but it's all in bits
and pieces. This is an attempt to (partially) improve that situation.
C++ template implementation][chaelim_hamt], [Adrian Coyler's morning paper
post][coyler_15_champ] and the original [Steindoerfer/Vinju compressed HAMT
article it summarizes][steindoerfer_15_optimizing]. The rest mostly seems to be
all bits and pieces and this document is an attempt to (partially) improve that
situation.

## Quickstart

Expand All @@ -40,43 +39,6 @@ For basic performance comparison with AVL and red-black trees (from `libavl`)
and the HashTree from GLib, see [the benchmarking repo][hamt_bench_github].



## Table of Contents

* [Introduction](#introduction)
* [API](#api)
* [HAMT lifecycle](#hamt-lifecycle)
* [Memory management](#memory-management)
* [Query](#query)
* [Iterators](#iterators)
* [Modification: Insertion & Removal](#modification-insertion--removal)
* [Using the HAMT as an efficient persistent data structure](#using-the-hamt-as-an-efficient-persistent-data-structure)
* [Examples](#examples)
* [Example 1: ephemeral HAMT w/ standard allocation](#example-1-ephemeral-hamt-w-standard-allocation)
* [Example 2: Changes required for garbage collection and persistence](#example-2-changes-required-for-garbage-collection-and-persistence)
* [Example 3: Using iterators](#example-3-using-iterators)
* [Implementation](#implementation)
* [Setup](#setup)
* [Project structure](#project-structure)
* [Building the project](#building-the-project)
* [Design](#design)
* [Foundational data structures](#foundational-data-structures)
* [Hashing](#hashing)
* [Hash exhaustion: hash generations and state management](#hash-exhaustion-hash-generations-and-state-management)
* [Table management](#table-management)
* [Putting it all together](#putting-it-all-together)
* [Search](#search)
* [Insert](#insert)
* [Remove](#remove)
* [Iterators](#iterators-1)
* [Persistent data structures and structural sharing](#persistent-data-structures-and-structural-sharing)
* [Basic idea: path copying](#basic-idea-path-copying)
* [Insert](#insert-1)
* [Remove](#remove-1)
* [Appendix](#appendix)
* [Unit testing](#unit-testing)
* [Footnotes](#footnotes)

# Introduction

A *hash array mapped trie (HAMT)* is a data structure that can be used to
Expand All @@ -95,35 +57,34 @@ versions of maps and sets.

The remaining documentation starts with a description of the `libhamt` API and
two examples that demonstrate the use of a HAMT as an ephemeral and persistent
data structure, respectively. I then detail the implementation: starting from
data structure, respectively. It then details the implementation: starting from
the foundational data structures and the helper code required for hash
exhaustion and table management, we cover search, insertion, removal, and
iterators. The final implementation section introduces path copying and explains
the changes required to support persistent insert and remove operations. We
close with an outlook and an appendix.
the changes required to support persistent insert and remove operations. It
closes with an outlook and an appendix.

# API

## HAMT lifecycle

The core data type exported in the `libhamt` interface is `HAMT`. In order to
create a `HAMT` instance, one must call `hamt_create()`, which requires a
The core data type exported in the `libhamt` interface is `struct hamt`. In order to
create a `struct hamt` instance, one must call `hamt_create()`, which requires a
hash function of type `hamt_key_hash_fn` to hash keys, a comparison function of
type `hamt_cmp_fn` to compare keys, and a pointer to a `hamt_allocator` instance.
`hamt_delete()` deletes `HAMT` instances that were created with `hamt_create()`.
`hamt_delete()` deletes `struct hamt` instances that were created with `hamt_create()`.


```c
/* The libhamt core data structure is a handle to a hash array-mapped trie */
typedef struct hamt_impl *HAMT;

/* Function signature definitions for key comparison and hashing */
typedef int (*hamt_cmp_fn)(const void *lhs, const void *rhs);
typedef uint32_t (*hamt_key_hash_fn)(const void *key, const size_t gen);

/* API functions for lifecycle management */
HAMT hamt_create(hamt_key_hash_fn key_hash, hamt_cmp_fn key_cmp, struct hamt_allocator *ator);
void hamt_delete(HAMT);
struct hamt *hamt_create(hamt_key_hash_fn key_hash, hamt_cmp_fn key_cmp, struct hamt_allocator *ator);
void hamt_delete(struct hamt *);
```
The `hamt_key_hash_fn` takes a `key` and a generation `gen`. The expectation is
Expand Down Expand Up @@ -162,8 +123,8 @@ example](#example-2-garbage-collected-persistent-hamts)).
## Query

```c
size_t hamt_size(const HAMT trie);
const void *hamt_get(const HAMT trie, void *key);
size_t hamt_size(const struct hamt *trie);
const void *hamt_get(const struct hamt *trie, void *key);
```
The `hamt_size()` function returns the size of the HAMT in O(1). Querying the
Expand All @@ -175,8 +136,8 @@ not exist in the HAMT.
The API also provides key/value pair access through the `hamt_iterator` struct.
```c
size_t hamt_size(const HAMT trie);
const void *hamt_get(const HAMT trie, void *key);
size_t hamt_size(const struct hamt *trie);
const void *hamt_get(const struct hamt *trie, void *key);
```

Iterators are tied to a specific HAMT and are created using the
Expand All @@ -190,7 +151,7 @@ key/value pair. In order to delete an existing and/or exhausted iterator, call
```c
typedef struct hamt_iterator_impl *hamt_iterator;

hamt_iterator hamt_it_create(const HAMT trie);
hamt_iterator hamt_it_create(const struct hamt *trie);
void hamt_it_delete(hamt_iterator it);
bool hamt_it_valid(hamt_iterator it);
hamt_iterator hamt_it_next(hamt_iterator it);
Expand All @@ -217,8 +178,8 @@ of hash function and (where applicable) seed.
### Ephemeral modification
```c
const void *hamt_set(HAMT trie, void *key, void *value);
void *hamt_remove(HAMT trie, void *key);
const void *hamt_set(struct hamt *trie, void *key, void *value);
void *hamt_remove(struct hamt *trie, void *key);
```

`hamt_set()` takes a pair of `key` and `value` pointers and adds the pair to the HAMT,
Expand All @@ -237,7 +198,7 @@ modificiation functions return a new HAMT. Modification of a persistent HAMT
therefore requires a reassignment idiom if the goal is modification only:

```c
HAMT h = hamt_create(...)
const struct hamt *h = hamt_create(...)
...
/* Set a value and drop the reference to the old HAMT; the GC
* will take care of cleaning up remaining unreachable allocations.
Expand All @@ -251,8 +212,8 @@ sharing such that the overhead is limited to *~log<sub>32</sub>(N)* nodes (where
number of nodes in the graph).

```c
const HAMT hamt_pset(const HAMT trie, void *key, void *value);
const HAMT hamt_premove(const HAMT trie, void *key);
const struct hamt *hamt_pset(const struct hamt *trie, void *key, void *value);
const struct hamt *hamt_premove(const struct hamt *trie, void *key);
```
`hamt_pset()` inserts or updates the `key` with `value` and returns an opaque
Expand Down Expand Up @@ -297,7 +258,7 @@ int main(int argn, char *argv[])
/* ... */
};
HAMT t;
struct hamt *t;
/* create table */
t = hamt_create(hash_string, strcmp, &hamt_allocator_default);
Expand Down Expand Up @@ -347,7 +308,7 @@ int main(int argc, char *argv[]) {
NULL to avoid explicit freeing of memory.
*/
struct hamt_allocator gc_alloc = {GC_malloc, GC_realloc, nop};
t = hamt_create(hash_string, strcmp, &gc_alloc);
const struct hamt *t = hamt_create(hash_string, strcmp, &gc_alloc);
...
}
```
Expand All @@ -368,7 +329,7 @@ iterator is valid. In every interation we print the current key/value pair to
```c
...
HAMT t = hamt_create(hash_string, strcmp, &hamt_allocator_default);
struct hamt *t = hamt_create(hash_string, strcmp, &hamt_allocator_default);
/* load table */
...
Expand Down Expand Up @@ -427,15 +388,6 @@ To build the library and run the tests:
$ make && make test
```

And, optionally, to run the performance tests:

```
$ make perf
```

The latter requires a somewhat current Python 3 installation with
`matplotlib` and `pandas` packages for graphing.

## Design

### Introduction
Expand Down Expand Up @@ -839,8 +791,8 @@ and expects users to provide a suitable function pointer as part of the call to
```c
/* ... see below for a practical definition of my_keyhash_string */

HAMT t = hamt_create(my_keyhash_string, my_keycmp_string,
&hamt_allocator_default);
struct hamt *t = hamt_create(my_keyhash_string, my_keycmp_string,
&hamt_allocator_default);
```

There are multiple [good, practical choices][why_simple_hash_functions_work]
Expand Down Expand Up @@ -888,8 +840,8 @@ static uint32_t my_keyhash_string(const void *key, const size_t gen)

/* ... */

HAMT t = hamt_create(my_keyhash_string, my_keycmp_string,
&hamt_allocator_default);
struct hamt *t = hamt_create(my_keyhash_string, my_keycmp_string,
&hamt_allocator_default);

```
Expand Down Expand Up @@ -937,7 +889,7 @@ static inline hash_state *hash_next(hash_state *h)
{
h->depth += 1;
h->shift += 5;
if (h->shift > 30) {
if (h->shift > 25) {
h->hash = h->hash_fn(h->key, h->depth / 5);
h->shift = 0;
}
Expand Down Expand Up @@ -1442,7 +1394,7 @@ and attempts to find (and return) a key/value pair specified by `key`. Its
implementation uses `search_recursive()` from above:

```c
const void *hamt_get(const HAMT trie, void *key)
const void *hamt_get(const struct hamt *trie, void *key)
{
hash_state *hash = &(hash_state){.key = key,
.hash_fn = trie->key_hash,
Expand Down Expand Up @@ -1477,7 +1429,7 @@ entries that may have had the same key but a different value.
The internal function that implements this behavior is `set()`:
```c
static const hamt_node *set(HAMT h, hamt_node *anchor, hamt_key_hash_fn hash_fn,
static const hamt_node *set(struct hamt *h, hamt_node *anchor, hamt_key_hash_fn hash_fn,
hamt_cmp_fn cmp_fn, void *key, void *value)
```

Expand All @@ -1494,7 +1446,7 @@ pair can be placed correctly. Cases (2) and (3) are covered by the
`insert_kv()` and `insert_table()` helper functions, respectively.

```c
static const hamt_node *set(HAMT h, hamt_node *anchor, hamt_key_hash_fn hash_fn,
static const hamt_node *set(struct hamt *h, hamt_node *anchor, hamt_key_hash_fn hash_fn,
hamt_cmp_fn cmp_fn, void *key, void *value)
{
hash_state *hash = &(hash_state){.key = key,
Expand Down Expand Up @@ -1630,7 +1582,7 @@ The implementation of the external API for inserting and updating values in
the HAMT is straighforward:
```c
const void *hamt_set(HAMT trie, void *key, void *value)
const void *hamt_set(struct hamt *trie, void *key, void *value)
{
const hamt_node *n =
set(trie, trie->root, trie->key_hash, trie->key_cmp, key, value);
Expand All @@ -1648,7 +1600,7 @@ a pointer to the value of the new key.

## Persistent data structures and structural sharing

### Basic idea: path copying
### Path copying

### Insert

Expand Down Expand Up @@ -1825,6 +1777,7 @@ summary or [here][c_templating] for a more in-depth treatise.
[python_dict_pre36]: https://stackoverflow.com/a/9022835
[python_dictobj]: https://github.com/python/cpython/blob/main/Objects/dictobject.c
[sedgewick_11_algorithms]: https://www.amazon.com/Algorithms-4th-Robert-Sedgewick/dp/032157351X
[steindoerfer_15_optimizing]: https://michael.steindorfer.name/publications/oopsla15.pdf
[stutter]: https://github.com/mkirchner/stutter
[wiki_associative_array]: https://en.wikipedia.org/wiki/Associative_array
[wiki_avl_trees]: https://en.wikipedia.org/wiki/AVL_tree
Expand Down

0 comments on commit f1683bd

Please sign in to comment.