Build a Hash Table in C

Goal: Implement a hash table with separate chaining from scratch in C. Support insert, lookup, and delete. Test with string keys.

Prerequisites: Hash Tables, C Language Essentials, Pointers and Memory, Memory Allocation


The Problem

We need a data structure that maps string keys to string values with O(1) average-case lookup. The approach: hash the key to an array index, handle collisions with linked lists (separate chaining).

key "alice" → hash("alice") → 738294 → 738294 % 16 → index 6
table[6] → ("alice", "engineer") → ("bob", "designer") → NULL

Step 1: Define the Data Structures

// hashtable.h
#ifndef HASHTABLE_H
#define HASHTABLE_H
 
#include <stddef.h>
 
typedef struct ht_entry {
    char *key;
    char *value;
    struct ht_entry *next;   // chaining for collisions
} ht_entry;
 
typedef struct {
    ht_entry **buckets;      // array of linked list heads
    size_t capacity;         // number of buckets
    size_t size;             // number of entries
} ht;
 
ht    *ht_create(size_t capacity);
void   ht_free(ht *table);
void   ht_set(ht *table, const char *key, const char *value);
char  *ht_get(ht *table, const char *key);
void   ht_delete(ht *table, const char *key);
 
#endif

Key insight: buckets is a pointer to an array of pointers. Each bucket is a linked list head (NULL if empty).


Step 2: Hash Function

A good hash function distributes keys uniformly across buckets. We’ll use DJB2 — simple, fast, good enough:

// hashtable.c
#include "hashtable.h"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
 
static size_t hash(const char *key, size_t capacity) {
    size_t h = 5381;
    int c;
    while ((c = *key++))
        h = ((h << 5) + h) + c;   // h * 33 + c
    return h % capacity;
}

Why DJB2 works

The magic number 33 (via h * 33 + c = h << 5 + h + c) gives good avalanche behavior — small changes in input produce large changes in output. The modulo maps the hash into bucket range.

When it breaks

Any hash function can degenerate if all keys hash to the same bucket. That’s why we need resizing (covered in exercises).


Step 3: Create and Free

ht *ht_create(size_t capacity) {
    ht *table = malloc(sizeof(ht));
    table->capacity = capacity;
    table->size = 0;
    table->buckets = calloc(capacity, sizeof(ht_entry *));  // all NULL
    return table;
}
 
static void free_entry(ht_entry *entry) {
    free(entry->key);
    free(entry->value);
    free(entry);
}
 
void ht_free(ht *table) {
    for (size_t i = 0; i < table->capacity; i++) {
        ht_entry *e = table->buckets[i];
        while (e) {
            ht_entry *next = e->next;
            free_entry(e);
            e = next;
        }
    }
    free(table->buckets);
    free(table);
}

Note: calloc zero-initializes memory, so all bucket pointers start as NULL. We strdup keys and values (see set) so the table owns its own copies.


Step 4: Set (Insert or Update)

void ht_set(ht *table, const char *key, const char *value) {
    size_t idx = hash(key, table->capacity);
    ht_entry *e = table->buckets[idx];
 
    // Check if key already exists — update it
    while (e) {
        if (strcmp(e->key, key) == 0) {
            free(e->value);
            e->value = strdup(value);
            return;
        }
        e = e->next;
    }
 
    // Key not found — insert at head of chain
    ht_entry *new = malloc(sizeof(ht_entry));
    new->key = strdup(key);
    new->value = strdup(value);
    new->next = table->buckets[idx];   // prepend to chain
    table->buckets[idx] = new;
    table->size++;
}

Inserting at the head of the chain is O(1). strdup allocates and copies the string so we own it.


Step 5: Get (Lookup)

char *ht_get(ht *table, const char *key) {
    size_t idx = hash(key, table->capacity);
    ht_entry *e = table->buckets[idx];
 
    while (e) {
        if (strcmp(e->key, key) == 0)
            return e->value;
        e = e->next;
    }
    return NULL;  // not found
}

Walk the chain at the bucket. Average chain length = size / capacity (the load factor).


Step 6: Delete

void ht_delete(ht *table, const char *key) {
    size_t idx = hash(key, table->capacity);
    ht_entry *e = table->buckets[idx];
    ht_entry *prev = NULL;
 
    while (e) {
        if (strcmp(e->key, key) == 0) {
            if (prev)
                prev->next = e->next;
            else
                table->buckets[idx] = e->next;
            free_entry(e);
            table->size--;
            return;
        }
        prev = e;
        e = e->next;
    }
}

Standard linked list removal. Need prev to unlink the node.


Step 7: Test It

// main.c
#include "hashtable.h"
#include <stdio.h>
#include <assert.h>
 
int main(void) {
    ht *table = ht_create(16);
 
    ht_set(table, "alice", "engineer");
    ht_set(table, "bob", "designer");
    ht_set(table, "carol", "manager");
 
    assert(strcmp(ht_get(table, "alice"), "engineer") == 0);
    assert(strcmp(ht_get(table, "bob"), "designer") == 0);
    assert(ht_get(table, "dave") == NULL);
 
    // Update existing key
    ht_set(table, "alice", "CTO");
    assert(strcmp(ht_get(table, "alice"), "CTO") == 0);
 
    // Delete
    ht_delete(table, "bob");
    assert(ht_get(table, "bob") == NULL);
 
    printf("All tests passed! (size=%zu)\n", table->size);
 
    ht_free(table);
    return 0;
}

Build and run

gcc -Wall -Wextra -g -o hashtable main.c hashtable.c
./hashtable
# All tests passed! (size=2)
 
# Check for leaks
valgrind --leak-check=full ./hashtable
# Should report: All heap blocks were freed -- no leaks are possible

Verify: Load Distribution

// Add this to main.c to see bucket distribution
void ht_dump(ht *table) {
    for (size_t i = 0; i < table->capacity; i++) {
        int count = 0;
        for (ht_entry *e = table->buckets[i]; e; e = e->next)
            count++;
        if (count > 0)
            printf("bucket[%zu]: %d entries\n", i, count);
    }
}

With 16 buckets and 3 entries, most buckets are empty and the populated ones have 1 entry each. This is good — short chains mean fast lookups.


Exercises

  1. Resizing: When size / capacity > 0.75, double the capacity and rehash all entries. Add this to ht_set. Test by inserting 1000 entries into a table that starts with capacity 4.

  2. Iterator: Implement ht_entry *ht_next(ht *table, size_t *bucket, ht_entry **current) that iterates over all entries. Use it to print all key-value pairs.

  3. Open addressing: Replace separate chaining with linear probing. What changes in set/get/delete? What happens when the table is nearly full?

  4. Benchmark: Insert 100,000 random strings into your hash table. Time it against a sorted array with binary search. When does the hash table win?


Next: 02 - Implement Merge Sort in C — divide and conquer with O(n log n) guaranteed.