Abstract:
A new formula for linearly growing arrays.

Created by Peter Kankowski
Last changed
Filed under Low-level code optimization

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

Dynamic arrays in C

The simplest way to implement a resizeable array in C is calling realloc each time you need to add a new item. However, this method is slow, especially under Windows.

Linearly growing arrays

Capacity is the allocation size; length is the number of items in array

To reduce the number of realloc calls, you should allocate memory in chunks. Usually, length (the number of used items) and capacity (the number of allocated items) are stored with the array:

typedef struct {
    ITEM * items;
    size_t length;
    size_t capacity;
} LIST;

void InitList(LIST * list) {
    memset(list, 0, sizeof(LIST));
}

bool AddItem(LIST * list, ITEM item) {
    const size_t chunk_size = 1024;
    if (list->length >= list->capacity) {
        // Try to extend the array
        ITEM * new_items = (ITEM *)realloc(list->items,
            (list->capacity + chunk_size) * sizeof(ITEM));
        if (!new_items)
            return false;
        list->items = new_items;
        list->capacity += chunk_size;
    }
    list->items[list->length++] = item;
    return true;
}

Remember that realloc does not free the old memory block when it fails. Hence you should check the returned value from realloc before assigning it to list->items, or else you will create a memory leak.

The capacity variable is unnecessary. If you add the items one-by-one, you can resize the array when its size reaches the next chunk boundary:

bool AddItem(LIST * list, ITEM item) {
    const size_t chunk_size = 1024;
    if (list->length % chunk_size == 0) {
        ITEM * new_items = (ITEM *)realloc(list->items,
            (list->length + chunk_size) * sizeof(ITEM));
        if (!new_items)
            return false;
        list->items = new_items;
    }
    list->items[list->length++] = item;
    return true;
}

The chunk_size constant should be a power of two so that your compiler can replace the slow modulus division with logical AND.

If you add more than one item, you could keep this invariant: capacity is always equal to length rounded to the next multiple of chunk_size.

inline size_t IsPowerOf2(size_t num) {
    return ((num - 1) & num) == 0;
}

// Round to the next multiple of powerof2
inline size_t Ceil(size_t num, size_t powerof2) {
    assert(IsPowerOf2(powerof2));
    return (num + powerof2 - 1) & -(intptr_t)powerof2;
}

bool NeedResize1(size_t length, size_t added_len, size_t chunk_size) {
    size_t capacity = Ceil(length, chunk_size);
    return length + added_len > capacity;
}

ITEM * AddItems(LIST * list, size_t added_len) {
    const size_t chunk_size = 1024;
    if (NeedResize1(list->length, added_len, chunk_size)) {
        ITEM * new_items = (ITEM *)realloc(list->items,
            Ceil(list->length + added_len, chunk_size) * sizeof(ITEM));
        if (!new_items)
            return NULL;
        list->items = new_items;
    }
    ITEM * items = list->items + list->length;
    list->length += added_len;
    return items;
}

My formula

There is a simpler formula:

bool NeedResize2(size_t length, size_t added_len, size_t chunk_size) {
    assert(IsPowerOf2(chunk_size));
    return ((-(intptr_t)length) & (chunk_size - 1)) < added_len;
}

The left side of the inequality is the number of remaining items in the current chunk (capacitylength). Because capacity is divisible by chunk_size and chunk_size is a power of two, you can express it as

   capacity - length =
= (capacity - length) % chunk_size =
           = (-length) % chunk_size =
           = (-length) & (chunk_size - 1)

   length + added_len > capacity   <=>
   added_len > capacity - length   <=>
   added_len > (-length) & (chunk_size - 1)

The formula requires 3 operations versus 4 in NeedResize1 (assuming powerof2 is a constant). It's equivalent to NeedResize1 when (length + added_len) fits into machine word.

The formula from Hacker's Delight

Hacker's Delight contains the following formula for checking if the range of addresses from length to length + added_len crosses boundary of a chunk:

chunk_size - (length & (chunk_size - 1)) < added_len

However, it's not suitable for NeedResize: at the beginning of a chunk, when (length & (chunk_size − 1)) == 0, the range does not cross the boundary of the chunk, but touches it. So, the test fails when it should succeed.

Change all pointers to offsets

If you keep pointers to the array items in some data structure, they may become invalid after realloc. So, instead of the pointers, you should always use offsets:

char * str = AddItems(list, str_length);
AddToHashTable(hash_table, str); // Error!

AddToHashTable(hash_table, (size_t)(str - list->items)); // Correct

It's useful to have a debug version of realloc that always returns a new memory block instead of resizing the old one. Without this, pointers to the resized array will be sometimes valid and sometimes invalid, which is really hard to debug.

Exponentially growing array

Linear growth: 10 reallocations, exponential growth: 5 reallocations, wasted memory

In some tasks, the array may be large, and too much realloc calls are needed to grow it linearly. There is a different algorithm for such cases: double the capacity of array each time it cannot hold the new items. At most one half of the memory will be wasted, but realloc will be called only log(N) times.

bool AddItem(LIST * list, ITEM item) {
    size_t initital_capacity = 16;
    if (list->length >= list->capacity) {
        size_t new_capacity = (list->capacity < initital_capacity) ?
            initital_capacity : list->capacity * 2;
        ITEM * new_items = (ITEM *)realloc(list->items,
            new_capacity * sizeof(ITEM));
        if (!new_items)
            return NULL;
        list->items = new_items;
        list->capacity = new_capacity;
    }
    list->items[list->length++] = item;
    return true;
}

size_t CeilLog2(size_t x) {
    // clp2 function by Henry Warren, Jr.
    x -= 1;
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
#if __SIZEOF_SIZE_T__ == 8 || _WIN64
    x |= (x >> 32);
#elif !(__SIZEOF_SIZE_T__ == 4 || _WIN32)
#error Unknown machine word size
#endif
    return x + 1;
}

ITEM * AddItems(LIST * list, size_t added_len) {
    size_t initital_capacity = 16;
    if (list->length + added_len > list->capacity) {
        size_t new_capacity = CeilLog2(list->capacity + added_len);
        if (new_capacity < initital_capacity)
            new_capacity = initital_capacity;
        ITEM * new_items = (ITEM *)realloc(list->items,
            new_capacity * sizeof(ITEM));
        if (!new_items)
            return NULL;
        list->items = new_items;
        list->capacity = new_capacity;
    }
    ITEM * items = list->items + list->length;
    list->length += added_len;
    return items;
}

For example, to linearly grow an array to 1 million items, you need 977 realloc calls (when chunk_size = 1024). With the doubling algorithm, you need to reallocate the array only 16 times.

It's possible to get rid of capacity variable in this case, but it does not make much sense. The array already wastes a lot of memory, so one machine word will not change anything.

Variable-length arrays in C99 and other languages

C99 Standard allows variable-length arrays:

bool read_and_process_file(const char * file_name) {
    FILE * f = fopen(file_name, "r");
    if (!f)
        return false;
    char file_content[get_file_size(f)]; // Variable-length array
    fread(file_content, 1,
        sizeof(file_content), f); // sizeof is not a constant!
    fclose(f);
    // Do something with file_content
    return true;
}

The array is freed when the function returns; it's not possible to resize it. So, the variable-length arrays are syntax sugar for alloca function. They are implemented in GCC and Pelles C, but not in MSVC++ compiler. Just like alloca, they make your program to crash if the array does not fit on the stack, so it's better to use malloca / freea instead of them.

The D programming language and Google Go include resizeable dynamic arrays. In Go, you can also change the default reallocation strategy (search for AppendByte on their page).

Conclusion

If you need a lot of small dynamic arrays, you can grow them linearly and get rid of capacity variable to save memory. If the array may be large, you can double its capacity to avoid the unnecessary realloc calls.

Related articles

The perils of alloca function

Peter Kankowski
Peter Kankowski

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created by Peter Kankowski
Last changed

2 comments

Alexey,

What about using mremap() instead of realloc() (for large chunks)?

Peter Kankowski,
Thank you! I don't know much about Linux programming, but the description of mremap() looks promising. In Windows, there is no such function. You only can reserve several extra pages, then allocate them with VirtualAlloc.

Your name:


Comment: