sasluca.github.io

Centralized Memory Management

Abstract

In this article we are going to explore patterns regarding centralizing and minimizing allocations. We will start with a brief cover of how memory management evolved and why RAII became the culturally dominant way of managing memory in C++ and other low level systems programming languages, followed by a presentation of core ideas and strategies related to Centralized Memory Management.

The fundamental nature of computers

Programs are fundamentally about transforming data. Even our basic model of computing, the Von Neumann Machine, has 3 core components, an input device, a processing unit, and an output device. Even tasks, like rendering graphics or sending data over a network, are fundamentally just about manipulating data.

Transforming the data however is not free. We have certain limitations and that is partially where the problem that we are trying to solve as developer comes from, as we have to balance those limitations in order to achieve our goal. Data must exist in memory and requires computing power to transform. Both of those resources are limited and have constraints that we have to manage efficiently and correctly when developing our programs.

Resource management in the early days

Back in the day, hardware was much simpler and the machine would only run one program at a time, and that program would have full control of the computer. You knew the exact amount of memory that was available, you could use it however you want with no consideration for other applications, and there was no reason not to use as much of it as you possibly can. You could even count the duration of every single instruction and know what your execution time will be. The problem was about achieving as much functionality as possible with the limited amount of resources available.

Today it is not that simple though.

The evolution of memory management

As computers evolved managing resources became more complex. CPUs started using memory caching and out of order execution. Suddenly counting the execution speed of your program was not as trivial. Operating systems started to support multiple processes, so your program now has to be a good citizen and share resources with other applications. The address space was virtualized and memory now had to be requested from the OS, you no longer knew the exact amount of memory that is available to you ahead of time. Resource availability now had to be queried and managed at runtime which made managing them harder.

As a programmer, you had to change your view from a singular piece of hardware to a spectrum of different pieces of hardware.

This sudden change in how we manage resources led to a lot of early mistakes. We switched to a malloc/free model, where we get dynamic amounts of memory at runtime and this created new issues such as memory leaks, double-free, running out of memory, and fragmentation. Education on how to write programs in this new era was still in development so a lot of programmers ended up making a lot of mistakes that ended up being very costly.

This mainly gave rise to 2 methods of managing memory which are mainstream nowadays.

Garbage Collection

Garbage Collection, which was soon popularized by Java and Functional languages, as it provided safety and made development much simpler. The garbage collector would handle all the memory management, allowing the programmer to focus on the main task and not worry about memory, it also prevented a lot of errors regarding programmers mismanaging memory.

However, garbage collection was not ideal for a lot of cases. The cost was simply too great for a lot of applications, particularly soft real-time programs that were constrained by memory and CPU time or that had to reason about problems such as cache coherency or allocation cost. These problems are much harder to reason about when you work with a garbage collector. In part because the GC was not made to solve those problems.

RAII

RAII stands for “Resource acquisition is initialization” and was popularized by C++. It is one of the fundamental features of C++ and also of most major systems programming languages such as D and Rust. RAII also has other uses besides memory management (eg: correctly lock/unlock-ing a mutex in a function), but for this article we will mainly focus on how RAII is used and how it affects memory management. And to understand RAII we must first go back to C.

The following is a classic generic example of managing memory in C:

//Resource acquisition
T* data = malloc(sizeof(T));
init_data(data);

// Use the data
...

//Later, release the data
cleanup(data);
free(data);

As you can see in that example, there 4 steps that the programmer usually had to follow:

Allocate the memory.
Initialize the memory.
Do any necessary cleanup.
Free the memory.

The problem here is that if the programmer accidentally made a mistake, such as forgetting to initialize the memory or freeing the data twice, this would cause a lot of hard to track bugs in the program. Knowledge about the lifetime and ownership of the data was easy to lose. There was no language-level construct to help you be explicit about lifetime and ownership.

So how did C++ address this?

C++ first formalized the first two steps into new and the last two steps into delete. This would ensure that the programmer can’t forget to initialize the data or cleanup after it when they need to allocate or free memory.

T* data = new T{};

delete data;

Then this process was automated using RAII, instead of the programmer having to manually call delete in the correct place, the compiler would generate a call to delete at the end of the scope when the object’s lifetime would end.

{
    //Memory acquired and initialized
    std::unique_ptr<T> data = make_unique<T>();
    
    //Destructor frees the memory here at the end of the scope
}

This proved to be very useful as most allocations have a known lifetime, so if you could map that lifetime to a scope, RAII does what you would do by default anyway, but guarantees correctness. And even when you are not 100% sure about the lifetime of the object, C++ had other smart pointers such as shared_ptr.

So, as we can see, RAII is pretty good! That is the one thing that the majority of C++ users seem to agree on. It allows for safe and simple resource management without the performance penalties of a garbage collector.

Let’s consider a more practical example, how would you write a function that reads the line from a text file into an array in C++?

vector<string> get_lines(string filename)
{
    ifstream f { filename.c_str() };
    vector<string> result;
    string line;

    while (getline(f, line))
    {
        result.push_back(line);
    }

    return result;
}

Now, let’s compare it with the Java version:

ArrayList<String> getLines(String filename)
{
	Scanner s = new Scanner(new File(filename));
	ArrayList<String> list = new ArrayList<String>();
	
	while (s.hasNext())
	{
	    list.add(s.next());
	}
	
	s.close();
	return list;
}

As we can see, the two versions look very similar, only some of the names have changed. This is one of the things people like about RAII, it allows for automatic memory management, much like a garbage collector, but it doesn’t have the cost of a garbage collector.

This example particularly showcases how RAII makes it very easy to use containers like std::vector and std::string correctly. In a language like C, you would have to manually make sure that those containers and the memory they own are managed correctly, which is very error-prone.

Because C++ makes it easy to use containers everywhere, it becomes possible to write code that looks very similar to what you would write in Java, but unlike Java memory management is not handled by a garbage collector, instead each object manages its own memory, with the compiler auto generating correct calls to the destructors, copy constructors and move constructors.

While this is generally considered a good thing it is important to also ask what is the cost of this feature and how it scales. RAII in combination with other features from C++ such as copy constructors, makes it possible to create smart containers and objects that are easy to use but this in turn encourages us to abuse them and can lead to problems which we are going to explore.

Case study: std::string

Consider the following simple function:

void read_string(const std::string&);

The intent of the author here is to read an array of characters, but what happens however when we call this function as such:

read_string("Hello Sailor, how are you today? I am fine, thank you!");

This actually causes a memory allocation, but that is completely unnecessary because all we wanted was to read some data, why would you allocate memory if you don’t need to?

This happens because the compiler hides a constructor call from us which creates a std::string that needs to allocate on the heap. You might think this is a tiny trivial example, but this can have an impact at scale.

[Image about chrome memory]

For example, in Google Chrome there was a performance bug caused by a lot of temporary allocations when you would type just one character in the Omnibox (the top search bar in Chrome), it would result in over 25000 allocations.

This is because they were using std::string quite a lot and it resulted in a lot of useless copies. Because std::string was so easy to use, it became easy to abuse. Programmers would just use it mindlessly without thinking about the performance implications. Since then C++ added std::string_view in C++17 to address this issue, we will discuss more about it later.

Case Study: Move semantics

Another example comes from a CppCon talk by Nicolai Josuttis titled “The nightmare of Move Semantics for Trivial classes”.

In this talk, Nicolai presents us with a simple class and asks the question “how do we write the constructor for this class to not make any useless copies”.

class Customer
{
    std::string first;
    std::string last;
    int id;
}

By the end of the talk after going thru all the ways in which you can initialize a class in C++ and how to write a constructor in order to avoid all possible useless copies, he ends up with something that looks like this:

class Customer
{
    std::string first;
    std::string last;
    int id;

    public:
    template<typename S1, typename S2 = std::string,
    typename = std::enable_if_t<!std::is_same_v<S1, Cust&>>>
    Cust(S1&& f, S2&& l = "", int i = 0)
    : first(std::forward<S1>(f)), last(std::forward<S2>(l)), id(i)
    {
    }
}

The reality is most people would have a hard time even understanding that, let alone writing it on their own. There are simpler ways to achieve what he wants, but the overall point is that you must have deep knowledge of C++ to write the most performant code. By default, chances are, you are going to write something less optimal than you need, and not because you designed the algorithms to be slow, but because using containers you can very easily make useless copies and writing code to ensure the moves always happen correctly can be difficult.

A lot of expert C++ programmers also seem to agree with this. In this talk, Herb Sutter, the chair for the ISO C++ committee actually asks everyone who thinks that “move semantics are kinda mysterious and they don’t fully understand them” to raise their hands, and we can see that most people in the room and even the creator of the language raise their hands.

Case study: std::string_view

In C++17

This is kinda bad though because C++ wants to be the language of high performance, but why then does the language of high performance make it so simple to create performance issues? Simplicity is definitely a good thing, but at the end of the day if you are working in C++ you are probably doing so because you are working with certain performance or scale requirements, so performance and results are clearly high on your priority list.

Now, I believe that there are ways to use C++ and avoid these issues. In order to avoid these problems at scale, I think it is important to consider other patterns for managing memory and one such set of ideas is Centralized Memory Management (CMM).

Centralized Memory Management

CMM is an architecture in which memory allocations are grouped based on their type, lifetime, or other requirements and are accessed via weak references from a central system.

Such an architecture can often be found in high performance, embedded or (soft) real-time software such as video games or operating systems.

But how does this look in practice?

Understanding memory allocations

If you reason about your memory allocations you will soon notice that most of them seem to fit into 2 categories:

Persistent allocations
Temporary allocations

Let’s focus on the persistent allocations first.

At the level of a module or subsystem, there are some data structures that are persistent, and these usually represent the input, state, and output of that module.

In centralized memory management, the idea is to identify those allocations that are persistent in the module and group them together. Procedures from the subsystem can then access those resources with the use of weak references which can be pointers or some form of handles/indices.

This minimizes the problem because now we are reasoning about the lifetime of those persistent allocations together instead of having to reason about lots of tiny individual objects, it helps avoid inefficient copies and minimizes the points of failure since they are now grouped in one place. The input, state, and output become clearly visible and are at the center of your system.

This pattern can be applied in C++ by first identifying those containers that represent the input, state, and output of the module and then using spans/views, indices or other types of weak references (such as ids) to reference those resources.

While they are not perfect, std::string_view introduced in C++17 and std::span introduced in C++20 fix a big hole in the toolset of a C++ developer. You can use them to pass around references into centralized buffers with different read/write permissions, which explicitly showcases the data flow.

You must, however, be very careful, one of the problems people encounter in the adoption of views/spans is that they can lead to hard to track bugs if you didn’t reason about the lifetime of the container they reference in the first place or if you have tons of owning containers that get copied around everywhere. This is why centralizing those containers is so important, by centralizing them it makes it much easier and clear to reason about their lifetime and understands when the memory might get invalidated.

Looking back at the example from Nicolai’s talk, we could rewrite that customer class as such:

struct Customer 
{
    std::string_view first;
    std::string_view last;
    int id;
}

In this version, there would be a central buffer used to store the first and last names of the customers and the individual objects would be simply referencing those.

This makes the class trivial or what is usually called a POD (plain old data). PODs are in general much easier to use than RAII smart objects and don’t come with all the pitfalls of hidden allocations and lifetime management.

In the Chrome case, using something like std::string_view would have also helped with avoiding all those useless copies.

Temporary allocations

Dealing with temporary allocations is often viewed as an optimization, but it can have architectural implications as well. As we saw in the Google Chrome example, their main issue was lots of tiny temporary memory allocations, which overusing RAII smart objects can lead to.

There are several strategies you can consider to avoid temporary allocations:

Consider lazy evaluation to avoid heap allocation for certain operations. Instead of storing the result of an algorithm in an array, run the algorithm each time you need to iterate over the elements if it’s not too expensive.
Consider using a garbage collector!

Garbage Collection

When people hear the G-word they instantly think of the horror and bad performance one might encounter with certain high level languages, this doesn’t have to be the case however.

Garbage Collection can take many forms and they don’t have to be slow. A common pattern in low level systems such as games for example is to use a temporary-frame-allocator, which is essentially a second stack that you can dynamically

Temporary memory arenas are an effective and performant way of doing garbage collection in a low-level system.

Essentially, it is a big buffer in which you can allocate fast (like with a stack algorithm) and all its memory is freed/reset as a whole at some point. This allows you to just allocate and forget into this large buffer, similar to how you would with a garbage collector. Data that is useful can just be moved to persistent storage after you finish working with it in the temporary buffer.

Using temporary memory arenas is also very easy, your function might change from this:

vector<string> get_line(string text);

To this:

using temp_vector = vector<T, temp_alloc>;
using temp_string = string<temp_alloc>;

temp_vector<temp_string> get_line(string_view text); 

This method is often in games, where the temporary memory arena would just be reset at the beginning of each frame, and then any random temporary allocations during the frame can just use it as an allocator.

void main_loop()
{
    //Arena is reset every frame
    temp_arena.reset();

    //Later...
    auto temp_result = do_stuff_that_allocates(some_string, temp_arena);
}

In this case, the arena is scope based, it is invalidated at the end of the scope that represents the main loop of a game.

But temporary memory arenas can be used in many other ways too!

std::async(async, temp_task([](temp_allocator talloc)
{
    auto temp = do_stuff_that_allocates(text, talloc);
}));

In this example, we are passing a temporary memory arena to a task to be executed in a job system. You might do this in a program such as a server or a compiler, where each task would have its own memory arenas.

Temporary arenas work great with RAII

RAII can help as an optimization on top of temporary memory arenas. RAII objects are freed in a fashion similar to stack-allocated objects, and temporary memory arenas act like a stack, so if you pair them with RAII you can optimize the amount of memory used by the arena, as RAII can automatically free some of the allocations in the arena.

Memory arenas are a great way to manage your program’s resources

Memory arenas have less overhead than general heap managers and are easier to reason about.
They minimize the point of failures for the allocations since allocations now would fail at the level of the arena instead of that of an individual object.
They improve memory locality and cache coherency.
They can help prevent allocations/deallocation bugs such as double free.
They can be configured to provide additional safety in debug mode.

Use arrays as much as possible

Arrays are very powerful, because of how modern CPUs work arrays often have very good performance because the elements are next to each other in memory so the cache coherency is much better.

In general, consider arrays as your default data structure, but even with other data structures such as graphs you can leverage arrays to your advantage.

For example, if you have a graph, instead of storing the data in the actual nodes, store the data in an array and in the slots keep an index to the associated element in the array. This approach has several advantages:

Your node struct is smaller and can be the same regardless of the data type you are interested in.
It works great when you don’t have templates (like in C) and in C++ it can help you avoid them.
Because the nodes are much smaller you can also store them in an array instead of randomly on the heap which avoids fragmentation.
You will avoid polluting the cache while traversing the nodes because you don’t have to store the data structs in the nodes.
It becomes easier to centralize the allocations into big arrays which you can preallocate with a maximum size.

Another common data structure that you might have difficulties with is a polymorphic class hierarchy. Consider the following example:

struct Base
{
	int base_data;
}

struct DerivedA : Base
{
	float other_data;
}

struct DerivedB : Base
{
	double other_data;
}

vector<unique_ptr<Base>> elements;

Here, our vector will store pointers to the elements but the elements themselves would be randomly placed around the heap. You can use a memory arena here so that the elements are closer together in memory, but another, simpler, approach that you should consider is using tagged unions.

With a tagged union the following example can become this:

struct Base
{
	int base_data;
}

struct DerivedA : Base
{
	float other_data;
}

struct DerivedB : Base
{
	double other_data;
}

struct Element : public std::variant<DerivedA, DerivedB>
{
}

vector<Element> elements;

Here we used the STL tagged union, in C you would have to do it by hand. A tagged union is a type-safe union that remembers what the type the element actually represents. The advantage here is that every element now has the same size, which is the size of the biggest element. Because of this, you can store them linearly in an array. You might not always want this as you will be wasting some memory, but this allows you to keep your elements in a linear layout very easily. New languages such as Rust and Odin even have this feature built-in at the language level, whereas in C++ this is a library feature added in C++17.

Real life example: Rayfork

Raylib is a very popular open-source game development library made in C. Recently I forked Raylib into a new library called Rayfork, this is because I wanted to use Raylib to develop a professional game but I realized that certain design choices made in Raylib make that hard:

Raylib has a built-in platform layer, meaning its hard to port to new platforms or use my own custom platform layer
Raylib does a lot of allocations behind your back which you can’t control.
Raylib doesn’t give you control over IO operations, which is necessary on certain platforms that do IO using custom APIs.

Raylib was designed to teach students game design, so from that perspective the design choices it made make sense, I, however, wanted the following:

No built-in platform layer or OpenGL loading, you do that yourself.
No calls to malloc or free, every memory allocation must be explicit and the user must have control.
Let the user provide custom IO callbacks.
Must be single header and provide options to customize the dependencies.

The part of the development of Rayfork that I would like to showcase here is the part where I removed all the allocations. While Raylib is not particularly big, clocking in at 185739 lines of code if you consider the dependencies which are all included as source, turning it into a single header, platform-independent collection of libraries was still difficult, especially by having to reason about all the allocations in it.

After reducing Raylib to just a couple of header-only libraries, I started searching for all the memory allocations. By simply searching for malloc I quickly compiled a list of places where memory was allocated and I managed to group them into 3 categories:

Persistent allocations required for the renderer.
Various temporary allocations.
Persistent allocations from the asset loading function that the user of the library would manage.

Let’s look at these categories case by case.

Dealing with the persistent allocations required for the renderer

These allocations were for buffers required for the renderer to work (after all its a games library). The lifetime of these allocations is pretty much the lifetime of the use of the library, which in something like a game would probably mean the lifetime of the whole program.

While examining these though I noticed something quite interesting, here is an example:

_rf_ctx->gl_ctx.vertex_data[i].vertices = (float*) RF_MALLOC(sizeof(float) * 3 * 4 * RF_MAX_BATCH_ELEMENTS); // 3 float by vertex, 4 vertex by quad

_rf_ctx->gl_ctx.vertex_data[i].texcoords = (float*) RF_MALLOC(sizeof(float) * 2 * 4 * RF_MAX_BATCH_ELEMENTS); // 2 float by texcoord, 4 texcoord by quad

_rf_ctx->gl_ctx.vertex_data[i].colors = (unsigned char*) RF_MALLOC(sizeof(unsigned char) * 4 * 4 * RF_MAX_BATCH_ELEMENTS); // 4 float by color, 4 colors by quad

Do you notice it?

All those allocations have a static size, meaning their size is known at compile time. We are essentially allocating the maximum amount of memory that we would need for the renderer. I was also able to identify a statically known maximum for all renderer allocations so the solution here was quite easy:

struct rf_memory  
{  
    rf_dynamic_buffer vertex_data[RF_MAX_BATCH_BUFFERING];  
    int default_shader_locs[RF_MAX_SHADER_LOCATIONS];  
    rf_draw_call draw_calls[RF_MAX_DRAWCALL_REGISTERED];  
  
    rf_char_info default_font_chars[RF_DEFAULT_FONT_CHARS_COUNT];  
    rf_rec default_font_recs[RF_DEFAULT_FONT_CHARS_COUNT];  
};

What I did is that I extracted all those buffers and placed them as arrays with a static size inside of a struct. Now the user of the library can decide to allocate that one struct however they want and they just pass a pointer to Rayfork in order to initialize it. By designing the API this way I gave the user full control and now I only have to reason about one allocation instead of a bunch of smaller, separated allocations. The user can also very easily decide how to allocate that memory, they can even allocate it in the global segment of the executable:

rf_context rf_ctx;  
rf_memory rf_mem;  
  
void on_init(void)  
{  
  //Load opengl with glad  
  gladLoadGL();  
  
  //Initialise rayfork  
  rf_renderer_init_context(&rf_ctx, &rf_mem, screen_width, screen_height);  
}

So if you can, always consider if you can just define a static layout with arrays containing a MAX amount that is needed for your program to run, this can often greatly simplify things but requires the discipline of not using dynamic arrays like std::vector by default and instead considering first the amount of memory that you need.

Dealing with the various temporary allocations

Temporary memory allocations where quite present in Raylib and they were not always easy to solve. Before attempting any generic solutions, such as asking the user for a temporary memory allocator I wanted to try to fix these in more simple ways.

The simplest to fix were the ones where I could just allocate a big enough buffer on the stack and work with that. This works great if you are dealing with things such as string manipulation for example. Some cases however required changing the way something worked.

One particular temporary allocation that caught my eye was in an initialization function. Raylib uses a few OpenGL extensions. To query if those extensions are available you need to parse a string from OpenGL. The API, however, is different between OpenGL 3 and OpenGL ES 2, and Raylib had to support both.

The solution in Raylib looked something like this:

#if defined(RF_GRAPHICS_API_OPENGL_33)
	// Allocate numExt strings pointers
	const char** extList = RF_MALLOC(sizeof(const char*) * numExt);

	// Get extensions strings
	for (int i = 0; i < numExt; i++) extList[i] = (const char*) glGetStringi(GL_EXTENSIONS, i);
#else //if defined RF_GRAPHICS_API_OPENGL_ES2
	const char** extList = RF_MALLOC(sizeof(const char*) * 512);
	const char* extensions = (const char*) glGetString(GL_EXTENSIONS); // One big const string
	int len = strlen(extensions) + 1;
	char* extensionsDup = (char*) RF_MALLOC(len); //NOTE: Duplicated string (extensionsDup) must be deallocated
	memset(extensionsDup, 0, len);
	strcpy(extensionsDup, extensions);
	extList[numExt] = extensionsDup;

	for (int i = 0; i < len; i++)
	{
		if (extensionsDup[i] == ' ')
		{
		extensionsDup[i] = '\0';
		numExt++;
		extList[numExt] = &extensionsDup[i + 1];
		}
	}
#endif

for (int i = 0; i < numExt; i++) { /* Check extensions */ }

Here, depending on the OpenGL version, they would parse and add the strings with all the available extensions into an array and then loop over them.

This is quite a common pattern, compute something, put it in an array, and then loop through it to compute something else. The way in which this can be improved is by combining the first computation, required to put the elements in the array, with the computation that you then want to do on the elements. That way, that code became this:

#if defined(RAYFORK_GRAPHICS_BACKEND_OPENGL_33)  
{  
	glGetIntegerv(GL_NUM_EXTENSIONS, &num_ext);  
	
	for (int i = 0; i < num_ext; i++)  
	{  
		const char* ext = (const char *) glGetStringi(GL_EXTENSIONS, i);  
        _rf_set_gl_extension_if_available(ext, strlen(ext));  
    }  
}
#elif defined(RAYFORK_GRAPHICS_BACKEND_OPENGL_ES2)  
{  
	const char* extensions_str = (const char*) glGetString(GL_EXTENSIONS);
	const char* curr_ext = extensions_str;  
  
	while (*extensions_str)  
	{ 
		//If we get to a space that means we got a new extension name  
		if (*extensions_str == ' ')  
		{ 
			num_ext++;  
			const int curr_ext_len = (int) (extensions_str - curr_ext);  
			_rf_set_gl_extension_if_available(curr_ext, curr_ext_len); 
			curr_ext = extensions_str + 1;  
		}  
		extensions_str++; 
	}
}
#endif

I extracted the code that checks if an extension is available based on its name into another function and then, as I parse the elements I also check them, this way I don’t need to allocate an array to store them.

As mentioned in the section about temporary memory allocations, there are a lot of ways to design functions such that they don’t need to allocate. C++20 Ranges can help you with that for example, but often you will have to redesign how something works.

Persistent allocations from the asset loading function that the user of the library would manage

Raylib has a lot of functions to load different kinds of assets such as 3D models, 2D images and audio, in fact, the vast majority of dependencies are related just to asset loading.

When a user calls a function like LoadImage in Raylib, the function would allocate a buffer using malloc for the image and the user would have to manage it, freeing it when they want to. Also, most loading functions need to allocate multiple buffers, including temporary ones sometimes.

If a function only needs one buffer that it then fills with data you can make 2 functions, one to get the size of the result buffer given the input, and one to actually compute the input into the buffer, like so:

int get_size_of_buffer_for_foo(input);
void foo(input it, void* buffer);

An example of this was the function rf_vec4* rf_get_image_data_normalized(rf_image image). This function converts an image to an array of vec4 structs ( which is a collection of 4 floats ) pixel by pixel. While the function still exists in Rayfork I wanted to offer an alternative that doesn’t allocate which looked like this rf_vec4 rf_get_pixel_normalized(rf_image image, int pixel). I extracted the computation from rf_get_image_data_normalized into a function that executes it but for only one pixel. So now someone can just loop through all the pixels of the image and compute the normalized version for each one of them and store/use it however they want. The old function can still exist as a convenience wrapper, however.

However, with I couldn’t do this with most function as I wanted to keep the nice API from Raylib, so what I did instead is design all the function to ask for allocators like such:

rf_image rf_load_image(const void* data, int data_size, rf_allocator allocator);
void rf_image_format(rf_image* image, int new_format, rf_allocator temporary_allocator);

Here the rf_load_image function takes a buffer to a compressed image in some format (like PNG) and an allocator. The function uses the provided allocator to allocate the buffer for the image and the return rf_image struct contains all the data about the image, a pointer to the buffer and the allocator.

The rf_image_format function takes a pointer to an image and converts it to another format. Depending on the new format, it might need to create a new buffer for the image, but because the rf_image struct has the allocator inside of it we can just use that. This function also has some temporary allocations so it asks for an allocator to handle those.

A user who doesn’t care about custom allocator, however, can just pass RF_MALLOC as the allocator, which is a wrapper over malloc and free in the format of our allocator. In C++ this would be even nicer since the allocator parameters could be set to RF_MALLOC by default.

Using allocators this way gives the user more granular control over the allocations and enables patterns such as the ones we described in the section about Temporary Memory Arenas. The allocators in Rayfork have a pointer to user-defined data, meaning they can be stateful and allow users to use memory arenas as allocators.

Learning from Rust

In Rust it is much harder to get away with making copies everywhere, that is because, unlike C++, Rust is move-by-default instead of copy-by-default, but the Rust borrow checker does not allow you to slack off and not reason about the lifetime of your objects.

Because of this a lot of people have noticed that Rust pushes towards a more centralized form of resources management since that gives them more flexibility as the borrow checker will not be complaining about lifetimes. Because centralized memory management is about minimizing the amount of owning pointers in your program, the borrow checker has less work to do and allows the programmer to be more effective.

In the closing talk from RustConf 2018, a developer from Chuckle Fish gave a talk on how Rust and the borrow checker specifically pushed them into a more centralized form of memory management and the benefits of such an approach. This is because it made it much easier to reason about the lifetimes and the borrow checker when there were only a couple of centralized buffers that had lifetimes which you have to reason about. Check out her talk to see the solution that she arrived at because of the borrow checker.

So if you were interested in Rust, you should also consider Centralized Memory Management, as Rust developers are discovering, due to the borrow checker, that it is a very effective way of reasoning about the allocations in your project, since you are lowering the number of owning pointers and grouping them in a way that makes it easy to deal with.

Regarding new programming languages

Zig and Odin are 2 examples of new programming languages that are designed with centralized memory management in mind.

In Zig, for example, every single function in the standard library that allocates memory asks for an allocator by parameter and the language itself doesn’t offer destructors, pushing you in the direction of a centralized memory allocation system.

In Odin, the language has a built-in context struct that sets a persistent storage allocator and a temporary storage allocator that functions can then use accordingly.

More and more, new low-level systems programming languages are moving in the direction of centralized memory management.

Conclusion

Switching to a centralized memory management system is not easy, and can take time and effort, however, it can certainly improve the scalability and performance of a system.

If you have been overusing containers in C++, consider going on an RAII diet and try adopting new features like std::string_view, std::span and ranges.

In general, reason about your allocations in batches, rather than individual objects, and try to keep your data structs as PODs.

As a project grows, it becomes harder and harder to reason about individual objects and their lifetime. So centralizing and grouping aids with the understandability of the code and data flow.

C++ and other languages are moving more towards this direction, as we covered examples from Rust, Zig, and Odin.

Centralized Memory Management is not just about performance, the architecture, in general, makes it easier to reason about the code, however, you must be aware of both the advantages and disadvantages of a particular solution when deciding what to do next.