> and the times you find yourself "at the mercy" of the garbage collector is pretty damn frustrating.
You're still at the mercy of the malloc implementation. I've seen some fairly nasty behaviour involving memory leaks and weird pauses on free coming from a really hostile allocation pattern causing fragmentation in jemalloc's internal data.
Which is why you generally almost never use the standard malloc to do your piecemeal allocations. A fair number of codebases I've seen allocate their big memory pools at startup, and then have custom allocators which provide memory for (often little-'o') objects out of that pool. You really aren't continually asking the OS for memory on the heap.
In fact, doing that is often a really bad idea in general because of the extreme importance of cache effects. In a high-performance game engine, you need to have a fine degree of control over where your game objects get placed, because you need to ensure your iterations are blazingly fast.
Doesn’t this just change semantics? Whatever custom handlers you wrote for manipulating that big chunk of memory are now the garbage collector. You’re just asking for finer grained control than what the native garbage collection implementation supports, but you are not omitting garbage collection.
Ostensibly you could do the exact same thing in e.g. Python if you wanted, by disabling with the gc module and just writing custom allocation and cleanup in e.g. Cython. Probably similar in many different managed environment languages.
I mean, nobody is suggesting they leave the garbage around and not clean up after themselves.
But instead what you can do is to reuse the "slots" you are handing out from your allocator's memory arena for allocations of some specific type/kind/size/lifetime. If you are controlling how that arena is managed, you will find yourself coming across many opportunities to avoid doing things a general purpose GC/allocator would choose to do in favor of the needs dictated by your specific use case.
For instance you can choose to draw the frame and throw away all the resources you used to draw that frame in one go.
The semantics matter. A lot of game engines use a mark-and-release per-frame allocation buffer. It is temporary throwaway data for that frame's computation. It does not get tracked or freed piecemeal - it gets blown away.
Garbage collection emulates the intent of this method with generational collection strategies, but it has to use a heuristic to do so. And you can optimize your code to behave very similarly within a GC, but the UI to the strategy is full of workarounds. It is more invasive to your code than applying an actual manual allocator.
> A lot of game engines use a mark-and-release per-frame allocation buffer.
I've heard of this concept but a search for "mark-and-release per-frame allocation buffer" returned this thread. Is there something else I could search?
It’s just a variation of arena allocation. You allocate everything for the current frame in an arena. When the frame is complete. You free the entire arena, without needing any heap walking.
A generational GC achieves a similar end result, but has to heuristically discover the generations, whereas an arena allocator achieves the same result deterministically And without extra heap walking.
Linear or stack allocator are other common terms. Just a memory arena where an allocation is just a pointer bump and you free the whole buffer at once by returning the pointer to the start of the arena.
Getting rid of this buffer is literally nothing. There is no free upon the individual objects needed. You just forget there was anything there and use the same buffer for the next frame. Vs. Waiting for a GC to detect thousands of unused objects in that buffer and discard them, meanwhile creating a new batch of thousands of objects and having to figure out where to put those.
You can do many things in many languages. You may realize in the process that doing useful things is made harder when your use case is not a common concern in the language.
C's free() gives memory back to the operating system(1), whereas, as a performance optimization, many GCd languages don't give memory back after they run a garbage collection (see https://stackoverflow.com/questions/324499/java-still-uses-s...). Every Python program is using a "custom allocator," only it is built in to the Python runtime. You may argue that this is a dishonest use of the term custom allocator, but custom is difficult to define (It could be defined as any allocator used in only one project, but that definition has multiple problems). The way I see it, there are allocators that free to the OS and those that don't or usually don't (hereafter referred to as custom).
In C, a custom allocator conceivably could be built into, say, a game engine. You might call ge_free(ptr) which would signal to the custom allocator that chunk of memory is available and ge_malloc() would use the first biggest chunk of internally allocated memory, calling normal malloc() if necessary.
Custom allocators in C are a bit more than just semantics, and affect performance (for allocation-heavy code). Furthermore, they are distinct from GC, as they can work with allocate/free semantics, rather than allocate/forget (standard GC) semantics.
Yes, one could technically change any GCd language to use a custom allocator written by one's self. But Python can't use allocate/free semantics (so don't expect any speedup). Python code never attempts manual memory management, (i.e. 3rd party functions allocate on the heap all the time without calling free()) because that is how Python is supposed to work. To use manual memory management semantics in Python, you would need to rewrite every Python method with a string or any user defined type in it to properly free.
(1) malloc implementations generally allocate a page at a time and give the page back to the OS when all objects in the page are gone. ptr = malloc(1); malloc(1); free(ptr); doesn't give the single allocated page back to the OS.
Python is a bad example to talk about gc, because it uses different garbage collector than most of languages. It is also the primary reason why getting rid of GIL and retaing performance is so hard. Python uses reference counters and as soon as the reference count drops to 0 it immediately frees the object, so in a way it is more predictable. It has also a traditional GC and I guess that's what was mentioned you can disable it. The reason for it is that reference count won't free memory of there is a loop (e.g. object A references B and B references A, in that case both have reference count 1 even though nothing is using them), do that's where the traditional GC steps in.
It's not a performance optimisation not to give space back. GCs could easily give space back after a GC if they know a range (bigger than a page) is empty, it's just that they rarely know it is empty unless they GC everything, and even then there is likely to be a few bytes used. Hence the various experiments with generational GC, to try to deal with fragmentation.
Many C/C++ allocators don't release to the OS often or ever.
That's true, and it's why the alternative to GC is generally not "malloc and free" or "RAII" but "custom allocators."
Games are very friendly to that approach- with a bit of thought you can use arenas and object pools to cover 99% of what you need, and cut out all of the failure modes of a general purpose GC or malloc implementation.
Due the low throughput of Go's GC (which trades a lot of it in favor of short pause duration), you risk running out if memory if you have a lot of allocations and you don't run your GC enough times.
In a context where you don't allocate memory, you lose a lot of those (for instance, you almost cannot use interfaces, because indirect calls cause parameters to those calls to be judged escaping and unconditionally allocated on the heap).
Go is a good language for web backend and other network services, but it's not a C replacement.
If you allocate a large block of memory manually at the start of the program, then trigger the GC manually when it suits you, won't you get the best of both worlds?
You can't call native libraries without going through cgo. So unless you don't want to have audio, draw text and have access to the graphic APIs, you'll need cgo, which is really slow due to Go's runtime. For game dev, that's a no go (pun intended).
Additionally, the Go compiler isn't trying really hard at optimizing your code, which makes it several times slower on a CPU-bound task. That's for a good reason: because for Go's usecase, compile-time is a priority over performances.
Saying that there is no drawbacks in Go is just irrational fandom…
Go was pushed as a C replacement, but very few C programmers switched to it, it seems like it took hearts of some of Python, Ruby, Java etc programmers.
I would very much prefer a stripped down version of Go used for these situations rather than throwing more C at it. The main benefits of using Go are not the garbage collection, its the tooling, the readability (and thus maintainability) of the code base, the large number of folks who are versatile in using it.
Large user base? C is number 2. Go isn't even in the top 10.[1]
Tooling? C has decades of being one of the most commonly used languages, and a general culture of building on existing tools instead of jumping onto the latest hotness every few months. As a result, C has a very mature tool set.
Unfortunately the excellent standard library is a major benefit of Go, and it uses the GC, so if you set GOGC=off you're left to write your own standard library.
I would also like to see a stripped-down version of Go that disables most heap allocations, but I have no idea what it would look like.
I'd be willing to wager that C programmers would be more comfortable working with a Golang codebase than Golang programmers would be working with a C codebase.
There may be more "C programmers" by number but a Golang codebase is going to be more accessible to a wider pool of applicants.
In my experience it takes a few days for a moderate programmer to come up to speed on Go, whereas it takes several months for C. You need to hire C programmers for a C position, you can hire any programmers for a Go position.
How do people learn C without knowing about manual memory management? They learn about it as they learn the language. This can be done in any language that allows for manual memory management (and most have much better safeguards and documentation than C, which has a million ways to shoot yourself in the foot)
You’re writing in a much improved C. Strong type system (including closures/interfaces/arrays/slices/maps), sane build tooling (including dead simple cross compilation), no null-terminated strings, solid standard library, portability, top notch parallelism/concurrency implementation, memory safety (with far fewer caveats, anyway), etc. Go has it’s own issues and C is still better for many things, but “Go with manually-triggered GC” is still far better than C for 99.9% of use cases.
Go’s compiler is not at all optimized for generating fast floating point instructions like AVX and its very cumbersome to add any kind of intrinsics. This might not matter for light games but an issue when you want to simply switch to wide floating point operations to optimize some math.
GCC can compile both C and Go. I searched for benchmarks but found none for GCC 9 that compares the performance of C and Go. Do you have any sources on this?
I don’t have a source, but it’s common knowledge in the Go community. Not sure how GCC works, but it definitely produces slower binaries than gc (the standard Go compiler). There are probably some benchmarks where this is not the case, but the general rule is that gcc is slower. gc purposefully doesn’t optimize as aggressively in order to keep compile times low.
Personally I would love for a —release mode that had longer compile times in exchange for C-like performance, but I use Python by day (about 3 orders of magnitude slower than C) so I’d be happy to have speeds that were half as fast C. :)
Yes, the idea is that you must invoke the GC when you’re not in a critical section. Alternatively you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC).
To make sure I understand, is this an accurate expansion of your comment?
Yes it would leak, to avoid leaking you could invoke the GC when you’re not in a critical section. Alternatively, if you don't use maps and instead structure all your data into arrays, slices and structs, you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC, but maps require it).
Yes, that is correct. Anything that allocates on the heap requires GC or it will leak memory. Go doesn’t have formal semantics about what allocates on the heap and what allocates on the stack, but it’s more or less intuitive and the toolchain can tell you where your allocations are so you can optimize them away. If you’re putting effort into minimizing allocations, you can probably even leave the GC on and the pause times will likely be well under 1ms.
You're still at the mercy of the malloc implementation. I've seen some fairly nasty behaviour involving memory leaks and weird pauses on free coming from a really hostile allocation pattern causing fragmentation in jemalloc's internal data.