Even under single-threaded applications, with a custom allocator that suits your workload, it's possible to get significant speed boosts over the system malloc. One of the assignments I enjoyed back in college was writing a memory allocator. By the time I was done optimizing (including replacing all structs with raw pointer arithmetic using #defines), IIRC I was getting over 10000 single-threaded allocations per second vs. ~3000 from malloc().