I don't know how C#, but Rust async/await doesn't allocate anything on the heap ...

evntdrvn · on May 21, 2023

For C#, you might want to try a different version with ValueTask instead of Task. It’s more memory-friendly.

It would also be interesting to try Native AOT compiling both versions…

noveltyaccount · on May 21, 2023

+1 to Native AOT, I'd love to see the data. Pretty easy to do, add a line of XML to the csproj and a modified `dotnet publish` command.

https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...

cyberax · on May 22, 2023

Just tried it. No difference whatsoever.

neonsunset · on May 21, 2023

NativeAOT will have negligible impact on memory usage or performance here. JIT performance on average is higher than NAOT.

noveltyaccount · on May 27, 2023

Yeah and I think NAOT still embeds the same(?) runtime GC into the native binary. So, for memory usage, I would expect it to be nearly/exactly the same.

ygra · on May 22, 2023

I may remember wrongly here, but wasn't ValueTask merely the go-to option when it's expected that the task would finish synchronously most of the time? I think for the really async case with a state machine you just end up boxing the ValueTask and ending up with pretty much the same allocations as Task, just in a slightly different way.

evntdrvn · on May 22, 2023

Check out this article: https://devblogs.microsoft.com/dotnet/async-valuetask-poolin...

cyberax · on May 22, 2023

There is no magic.

Rust also has to allocate (box) when you need recursive async calls: https://rust-lang.github.io/async-book/07_workarounds/04_rec...

What Rust does, it allows to remove one layer of allocations by using Future to store the contents of the state machine (basically, the stack frame of the async function).

withoutboats3 · on May 22, 2023

It allows to remove all allocations within a task, except for recursive calls. This results in the entire stack of a task being in one allocation (if the task is not recursive, which is almost always the case, for exactly this reason), exactly like you describe the advantage of Go. And unlike Go, that stack space is perfectly sized, and will never need to grow, whereas go is required to reallocate the stack if you hit the limit, an expensive operation that could occur at an unknown point in your program.

cyberax · on May 22, 2023

Again, no magic. Future trait impl stores all the state machine that Rust can see statically. If you have complicated code that needs dynamic dispatch or if you need recursive calls, Rust will also have to allocate.

This is not at all that different from Go, except that Go preallocates stack without doing any analysis.

Actually... I can fix that! I can use Go's escape analysis machinery to statically check during the compilation if the stack size can be bounded by a lower number than the default 2kb. This way, I can get it to about ~600 bytes per goroutine. It will also help to speed up the code a bit, by eliding the "morestack" checks.

It can be crunched a bit more, by checking if the goroutine uses timers (~100 bytes) and defers (another ~100 bytes). But this will require some tricks.