It's not really fair to compare async/await systems with Go and Java. Go and Jav...

kevincox · on May 21, 2023

On the contrary I think it is critical to compare them. You can then decide if the tradeoff is worth it for your use case.

Even at a million tasks Go is still under 3 GiB of memory. So that is roughly 3KiB of memory overhead per task. That is likely negligible if your tasks are doing anything significant and most people don't need to worry about this number of tasks.

So this comparison shows that in many cases it is worth paying that price to avoid function colouring. But of course there are still some use cases where the price isn't worth it.

za3faran · on May 21, 2023

One thing I wanted to add is that in golang, you end up passing context.Context to all asynchronous functions to handle cancellations and timeouts, so you “color” them regardless. Java folks with structured concurrency have the right idea here.

throwy2342 · on May 22, 2023

The correct interpretation here is that golang chose explicit context passing where java chose implicit.

It's similar to explicit DI vs implicits.

the function coloring metaphor doesn't make sense since the calling convention is the same nor are there extra function keywords (`async` vs non-async).

cyberax · on May 22, 2023

This is a convention, and you also don't _have_ to do it.

A function without a context can still be used just fine from any code, you just won't be able to cancel it.

mftb · on May 22, 2023

This isn't true. I have use cases that don't require cancellations or timeout. The tasks I'm running don't involve the network, they either succeed or error after an expensive calculation.

throwaway2037 · on May 22, 2023

This is an interesting post. My understanding: Most of the use case for async code is I/O bound operations. So you fire off a bunch of async I/O requests and wait to be notified. Logically, I/O requests normally need a timeout and/or cancel feature.

However, you raise a different point:

    The tasks I'm running don't involve the network, they either succeed or error after an expensive calculation.

This sounds like CPU-bound, not I/O-bound. (Please correct me if I misunderstand.) Can you please confirm if you are using Go or a different language? If Go, I guess it still makes sense, as green threads are preferred over system threads. If not Go, I would be nice to hear more about your specific scenario. HN is a great place to learn about different use cases for a technology.

mftb · on May 22, 2023

I think I just responded too hastily. I am working in Go. There is file IO going on in addition to the calculation (which because of a NAS or whatever could also be network IO). As a practical matter I had never felt the need to offer cancellation or timeout for these use cases, but I probably should, so mea culpa.

eptcyka · on May 22, 2023

What's the point of multiplexing tasks on a particular core if the tasks don't do any I/O? It will be strictly faster to execute the tasks serially across as many cores as possible then.

mftb · on May 22, 2023

See response to sibling comment.

DSingularity · on May 22, 2023

How is passing in a context “coloring”? This is GC for goroutines — specifically long running ones.

metaltyphoon · on May 21, 2023

I don’t get how people that mentions “no color” don't get this: context.Context IS color.

SmooL · on May 21, 2023

It's not _quite_ the same: you can't call async code from a sync context (hence the color issue), but I can always pass a "context.Background()" or such as a context value if I don't already have one.

TheDong · on May 22, 2023

> you can't call async code from a sync context (hence the color issue), but I can always pass a "context.Background()" or such as a context value if I don't already have one.

You can always pass context.Background, in this metaphor creating a new tree of color.

You can always call "runtime.block_on(async_handle)", in this metaphor also creating a new tree of color.

throwy2342 · on May 22, 2023

Go doesn't have async/sync distinction, no keywords so the metaphor doesn't hold.

pkolaczk · on May 22, 2023

You can always pass the async executor to the sync code and spawn async coroutines into it. And you can keep it in a global context as well to avoid parameters. E.g. there is `Handle::current()` for exactly this purpose in Tokio. Function coloring is just a theoretical disadvantage - people like to bring it up in discussions, but it almost never matters in practice, and it is even not universally considered a bad thing. I actually like to see in the signature if a function I'm calling into can suspend for an arbitrary amount of time.

akvadrako · on May 22, 2023

It's not the same because you can have interleaving functions which don't know about context.

Say you have a foreach function that calls each function in a list. In async-await contexts you need a separate version of that function which is itself async and calls await.

With context you can pass closures that already have the context applied.

TheDong · on May 22, 2023

You can do that with async functions too by having the caller use, for example in rust, 'runtime.block_on(async_handle)'

You're talking about doing this:

    f1 := func() error { return nil }
    ctx_f1 := func(context.Context) error { return nil }
    fns := []func() error{f1, func() error { ctx_f1(ctx) }}

Basically, writing an anonymous conversion function.

In, say, rust, the equivalent in this analogy would be:

    let f1 = || -> Result<(), ()> { Ok(()) };
    let async_f1 = async { || -> Result<(), ()> { Ok(()) } };
    let fns = vec![f1, runtime.block_on(async_f1)];

That conversion function, 'runtime.block_on', doesn't really seem that different. The analogy still seems to hold.

gpderetta · on May 22, 2023

No, because if the function that includes those lines is itself async, it will now block, while the equivalent go coroutine will still preserve the stackfull-asyncness. I.e. you can't close over the yield continuation in rust, while it is implicitly done in go.

For a more concrete example: let's say you have a generic function that traverses a tree. You want to compare the leaves of two trees without flattening them, by traversing them concurrently with a coroutine [1]. AFAIK in rust you currently need two versions of traverse, one sync one async as you can't neither close over nor abstract over async. In go, where you have stackful coroutines, this works fine, even when closing over Context.

So yes, in some way Context is a color, but it is a first class value, so you can copy it, close over it and abstract over it, while async-ness (i.e. stackless coroutines) are typically second class in most languages and do not easily mesh with the rest of the language.

[1] this is known as the "same fringe problem" and it is the canonical example of turning internal iterators into external ones.

throwaway894345 · on May 22, 2023

How do you figure? What’s requiring my async function to take a context in the first place? Even if it takes one, what’s stopping it from calling a function that doesn’t take one? Similarly, what’s stopping a function that doesn’t take a context from calling one which does?

paulddraper · on May 22, 2023

It's absolutely fair to compare them.

Just like comparing C vs Python performance.

"B...b...but Python is easier to write."

Sure. That is a trade off. And whether the trade off makes sense depends on the situation.

brightball · on May 22, 2023

Erlang / Elixir processes use about 0.5kb per process, a preemptive scheduler on the BEAM to make sure work is consistent and isolated heap space for each one to avoid stop the world garbage collection.

It’s as good as it gets for this type of thing as it was designed for it from the ground up.

tonyarkles · on May 22, 2023

And it looks like the SystemLimitError issue they encountered is one flag away: https://elixirforum.com/t/too-many-processes-error/22224/2

anonymousDan · on May 22, 2023

This was the first thing that occurred to me when I saw the error - should have been pretty straightforward to tweak the settings to give a clearer picture.

pkolaczk · on May 22, 2023

Fixed already.

tonyarkles · on May 22, 2023

Thanks! Looked back at the results and that's about what I'd expect. Erlang/Elixir isn't a silver bullet, but amazing in its own way for other reasons. Ultimate max performance and memory efficiency definitely isn't one of them :).

masklinn · on May 22, 2023

> Erlang / Elixir processes use about 0.5kb per process

Last I checked it used about 2.5K. Beam reserves around 300 words per process (depends on exact options), but a word is 8 bytes not 8.

You can get it lower (obviously at a cost as soon as you start using the process and need space) but nowhere near 512 bytes, just the process overhead is around 800.

pkolaczk · on May 21, 2023

I don't know how C#, but Rust async/await doesn't allocate anything on the heap by itself. So it is not a universal property of all async/await implementations contrary to what you imply in your blog post.

evntdrvn · on May 21, 2023

For C#, you might want to try a different version with ValueTask instead of Task. It’s more memory-friendly.

It would also be interesting to try Native AOT compiling both versions…

noveltyaccount · on May 21, 2023

+1 to Native AOT, I'd love to see the data. Pretty easy to do, add a line of XML to the csproj and a modified `dotnet publish` command.

https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...

cyberax · on May 22, 2023

Just tried it. No difference whatsoever.

neonsunset · on May 21, 2023

NativeAOT will have negligible impact on memory usage or performance here. JIT performance on average is higher than NAOT.

noveltyaccount · on May 27, 2023

Yeah and I think NAOT still embeds the same(?) runtime GC into the native binary. So, for memory usage, I would expect it to be nearly/exactly the same.

ygra · on May 22, 2023

I may remember wrongly here, but wasn't ValueTask merely the go-to option when it's expected that the task would finish synchronously most of the time? I think for the really async case with a state machine you just end up boxing the ValueTask and ending up with pretty much the same allocations as Task, just in a slightly different way.

evntdrvn · on May 22, 2023

Check out this article: https://devblogs.microsoft.com/dotnet/async-valuetask-poolin...

cyberax · on May 22, 2023

There is no magic.

Rust also has to allocate (box) when you need recursive async calls: https://rust-lang.github.io/async-book/07_workarounds/04_rec...

What Rust does, it allows to remove one layer of allocations by using Future to store the contents of the state machine (basically, the stack frame of the async function).

withoutboats3 · on May 22, 2023

It allows to remove all allocations within a task, except for recursive calls. This results in the entire stack of a task being in one allocation (if the task is not recursive, which is almost always the case, for exactly this reason), exactly like you describe the advantage of Go. And unlike Go, that stack space is perfectly sized, and will never need to grow, whereas go is required to reallocate the stack if you hit the limit, an expensive operation that could occur at an unknown point in your program.

cyberax · on May 22, 2023

Again, no magic. Future trait impl stores all the state machine that Rust can see statically. If you have complicated code that needs dynamic dispatch or if you need recursive calls, Rust will also have to allocate.

This is not at all that different from Go, except that Go preallocates stack without doing any analysis.

Actually... I can fix that! I can use Go's escape analysis machinery to statically check during the compilation if the stack size can be bounded by a lower number than the default 2kb. This way, I can get it to about ~600 bytes per goroutine. It will also help to speed up the code a bit, by eliding the "morestack" checks.

It can be crunched a bit more, by checking if the goroutine uses timers (~100 bytes) and defers (another ~100 bytes). But this will require some tricks.

djsavvy · on May 21, 2023

In your Go benchmark code, how is the calculation actually being done across threads? I don't see the `go` keyword anywhere in the source code.

cyberax · on May 22, 2023

But that's the point I'm trying to show. In Go code all calls are the same. There is no sync/async distinction.

In contrast, in C# (or any other similar system) async calls are _expensive_ compared to regular function calls.

mftb · on May 22, 2023

You've misunderstood how go routines work. You need to put the "go" keyword before the function call in order for it to be run concurrently.

cyberax · on May 22, 2023

I know perfectly well how goroutines work.

I'm NOT trying to show that Go is faster than async/await or anything similar. I'm showing that nested async/await calls are incredibly expensive compared to regular nested function calls.

foton2097 · on May 29, 2023

You need to add to go keyword to change a normal function to a goroutine. If you would remove async/await and Task/Return from the C# code example, it would perform pretty much the same as Go.

If you want to show that async/await calls are expensive, than you should have shown two code samples of C#, one with async/await, and one without.

Or could have done the same for Go, show one example with goroutines, and one without.

But I think everyone already know that async/await and goroutines has it's costs.

The problem is more that you are comparing Go without goroutines (without it's allocation costs) to a C# example with a poor implementation of async/await.

SmooL · on May 21, 2023

Yeah this isn't using goroutines at all? I don't see how this is a good comparison of goroutines vs coroutines.

h4x0rr · on May 21, 2023

I'm wondering that as well. The c# code seems really unrealistic though, using a lot of tasks for CPU bound work. A fair comparison would at least need to involve some degree of IO. Does go maybe automatically parallelize everythinh it can? That would be one potential answer to the initial question

yencabulator · on May 22, 2023

Just like there's no `spawn` or such in the async code.

It's trying to show the overhead of the async machinery, compared to "uncolored" unfunctions.

whateveracct · on May 22, 2023

The author doesn't have the understanding to have even gotten close to the capability for nuance you're asking for. They just copied code from ChatGPT and ran it and timed it and made graphs and it somehow got on HN.

mav88 · on May 21, 2023

You haven't written a concurrent Go program. Not sure what you're trying to demonstrate.

cyberax · on May 22, 2023

Yes. Because that's the point. I'm comparing overhead of async calls vs. normal calls.

schemescape · on May 22, 2023

Note that you’re changing multiple variables, namely the entire compiler and runtime.

cyberax · on May 22, 2023

If you know a system that both has async/await _and_ full-blown green threads, then I can try to test it.

michaelsbradley · on May 22, 2023

Racket has full-blown green threads (virtual threads)[1] that serve as an alternative to the async/await paradigm. It also supports use of native (OS) threads[2].

[1] https://docs.racket-lang.org/reference/eval-model.html#%28pa...

[2] https://docs.racket-lang.org/reference/places.html

[&] https://docs.racket-lang.org/reference/futures.html

gpderetta · on May 22, 2023

C++ has now both async (built-in) and multiple flavors of stackful coroutines (as external libraries). You can run both on top of ASIO so that you can measure purely difference of the coroutine implementation as opposed to the runtime.

schemescape · on May 21, 2023

Isn’t the first example completely synchronous? If that’s what’s being compared, why not use C# for both programs?

EGreg · on May 21, 2023

With async/await (which I think C# also uses) you actually unwind the entire stack and start a new one every time you do async callbacks. With fibers / green threads / whatever you instead store stacks in memory.

It took me a while to figure this out, thanks to articles I came across (such as “what color is your function?”.

There are memory / speed trade-offs, as per usual. If you have a lot of memory, and can keep all the stacks in memory, then go ahead and do that. It will save on all the frivolous construction / destruction of objects.

Having said that, my own experience suggests that when you have a startup, you should just build single-threaded applications that clean up after every request (such as with PHP) and spawn many of them. They will share database pool connections etc. but for the most part it will keep your app safer than if they all shared the same objects and process. The benchmarks say that PHP-FPM is only 50% slower than Swoole for instance. So why bother Starting safe beats a 2x speed boost.

And by the way, you should be building distributed systems. There is no reason why some client would have 1 trillion rows in a database, unless the client themselves is a giant centralized platform. Each client should be isolated and have their own database, etc. You can have messaging between clients.

If you think this is too hard, just use https://github.com/Qbix it does it for you out of the box. Even the AI is being run locally this way too.

austinshea · on May 22, 2023

Cyberax,

I missed my opportunity to reply to your comment, but I really appreciate it, and I wanted to find a way to get this back to you. The comment in question:

"Well, I was one of the engineers that made the change :) I'm not sure how much I can tell, but the public reason was: "to make pricing more predictable". Basically, one of the problems was customers who just set the spot price to 10x of the nominal price and leave the bids unattended. This was usually fine, when the price was 0.2x of the nominal price. But sometimes EC2 instance capacity crunches happened, and these high bids actually started competing with each other. As a result, customers could easily get 100 _times_ higher bill than they expected."

There was more to it than that, but I figure that's a good enough reference point.

Thank you for these improvements. It doesn't change anything, in terms of how much savings I can get by following the latest generations and exotic instance-types, but it does help with the reliability of my workloads.

It's been a huge benefit to me, personally, that I can provide some code that enables the potentiality of servers dying, with the benefit of 80% cost savings without using RIs.

cyberax · on May 22, 2023

Thank you! It means a lot!

You can also try another product of my former team: https://aws.amazon.com/savingsplans/ - it's similar to RI, but cheaper because it doesn't provide an ironclad guarantee that the instance will be available at all times. It's still a bit more expensive than spot, but not by much.

ilyt · on May 21, 2023

Your benchmark for Go runs in single goroutine

cyberax · on May 22, 2023

Yes. On purpose.

substation13 · on May 22, 2023

Function coloring is orthogonal to the underlying mechanism.

You can think of the Java / Go approach as adding the bind points automatically.

kccqzy · on May 22, 2023

> You don't have to "color" your code into async and blocking functions

Unpopular opinion: this is a good idea. It encourages you to structure your code in a way such that computation and I/O are separated completely. The slight tedium of refactoring your functions into a different color encourages good upfront design to achieve this separation.

armitron · on May 22, 2023

Good upfront design? This stinks of implementation detail leakage affecting high-level design, which should be a cardinal sin.

One should design based on constraints that best match the problem at hand, not some ossified principle turned "universal" that only really exists to mask lower-level deficiencies.

kccqzy · on May 22, 2023

When the implementation detail involves whether or not the function will perform I/O, it is better to let that leak.

Excessive hiding of implementation detail is what leads to things like fetching a collection of user IDs from a database and then fetching each user from an ID separately (the 1+N problem). Excessive hiding of implementation detail is what leads to accidental O(N^2) algorithms. Excessive hiding of implementation detail is what leads to most performance problems.

pkolaczk · on May 22, 2023

> When the implementation detail involves whether or not the function will perform I/O, it is better to let that leak.

I guess Haskell is fully based around that idea :)

gpderetta · on May 22, 2023

But async does not help with that. As shown elsethread, in most languages you can block from async functions just fine.

slt2021 · on May 22, 2023

concurrent algorithm is a design choice and is an architecture. Not an implementation detail

throwawaymaths · on May 22, 2023

You don't necessarily know ahead of time if an async or a sequential strategy is better. With colored functions you kind of have to pick ahead of time and hope you picked right or pay a big effort penalty to do a side by side comparison.

thinkharderdev · on May 27, 2023

async/sync and sequential/parallel are orthogonal concerns. You can write sync code which does work in parallel (on different threads/cores) for something like a numerical algorithm that can be parallelized in some way. Deciding whether something is sync or async is about whether it needs to suspend (in practice, mostly whether it needs to do IO) which is much easier to understand up front. Sometimes it changes of course, in which case you have to do some refactoring. But in a decade of programming professionally in Scala and Rust (both of which have "colored" functions) I can count on one hand the number of times where I had to change something from sync to async and it took more than a few minutes of refactoring to do it.

tester756 · on May 21, 2023

>num += Calculate(20).Result;

You're sure you should use .Result here instead of awaiting it?

uncheckederror · on May 21, 2023

To expand upon this thought, here is the AsyncGuidance doc[1] on why not to use .Result to get the return value of a completed Task in C#.

To make this simple they introduced async Main[2] a few years ago.

[1]: https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b...

[2]: https://github.com/dotnet/csharplang/blob/main/proposals/csh...

jimsimmons · on May 21, 2023

None of what you say is a gotcha. An ignorant programmer stands only to benefit from Go