The case of a leaky goroutine

atombender · on March 25, 2024

I wish Go recorded the timestamp of goroutine and let you access them.

An app I work on recently had a bug where goroutines would slowly build up over time. Turns out the bug is in the Growthbook SDK [1]. We can monitor the number of goroutines, but having a large number of goroutines waiting in the location that gets stuck is normal — we can only see such a problem over multiple days, in that the minimum value slowly goes up.

If Go could tell you the timestamp of the oldest goroutines as part of the pprof dump, we could have an alert, and it would work for any such leak.

[1] https://github.com/growthbook/growthbook-golang/pull/28

prosunpraiser · on March 26, 2024

Execution traces have a goroutine profile which outputs the count of goroutines as well. That can be used for an alert as well - though it would require parsing the trace output. They recently made some changes to give a structured API over trace data - maybe use that?

jmholla · on March 25, 2024

Language support would be great, but could you add logs that record the creating and destruction of goroutines, giving them a unique UUID so you can track which one's haven't exited?

Edit: Also, maybe the tool at this comment could've helped you? https://news.ycombinator.com/item?id=39817775

atombender · on March 25, 2024

How would one log that? In our case the bug was in a goroutine started by a third-party library, so we don't have control over when it starts or exits.

I don't think Goleak would have helped here, because I believe it doesn't support concurrency. It's really designed to run in tests, not in production. It parses and searches stack traces, so it's not going to be performant.

physicles · on March 26, 2024

I guess you’d need to fork the third-party library (or, if you can build locally, just yolo and chmod +w) and inject something to track those goroutine metrics. But I suppose by the time you’re doing that, you already know where the problem is.

lxe · on March 25, 2024

Thread leaks, which happen more frequently due to threaded async abstractions, such as the goroutine, are less often discussed than memory or CPU leaks, but are much more dangerous in multi-tenant container environments.

A thread leak can lock up your entire node, including all the control plane processes. A container spec doesn't provide an easy way to control thread/nproc/ulimit limits (you can still do it, but it's not straightforward), which in turns leaves pretty much every k8s deployment misconfigured and vulnerable to thread leaks.

kbolino · on March 25, 2024

Goroutines in a single process map onto a fixed number of threads. Even if you have goroutine leaks, you should not have thread leaks. Your program may deadlock or run out of memory, but it will not take the whole system down (at least, not in this way).

usrnm · on March 25, 2024

> Goroutines in a single process map onto a fixed number of threads

Not necessarilly true if you're using cgo

kbolino · on March 26, 2024

I'd love to find a deep dive on how goroutines and cgo actually interact.

As I understand it, a cgo function call yields the caller goroutine and then runs the C code directly on the underlying thread, blocking it from use by goroutines. When the function returns, the thread is freed up, and goroutines including the caller can be scheduled again. I'm not sure if the caller is guaranteed to run next, or if other goroutines can crowd it out. I would imagine it probably does run first, if only to receive the return values from the C function and then release its thread affinity. This whole process is notable for introducing some overhead to cgo calls which can be significant if cgo is used frequently.

While you can create new threads in C and thus create thread leaks that way, I don't think any of those threads will be used by the goroutine scheduler, which sticks with the pool of threads it manages.

EDIT: reading the runtime docs, it seems that GOMAXPROCS is not as hard of a limit as I thought:

> The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

I think cgo calls count as "system calls on behalf of Go code" for this purpose. Thus if you have GOMAXPROCS=1 and more than one goroutine and you make a cgo call from one of them, the scheduler may create a new thread so the other goroutines can still run. You don't need cgo to do this though, syscalls (explicitly or through Go's stdlib) can exhibit the same behavior.

So I think it is possible to leak threads this way, but to do so you would need to spawn goroutines calling cgo faster than the C code can return.

hedora · on March 25, 2024

My #1 complaint about Rust is that leaking a future is safe. It means the compiler can’t check for async coroutine leaks, and it breaks the borrow checker’s ability to say “nothing else has a reference to this any more”.

Anyway, we’re using golang for some stuff at work, and holy crap, I forgot how terrible it was to work in high level languages that don’t statically check for correct synchronization.

If C++-style concurrency is like a chainsaw, then golang concurrency is like a chainsaw in a bouncy castle.

masklinn · on March 25, 2024

> My #1 complaint about Rust is that leaking a future is safe.

There’s no other way given the leakpocalypse decision. You’d need an entirely new leak-proof language to fix that, and that means you need alternatives for Rc and Arc (or a way to prevent them creating a cycle).

hedora · on March 26, 2024

I'd gladly give up Rc and Arc if it meant the borrow checker treated spawn() like a normal function call.

All the code I write is async, so the borrow checker is effectively broken for me. (Wrapping everything in Arc creates weird false sharing at runtime, and I don't want to spend time debugging that class of performance nonsense.)

favflam · on March 26, 2024

For what it is worth, I totally agree. I programmed a fairly large project only to find that goroutines were mysteriously blocking in extremely hard-to-debug subtle ways. So, I had to rewrite a huge chunk. I think goroutines are like Ackbar "It's a trap" trap.

I was liberal in usage with goroutines and channels. But after that experience, I decided to religiously track each goroutine in my head. If I reached the point of not being able to mentally map all goroutines running, I would cut out go. I started using callbacks more too to avoid the pernicious blocking I have experienced.

duped · on March 25, 2024

> My #1 complaint about Rust is that leaking a future is safe.

Do you mean futures that aren't polled to completion, tasks that aren't joined, or literal memory leaks that happen to own futures?

hedora · on March 26, 2024

The first and third thing. (They're basically equivalent.)

You can start polling then do std::mem::forget on the future. At that point, the borrow checker thinks the future no longer exists. So, it is unsound to pass a reference with a bounded lifetime into a future (which is why you need all the references to be 'static if you pass something into a spawn, or you need to spray Arc everywhere).

duped · on March 26, 2024

They're very different, because a future may be dropped before it is polled to completion (aka, cancellation). If you leak it, then the drop handlers are not called, and this can affect program correctness.

It is not unsound for a future to own a reference (in fact it's super common - how else would asnc methods work?). If you leak the future, it can't be polled and it can't be dropped, so any references will never be dereferenced. But also that's a pretty contrived example.

Like you could call std::mem::forget on a future that owns a tokio::sync::MutexGuard and then you'd have some problems with deadlock... but that's not an async issue, it's always incorrect to leak an RAII guard (same as if you leaked std::sync::MutexGuard)

tokio::spawn has a 'static bound because it simplifies things, not because there's some fundamental limitation of futures owning references.

f_gergo · on March 25, 2024

What is a "correct synchronization"?

kccqzy · on March 25, 2024

Synchronization such as the whole program is well-defined according to some memory model.

Thaxll · on March 25, 2024

Which is the case for Go:

https://go.dev/ref/mem

kccqzy · on March 25, 2024

A quote from your link:

> programmers are strongly encouraged to use appropriate synchronization to avoid data races

Any time you need to "encourage" programmers to do the right thing, you have already failed in your language design.

And I think OP agrees with me here. OP says "static checking of correct synchronization" which is irresponsibly absent from Go.

Thaxll · on March 25, 2024

So it's absent from every language but Rust?

sapiogram · on March 25, 2024

Javascript is even better, by not having multithreading at all. I am not joking, Go is much worse than Java and C#, but Javascript and Rust are the only mainstream languages where I've seen non-experts reliably write correct concurrent code. Maybe it's true for langauges like Elixir as well, but I haven't tried.

masklinn · on March 26, 2024

My experience of JavaScript is nowhere near that good.

While it does not have parallelism, the concurrency is as unrestricted as it is in go, forgetting awaits is a very common issue leading to wild tasks running unbounded and unchecked, and when you need any other synchronisation mechanism you have to write them yourself and they’re tricky indeed.

$dayjob’s JS team keeps chasing concurrency issues, and breaking builds because of them.

pdimitar · on March 26, 2024

It's 95% true for Elixir as well; it's very difficult to mess up parallelization or just leave a system in an incoherent state with it. Immutability and message passing help a lot.

hedora · on March 26, 2024

I don't know of any other multithreaded systems programming languages that check for data race freedom at compile time.

Single threaded languages are usually data race free. I imagine some multithreaded, purely functional languages are too (everything is immutable, and therefore cannot be modified in race with reads). Of course, SQL running in strictly serializable mode is too.

Of those, the only one of those that's an appropriate choice for systems software development is Rust. The Core C++ Guidelines are a runner up in my opinion: They dictate a subset of C++ that is safer, with the goal of backporting the Rust memory safety properties to C++. Swift has also done a lot in this space.

kccqzy · on March 25, 2024

Alternatively a language can choose not to expose anything low level enough to cause synchronization issues and only make available high level APIs that are correct by construction.

Go does neither. That's why there's this thread on HN that I bookmarked: https://news.ycombinator.com/item?id=31698503

js2 · on March 25, 2024

Isn't this addressed by CPU requests/limits + pid limiting?

https://kubernetes.io/docs/concepts/policy/pid-limiting/

lxe · on March 25, 2024

It's just more annoying to set up and isn't as widely known, and doesn't work on a per-pod basis.

> PID limiting is a an important sibling to compute resource requests and limits. However, you specify it in a different way: rather than defining a Pod's resource limit in the .spec for a Pod, you configure the limit as a setting on the kubelet. Pod-defined PID limits are not currently supported.

mholt · on March 25, 2024

This article doesn't go into depth about capturing the profile to identify the leak, so if you want more instruction on one way to do that, I wrote an article recently called Profiling Caddy, which is geared toward Caddy but really works for any Go programs: https://caddyserver.com/docs/profiling

whateveracct · on March 25, 2024

People hate on Haskell async exceptions (with good reason), but one cool thing about them and the Haskell RTS is that you can almost [1] always cancel a thread from the outside. No need for the thread to cooperate like in Golang.

The entire `async` package is built on this. The `race` combinator is an especially cool application.

After doing a big project in Golang, I appreciated this more. We had our fair share of goroutine leaks.

[1] iirc, if the thread is not blocking on a syscall or allocating memory, it will not be yielding to the RTS.

treyd · on March 25, 2024

Rust's futures also get this through being poll-based. You cancel a future by dropping it. Futures compose really easily, so you can combine a bunch of futures in interesting trees of selects and joins and any that are not completed get cleaned up automatically and with little/no overhead when they go out of scope / when the task they're a part of completes. You don't think about them like separate chunks of work, you just think about them like types you can await on and yield a value, and the compiler flattens it all out into a state machine. All of the sync and composition combinators are implemented just in the traits/types of the tokio/futures libraries because the poll/waker abstraction is low level and versatile enough. As long as you don't go out of your way to write bad async code, there's no leaks for the same reasons there's no leaks in Rust code.

It feels a lot like the monadic composition of Haskell even if the means it achieves it are very different.

pdimitar · on March 26, 2024

> As long as you don't go out of your way to write bad async code

I've seen the light of async Rust and I believe (heh, sorry for the semi-flippant sarcastic remark here but I do genuinely love it when I do things right with async Rust) but I feel that writing bad async code is also not something that the compiler will actively dissuade you from. It goes to certain lengths but not quite far enough IMO.

I didn't want to do it but at one point I started reading how is async implemented and that actually lifted a big part of the mystical veil and helped me understand it better. Now if I can also completely internalize the lifetime semantics combined with async I'd be very proud of myself. (But it doesn't help that I am not working with Rust currently.)

shp0ngle · on March 25, 2024

Didn't Uber have some leaky goroutine detector? I vaguely remember seeing something like that, 5 years ago...

Ah yeah it's here.

https://github.com/uber-go/goleak

latchkey · on March 25, 2024

Uber also made something called fx, which is fantastic.

You don't have to use it, but when you do, it helps ensure that you organize your code in a way that becomes very easily testable. It enforces a modular approach to composing together golang services.

Being more easily testable helps prevent bugs, like these leaky goroutines.

Karrot_Kream · on March 25, 2024

`fx` is mostly just Dependency Injection in Go which has been a thing in Java forever.

I'm curious though, when do you reach for `fx` in a non-industrial project and when do you not? I still use the same patterns of separating out the implementation from the interface but I've been wiring in the dependencies by hand. I'm curious if folks reach for `fx` immediately or if it's something that requires thought to add. There's also Google's wire library [1] that does similar stuff but takes a compile time approach so it's a little easier to reason about if struct initialization screws up due to weird implicit things.

I still wire dependencies up by hand, but I'm curious what others do.

[1]: https://github.com/google/wire/tree/main

latchkey · on March 25, 2024

I co-founded Java @ Apache, so my background is Java and DI was the first thing I was looking for when I started down the golang path. I tried out a bunch of different DI options for golang and settled on fx. fx is actually more than just DI, it is a whole framework for starting and stopping "services" as well.

I realized quickly it wasn't absolutely necessary to use it since most people just make a package in golang, in order to get the separation they need. But when I started using it more and more, I noticed that taking advantage of the DI features of fx, also ensured that I wrote code that had clear separation of concerns.

In golang, it is too easy to just cross package include `new` things you need right in the function, instead of passing it in as an argument. This of course, makes it much harder to write tests for since you can't mock what you need easily.

The binary I built was distributed across tens of thousands of servers in multiple data centers, and had to run perfectly on every release as it took a lot of time/effort to even do updates. This meant comprehensive testing before deployment, so I wanted to optimize my unit/integration tests as much as possible.

I'm not sure it would be necessary for just simple api endpoint microservices, but for a complicated application binary that needs perfect testing, I can't imagine writing golang code without it. The benefits far outweigh the negatives.

Karrot_Kream · on March 25, 2024

Yeah when I've written Go at scale we've used fx for similar reasons. But for smaller projects I go back-and-forth. Fx makes it quick and easy to start using DI and avoid the repetitive hand initialization and injection, but for smaller codebases it's also much easier to reason about. FWIW I haven't used wire before as if I'm at a scale smaller than fx I'm just wiring in structs by hand.

physicles · on March 26, 2024

I still wire things up by hand: pass concrete parameters, use defer foo.Close(), use context.Context for signaling close. It’s super obvious how stuff works. In a 200k line code base, there are maybe 2 components where I haven’t been able to simplify the shutdown sequence to my satisfaction, but I doubt a framework would add clarity there.

Maybe I just don’t know what I’m missing.

Karrot_Kream · on March 26, 2024

I found fx to be good for fast iteration, think quickly shipping startup code. You can just sorta spaghetti wire everything together and it's mostly okay. Though one can ask why in that situation you aren't using a dynamic language which is a good question.

shp0ngle · on March 27, 2024

I have seen fx used in production and it was an unholy mess. I never wish this upon anyone. It makes Go into Java.

latchkey · on March 27, 2024

I guess there are things one could do to screw it up. But in your opinion, what exactly made it a mess?

bakul · on March 25, 2024

If Go allowed something like "handle = go foo()", goroutine could be automatically terminated when handle goes out of scope or becomes dead and is garbage collected. You can also use handle to cancel a goroutine etc.

Go designers specifically avoided this model (of having a goroutine "id") for reasons I don't remember any more (may be to avoid making them heavier weight?) but this would be one way to stop leaky goroutines.

atombender · on March 25, 2024

It's because just terminating a goroutine isn't safe with respect to I/O and defer chains.

I think Go has the internal plumbing to theoretically support this, though it might require inserting checks more often. Another way would be to make contexts first-class and automatically insert context checks even when not done (e.g. selects). And also all I/O has to be cancellable.

I suspect Go's designers prefers the current way in which cancellation is explicit.

bakul · on March 25, 2024

> just terminating a goroutine isn't safe with respect to I/O and defer chains.

Agreed. The idea is to panic() if a goroutine has to be forcibly terminated due to GC, instead of a slow leak. Requires more thought though.

masklinn · on March 25, 2024

So the idea is to randomly take down the program when the GC runs?

bakul · on March 25, 2024

Let me try to explain. The current way (which must continue working the same way even if the language is changed) does not allow you to distinguish a goroutine that should terminate but hasn't due to some bug versus a goroutine that can legitimately run for a long time. Making the goroutine "id" explicit can allow you distinguish the two cases. Store the id in some global or long lived variable or array if you want the goroutine to run for a long time. Otherwise carefully control the scope of such an id so that the runtime has a chance to catch the first case. That is my initial thinking but it would need to be fleshed out more. For instance, there should be a way to test that goroutine has terminated. Currently you do this explicitly by passing a channel and waiting for a message.

dagss · on March 25, 2024

The Go way is to pass the long running goroutines a different ctx from the short running.

Or even one ctx per goroutine and cancel them dynamically according to whatever logic.

dagss · on March 25, 2024

The convention of passing ctx does almost the same thing though. Make a new context.WitCancel and pass it to the goroutine.

It just requires programmer cooperation, but as long as you pass ctx all through the stack down and handle err on the way back, it is not often you deal with it explicitly.

silisili · on March 26, 2024

It does, but feels extremely bolted on(because it was). To do properly , near every single function and method has to take a context, which gets repetitive. Then every single thing in the call stack needs to actually check for cancellation.

It feels like something similar should be baked in, and inherited from its parent by default(but overrideable). And a cancel would cancel the callstack. Would be nice to make this cleaner in go 2, I think.

ikiris · on March 25, 2024

If you actually code in go there, they expect you to handle it yourself though by not orphaning goroutines. Err groups or similar will take care of this. Basically it's people writing bad code because they can. There's already options to handle it available.

psnehanshu · on March 25, 2024

Just wondering if threads in Rust can suffer such problems?

Background: I am coming from the JS/TS/Node world, and have decided to jump onto a compiled language. I narrowed down my choices to Go and Rust and eventually decided to go with Rust, because it didn't use GC for memory management.

Filligree · on March 25, 2024

Less likely, but it certainly can. The problem isn't the garbage collector; it's the overall approach to threading. Rust has a different culture that makes problems like this probably less of an issue, but nothing stops you implementing a memory leak.

In fact, since it doesn't have a GC, you can trivially create a memory leak by creating a reference loop... though the ownership checker makes that in itself really difficult, and so it's again less likely to happen than it otherwise would be. At the cost of loops being hard to make even if you want them.

pdimitar · on March 26, 2024

I have several production Rust projects under my belt and while I am nowhere near the level of most full-time Rust devs (I don't work with Rust full-time, it's opportunity based for me) I have noticed that it's very difficult to introduce such leaks with Rust.

It is possible to do it by using Arc / Rc and cyclical references (which the borrow checker makes infuriatingly difficult -- for good reason!) but you really have to go out of your way for it. And while there are real projects where you need such idioms I have found that you should not try to twist Rust's arm and just opt for something like arena allocator.

Thaxll · on March 25, 2024

Every language can leak memory and threads.

masklinn · on March 25, 2024

The explanation of the issue in ToDoneInterface really is not clear to me because of this:

> The defer close() seems to close well, but it’s on the wrong channel.

The `done` input channel is supposed to be closed by a caller, and the goroutine is closing the output channel, surely that's the point?

Now from what I know of go channels and understand of the code involved, the `done` channel may never get closed by the parents (and can never get closed at all if it's nil?), in which case the goroutine never receives a signal, never terminates, and leaks. But the explanation below the snippet confuses me completely.

And if that's that... what's the fix? Aside from not doing this sort of conversions? Just "git good scrub", try to make sure you don't rely on cancellation for progress, and hope you don't use raw background channels again, and don't forget to cancel your non-background channels?

subomi · on March 26, 2024

At this point, I think done chan should be an anti-pattern. Just use context for coordination and cancellation.

pdimitar · on March 26, 2024

I get how a Context can be used for cancellation but how is it used for coordination?

samatman · on March 25, 2024

A lot of these problems come from accepting a line in the OP, "A Goroutine is essentially a coroutine". The rest of the sentence is "...that maps onto green threads that map onto real native threads on your OS in an NxM way".

This is not a coroutine at all, calling them goroutines was a clever hacker pun along the lines of "GNU's Not Unix". If you treat a preëmptively-scheduled primitive as though it's a cooperatively-scheduled primitive, you're going to have a bad time.

Goroutines are threads, basically, with all the memory-management headaches that implies. Caveat emptor.

kbolino · on March 25, 2024

I was under the impression that goroutines were cooperative but that they yield pretty aggressively (on blocking channel operations, disk and network I/O, cgo calls, and certain syscalls). What makes you think they're preemptive?

masklinn · on March 25, 2024

New implicit yield points started getting added from 1.2, which added a yield point in the function prologue (so any function call, even with no IO whatsoever, could yield). I think later releases also added yield points on allocation and stack growth.

This culminated in 1.14, which made the runtime preemptive on most platforms (https://go.dev/doc/go1.14#runtime) in order to fix the last sticking point where a goroutine might not yield: a tight numerical loop might never yield.

This was an issue, because the GC relied on scheduling to slip in its STW pauses, so the GC would trigger STW, progressively pause every goroutine reaching a yield point, but would be unable to ever pause the last goroutine, and the program would pretty much grind to a halt until it was done.

There are ways to handle this (e.g. insert trapping reads in various control structures), but ultimately preemption was considered a better and more useful solution.

codethief · on March 25, 2024

I don't have any experience with Go but to my untrained eyes this looks very much like a general problem that I've noticed with coroutines in other languages, e.g. JavaScript or Python¹: Coroutines are so lightweight that people tend to "fire & forget" them, when in reality coroutines take up memory and can easily leak. One should keep track of them and garbage-collect them but last time I checked there wasn't a great out-of-the-box solution for that.

¹ Same thing in frameworks like RxJS, where observers in some sense take on the role of coroutines.

jerf · on March 25, 2024

It's a pity Go didn't have structured concurrency: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

There's a library for it: https://github.com/sourcegraph/conc

But this goes to one of the things I've been kind of banging on about languages, which is that if it's not in the language, or at least the standard library right at the beginning, sometimes it almost might as well not exist. Sometimes a new language can be valuable, even if it has no "new" language features, just to get a chance to reboot the standard library it has and push for patterns that older languages are theoretically capable of, but they just don't play well with any of the libraries in the language. Having it as a much-later 3rd party library just isn't good enough.

(In fact if I ever saw a new language start up and that was basically its pitch, I'd be very intrigued; it would show a lot of maturity in the language designer.)

roguecoder · on March 25, 2024

Guy Steele's talk "Growing A Language" gets into this. It's definitely worth looking up if you haven't seen it.

I think we've seen two ways languages have successfully been open to evolution:

Java was specifically designed to allow it to build on the standard library over time in fully backwards-compatible ways. It has required the central governance committee to adopt proposals, because reflection is slow and poorly-optimized, but it is a far more fully-featured language today than it was an inception without ever needing a reset. By keeping the surface area small & the strong "Everything Is An Object" paradigm in place, it has had remarkable longevity and has avoided the Python versioning pain.

The second are the "sharp knives" languages: Javascript, Ruby and to a lesser extent C++ (only because DLL hell is very real).

All three of these can be used to write software in any paradigm (including Aspects, if one is a masochist), and so let engineers invest in their own productivity. Languages where the standard libraries are indistinguishable from custom libraries require more skill and collective team alignment to use productively and safely, but also allow for solutions highly-opinionated languages can't support.

typesanitizer · on March 25, 2024

Structured concurrency is part of the Swift standard library, and was added at the same time when first-class support for concurrency was added.

TaskGroup in the standard library - https://developer.apple.com/documentation/swift/taskgroup

Explore structured concurrency in Swift (WWDC 21) - https://developer.apple.com/videos/play/wwdc2021/10134/

Cthulhu_ · on March 25, 2024

> The concept was formulated in 2016 by Martin Sústrik (creator of ZeroMQ) with his C library libdill, with goroutines as a starting point.

It's fairly new; the thing (and I think you address it too) is that the pattern did not exist yet when Go was introduced. Go is averse to adding more things to its standard library, or indeed changing its core fundamentals; I think it's better to have one well-defined way of doing things in a language, instead of adding the mental overhead of deciding between one or the other.

And I doubt Go will remove support for their original concurrency, like, ever. I'd love to see forks of Go made with core elements (like concurrency) swapped out though.

jerf · on March 25, 2024

Yes, I do know Go predates a solid description of the paradigm, though I hid that in what otherwise looks like a bizarre verb tense in my first sentence. :)

Part of the reason I rhapsodize about new languages off of that observation is precisely that Go can't add it. It almost wouldn't even matter if they tried to put it in the language proper, because by backwards compatibility the old ways would still work, and it would take a very long time to get the entire ecosystem to the new way.

roguecoder · on March 27, 2024

> it's better to have one well-defined way of doing things in a language, instead of adding the mental overhead of deciding between one or the other

That was the hypothesis of Golang, for sure.

I think we've seen that it is true in specific contexts. It seems like it's been particularly valuable to massive teams with less-experienced developers coming from academic computer science backgrounds, frequent turnover and high-coordination projects.

I don't think that property has proven to be valuable more globally. Consistency for consistency's sake is particularly costly when the "consistent" solution has significant downsides in some contexts.

On small teams, having consistency result from actual alignment is incredibly valuable, and a sign of a high-performing team. In those contexts I haven't seen consistency itself, enforced by an outside group and especially without a way to work around it when the "consistent" approach has a reason it sucks for a particular application, be similarly valuable.

masklinn · on March 25, 2024

> It's fairly new

That specific formulation is new but the concept had been floating around for a while. For instance Rust originally got scoped thread in 2015 (before they had to be removed for being unsound).

paulddraper · on March 25, 2024

Structure concurrency is to concurrency what structured program control are to program control.

I.e. unstructured concurrency is like GOTOs. Not necessarily wrong, but certainly nerve-wracking.

peterashford · on March 25, 2024

That's a lovely analogy, and I concur :o)

iforgotpassword · on March 25, 2024

I'm gonna be that guy. The old man yelling at cloud. I don't get all this high level crap. Coroutines, goroutines, fibers, async/await. It's supposed to make concurrency easy and safe. But I just fail to build a working mental model for it. I get the rough idea, but every time there's an await I wonder where execution might jump next. And then you read stuff like this, how these super high level comfortable languages fuck you over if you're holding them wrong, and even a Go dev has to admit they needed to stare at the code for an hour to get it. I don't understand half the words in that post, but it makes me want to stay away from that language.

I prefer multi threaded programming. Everything is off to the races (pun intended), you need to think long and hard about lifecycle management, who creates a resource, who will clean it up, how do you synchronize, where do you synchronize. It might be hard to get right sometimes. But the concept is simple. The tools you have available are simple. There's no "well everything works fine automagically unless it doesn't because these 10 lines of code".

pdimitar · on March 26, 2024

> I prefer multi threaded programming

Then you had sheltered and privileged career which I am finding out applies to a lot of folks on HN, apparently. So hold on to your cushy job because if you leave it you might find out the world has moved on long ago.

Especially Rust's async/await are instrumental to services that literally saturate their network link and would go higher if the link allowed more bandwidth. Normal multi-threaded code is folding under pressure somewhere at the 50_000 requests / sec mark on an average small-ish VPS. In the meantime the async/await code is saturating a 10GbE link.

Both Golang and Rust didn't make the perfect design choices though -- that much is true, sadly. But they are a big improvement over the status quo.

I'll grant you that Golang replaced one class of problems with another -- something I dislike as well. Wish it was stricter but it's good enough for 95% of all projects everywhere, if we have to be brutally honest with ourselves.

roguecoder · on March 27, 2024

Most of the world shouldn't be writing services that saturate their network links (actually, no one should: if you don't have slack in the system, it becomes incredibly error-prone. But even running at 80% saturation, most people shouldn't be writing those services.)

There are a small number of very large companies where their architecture is based on that sort of thing. But in most cases in most places the right approach in that situation is to ask, "why is this a thing you think you need?" And then to either inline the service or employ horizontal scaling strategies instead of trying to stuff more cycles onto one network card.

I do agree that these language choices are very likely a symptom of the misuse of service architectures. People are treating services like objects (or worse, Singletons) that live on a different machine, relying on the network stack as their interpreter. It makes "modern" software buggy and fragile. Neither of these languages solve that problem, but they do capitalize on it.

pdimitar · on March 28, 2024

> Most of the world shouldn't be writing services that saturate their network links

Very strange hill to die on, and feels like a side attack towards a point that you chose to ignore: namely that classic multi-threading is fragile and does not scale well. Doing the async/await thing is objectively better. I get why people don't want to admit that they invested so much in something and are grumpy that this skill is now not as needed -- I was one of them in fact, I am coming from C pthreads and Java after -- but being a curmudgeon and trying to do a torpedo attack on a discussion about multi-threading vs. async/await is something that I can't endorse.

> "why is this a thing you think you need?" And then to either inline the service or employ horizontal scaling strategies instead of trying to stuff more cycles onto one network card.

Very often false, having stuff being only on one machine with 1-2 hot backups and a load balancer is the lowest maintenance I've done on systems for all of my 22 years of career. Feel free to disagree but (1) most programmers will never work on the scale of AWS and Facebook and (2) horizontal scaling / distribution comes with many, many new failure modes.

KISS is an art that is being forgotten, it seems.

iforgotpassword · on March 29, 2024

On that note, I'd argue KISS is the multi-threaded approach. I'm maintaining a server software that is multi-threaded with plain old blocking I/O and it easily saturates a 10GBit/s link without breaking a sweat on quad-core CPUs from a decade ago.

pdimitar · on March 29, 2024

I guess it really depends. In all my practice with C pthreads and Java threads, this code always spiraled to unholy messes due to constantly adding to requirements.

iforgotpassword · on March 30, 2024

Yes, as an example, if you look at something like PostgreSQL or MySQL the code is absolutely intimidating because they use crazy data structures and synchronization techniques to squeeze out another nanosecond under heavy load. But I couldn't tell if async/await would make this any better while still delivering similar performance.

roguecoder · on March 27, 2024

Just because your position is unpopular doesn't make it wrong.

async/await is particularly damaging because it breaks the paradigm of Javascript. That no one can draw a picture explaining it is a huge problem, and it encourages people to write callback hell code by hiding how ugly it is. Unfortunately, whether the code is pretty or ugly the massive webs of nested callbacks are still a source of massive complexity and potential failures.

I constantly see tests in the wild that are passing even though the assertions fail because people don't understand the concurrency model they are using. And at least 60% of the time there was no reason for that concurrency to exist in the first place, except that it's what the code example looks like.

cherryteastain · on March 25, 2024

A goroutine is practically just a thread

doctor_eval · on March 25, 2024

I only skimmed the article but if I understand correctly, the problem was in a dependency (library), not in the code. That could happen to anyone and is (arguably) not really a fault in the language.

From the other comments here it’s not clear that there is a modern language that doesn’t have this problem.

roguecoder · on March 27, 2024

I believe the OP isn't arguing that the language shouldn't have the problem, but rather that we shouldn't be relying on the language to try to solve this problem.

By hiding the complexity of multi-threaded programming behind these facades, we let people write multi-threaded programs without understanding what it is doing. That in turn leads people to believe they don't need to understand parallelism to write thread-safe programs well, and it simply isn't true.

Companies don't want to pay for actual expertise, so we are all pressured to do things we don't understand. Languages can enable us to succeed some of the time, but they can't make distributed systems actually simple. The errors that pop up as a result are a predictable consequences of relying on under-trained workers without the resources we need to do our jobs well.

vonwoodson · on March 25, 2024

While The White House (and actual Rust and Go enjoyers) are advocating for these safe memory-safe languages; what is really going on is that the warts these languages have are just not as well known yet. In a few years time Go will be just as hated as C++, and there'll be some new darling programming language that'll "solve all out problems".

To be fair, I do look forward to when logic programming languages get their time in The Sun.

pdimitar · on March 26, 2024

We can't go from X% error rate to 0% error rate, sadly, even though we want to.

But if we can go to (X/2)% error rate then that's still a win.

I wouldn't mind if we replace Golang and Rust in 10-ish years or so. For now they are definitely doing better than C++, especially having in mind that the old guard is gradually retiring and the newer generation are not as good with it.

You seem disappointed that we haven't found the one true universal language yet. I am as well, but no need to trash-talk the current iterative improvements. Apparently that's how we'll get to that ultimate thing.

vonwoodson · on April 8, 2024

You have to squint pretty hard to think that I'm "trash talking" anything in my post.