Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We deal with Haskell resource leaks the same way you would in C++ or Java.

We have production monitors on every host that show basic metrics like memory, disk, and CPU utilization. Atop that, we added a tracker for the number of suspended Haskell threads. (that is, threads which are not blocked on I/O, but are also not running)

We found that the machines are usually able to handle requests as soon as they come in, so if the number of Haskell threads goes above 0 for any length of time, the machine is about an hour away from melting down.

We can restart the process without losing any connections, so this leaves us a very comfortable margin of error.

Once we know we have a problem, it's usually pretty simple to run the heap profiler on the process and look at recent commits. We continuously deploy, so there's only about a 10 minute delay before a particular commit is running in front of customers. This makes tracking regressions down really fast.

Even in cases where we can't figure out why a bit of code is leaking, we can almost always identify it and revert it until we understand what's going on.



> We can restart the process without losing any connections

Would you mind expanding on this a bit? I'm not too familiar with Haskell, but I am familiar with various was of blocking new connections while allowing existing connections to complete, either at the load-balancer level or built-in each individual process.

What Haskell stack are you using, and how are graceful restarts accomplished?

Thanks.


One of my coworkers wrote a really cool bit of software to do this. I want him to open source it.

Basically, you can share a single socket amongst many servers. The OS ensures that just one process accepts each connection.

You can therefore have a manager process that owns the socket and passes it on to application processes.

To update, start new processes, then politely tell the old ones to go away.


One really cool thing in Linux is that you can actually pass file descriptors between processes over unix domain sockets.


Windows has supported this for ~14 years too.


Good to know. Does it work for everything that's an fd in Linux? I know you've got to treat sockets and files differently in some cases (or at least did once)...


It works for most kernel handles, sockets might be a little more normal starting with Win7 but I stopped doing Windows development around then.

Here are the official docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ms72...


Looks like there's a separate function for sockets.

Still, cool stuff there too.


einhorn [1] implements this model and is pretty effective. Used in production at Stripe and other places. (It's written in Ruby, but can run application processes in any language.)

[1] https://github.com/stripe/einhorn


Basically, catch SIGINT, then stop listening to a socket/port. Finish all current requests and exit. The "watcher" parent process will restart the process with the new executable. Repeat for all other processes listening to the socket/port.


I can't answer for grandparent, but you should check out https://github.com/notogawa/graceful


Except in Haskell you can build ekg right into your server. http://ocharles.org.uk/blog/posts/2012-12-11-24-day-of-hacka...


"we added a tracker for the number of suspended Haskell threads" - would you mind sharing how you did that? I couldn't see any obvious GHC APIs for it.


It looks like you're right. I misspoke.

We track total threads, working or not. It works great as an indicator because it tends to stay below the number of CPU cores on the server.


I take it you mean OS-level threads then?


We track Haskell threads.

edit: Found the code. :)

We rolled our own implementation. Our WAI application action increments a counter and decrements it again whenever an HTTP request is received and completed.

It doesn't track threads created as a part of HTTP request handling, but we don't allow those actions to forkIO anyway. There hasn't been any demand for it.


Ah, right, that makes sense - thanks.


... so, how? Isn't that what he was asking?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: