Yeah because Linux's memory management is quite poor and running out of RAM without swap will often mean a hard reboot. Swap definitely helps a lot, even if it doesn't fully solve the problem.
To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory. I'm not sure why Mac is so much better than Linux at memory management.
My eventual solution was to just buy a PC with a ton of RAM (128 GB). Haven't had any hard reboots due to OOM since then!
> To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory
To be fair, my Windows system grinds to a halt (not really, but it becomes very noticably less responsive in basically anything) when JetBrains is installing an update (mind you I only have SSDs with all JetBrains stuff being on an NVMe). I don't know what JetBrains is doing, but it consistently makes itself noticable when it is updating.
I have had this happen in the past (not very often though), and another saving grave of Windows is you can press ctrl-alt-del, which somehow seems to pause the rest of the system activity, and then see a process list and choose which one to kill.
Linux doesn't have anything like that. KDE seems to have a somewhat functional Ctrl-alt-del menu - I have been able to access it when the rest of the shell gets screwed up (not due to OOM). But inexplicably the only options it has are Sleep, Restart, Shutdown or Log out!! Where is the "emergency shell", or "process manager" or even "run a program"? Ridiculous.
I think Linux GUIs often have this weird fetish with designing as if nothing will ever go wrong, which is clearly not how the real world works. Especially on Linux. I've genuinely heard people claim that most Linux users will never need to use a terminal for example.
> To be honest I don't know why it's such an issue on Linux.
edit: I wrote all this before realizing I overlooked that you answered it yourself, so below is my very elaborate explanation of what you said:
> Windows presumably because it doesn't over-commit memory.
I'm no expert but from what I've gathered this ultimately boils down to how Linux went with fork for multiprocessing, vs Windows focused on threads.
With fork, you clone the process. Since it's a clone it gets a copy of all the memory of the parent process. To make fork faster and consume less physical memory, Linux went with copy-on-write for the process' memory. This avoids an expensive copy, and also avoids duplicating memory which will only be read.
The downside is that Linux has no idea how much of the memory shared with the clone that the clone or the parent will modify after the fork call. If the clone just does a small job and exits quickly, neither it or the parent will modify a lot of pages, thus most of them are never actually copied. The fastest work is the work you never perform, so this is indeed fast.
However, in some cases the clone is long-lived and thus a lot of memory might eventually end up getting copied. Well, Linux needs to back those copies with physical memory, and so if there's not enough physical memory around it has to evict something. While Linux scrambles to perform the copy, the process which triggers it has to wait.
AFAIK one can configure Linux to reserve physical memory for a worst-case scenario where it has to copy all the cloned memory. However in almost all normal cases, this grossly overestimates the required memory and thus leads to swapping when technically it is not needed.
On Windows this is very different. Instead of spawning a cloned process to do extra work, you spawn a thread. And all threads belonging to a process shares the same memory. Thus there is no need to clone memory, no need for the copy-on-write optimization, and thus Windows has much better knowledge about how much free physical memory it actually has to work with.
Of course a thread on Windows can still allocate a huge amount of memory and trigger swapping that way, but Windows will never suddenly be in a situation where it then also needs to scramble to copy some shared pages.
Yeah that wasn't correct. It will however cause the kernel to refuse memory allocations[1] which could have been allowed, and a lot of programs don't handle that gracefully.
My experience is different. Running out of RAM without swap will cause the most memory–hungry process to die, whereupon systemd restarts it. Running out of RAM with swap causes thrashing and you can't serve any requests or ssh logins. Someone has to press the reset button then.
If I did something (like try and decompress an archive) and I run out of memory I want that process to be killed.
If my system/config is simply not up to scratch and the normal services are causing thrashing that needs to be addressed directly and OOM kill isn't intended to help I don't think.
To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory. I'm not sure why Mac is so much better than Linux at memory management.
My eventual solution was to just buy a PC with a ton of RAM (128 GB). Haven't had any hard reboots due to OOM since then!