If you start with an impure lazy programming language, you will very quickly discover that you may as well make it pure and handle side effects through an IO type. Specifically, it would be far too hard to perform effects in the correct order without an IO monad. I wrote up an explanation:
Side effects like IO are very difficult to handle in a pure language. I believe Simon Peyton Jones' argument was that, if it's strict, there is too much of a temptation to say, "screw it", and abandon purity.
Those were all designed after monadic IO was introduced in Haskell. The ability to mark IO operations in types (and the do notation) was a game-changer.
The same way as a pure and lazy programming language. That is to say, the IO monad. This abstraction works for both lazy and eager evaluation because it's internally just passing a token from function to function.
When you have something like `doThis >> doThat` (two actions sequentially) the first action results in a token (RealWorld#) that must be passed into the second action. Even though the evaluation of the two actions may be undetermined in a lazy language (but determined in a strict language), the token being passed around has a definite order.
In a lazy language, you have a significant difficulty in ensuring
the order of operations, specifically printing a prompt and reading the answer. Before someone (Mogi?) realized that a category theoretical monad with specific rules made IO sequencing trivial, the state of the art (IIRC) was world-passing. (With world-passing, you have a value that represents the state of everything outside the program; any IO operation absorbs the current world value and returns a new one. Only the most current value actually works.)
I don't know if it is still the case, but the last time I poked around, in GHC the IO monad was implemented using the older world-passing code.
In a strict language, the answer is obvious. In a non-strict language, line 4 has a data dependency on line 1 so always executes after it, ditto lines 3 and 2. But how the two groups get interleaved is completely unpredictable, so, if you’re really committed to non-strict evaluation, you need a way to force data dependencies between all four lines such that order of evaluation is forced. Once you achieve that, you have a bunch of ugly-looking machinery that’s peppered across your whole codebase.
Monadic IO (and do-notation in particular) gives us a way to write this sort of code without feeling the urge to gouge our eyes out.
Laziness requires immutability to work well, and that means you need to represent mutations such as IO operations in an immutable cover like the IO monad.
You've got it backwards. Laziness requires mutation to work well. Laziness is just a thunk (a closure) that can be overwritten by a value at a later time. It is impossible to implement laziness if you cannot mutate memory: if the address of the unevaluated closure and the evaluated value cannot be the same, you have to update references to the closure into references to the value, but you cannot do that if you can't mutate memory!
People always assume garbage collectors in Haskell might somehow be different due to immutability. But due to laziness it works just like Java garbage collection because laziness requires mutability.
I don't quite understand. You don't update any references when thunks are evaluated, GHC's runtime does. The runtime shields you from any mutation, doesn't it? (Unless you explicitly ask it to let you mutate a value in memory, of course. But that goes beyond the normal evaluation of thunks.)
Oh yes. I just realized we are talking past each other. I'm talking about the implementation detail of laziness. The mutation is internal and usually invisible. The only knob that I know of to adjust it is -feager-blackholing to control whether evaluation results in two mutations or one (except when switching threads: always two in that case).