They are not completely independent. It's a good assumption though. If a model encounters something out of distribution then all five of the generations will fail. If the model knows and went in a wrong direction (due to lack of reliability), within five generations, it can be corrected. You need evals, runtime verifiers as basic harness for AI systems.