> how do you handle and account for preemption? I do most of my experiments with...

rryan · on July 9, 2017

That's really odd that that Keras API's interval is measured in epochs (which is a different wallclock interval for every different model/dataset/hardware configuration). It's much more common to checkpoint based on a time interval.

gcr · on July 9, 2017

Oh interesting, I've never seen checkpointing on a time interval. Most Torch examples just dump the model to disk after the epoch finishes.

One reason to use epoch checkpointing is because that ensures that all samples of the training data have been seen the same number of times. If your data is large and diverse, with heavy enough augmentation it might not matter very much