“ With machine learning, you spent most of the time copying memory between the C...

Jhsto · on Sept 29, 2020

Sure, but compared to the 3080 I'd say that the main deal is the bigger RAM for copying reason than the increased core count.

option · on Sept 29, 2020

TIP: if you need to emulate a bigger batch with less RAM available - use gradient accumulation trick. Super easy to implement in Pytorch and it is already implemented as a single flag (accumulate_grad_batches) in Pytorch Lightning.

p1esk · on Sept 29, 2020

Your gradient accumulation trick involves multiple cpu to gpu transfers, which is precisely what the parent is trying to avoid with fitting a larger batch in gpu memory.