On the NVIDIA GPT- 2 implementation: >What would be the largest model one could ...

gwern · on Aug 20, 2019

They haven't released such models, though, and I don't know if it would be drop-in compatible with the OA GPT-2-774M checkpoint (they're training their own GPT-2s using their own webtext corpus).

JonathanFly · on Aug 20, 2019

I haven't look into at all myself, but he also said:

>We do provide training code that should work out of the box for gpt2 117M/345M

https://twitter.com/TheRealRPuri/status/1161319745259393024

p1esk · on Aug 20, 2019

It would take forever (or $$$) to train even 117M model from scratch.

JonathanFly · on Aug 20, 2019

I read that meaning you can start with the actual pre-trained GPT-2 models but I never got an answer when I specifically asked if that was the case.