Insane but also expected. When I tried it out when it was posted here and saw it...

blotter_paper · on Dec 9, 2019

> Is there no way to have a single hosted instance rather than downloading again for each user?

This might make the problem worse; then they'd have to do processing server side, rather than offloading it on the client. I dunno whether this would be more or less expensive than the initial download, but the torrent they put up seems cheaper either way.

chongli · on Dec 9, 2019

Weren’t they already doing the processing server side? If it were client side then it wouldn’t be costing so much to run git clone every time, as the download would be from GitHub to the user’s computer. It would be free, in fact.

My impression of the situation is that every user who tried to play would result in a new instance to spin up on Google’s cloud services and then begin downloading a fresh copy of the repo from GitHub. This is what cost so much in bandwidth.

blotter_paper · on Dec 9, 2019

A client side request to GitHub would require them to serve up a relevant CORS header, but I do think you're right about me misunderstanding where execution is taking place. I'm unfamiliar with Jupyter Notebooks, and assumed "downloading" meant "to the client". I, too, am now confused about why this is set up like it is. Probably some constraint of Jupyter Notebooks that I'm unaware of.

chongli · on Dec 9, 2019

If you install and run Jupyter on your local machine, it’ll spin up a web server on localhost and then connect to it in your browser. All of the Python code runs on the server and only the results are sent to the client, to be displayed in the browser.

dodobirdlord · on Dec 9, 2019

> Weren’t they already doing the processing server side?

Yes, but they were using Google Colab because Colab will give each user their own dedicated Nvidia K80 for free. Google will spin up a new instance to back each user's Colab session, but on Google's rather than the researcher's or the user's dime. The downside though is paying for the data egress, which can be avoided if the users download to Colab from somewhere else, or download from somewhere else to their own machines that have a GPU with 12GB of onboard memory.

baroffoos · on Dec 9, 2019

I'm pretty sure its not downloading to the client since the dataset is apparently pretty massive. It looks like its downloading it to a vm or something and creating a new instance of the service for every user.