If you're "learning about ML" there is no point in buying anything. Just get the...

jjoonathan · on Sept 29, 2020

What's the (pre-Ampere) GCP price for a V100? On AWS it was $3/hr, so at 100% use and market prices a Titan V would pay for itself vs the cloud inside a month. Is GCP significantly cheaper? Or are we talking about pricing at ~0% utilization?

TylerE · on Sept 29, 2020

100% utilization is a pretty huge assumption.

And if you ARE actually running it that hard, you'd better budget for fairly frequent replacement cards.

jjoonathan · on Sept 29, 2020

At 50% utilization it beats the cloud in 2 months. 10% utilization, it beats the cloud within a year. If you're dabbling, definitely go with the cloud, but if you're turning around experiments on a regular basis, buying gets attractive quickly.

And no, cards don't just keel over in a few months at 100%. Crypto miners ran that experiment. A typical card has years of 100% in it.

TylerE · on Sept 29, 2020

Only if your power is free (it isn't) and the machine the card is in is free (it isn't) and said machine produces no heat or noise (it does).

nullifidian · on Sept 29, 2020

Power doesn't cost nearly as much for TCO to get anywhere near even the "preemptible price" of V100 (and probably A100 when it's ready) over a period of half a year. And now that 3090 has 24gb, which is needed for larger models, a solution with a couple of consumer cards is even more competitive for experimentation. You can also sell your cards in a year or two, and recover some of the costs. (All that of course if Nvidia's gimping of consumer cards doesn't significantly affect your code)

gambiting · on Sept 29, 2020

So for the people who I know do it professionally, the answer is really simple - when they run calculations for clients, they can add GCP compute time on the invoice. You can't bill a client for your own electricity usage. I mean you can factor it into the price of your service, but then that brings a whole pile of other issues with it.

jjoonathan · on Sept 29, 2020

Cloud if:

- Someone else is paying

- You expect to dabble

- You need burst capability

Buy if:

- Cost sensitive & capable

jjoonathan · on Sept 29, 2020

I rounded down AWS's price by $.06/hr, more than the price of electricity and cooling around here.

Karunamon · on Sept 29, 2020

Wait, what? I used to mine crypto on GPU back when that was a profitable thing to do, which involves leaving the card maxed out for long periods of time.. no real damage.

Are modern cards really so fragile you can expect them to die off under heavy use even if properly cooled and not overclocked?

ls612 · on Sept 29, 2020

No. Obviously people who use them for gaming can play for thousands of hours at 100% utilization and they will last for many years on average.

jjoonathan · on Sept 29, 2020

No, they are not that fragile.

claudeganon · on Sept 29, 2020

Agreed. If you’re just learning or building hobby stuff, you can use Colab, Paperspace, or any number of other services for free or very cheaply.

emteycz · on Sept 29, 2020

I want a gaming computer than won't limit my future ML learning. Are there any suggestions for that use case?

gambiting · on Sept 29, 2020

There's no such thing as long as you buy an actual mid to high tier GPU. Even an ancient GTX1070 would be more than enough - and for sufficiently large datasets even an RTX3090 will take hours to process whatever you're crunching.

Just buy a PC that you like for gaming(with an Nvidia gpu) and don't worry about ML yet - it's incredibly unlikely that you can pick something that would limit you in any way. Small datasets will run on anything, large datasets will take hours to process no matter what you run them on. It's not a "limit".

claudeganon · on Sept 29, 2020

Some off-the-shelf gaming PCs are not very Linux friendly though, so they should watch out for that, especially the laptop varieties. Getting a lot of the ML stuff working locally in Windows is a nightmare.

claudeganon · on Sept 29, 2020

You're probably better off building your own machine and dual booting Windows and Linux. Here's a good guide for ML requirements, only a little out of date (published before the release of the 3080):

http://timdettmers.com/2018/12/16/deep-learning-hardware-gui...

currymj · on Sept 29, 2020

just make sure it's NVidia. whatever graphics card you want -- all their consumer cards will work great for deep learning.

make sure your motherboard and processor support whatever the newest version of PCIe is -- a major factor with deep learning is bandwidth moving data on/off the GPU.

AMD GPUs can theoretically be used for machine learning, but right now software support is lacking -- you will spent more time configuring and installing than learning. (AMD CPUs are fine though.)

it doesn't really matter that much though -- any gaming PC with a new-ish NVidia card can be used to do quite a bit of interesting ML.

AnthonyMouse · on Sept 29, 2020

This is also a reason why it might make sense to hold off unless you have some kind of time-sensitive project.

Nvidia came to dominate the market at a time when AMD wasn't making particularly competitive GPUs, but that isn't really the case anymore. For anything not so expensive that nobody is really going to buy it anyway, the current and expected (in less than a month) AMD GPUs are competitive on performance.

The result is that a lot of large customers, who see value in not being locked into a single supplier, are going to be pushing for frameworks that work across multiple vendors. And then you could plausibly be wasting your time learning Nvidia-specific technology which is about to become disfavored. So you might want to wait and see.

jjoonathan · on Sept 29, 2020

I tried to go red twice. Red team has been winning at perf/$ for a decade! I thought I did my homework and established compatibility and suitability for the purposes I cared about. Unfortunately, both times I eventually ran into unanticipated incompatibilities I couldn't work around. I wound up paying the green tax anyway and also the price spread + ebay fees. Oof.

Twice bitten... once shy? In any case, I'm going to let someone else be the guinea pig this time.

currymj · on Sept 29, 2020

i think most people would just use TF/PyTorch and ignore the specific technology on the backend. not much GPU specific stuff to learn -- very, very few deep learning people write their own CUDA code.

so the question is just -- when will it be very simple to install these packages for AMD GPUs, with enough mathematical operations implemented and optimized to let you do the things you want to do.

right now things sort of work, but it's definitely in a bleeding edge early adopter state. it's seemed like AMD is on the cusp of catching up for a couple years now, but it's taken longer than I expected.

jjoonathan · on Sept 29, 2020

> very few deep learning people write their own CUDA code.

True, but even once TF/PyTorch support AMD well it's highly possible that an unanticipated CUDA dependency will pop up in one's computational journey. NVidia subsidized CUDA seminars for a decade and now it's all over the place, both in the flagship frameworks and in the nooks and crannies.

bitL · on Sept 29, 2020

That's a terrible advice, while Navi2 might be finally competitive(?), not being able to run most models due to CUDA/ROCm differences would seriously limit one's ML work.

cauthon · on Sept 29, 2020

Buy whatever makes you feel happy. I agree with gambiting that anything you choose won’t limit you