Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Dungeon 2 costing over $10k/day to run on GCS/Colab (twitter.com/nickwalton00)
415 points by _ps6d on Dec 9, 2019 | hide | past | favorite | 202 comments


Previous HN thread since I missed it: https://news.ycombinator.com/item?id=21717022


Cloud bandwidth costs are a rip off. For this particular situation though, make sure to replicate your files across multiple buckets, one in each GCP zone, otherwise you incur the cross zone transfer costs. We do something similar to serve julia downloads, because it turns out that most downloads are from people running on the cloud (so we basically replicate our binaries to every cloud provider and then to every region for that cloud provider). Everything else then goes through fastly for us (we also use fastly to serve custom redirects if your request comes from one of aforementioned cloud providers). That works pretty well. You do have to monitor it though to make sure that load doesn't suddenly shift to a cloud provider you didn't account for. For example, after GitHub actions became widely used, we suddenly started seeing TB/day traffic from Azure, which we hadn't deployed any caches to, so our bandwidth utilization on Fastly shot through the roof. (Side note: Shout out to fastly for hosting our binaries for free!). Without the cloud provider caching setup, we'd probably be at similar $/day costs, but this way it's basically free (even without the free fastly service, we'd only be at ~$1000/month or so).


Can you describe a little more how you direct downloads to an in-zone copy?


I imagine using known IP ranges (https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.... & https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges....) you could redirect to local resources within their region.

I'd probably use Lamda/NodeJS with a ~20 minute in-memory range cache that should stay under 128MB of memory per instance. Perhaps I'd store all the range starts and ends each as two 64-bit ints on a database of your choice for persistence, indexing, and comparison. Finally, some code to convert IPv6 and IPv4 into 128-bit (2x64-bit ints) IPv6 integer space and back.

A second service could listen for IP range updates avoiding any bandwidth fiasco if a new range opens up with a sudden influx of traffic.


I think they were looking for a solution where you replicate the data natively and direct to that copy for a certain region, rather than cache and serve


The cloud providers all have public lists of their IP ranges, so there's a fastly ACL for each provider/zone pair and for matching IPs, some custom VCL serves a HTTP redirect to the HTTP url for the appropriate cloud provider's cloud storage solution (S3, GCS, etc.). On the VMs, these resolve to appropriate internal IPs to obtain data from the storage service without charge.


OVH, BuyVM, Scaleway, etc provide cheap bandwith: serverhunter.com

Disclaimer: serverhunter.com sponsored my game


This is good advice for only a particularly narrow use case. If your clients are likely to be on-prem in the same cloud as you, I think most cloud providers will charge you local data transfer fees (e.g. nothing) rather than egress fees, even if the request is made over the public IP address.

If your users aren’t mostly in the cloud with you, than this strategy doesn’t help at all.


> If your users aren’t mostly in the cloud with you, than this strategy doesn’t help at all.

Sure, but in the situation in the linked tweet 100% of the users where in the same cloud, just in the wrong region. For julia, the distribution is much more mixed, which is why we have the fastly fallback, which works as a traditional CDN. Still caching locally in each of the clouds is useful as people often download a fresh tarball of nightly julia when they CI their packages, so the load from the clouds is quite high.


Not blaming them, but still a nice example to illustrate the myth that 'the cloud' will let you operate without a sysadmin (someone that knows about setting up and managing infrastructure). Yes, there might not be servers you have to rack and electrical power to be provisioned, but the amount and granularity of the parameters and decisions on the virtual fabric to be decided on and continuously monitored for changes and opportunities that could have an enormous impact on performance and cost is orders of magnitude higher.


Also, at least from what I've witnessed, the kind of sysadmin activities often stated as those that cloud eliminates was already becoming a vanishingly small element of the day-to-day work involved in being a sysadmin.

It was already going away, and cloud was not the reason. But with the whole cloud = ditch sysadmins, and the weird way that 'devops' was interpreted to mean a similar thing, a whole lot of baby got thrown out alongside the bathwater.

Virtualization, the maturity of modern networking, compute and storage systems, and the sheer capacity of each atomic piece of hardware means that kind of stuff had already become a once-every-couple-of-years kinda task for most sysadmins, except the few still stuck in some kind of backwards anachronistic environment.


> Also, at least from what I've witnessed, the kind of sysadmin activities often stated as those that cloud eliminates was already becoming a vanishingly small element of the day-to-day work involved in being a sysadmin.

This. Most of my work is not technical and consists of technical-ish meetings, and budget arguments.

Automation is part of that, but also that a lot of stuff is fairly robust and works (mostly) after initial config. Three weeks of setup, plus some shakedown (and a few late nights when the new mission-critical system stutters), but then it's mostly autopilot, or scheduled maintenance.


Also not blaming him/them but isn't this an indicator of lack of cost control at research facilities? Surely someone should of costed this out before making it public?

Here in the UK running at 10k/day would of ruined a departments budget for an entire year!


Costing for cloud solutions is often not trivial, and a single overlooked parameter can do you in. I at least blame Google partially for having quotas and spending limits set to unlimited by default.


Don't forget Amazon's incomprehensible billing practices for AWS. It seems to be designed to be impossible to figure out what you're going to spend ahead of time.


Amazon really seems to be doing this on purpose. Even tasks as simple as starting an EC2 instance require referencing multiple pages to find out how much this instance will cost you, while some other providers just show the predicted costs/month in the instance creation dialog.


in addition to how difficult costing cloud services is, what if you get it perfect but underestimate the success of your product?

great, "it would have been good except we were too popular" do you then cut off your popular product while it's gaining traction, and thus almost guarantee you kill it in its infancy?


The non cloud version is your website goes offline for a day due to high traffic. Depending on the website that may or may not be preferable to variable costs.


The Twitter feed says the $10k+ are for serving about 60k Visitors a day. This isn't exactly high traffic. A dedicated server from Hetzner with unlimited Traffic and 1gbit/s uplink, a 16 core Threadripper, 1TB SSD and 128G RAM costs €140/Month and would be absolute overkill for 60k Visitors. It doesn't have GPUs, though.


They are seeing 60k downloads of 6GB models per day, which is about 33Gbps of bandwidth (assuming no burstiness in when people visit, which is a poor assumption).

That is starting to get out of the range of what is easily available. 10Gbps circuits are commodities (I had one at my desk at my last job), but 100Gbps circuits are still pretty pricey. And, it's not necessarily trivial to get that kind of throughput on file serving out of the box; this bandwidth is something like CPU <-> video card, not disk <-> cpu, or cpu <-> network. Some tweaking is for sure going to be necessary if you are self-hosting this, and now you're tweaking network parameters and writing a custom file server instead of writing your game.

The cloud here is making something possible that should never have been possible, which is pretty cool. Being able to go from 0 infrastructure to 30Gbps of file serving without lifting a finger is somewhat impressive... but with that fast iteration times, comes the entity that did all the work wanting their cut. It seems fair to me, though perhaps not economically viable. Such is life.


> 60k downloads of 6GB models per day

Wait... you mean the actual computation is running client in the browser? I didn't even open this "game", but I assumed such high cost is because there is a separate GPT-2 running on a GPUs for each and every user.


No, the computation is not running on the client in the browser. This is the traffic to transfer the model from GCS to Google Colab.

This is what makes the price so surprising - you are copying data from one Google Service to another, but it's billed as egress.


But if you were hosting it yourself you wouldn't transfer 6G of data around per user. You'd be a bit more intelligent about it.


They send it to Google Colab. It allow you to run model over a powerful server for free. 6 GB of download at this crazy bandwidth cost would still be cheap versus offering themselves that kind of beast of a server to 60k persons each day. I remember when I tried it I saw that it used 10 GB of memory, that was crazy!


You'd save on transferring the bytes around, but now you would have to self-host jupyter and the GPUs it uses. That is going to be even more expensive than IP transit because now you have to have 60,000 12GB GPUs in your datacenter.

Like I said before, this is one of those things that wouldn't exist without the Cloud. If you run things on your user's computers, you have to send them a lot of bits. If you run things on your own computers, you're spared that bandwidth, but now have to have enough "computers" to satisfy your users. It's simply something that's not super cheap to run these days.

I will admit that it is surprising that Google <-> Google traffic is billed at the normal egress rates, but the reasoning does make sense -- a 30Gbps flow is nothing to sneeze at. That is using some tangible resources.


No, the models are running in Colab - but in the end-user's account - so each 'run' costs the lab 6GB of internet egress when the model is downloaded from GCS to the Colab VM.


It absolutely is high traffic if each and every one of them is downloading your NN model


I see the model size is 6G - a bit larger than I thought :) 100gbit/s upstream would cost an additional 1k€/month.

The larger problem is that people didn't actually download the model, but apparently got custom server instances that got a copy of the model plus a gpu to run them on.


The server instances are free though, google only charged them for the datatransfer to those instances.


About a year and a half ago, I rented a Hetzner server with a 1080 GPU, 16 core i7, etc. it was only about $105/month. I had a job queue for TensorFlow/Keras experiments and set it to mine ether when there were no ML jobs. I did this until I bought a System76 laptop with a 1070 GPU.


For similar price point (€105.62), from e.g. Azure you can get their "E2 v3" VM, with 2 vCPUs and 16 GB RAM, plus 50GB temporary storage. That's...quite a difference.


This isn't 2005. It really doesn't need to happen anymore.


Says the person not paying the $10k/day unexpected bill...


Wait, wait, this is a university developing something, that needs to be downloaded, they choose a commercial solution for that, and that costs them $10k per day? They don't have their own file-hosting solutions? At a university with an AI dept?

With that money ("for a couple of days"), you could buy your own server and do it yourself, for a fraction of the price. And don't tell me you can't find people who'd be capable of doing that, if you've got AI development going on.

What am I missing?


Colab is effectively JupyterLab/JupyterHub with Google's own add-ins (like integration with Google Drive). JupyterHub is a huge PITA to manage, and Colab also offers a degree of free compute and limited access to a dated, but still free K80 GPU.

From the tweet thread, it seems like there was some misunderstanding over where the files are being stored and being executed. This is a pretty common issue with Google Drive. I.e. if someone shares a file with me, and I copy it to a folder, it's just a pointer to the original file. Only after clicking "Add to My Drive" does it count against my storage allocation, and only then is it a distinct copy.

My guess is that the researchers expected each user to be able to run the game in their own personal free Colab environment, not be running it against the university's compute and storage budget.


Google Colab now comes with free T4 GPUs.


Occasionally. You randomly get a K80 or T4. It's annoying.


As an "SRE engineer", I don't agree. Certainly in an university/R&D environment, you cannot compete with what 'the cloud' is able to offer within the reach of a few mouse clicks. It would be absolutely moronic to set up an entire infra yourself with GPU's, ... specifically tailored to for 1 research project to then throw it away after a few months, or see the scope change every 2 weeks.

One of the reasons such AI tech is able to move this fast because the cloud has become a commodity. People start to expect this kind of flexibility. It's impossible to self-host every possible thing some guy might want to try out.

Once you have a stable application - sure you might want to invest in own hosting for this, but even then it's an expensive up-front investment for something that might not go anywhere. I know a few places where they have a bunch of maxed-out nVidia DGX systems catching dust for exactly this reason.


At 10k per day it is absolutely worth it to hire someone to take care of a bunch of cheap servers.


That's what, $2.5MM/yr?


Servers with GPU's are not going to be cheap :)


Servers are a CapEx purchase, and can be depreciated or re-purposed.

10k/day makes sense for a short project, but if this is going to be an on-going thing -- like 2-3 years -- then absolutely buy that hardware and have an internal team run it.


It doesn't seem to be as simple as downloading a binary.

We (those who read HN) could probably download the code and run it at home, but I think the authors want non-technical users to play. To do that, they need an accessible Python runtime, so they're hosting the game in a Colab notebook. The download in question is referring to downloading the weights of the neural net into the VM running the notebook.

If only redistributing Python apps wasn't so difficult.


The game's GitHub page[1] states that you would need a "beefy" GPU ~12 GB and CUDA to play the game locally.

I think that's why the author was serving the game through Colab since the majority of users probably don't have a 12GB GPU.

[1]https://github.com/AIDungeon/AIDungeon/


Ooof, yes, now I see where the $10K/day is coming from...

As I have demonstrated, I've really not much of a clue when it comes to AI, but do users really need 12Gb GPU RAM, 100% of the time? Maybe it's possible to use one GPU for multiple users?


Google Colab gives each user a dedicated Nvidia Tesla K80 GPU for 12 hours for free, which is super cool and presumably why the project is on Colab. But as each user spins up their own Colab instance it pulls down the 6GB of GPT-2 model weights, incurring 30-40 cents of data egress charges against the GCP Storage Bucket that the data is stored in.

60k users yesterday * 6GB each -> 360TB of data egress!

Normally, a scenario like this wouldn't involve bandwidth costs because GCP -> GCP same-region bandwidth is free, but Colab is technically not part of GCP, so the bandwidth charge is being assessed as egress to the public internet, which is pricy for that much data. Though it's probably still a lot cheaper than paying for the GPU-hours for that many users.


This is sort of a killer example for the layperson of what ML can do, so hopefully Google will recognize this and comp most/all the data egress since it's a drop in the bucket for them, but every person that uses it can still go "Wow, Google's services allows for some amazing stuff."


Oh, it does. NOT.FOR.FREE, though; that's the entire reason it exists at all.

In other words, yes, this is an amazing amount of raw power - with a corresponding price tag.


But you could set up your own file-serving node and then data transfer to end-user Colab instances would be free, right? ("Free" — costing only as much your CDN costs for 360TB/day, which is still quite pricey, I think, but not as much as $10K/day. I.e. Google wouldn't charge you for data transfer here.)


Yup. I think I saw yesterday that they were looking to move the model to BitTorrent.


The $10K/day was actually coming from the large egress fees they were getting for transferring the models and agents from Google Cloud Storage to the Colab notebooks. I think if you were to serve the game as a web app you definitely wouldn't need one instance of a 12 GB GPU for each user. But the thing about Colab is that you need a Google account to use it, and you run your own notebook, independent from the author's account.


$10k/day is just for file transfer, not for GPU.


Ok, ok, let me see if I now got this right: The AI needs a VM (because users rarely have 12GB GPUs at home), and this VM then downloads about 6GB from GitHub each time a user opens up a new session instead of sourcing that from a locally cached copy?

EDIT: Almost, the VM runs on Colab, which only works with Google, and Google's charging for the upload to Colab? ...more like collaborator, amirite? dodges rotten eggs


Yeah, now you almost got it right! Colab is a service provided by Google.

The funny thing is the high egress fees Google charges for transferring data between two of its services (GCS -> Colab).


Funny, cheap and cheerful, little hidden costs in the fine-print. Oh my Google, how could we ever be mad at you? opens a can of laughter


I played it on a cpu. I downloaded it to remove the profanity filter. It's several seconds for each answer, so it's not ideal, but workable.


A lot of HN users couldn't run it and have a reasonable experience because they don't own computers with GPUs capable of generating a response in a timely manner.


Could something like PyInstaller[1] work?

[1]https://www.pyinstaller.org


If agree in theory, but I'd like to disagree with some assumptions. AI devs don't necessarily have experience with efficient network infrastructure. (And ops people with AI development)

You can't just "buy servers for the same price". Where are you going to put them? How are you going to power them? How is the bandwidth for them provisioned? At some scale these are non-trivial questions - you can't just buy a rack, stick it next to your desk and plug into an extension cord.

The system deployment is something you need to spend time on as well. Bare metal provisioning and deployment of GPU libraries to make things run smoothly takes time.

And finally when the hype dies down in a week or two, what are you going to do with that infrastructure?

Cloud services are not trivial either. But they do have some advantages.


Possibly the author of the parent comment was puzzled why does an AI lab not have a specialist who provisions the bare metal. I would think that if the hype died in a week, an AI lab would have other projects which require similar equipment to run.

I agree with your post in general as intuition tell me (without further details about ops situation) that cloud is a competitive fit in this scenario.


At this "scale" a single 3Gbps guaranteed pipe is ~330€/mo on OVH. You could simply buy ten of them just to be sure and still be paying 1/3 a month of what that would cost you for a day in the cloud. That includes all the maintenance and power for the equipment, you simply have to know how to use the OS.


My understanding is that each user runs a separate Colab instance with its own hardware, it's not the kind of thing you could replicate easily with one server and a couple GPUs.


I see, thanks. That's no longer the case now, is it? I'm presently downloading the torrent, and it seems like it will then run completely local?

Well, I guess that was the cost of getting it to go viral. The torrent seems quite active now.


> What am I missing?

There are a couple of things. First off, academic funding for things like AI-capable clusters is a whole thing, let alone the support staff for them. This is difficult for a host of reasons, some good and some bad.

Secondly, this isn't about hosting files really, it's about providing a suitable environment to run the model in. You can expect a AI research lab to have figured out how to do training in a reasonable way, but what they have here is more of an inference in production problem. That's really outside the expected expertise of a research lab. In the old days you just wouldn't be able to access this simply. Today there are turn-key(ish) solutions with built in scaling; they used one - it scaled and now they are wincing at the bill.

This isn't crazy, as the google->google egress fees are not obvious.


> What am I missing?

They overbudgeted the "infrastructure" section of their grants and have money to burn; and nobody's interested in setting up another datacenter.


Politics is what you are likely missing.


Someone, probably: People who don't know how their services work are doomed to spend lots of money on them.

From the GCS pricing page:

  Network egress within Google Cloud applies when you move or copy data from one bucket in Cloud Storage to another or when another Google Cloud service accesses data in your bucket.

  Within the same location (for example, US-EAST1 to US-EAST1 or EU to EU) -- Free
From the original tweet (not the linked reply):

  For reference most of the fees are from transferring from NA to EU and ACAP
It's costing them assloads of money because they're moving data between regions. AWS works the same way. Azure is probably also the same but their pricing page is incomprehensible so who knows.

Lesson: if you're going to use data in a given location, you need to host data in the same location.


[deleted]


I checked a while ago and IBM Cloud costed the same price down to the dollar.

IBM bills by the network interface, hundreds of dollars per months to get a dual bonded NIC for private and public transfer. If you use the bandwidth close to capacity all day long for he whole month, it ends up the same cost as you would pay on Amazon for transferring that amount of GB.


So be more constructive, I'd like to learn more about IBM Cloud as I have no experience with it. Is there a Lambda-like service, pay-by-use 'serverless' database hosting, and support for direct client-to-DB connections with RLS/ACLs?


Are you looking for something inside IBM cloud? If not, FaunaDB might be interesting? It's a serverless database, pay as you go (with free tier), security layer for client to DB connections, strongly consistent, scalable and distributed (multiple regions). Disclaimer: I work for them.


I looked it up, and I do like it. Thanks for the reference. But, there's one big thing missing and that is the DB needs to be open sourced. Do that and I'll move my projects to your hosted service and promote it everywhere I possibly can.

But, it needs to be open source. Otherwise I'd rather stick with AWS Cognito + Dynamo/AppSync, because primarily that I can depend that it'll be around in five years. Even if I believe FaunaDB cloud will around, I know I won't be able to convince my other stakeholders. But, if it's open source, I can always argue that if FaunaDB-as-a-service disapears, I can host it myself (as a last resort).


[flagged]


Per the HN Guidelines:

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

Hard to say their cloud "sucks" when they offer a solution featuring the very thing being complained about in regards to other cloud services. Or maybe IBM's cloud really is terrible, but it would be hard to know that as an outsider since you failed to provide any substantiating info. If you could expand upon your initial assertion, that would surely help the rest of us see your point and assist you in proving your case.


What value does this comment add?


What value does asking what value that comment adds add?


If we assume you a reflected/intelligent person it improves the quality of this community and the content posted here.


The gcs bucket just needs to be set to region? See documentation:-

Data egress from your bucket to a non-Cloud Storage Google Cloud service is free in the following cases:

Your bucket is located in a region, the Google Cloud service is located in a multi-region, and both locations are on the same continent. For example, accessing data in a US-EAST1 bucket with a US App Engine instance. Your bucket is located in a multi-region, the Google Cloud service is located in a region, and both locations are on the same continent. For example, accessing data in an EU bucket with an EU-WEST1 GKE instance.

EDIT 2:

the cloud ai documentation suggests the correct setting is regional/regional

https://cloud.google.com/ml-engine/docs/regions

Cloud Storage You should run your AI Platform job in the same region as the Cloud Storage bucket that you're using to read and write data for the job.

You should use the Standard Storage class for any Cloud Storage buckets that you're using to read and write data for your AI Platform job.

EDIT 3:

----------- I don't think you can set a region for colab, so I am not sure you can make egress free. ----------


EU-USA data costs a lot for a good reason.


Can I be controversial and ask what justifies such expenses? Why is this game so important that it needs to be up and running so badly?

From what I've seen so far the "game" (if you can even call it that) highlights the weaknesses of GPT-2 much more than its strengths (the model's answers to user actions are random, the story is incoherent, the world is inconsistent). I don't get the feeling it was setup to demonstrate _weakness_ though. I think it was meant to demonstrate strngths.

I suppose it's just advertising for the BYU Perception, Control and Cognition Lab, but it sounds awfuly expensive for advertising for an academic group.


What a cynical input into the conversation.

First you assume that this project even existing is some kind of statement about how important it is; a connection I can't make myself unless I try to be extremely cynical. If you actually look at the project website you'll see that the main mirrors are currently down due to high download costs.

Then you smugly dismiss the entire project as a GPT-2 weakness highlighter, but not without wrapping it in some passive aggressive faux concern ("Oh is this trying to be good? Silly me, I thought this was a showcase for how to be bad!").

And then you assume that this game that thousands of people (although none of them were you) have enjoyed playing is probably an advertisement for an academic lab. Despite there being almost no evidence for it.

Can I be controversial now and ask: What's with the bile?


>> First you assume that this project even existing is some kind of statement about how important it is; a connection I can't make myself unless I try to be extremely cynical.

That would be cynical, but I did not make this assumption. I questioned the justification of the high cost to maintain the project, not the existence of the project per se.

I did not doubt that people enjoyed the project but, again, I don't understand how a research lab justifies paying such a high cost to provide entertainment.

If the project could be maintained for free then I would not have any questions. But if a lab is spending $10k to keep a game running then yes, I have to wonder why.

I do think that the game shows up GPT-2's weaknesses. Do you really think it's "bile" or "cynical" to recognise weaknesses of a technology?

Perhaps I'm cynical to assume it's advertisement. I apologise if that's the case.

Note also that accusing me of bile is a personal comment that I think is unnecessary.


Also, can I please ask you to tone down the god-moding if you want to have a disagreement? "passive aggressive faux concern", "bile", what's all that about? Why do you think you have an insight to my state of mind and emotions after reading a comment I made on the internet? This is just frustrating. Is that how you want the internet to be, really? A bunch of people assuming each other are either assholes or idiots, and treating each other accordingly?


Agreed. That was much more aggressive than it should have been. I don't think I have an insight into your state of mind, and I'm sorry for using that to land some stupid burn. The point would have come across just as well without my bile. You're right, that's not what I want the Internet to be.

In that spirit, I would very politely suggest that you read your original comment and consider if it represents what you want the Internet to be.


Well, thank you, that's very welcome and I feel bad for making you apologise. I'm glad you see things this way!

So, I'm looking at my comment again and I think I should have omitted the following two sentences:

>> I don't get the feeling it was setup to demonstrate _weakness_ though. I think it was meant to demonstrate strngths.

>> I suppose it's just advertising for the BYU Perception, Control and Cognition Lab, but it sounds awfuly expensive for advertising for an academic group.

The first one does sound as if I'm taking a swipe at the research team. This was not my intention but it came out all wrong.

The second one makes assumptions about the motivation of the BYU research team, and I should have kept those to myself.

My original question, what justifies the high cost of the product, I feel is valid so I wouldn't change it. But, clearly, I took it farther than I should have. I apologise that I didn't think my comment through enough so as to avoid having it come across as an attack on the BYU team and I'm sorry it upset you.


HN is the only community I know of that has people that will not only call out this kind of attitude, but will do so in an absolutely scathing, bedrock-hitting way.

I hope this is something that never changes about HN. That was a pleasure to read.


>Can I be controversial and ask what justifies such expenses? Why is this game so important that it needs to be up and running so badly?

Most popular Games, Movies, etc. cost orders of magnitude more. The value is typically entertainment.


The hype is not rocket science: The game is a lot of fun, and a lot of people in the RPG communities are excited about this.


I agree that it's entertaining and I see the appeal to role-players. It appeals to me, too (well, I'm a gamer). What I don't understand is how come someone's paying $10k a day to keep it running if it's not really returning anything. $10k is not a small sum. It sounds like a heck of a lot to pay to keep a game running just because people find it fun.

Basically, I don't think I've heard of anything like this before. Usually when people put something on the internet for free either it's very cheap for them, or they have a way to recoup the costs, e.g. by serving ads or asking for donations etc. To just throw money at something that doesn't return anything is very uncommon.


Welcome to the hype cycle


Why justify it, when it's public money? It's free!


> Brigham Young University is a private research university located in Provo, Utah and owned by The Church of Jesus Christ of Latter-day Saints.


Definitely worth it:

She blushes slightly and smiles shyly. "Oh, I'm sure we will. But first, let me take off my clothes".

> say "no, that's prohibited"

"No, it isn't". She says with a smile. "But if you insist on not marrying me, then at least don't touch me". > say "I will marry you"

She nods happily and kisses you passionately on your lips. The two of you embrace each other as you kiss her deeply. It is only after this that you realize that you are actually married.


Sounds awesome, it's like reading those curated short stories written by AI. The tech isn't quite there yet, but it makes it all the more hilarious. It's like an alien who's been watching our TV for years comes and tries to write a sitcom.


AI has advanced so far it might be able to win the Bulmer-Lytton fiction contest.


To be clear, this was because of crazy high network egress fees, not the fancy neural network compute.


So to catch up, AID2 is leveraging a cute offering by Google Colab which will provide any Google account a few hours of free access to a fairly powerful GPU.

To use Colab under this arrangement, each person playing the game has to load the model into their personal VM, which is being billed by Google as 6GB of data transfer from GCP into Colab.

The obvious questions;

1) Was there a place they could have hosted the model closer to Colab so that they weren’t being charged for egress bandwidth — and also, ideally, so that 6GB of data wasn’t actually being moved very far?

2) The underlying model is 6GB, but I’m curious how much memory is required for an individual user’s world state and how hard it would be to have a single GPU handling multiple user sessions?

Presumably it would be possible to multiplex multiple sessions with a single GPU? You would have to serialize the game state, receive the next input, load the prior state, feed the new input through the model, return the resulting text output, and re-serialize the state until the next input comes through.

What I don’t know if that’s at all practical based on the amount of data that would have to be serialized? Is the 6GB model data separate and static throughout the game, with an isolated block of data for the current world-state? Or does playing the game fundamentally alter the state of the model, meaning you would have to reload the whole thing just to process the next command?


"World state" is the text people see on script. It can reside in the browser.

So a server can be completely stateless, i.e. it receives text so far (say, 5 KB), applies GPT-2 generate and returns.

The problem is that GPT-2 generate seems to be very computationally intensive. As I understand, it actually does number crunching with all these 6 GB of data, so it takes 5-10 seconds even on GPU (K80, at least, is that slow).

Is GPU capable of running multiple GPT-2 generate in parallel? No idea.

Assuming that a high-end GPU would be able to produce response in 2 seconds, you can only run maybe 10 concurrent sessions per server if you want fast response time.


Google Drive is actually perfect for this as it lets you mount the drive (presumably as NAS) so you can read from it directly. I set up a notebook for it but unfortunately the shared folder got rate-limited. It still works if you make a copy of the files in your own Google drive though.

https://colab.research.google.com/github/Akababa/AIDungeon/b...


The obvious question to me is why must the cost fall on the provider instead of the player? If micropayments were easy enough then this would've been sustainable.


Update: the issue has been fixed, apparently:

"Should note for anyone who comes and sees this that's no longer how were hosting the model. :) Model is now hosted on a peer to peer torrent network so no more costs for us."

https://twitter.com/nickwalton00/status/1204064712394076160


It appears that it has been updated to use bittorrent as a temporary solution: https://colab.research.google.com/github/nickwalton/AIDungeo...


Still not clear why a university research lab would keep this running 'for a few more days' and not pull the plug immediately until either having the costs under control or some sort of reasonal matching benefit in place.


All the AI and CS labs I worked for were perpetually short of funding. The end of year budget things is an accounting pitfall that I have met many times in industry, government and academia alike. I tis the result of money allocated but not spend before the end of the budget year not just is taken away, but also with high probability leading to a budget reduction the next year as 'you clearly didn't need it'.

Needs otoh aren't clearly delineated by the budget calendar.


Probably because they have to spend their grant money some way.


One hears stories about researchers with unspent grant money scrambling madly to find something, anything, to spend it on before the grant expires. Use it or lose it.

My Dad told me how back in the 1970s he worked at a government-funded research lab. One time they called up a laboratory glassware supplier and said "We'd like to order $10,000 worth of glassware". The supplier asked "Sure, what specifically would you like to order?" The lab replied "We don't care, whatever you have in stock, so long as it costs us $10,000 and we pay you today – if we don't spend the money today we lose it forever".


I came across a similar thing in the 90s; one department had an expensive piece of equipment with a three year lifespan.

They weren't allowed to amortise that, and any budget increase every three years would have been denied, so they had to include the whole replacement cost in every budget, and find ways of spending that money every year, or lose it.


Uh, I mean, of course I don't know any better than you, but I think parent wasn't entirely serious... obviously. Was he?


I have zero proof of this, but I've heard of the US Navy having jets fly out, dump fuel, and come back to refuel minutes later, just to fill out a budget. Lose it or lose it is not uncommon unfortunately


He was absolutely serious. It's complete nonsense, of course, but this is also reality.


It sounds like there is a missing incentive to go under budget. There's something wrong with the system if people feel like they just have to burn money.


> It sounds like there is a missing incentive to go under budget.

Other way around. Right now there exists a negative incentive to go under budget, being that if you need extra budget next year you suddenly don't have a buffer.

The correct way to fix things is to simultaneously prohibiting end-of-FY budget-keeping purchases (e.g. by mandatory reviewing of purchases) while at the same time allocating a company/organization wide buffer for stuff that needs immediate / projected stuff that was not in the planned budget.


I like how google did employee travel. They calculated a cap = 2/3 * historical employee data + 1/3 * current cost of business flight.

Obviously for common routes such as Sf to la, nyc to sf...etc they hard coded values.

for every $1 under the cap you received 50cent to use on a future travel plan over the calculated cap.

You could fly economy on an international flight saving money, then bank 50% of savings to buy a first class ticket some other time.

This created a negative feedback loop keeping travel costs in check.


Yep, back in the day when I worked for an IT dealer/distie, at certain times of the year, the client conversations went:

"How many <Brand> laptops can I get for £xK?"

"What kind of spec are you looking for?"

"Don't care, just as many as I can get of 'something' for £xK."


Because it’s probably government funded. If it’s not burning a hole in your own pocket it’s not worth staying up all night over.


BYU is not federally funded.


You are Faif, a ranger living in the kingdom of Larion. You have a hunting bow and a quiver of arrows. You spot the deer and are ready to finish your hunt when suddenly you hear a voice calling out to you from behind a tree.

> nosaving

Saving turned off.

> shoot the deer

You quickly turn around and see a young man with a long beard standing there. He is dressed in rags and his face is covered by a large bandage. His eyes are red and puffy and he has a large scar across his cheek

> shoot the man

You fire an arrow at him and it hits him square in the chest. The arrow flies through the air and slams into the ground. You look up and see that the man is lying on the ground motionless. You can't believe what just happened

> Check his pockets

You pick up the arrow and examine it. It's made of bone and is very sharp. You wonder if you should take it. You could use this to kill someone else.

Doesn't seem to understand me too well?


It's hit or miss. You'll have better results being more specific in your commands because it gives the model more to work off of and let's you crystalize elements of the scene through reaffirmation.

"Fire an arrow at the deer" or "Check the pockets of the corpse".

I'm often surprised at how well the game can understand me and that it has anything to say in reply at all.


I realise Cloudflare is a dirty word around here but you could pay $20/mo on their pro plan and serve this file, along with any other static assets, through their CDN.

I do this for assets for my own games hobby site. Granted, not getting tens of thousands of downloads a day, but there's nothing in their Ts&Cs to indicate to me that it wouldn't work even if I were.


Why is Cloudflare a dirty word around here? I'm afraid I'm out of the loop


There's one particular commentator who is very vocal about his disdain for Cloudflare.


Never noticed. What's his problem though?


I had a lot of fun playing the game. I think this game shows us how interesting AI technology will become in the 2020s. It’s open ended but somewhat incoherent right now, but I think we’ll figure out how to update this kind of technology to have an open ended yet internally consistent world by the end of the 2020s.


As it turns out, it’s actually possible to bona fide win the game.

In my case, I was dating two girls, one was uncomfortable with the other girl, and broke up with me, so I asked the remaining girl if she would marry me. At this point, she said yes, we rode off in to the sunset and the game proclaimed “CONGRATS YOU WIN” then it saves the game for me.

I guess I could load the game and deal with domestic squabbles, having children, growing old together, but I’m not sure this training set is optimized to generate a domestic married situation comedy story.

The torrent trick works. Right now, the game can be played at http://www.aidungeon.io/


I got a win-game blurb for "> RESPECT WOMEN" in a game that had become really NSFW, really quick, and offensively so, from completely innocuous commands.

Then another, not surprisingly, for "> WIN GAME".


Torrent trick?


Instead of downloading the six gigabyte trained data from the cloud ($$$), they use a Bit Torrent client to download it, to keep costs down. It works, as long as considerate users seed the file.


Insane but also expected. When I tried it out when it was posted here and saw it took multiple minutes to warm up I knew it was probably expensive.

>And it's currently costing 30-40 cents per download.

Is there no way to have a single hosted instance rather than downloading again for each user?


> Is there no way to have a single hosted instance rather than downloading again for each user?

This might make the problem worse; then they'd have to do processing server side, rather than offloading it on the client. I dunno whether this would be more or less expensive than the initial download, but the torrent they put up seems cheaper either way.


Weren’t they already doing the processing server side? If it were client side then it wouldn’t be costing so much to run git clone every time, as the download would be from GitHub to the user’s computer. It would be free, in fact.

My impression of the situation is that every user who tried to play would result in a new instance to spin up on Google’s cloud services and then begin downloading a fresh copy of the repo from GitHub. This is what cost so much in bandwidth.


A client side request to GitHub would require them to serve up a relevant CORS header, but I do think you're right about me misunderstanding where execution is taking place. I'm unfamiliar with Jupyter Notebooks, and assumed "downloading" meant "to the client". I, too, am now confused about why this is set up like it is. Probably some constraint of Jupyter Notebooks that I'm unaware of.


If you install and run Jupyter on your local machine, it’ll spin up a web server on localhost and then connect to it in your browser. All of the Python code runs on the server and only the results are sent to the client, to be displayed in the browser.


> Weren’t they already doing the processing server side?

Yes, but they were using Google Colab because Colab will give each user their own dedicated Nvidia K80 for free. Google will spin up a new instance to back each user's Colab session, but on Google's rather than the researcher's or the user's dime. The downside though is paying for the data egress, which can be avoided if the users download to Colab from somewhere else, or download from somewhere else to their own machines that have a GPU with 12GB of onboard memory.


I'm pretty sure its not downloading to the client since the dataset is apparently pretty massive. It looks like its downloading it to a vm or something and creating a new instance of the service for every user.


To play backseat problem solver... You have free ingress? So start up a few $5 DO droplets and serve files from there. That gives you 1TB transfer per month.

I haven't tried this, but my understanding is that's per droplet. So when drop A is about exhausted, start B and switch over the traffic. Then shut down A. Then start C and shut down B, etc. Unlimited transfer? (Until your account gets banned, anyway.)


OVH has unmetered (i.e. effectively unlimited) bandwidth and is well-established - enough that they have their own gTLD, at least. I often recommend them for situations like these where 1TB/month/instance may not be enough.


DigitalOcean Droplet bandwidth is pro-rated, you don't get the full pool when you create the Droplet, you get it over the course of the Droplet running across 28 days.


Ah, I was missing that. Good to know, I guess, should it ever matter.


You might be able to do this with the droplets like you said.

With DO Spaces, it’s a $5/Mo subscription to spaces. You get 250GB of storage and 1TB of transfer. Anything more costs 1c per GB transferred and 2c per GB stores. You can create as many buckets as you want.


You can buy bandwidth from a simple CDN for cheap enough to not need to do this dance.


That’s the real story here. Paying market rate for bandwidth is like buying soda from a restaurant. They’ll gouge you and make a profit but you’re thirsty and it’s too late to shop around.


I'm pretty sure Cloudflare would do this for nothing for .edu.


Also this


If they had put the bucket behind a CDN costs would have been dramatically lower.


GCP charges ~$0.60/hour for the GPU/CPU/MEM equivalent in colab. If bandwidth is costing 10k per day, how much is the free colab compute costing Google?


Probably less. I doubt the average use time is more than half an hour. Pretty cheap advertising for the service, really.


When will we come full circle and do "edge computing" on our own devices again? I'm getting sick of exorbitant cloud costs and the moat that is forming.



The ominous opacity of the AWS bill – a cautionary tale (taloflow.ai) https://news.ycombinator.com/item?id=21694835


I think the key architectural mistep is putting a colab session per user session. A fix is having the user call out to an service API, which, behind the scenes, is executed on a pool of colabs with reuse. I don't know of any technology to make that easy though. The symptom that could of been a trigger for cost investigations were the long startup times.


I know almost nothing about AI/ML.

I can imagine that running it on a single server and responding to requests is intractable because of how the game feeds your quest back into its model.

But what would it take to package it as a local application?

I really love this game. You can't beat this:

https://twitter.com/ptychomancer/status/1203246078989987840

> $ Give rousing speach to my fellow mud beings

> "Mud creatures! Mud creatures! We must unite against this enemy!"

> The other mud creatures nod eagerly, and begin chanting, "We will fight! We will fight! We will fight! We will fight! We will..".


I remarked how impressed I was with the state of technology that made it possible to freely spin up a VM with ~14GB of memory and beefy compute.

Now it all made sense--of course someone had to pay for this all. Doh!


Google is paying for the Colab notebook and compute, that's free for users. The problem was the ~6GB model hosted on GCS, GCS has very high network egress costs.


Why would a user need a 12gb gpu to run the game locally? The deep learning model is already trained, and I can’t imagine one needs a gpu to just evaluate the model.


It's a big model.


It's a 6GB download. Mid-high end GPUs are 8GB. So what makes it need 12GB? How specifically is that memory used up by various types of decompression, intermediate calculations, etc.


I believe that the 1.5B model weights take up the 6GB themselves. Presumably f32 weight values? Since they are all needed for an evaluation pass they will eat up 6GB on the GPU off the bat. Not too surprising that everything else can't fit in 2GB, since that's going to have to fit the entire model architecture and all intermediary values.


What exactly are these intermediate values composed of that makes them notable in size compared to the model itself? Are there resources I should read for how a model like this executes?

Will this model work with half precision weights? Is it very awkward to use "brain" 16 bit floats?


The model has 1.5 billion parameters, so requiring a beefy GPU to evaluate the model is unsurprising.


You can use your CPU, certainly. It'll take a couple seconds per reply.


I was able to get it working without downloading files by mounting Google Drive, although once the shared folder is rate-limited you need to make a copy in your own Google drive:

https://colab.research.google.com/github/Akababa/AIDungeon/b...


Shouldn’t this be free egress as traffic is straight to a google product/ colab server or is this being downloaded in the browser?

Also cost was before cdn was enabled. This kind of traffic generally costs fractions of a cent per GiB after signing a contract with negotiation.

“”” Egress to Google products (such as YouTube, Maps, Drive), whether from a VM in Google Cloud with an external IP address or an internal IP address No charge “””


Speaking as someone who runs a bunch of large websites - out of my own pocket, for profit.... I'm confused.

How did we get from 60K users to 10K per day expense?

For comparison, I serviced millions of users per month for years from a single virtual server... (granted, that was after making our site super lean for a data & CPU perspective)

How much resources is each user consuming?


I guess you don't instantiate a 5Gb image / user. It's the data transfer that costs 30/40c / user.


Cloudflare ? or just one machine at OVH ?


https://wasabi.com/cloud-storage-pricing/

Wasabi’s pricing model of $.0059 per GB/mo ($5.99 per TB/month) with no additional charges for egress or API requests means you don’t pay to access your data

I don't know, could be a couple of dollars only ?


One would need to read the install.sh base file to know where to copy the model from the torrent into. and then manually install the other dependencies. If someone could make a friendlier version or at least better instructions more people would give it a try.


Could someone explain the appeal in keeping this going long-term? When it was posted here, I played it and read several adventure logs here and on reddit. In every case the story is nonsense. Sure some parts read like something a human would write, but anytime you go beyond a few sentences you can see contractions and lack of flow that good human author would never make.

Don't get me wrong, it's a cool demo of how far we have gone beyond Markov chains. Am I missing something or just spoiled from those Infocom games I played as a kid?


Man, why so cynical? I don't see how those Infocom games spoiled you since they were limited to whatever a team of writers could come up with. And a well-written, dynamical-feeling CRPG is so rare that we still trumpet the handful that were any good from 20 years ago like Planescape: Torment.

Here's an example of how this game is fun: https://twitter.com/ptychomancer/status/1203246078989987840

It's just fun to play with.

> you can see contractions and lack of flow that good human author would never make.

That doesn't seem like a sensible goal post. Unless you think the technology is magic, why would you go into this thinking it's going to compete with a master-planned work of fiction by a human writer?

I, on the other hand, am inspired by the game. Imagine Crusader Kings 2 (free on Steam btw) where the events are randomly generated by this kind of story-telling technology. Right now it's kind of boring wondering which of the finite human-written events are going to show up. After playing for a while you go from wondering what crazy event will happen next to knowing all of the events and waiting for your favorite ones to show up.

We're a ways off from embedding a game in this technology, but I think we are within reach of embedding this technology inside a narrative-driven game.

Another example is Dungeons & Dragons. The fun is the sandbox and interacting with the narrative even though the human-driven dynamic storyline is often complete nonsense if you were to read a transcription of what actually happened.

This is the second comment I've read so far that seems to gleefully pat itself on the back for wondering how someone could enjoy something else.


I only dug into this because people were raving about it. Never played D&D, but it did remind me of those text adventure games I played as a kid.

When humans play D&D as the game progresses, are the rules of the world that people establish supposed to be internally consistent and is there a planned plot arc? Can you say there's no gravity and then later drop your sword?


GPT2 is trained on huge amounts of text. it contains every book people haven't read, and can more or less stitch pieces together seamlessly. This is exciting from a research perspective but the hype is out of control already. Despite breaking previosu benchmarks, language benchmarks themselves are not great ways to describe language. I get the excitement but this overselling to the general public is soon gonna lead to a trough of disillusionment.


A 6GB neural net is going to be expensive to run anywhere. Is it possible to break the neural net into services?


IIRC correct bandwidth costs have been essentially flat for the past decade or so. Not coming down.


Out of curiosity, how much more time will it cost to run this on general CPUs of users computer?


It takes a couple seconds per reply. GPUs aren't as powerful as everyone would have it, at least not for ML; you'd be lucky to get a 10x speedup.


How does one play this without a Google account?


Gcs egress is 0.12 per GB, so 3GB per user?


The model weighs about 6.2 GB, you can check it from install.sh [1] that currently downloads it via torrent (it was S3 days ago, I think).

[1] https://github.com/AIDungeon/AIDungeon/blob/master/install.s...


I am, personally, amazed by this. The following might be kinda NSFW, though.

> You are Jesus, a knight living in the kingdom of Larion. You have a steel longsword and a wooden shield. You are on a quest to defeat the evil dragon of Larion. You've heard he lives up at the north of the kingdom. You set on the path to defeat him and walk into a dark forest. As you enter the forest you see a huge black dragon flying above it's lair. The beast looks very angry as you approach.

+ suck his d-ck

> Your screams are silenced when you feel the head of the dragon's ---- entering your mouth. He begins thrusting inside of you and soon you're filled with his ---. You collapse onto the ground and die

Well. That went really more wild than I could have ever expected.


> suck his d-ck

Thank you for shading my eye from this indignity.


Still a better love story than Twilight.


I don't get it. What's NSFW here? Suck his what?

Also, yeah I think it's funny how chatbots can generate undesirable or NSFW responses and there's basically no way to stop it. I know GPT-2 can parrot old-timey scientific racism back at me, just like the redditors who trained it!


NSFW as don't go loudly reading it in an office.


I'm too scared Google will charge my credit card just by opening this lab page... I've got a gaming rig, why can't I just run this locally?


I have an unpriveledged user account named "wilson" designed specifically for situations like this.


Because you don't need a gaming rig, you need a compute rig. Apparently, the model runs on K80s and requires 12GB of GPU RAM. To put that in context, flagship gaming card RTX 2080 Ti "only" has 11GB.


After having read Rizwan Virk's "The Simulation Hypothesis", and playing this amazing new game, I can say that AI-generated text adventure games are an important step on the road to the Simulation Point (the point at which it would be technologically possible for us to construct a simulation that is all-encompassing as the one in The Matrix).


Have you tried psychadelics yet? Or maybe learned how to lucid dream? I think such technologies will be preferable to you, even if our electronics can catch up with our biological evolution.

Sure, maybe dream machines are fun and cool, but the ones in our future will be made for profit by companies like Facebook and Alphabet which severely diminishes any potential they may have had. Real dreams are libre, gratis, and uninterrupted by ads.


I've never tried any psychedelic. But I am able to lucid dream sometimes, I just never got deep into it. I guess I should read more about it.

I agree with your second statement, this so-called dream machine sounds a lot like a future iteration of Facebook's Oculus Rift.


Well that was obviously forseeable and dumb then. Use the user's resources or charge money.


The unexpected thing was that Google colab and GCS are separate such that transfer costs between them often end up as international external egress fees


I don't know anything about your architecture, but you can quickly drop in something like BunnyCDN as a caching proxy for about 4% of price of Google egress. Even if it's a Google->Google transfer, since ingress is free, Google->Caching Proxy->Google should be much better.

You could also consider a webhost like Hetzner, on which you can get a bunch of 1 gbps machines very cheap, though you have to manage them yourself.

You could try Cloudflare, but they may well consider it abuse and cut you off (though they like to pretend on HN that they don't do this).

Finally, I know that Google is handing out substantial credits for game developers and this could very well be up their alley.

If you can optimize the download size a bit and swap to a cheaper delivery method, it should become feasible to run on a hobbyist budget.


This is just GCS. For example, if you use their managed Kubernetes service, you will get a fresh load balancer for every service you expose to the internet. Not a shared load-balancer, a new one.

Unless you set up an alternative you'll get absolutely rinsed through the cost of the instance and then the egress charges on top.


All of the load balancers on GCP are shared. Maybe you meant to say you get a new, fresh IP address which is true but also not very expensive.

"Cloud Load Balancing is a fully distributed, software-defined, managed service for all your traffic. It is not an instance- or device-based solution, so you won’t be locked into physical load balancing infrastructure or face the HA, scale, and management challenges inherent in instance-based LBs."

https://cloud.google.com/load-balancing/


Nothing in the paragraph indicates it’s shared. Also: it might be shared in the implementation but you will still be billed for every single https LB that you use (or an NLB if you’re doing tcp load balancing).

Every unique kubernetes ingress resource WILL spin up a NEW, uniquely billed Https LB. Every unique kubernetes service with specific annotations will spin up a NEW, unique LB (internal or external). The author is correct.


You are not billed by the "load balancer", you are billed by the "forwarding rule" which makes it very obvious that the infrastructure is shared and that you will have additional costs with every K8S ingress.


That must have changed. I was billed for individual LBs a couple of years back.


Sorry, can you please try to explain that in another way? I don't understand what you mean.


The user is already providing their own Google colab instance. It's the downloads that cost them 10k$ per day and those always cost the same regardless whether the user downloads the model to their local pc or to colab.


I am trying to think some solutions but the crux is that users may be expecting to have a unique story tailored specifically to them. If that assumption is false, then we have some solutions:

- save top or similar stories and make them pre-determined to avoid calling the ai services

- decrease the amount of times the user can keep the story going: users can only give 3 times input instead of X

- charge people for the game

Probably there are more ideas out there.

One last idea: package the code and instruct users to run on their own machine(s) or have then to run on their own GCS account.


Part of the reason this idea was pretty viral/successful is because anyone could try it out right away, without the usual nonsense that comes with running open source code locally.


There are no AI services. Every user runs their own instance of the game.


i think the game should be made runnable on a user's own machine too.

Make the data licensed AGPL if the author is afraid of people copying and making profit off it without including them!


AGPL does not prevent someone from copying data and making a profit off it and not including them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: