Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm certainly aware of what you are saying. But: if you can scale up the infrastructure by 130% in one week, then you have certainly baked the notion of scalability into your initial design. The fact that scaling to 130% helped makes me think that adding more capacity will help more.

As such, it is potentially also possible to scale up to 200 or 500%, giving even more users a chance to play. I don't expect it to scale linearly with the added hardware, but it will help a bit.

From a technical standpoint, I've learned somethings (from which I'm drawing my conclusions. Bear with me) about their problem:

0) the architecture is hosted on Amazons infrastructure

1) as part of their mitigation, they have turned off the ability to increase simulation speed. This points to a lack of CPU resources on the application servers or a lack of io resources on the backend database to keep up with the increased rate of changes to the data model. However, I've heard from various comments that they might not actually be storing all that state on-the-fly but syncing it at the end of the session at which point the db load would be independent of simulation speed.

I'm inclined to think that CPU starvation on the app servers was likely the reason why they disabled cheetah speed.

Adding more frontend CPU power is quite trivially done by adding more app servers and having each of them deal with fewer concurrent users. In the end, they could use one app server per region (= 10ish cities. Not all of them necessarily being played concurrently) without having to do a lot of additional work in synchronizing region state between machines.

2) https://news.ycombinator.com/item?id=5347611 was talking about database io issues. This is of course a problem you can't just solve by adding more machines as many databases can't easily scale horizontally.

The way how sim city works though lends itself very well to sharding: run regions independently and balance them across multiple machines. Add more machines and rebalance shards (with downtime if you have to, but people weren't able to play anyways)

This is all the information I had to build my opinion on. I did however reflect upon these facts before posting my opinion by which I stand even now.

Being able to scale 130% in a week, having an easily shardable problem and having frontend server CPU starvation tells me personally, that adding more machines could be possible (but maybe not economically feasible).

Of course it's guesswork in the end. But writing these half-technical articles like the submission I linked above do invite guesswork. As long as one has at least some facts to start with, a discussion is still warranted. I have no problem in being proven wrong by additional facts, a different interpretation of the facts or just one article that goes beyond trying to convert PR speech into technical facts (above linked article)



"Being able to scale 130% in a week, having an easily shardable problem and having frontend server CPU starvation tells me personally, that adding more machines could be possible (but maybe not economically feasible)."

I strongly feel that this is the real problem, especially since Amazon as extra large instances available now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: