We are using Varnish, but we're storing the sources in the DB (to take advantage of automatic compression and all-around ease). It's trivial to get Varnish to cache the entire source and invalidate the cache when the document changes (the public/private setting, basically, so it doesn't accidentally share a page that's been changed to private), so that's all good.
The biggest problem (by far) is disk usage. The rest of the service is very easily sharded, really, as every user is isolated. Solr is also fantastic (much better than Sphinx, in hindsight we should have gone with that for TP), so that can also scale very well (there's even an implementation of it on hadoop).
We'll add the caching feature to the front page as soon as we finish the current round of A/B testing, thanks again (that feature was basically an afterthought, so it was great that you noted its importance)!
The biggest problem (by far) is disk usage. The rest of the service is very easily sharded, really, as every user is isolated. Solr is also fantastic (much better than Sphinx, in hindsight we should have gone with that for TP), so that can also scale very well (there's even an implementation of it on hadoop).
We'll add the caching feature to the front page as soon as we finish the current round of A/B testing, thanks again (that feature was basically an afterthought, so it was great that you noted its importance)!