Because of the lucrative nature of affiliate payments and commissions in the travel industry, one of the biggest sources of what Google considered bad content was sites linking to hotels.
Look at this page: [http://www.travbuddy.com/hotels]. All those links to "Hawaii Hotels" and "New York Hotels" look exactly like an old-school linkfarm.
Ask yourself this: if you're Google, and somebody is typing in "Hawaii Hotels" to do a search, is your Hawaii Hotels page really something they want as a top result? I don't think it is. Your content is not really "canonical" for Hawaii Hotels, it's just a page that lists a few reviews you've managed to collect.
It's all speculation on my part, since I have no inside information, but essentially, I think you're sort of poisoning your own reputation at Google with all those links.
1) Not much content on that page exclusive of navigation.
2) Not enough site trust.
3) Not enough links pointing directly to that page, the category (state) pages, or leaf node pages, like the individual hotel ones.
Prior to Panda, a domain with a lot of links could have virtually infinite leaf node pages with no links but laser-targeted content and rank a large portion of them. These days, "lots of links to our homepage" plus virtually infinite pages with no links is a recipe for being Panda'd.
I don't think the core "top -> state -> hotel" structure is likely a problem. That sort of tree structure, with variations, is pretty much universal in large sites. Having a lot of links on a page will not, by itself, burn you to bits.
Hi Patrick - thanks for your feedback and analysis, I really appreciate it.
You're absolutely right that there is not much content on that page exclusive of navigation, but in its defense the whole point of the page is navigation :) And it is just one page out of hundreds of thousands of pages on the site, many of which do have a lot of content.
I like your point about leaf nodes. Many of the domains that rank now either have a small # of pages total, or are extremely large brands with a lot of "trust", however Google defines it. In fact, these factors seem to outweigh having any original or useful content on the page itself in many cases (see post below).
If the problem is about too many leaf nodes, then cutting down "thin" content nodes (pages with little content, photo pages, etc) should have a positive effect. We've done a lot of this, so it will be interesting to see if that is the case.
When I search Hawaii Hotels on Google, the top result is a 7-item list of hotels with a little bit more information (link to the official hotel site, review count, place page, etc). How is that different from travbuddy's list page except that extra information?
The top result I'm getting goes to TripAdvisor. Arguably, TripAdvisor is much bigger and has tons more reviews than TravBuddy. I can't speak to its quality but that does speak to its "canonicalness".
I think necrodome is referring to google's own, similar to travbuddy, reviews presented as Google places. I get 7 hotels listed from Google places as well listed above the link to tripadvisor.
Hi Joel - Thanks for the feedback, that's really helpful and you do make a lot of valid points, especially about travel affiliates. We definitely aren't trying to rank for "Hawaii Hotels". Those links are intended as a sitemap to help people find our review pages, many of which I believe we do have valuable, original content.
As a concrete example following the Hawaii theme, we used to rank pretty well when someone searched for "Hale Koa Hotel" (a hotel in Hawaii).
It's got a few detailed reviews and candid photos from active members of the site. Certainly not perfect, but I think it's helpful and that it provides a different perspective. Right now it's ranked #55 on my browser, and here is a sampling of the sites that rank above it:
I'm not arguing we should be #1, or even #10, but we're certainly better than many of the sites between #1 and #55, and we've seen this pattern repeat itself for most of the hotels and pages we have original and unique content for. While we certainly don't have as many hotel reviews as TripAdvisor, we do have more quality reviews than most other sites, and we have to start somewhere :)
I guess a better question would be, if we have 75,000 reviews, what is the best way of internally linking to them within our site WITHOUT looking like a link farm? We're just trying to put our best content forward, so any suggestions would be appreciated!
Outside of Panda, in terms of your goals for your business: do you really want to be in the business of arbitraging existing travel brands like Hale Koa Hotel and charging them (Hale Koa Hotel) money for customers who already knew of them? That strikes me as not a great place to be in life, for a lot of reasons. For one: what does Google need you for in that scenario? (Answer: nothing after they debut Google Travel and give it 80% of the non-ad real estate on the search page.) For another, you're in perpetual competition to be in the top ~10 of the 1,000 sites who are adding equally little value, and your best answer on why you should be there is guaranteed to be "OK so we're meh-tastic but come on we're 2% better than our meh-tastic competition doing the exact same thing we do."
Hi Patrick - Our goal isn't to charge money to customers who already know about existing brands, but to provide first-hand information ABOUT that brand that customers wouldn't otherwise learn from other sites (including the brand itself).
We've been focused on hotels in this specific example, but the same argument could be applied to any travel destination. If you're planning a trip somewhere, would you only want to get information from the official tourist bureau and the hotels themselves, or would it also be beneficial to see candid reviews, blogs, and photos, and interact directly with other travelers who have been to the same place? I would argue that the value provided by that information is worth a lot more than 2%.
The average traveler researches 2-5 sites before booking. and a large % of them are motivated by a desire to read reviews (http://www.phocuswright.com/library/fyi/427). Clearly there is some value provided there, and if we can provide relevant and unique information that leads to a sale, or profit off the sharing of such information through unobtrusive advertisements, then I don't see any problem with that.
And if we can provide more useful information than our competition (many of those sites listed above have almost no original information at all), then I also see no problem with ranking higher in the search results.
That's not to say we are 100% there yet for every destination, but I believe we do have a ton of valuable information, and our goal is to get to 100%.
I would be interested in knowing how your rankings change if you put some adsense into your pages. You said in your blog that you removed advertisements from your site- I assume that those were ads that didnt't go through Google. I wonder if Google rewards those who do advertise with adsense.
Let users rate the quality of reviews, like amazon does.
Also, get backlinks into your site for keywords like hotel reviews. To do this you can tell your reviewers to post on their blog twitter or Facebook that they just wrote a smashing review, and then place a link to their review.
I've run The Online Slang Dictionary (http://onlineslangdictionary.com/) since 1996. On April 11, the date of one of the Panda updates, traffic from Google dropped 20%. Between April 11 and a month ago, I made the following changes:
* Correct spelling, grammar, and capitalization errors
* Where spelling "errors" are legitimate "slang terms", link to the definition pages for those slang terms (gonna, wanna, ain't, dat's, etc.)
* Remove unnecessary extra punctuation, e.g. example sentences ending in "???" or "!!!"
* Checked keywords on top landing pages for searches like "slang", "slang dictionary", "slang thesaurus"
* Remove unnecessary extra spaces in the middle of sentences
* Use complete sentences even where completely unnecessary
* Restructure entire /definition+of/word+goes+here directory structure to be /meaning-of/word-goes-here, since people search for "meaning of" more frequently than "definition of", and since Google seems to have stopped treating + as a word separator in some cases.
* Reworked meta descriptions and page titles
* Removed meta keywords. I know Google doesn't use them, but maybe they treat them as a negative indicator of site quality? Who knows?
* Completely re-designed site's front page
* Switched from Google Custom Search Engine to a custom search implementation, because I think I can provide better results, improve the user experience, and reduce search exit rate
* Delay-load Twitter widget to reduce page load time
* Use rel="canonical" on every page on the site, since Google is indexing the IP address of the site and showing it separately in SERPs
* Removed <priority> nodes from the sitemap, just in case Google knows better
* Fixed the "HTML suggestions" on GWT, such as short meta descriptions and duplicate meta descriptions
* Excluded directories via robots.txt like /word-of-the-day/, since the words of the day are already spidered elsewhere on the site, in case Google is giving me a duplicate content penalty of some kind
I've also made plenty more changes in the past month, but I haven't written up a list yet.
Sine the Panda update, I've seen no improvement in traffic from Google to my site.
Hi Walter - Thanks for posting your site and describing the changes you've made as well. It's interesting that you've focused on spelling/grammar errors (despite it being a slang site), and still haven't noticed any improvements.
We have a lot of members from around the world, who may not have English as their first language. We've thought about correcting some spelling/grammar programatically, but since we have a UGC site, and each review is tied to a real person, it seems disingenuous to rewrite someone else's words.
"Grammar, punctuation and style are part of how we decide if a review is credible. For Zappos to artificially inflate the credibility of reviews by changing those qualities is a form of fraud IMO."
I don't know if I'd call it fraud, but it certainly artificially inflates the credibility of reviews, and that's something (in addition to the rewriting someone else's thoughts part) we don't feel comfortable doing.
(For your site, since it's a dictionary, seems like a better fix and I'm surprised that it hasn't yielded any improvements)
i did not hear from a single big travel site that won in the so called google panda update.
and to be honest, i understand it
the spammiest verticals are not only PPP (pills, porn, poker), it's PPPIT (pills, porn, poker, insurances, travel). i worked in poker, i worked in travel, i did insurances and yeah from an SEO perspective they are all very disgusting verticals.
and from what it looks like: the whole travel segment got a hit. so yeah, you can try to get out of "panda" but then you should realize that "panda" is nothing you can get out of. it's not a "penalty", it's a new set of rules. try what works, try what doesn't and then iterate. so simple.
Hope you don't mind if I ask you here, I can't find your email. I saw on your twitter feed you recommended two great non-obvious resources for SEO - In the Plex and schema.org. An other resources (sites/books/etc.) you can recommend?
I run a large site that was unexpectedly affected by Google's "Panda" update. There has been a lot of talk on the subject, but most of it is FUD, and I haven't seen many large sites lay everything out for discussion. This is the first post (of many planned ones) about our experience with Google's "Panda" update.
Hopefully it will generate some good discussion from those facing a similar situation, and help some other people out.
The site should 410 removed pages insted of 404ing... 404 just means "not here anymore" and google will wait to remove it from their index until they have revisited the page a few more times over days/weeks.
A 410 on the other hand indicates that this page has been intentionally removed, and google tends to act quicker on those.
Alternatively, they could 301 the removed pages to the most relevant parent page if that would be a better user experience for those few who did navigate to them. (in my experience 301ed pages also get removed from the index faster than 404s)
Thanks, that's helpful information as well. I didn't know about the 410. We used to 301 some pages, but ran into duplicate content issues where Google would continually spider old 301'd URLs (even when there were no longer links pointing to the original bad URL).
Google's intention with search, to return exactly the most relevant information for any query, seems a bit like the strive for a grand unified theory of physics. It's a noble ambition, but there are going to be a lot of mistakes and missteps along the way; it's probably going to take an awful long time to arrive there, if it ever happens; and when we do get there a lot of people are going to be upset by it. I really appreciate that you've stepped forward with both information about the effects of the changes and what you've done to remedy the situation. I'm keen to see your followup posts, and I will be keen to see what future changes Google implements and how they improve things for you, me, and everyone.
But as time has shown, holding out hope for Google to do anything specifically helpful has been a disappointment in so many instances I've lost count.
An observation based on mentions of "low quality" in Amit Singhal's blog post at http://googlewebmastercentral.blogspot.com/2011/05/more-guid... -- you mention that you have no-indexed thin content pages, but is it possible that if Google sees that the volume of "noindex" pages is high compared to the overall volume of pages on the site, that Google still views your site as overall of "low-quality" (relative to the other sites it indexes for the same keywords)?
Rather than noindex pages, I am thinking using robots.txt to prevent access to these pages might be better -- Google can't perhaps then tell what to make of these pages you've hidden via robots.txt. Just thinking...not really an expert on this.
I think this is all nonsense. Google's work should be to separate good content from bad content. Now, they are inverting the relationship and saying that web masters should be responsible to handle that information to them.
This is wrong in several levels. First, Google starts to dictate what is acceptable or not in their index, using tools like webmaster central and all the "semantic web" talk -- Why should I care?
Second, it creates the incentives and the opportunity for bad guys to do well. If you need to do all this overhaul of a web site, the only people willing to do the work will be the very same ones that created content farms in the first place. After all, they are the ones that make big buckets from Google, not the after hours hobbyist that maintains a single web site.
Google is in a tough position. If they keep their algorithms secret, they are accused of not being transparent. If they reveal their algorithms, they get gamed by spammers. If they reveal only vague hints and guidelines, they are accused of manipulating content. And if they don't use ever more elaborate algorithms, they get overrun with spam.
By becoming defacto ruler of the internet, Google has put themselves in a position of outrageous power and responsibility. They are fighting a vicious war with spammers and content farms and the rest of the web is caught in the crossfire.
Personally, I agree with you that Google should not be telling people how to run their sites. They should never have said a word about how their algorithm works, or even what it's named, when it's updated, etc. They should tell people "just make great web sites, searching them is our problem".
I agree with that. But Google is weakening its position by creating these extra semantic levels that only benefit people that are making a lot of money in this game.
You probably didn't read what came before this sentence... The point is exactly that, with a good web search engine, content writers shouldn't have to do anything other than writing content in order to appear on the search results. Google is the one that should be actively looking for content and ranking it accordingly, instead of myself doing any work other than writing.
>You probably didn't read what came before this sentence
I did. I guess what I'm saying is: why does it matter to you if google's search algorithms aren't perfect and could operate better with a bit of metadata from the site? After all, you're free to opt out from playing along. If, like you suspect, it means that their algorithm is optimized for web spam, they'll sort it out eventually.
It just sound like your bitching that the cost of doing buisness has gone up in the meantime, and if only google could be perfect you wouldn't have to worry about it. But this is what you get for relying on the behavior of someone that owes you nothing.
What still confuses me is the duplicate content issue. Doesn't it make sense to make some information reachable in different ways?
For example in a typical blog, the same text can be found via direkt link, latest articles, categories... How does one make that go away - and should it be done?
This is great. I work for a blog network that got hit similarly hard by both Panda updates, and we've done a lot of the same stuff (removing and/or de-indexing short/no page view content, improving our site map and linking system), but scrapers continue to be our biggest issue, especially the ones we don't syndicate to. These can rank higher than us in Google as well, and we continue to try to find a way to flag them or submit removal requests without having to do each one manually.
This is a great write up though, it's good to see that others are trying the same things as we are.
I'd like to talk to you a bit about your site. Please send me an email, I don't see an obvious contact form on your blog....
johng a t forum foundry d o t com
Hi Jerry - I don't know anyone who works in search quality at Google, and they have said a few times that they aren't going to make any manual exceptions. I've filed a "reconsideration request" with Google, and only got a boilerplate response saying our site has "no manual penalties". Unfortunately, any spam report, reconsideration request, or communication via the Webmaster Tools interface seemingly goes to a black hole. I wouldn't even know who to contact or how to contact them.
I can understand why they can't reveal too much about their algorithm, but at the same time if they want to build a "healthier web ecosystem" I think they have to do a better job of communicating with legitimate webmasters (this is going to be the subject of a future post). There is so much uncertainty out there that I think even legitimate site owners are afraid to speak out for fear of being punished, when instead they should be working together with Google to help improve things. It's like the attitude is: if you're doing well, don't say anything or you might draw Google's scrutiny (especially because you're probably doing something shady), and if you are doing badly don't say anything either because Google will just punish you more (because they have not told you why you have been punished in the first place). So instead people are driven to anonymous postings on random forums where wild misinformation spreads.
The intent of this post wasn't to focus on why our particular site wasn't ranking well anymore in Google, but to try to see if there were other website owners out there who feel comfortable coming out with their stories, and to share anything they may have learned. Again, there is so little real information coming from Google that most people I know have just been grasping at straws, so it would be great to hear other people's stories. I recently read about another seemingly legit site with lots of good content punished (http://www.google.com/support/forum/p/Webmasters/thread?tid=...), so there must be more of them out there.
They've admitted that there's al algorithm then also sites can be manually 'reviewed'... layman's terms... yeah they made exceptions and got called out hardcore with this last panda release on it and at least they 'claim' that they're working on fixing it. search google for panda site slap.
you'll see lists of all the big ones that went down...
Look at this page: [http://www.travbuddy.com/hotels]. All those links to "Hawaii Hotels" and "New York Hotels" look exactly like an old-school linkfarm.
Ask yourself this: if you're Google, and somebody is typing in "Hawaii Hotels" to do a search, is your Hawaii Hotels page really something they want as a top result? I don't think it is. Your content is not really "canonical" for Hawaii Hotels, it's just a page that lists a few reviews you've managed to collect.
It's all speculation on my part, since I have no inside information, but essentially, I think you're sort of poisoning your own reputation at Google with all those links.