It appears that a spammer has found out how to infiltrate the Google index without being caught. Here's what is happening in a nutshell:
- Some searches (very specific phrases, and I won't list any of them right now - Google knows which they are) return results with a large number of .cn (Chinese) sites.
- The .cn sites are often scraped content from legitimate U.S. websites
- The legitimate sites are being ranked below the scammed .cn sites for these competitive keywords.
- When a user clicks on one of the .cn sites returned in the result set, the user is redirected to an entirely different page which attempts to install one or more pieces of malware on the user's computer. If the user is not protected, they become infected - I don't know the specifics of the infection as I AM well protected
- The .cn sites don't appear to be hosted ANYWHERE. They are simply redirected domain names. How they got ranked in Google in such a short period of time for fairly competitive keywords is a mystery. Google's index even shows legitimate content for the .cn sites.
- It appears that the faked sites are redirecting the Googlebot to a location where content can be indexed, while at the same time recognizing normal users and redirecting them to a site that includes the malware mentioned earlier. This is an obvious violation of Google's guidelines, but the spammers have found ways to circumvent the rule and hide it from the Googlebot.
- These sites are numbering in the millions for many different keywords and phrases, and appear to be developed on an automated basis. Because of privacy laws, it's hard to track down who owns the domain names - Google has the power to do so, but there has been about exactly zero information from Google about the problem so far, and even many SEO experts and webmasters are not picking up on it.
So what does all this mean? One, don't click on a .cn domain name returned from Google.com. If you need to search for a Chinese site, use Google.cn instead of Google.com. Second is to watch your own SERPs and see if you are suddenly dropping below sites with a .cn TLD. If you find that happening, report it here. Third, don't panic - Google is remaining mum on this for a number of reasons. Were the public to stop trusting Google it could cause major upheavals in the search engine business - if the problem was just spam, the public wouldn't even notice. However, since malware is involved, this is something that could hit the major media with a giant bang and cause a panic. That could affect traffic to some sites in a major way - especially those specifically optimized for the Google search engine.
A Major Infrastructure Problem?
If a smart spammer has really found a way to game the Google search results with spoofed or cloaked sites, and Google still doesn't have a fix, this could be a major issue with the underlying infrastructure of the entire Google operation. I've seen hints that a significant infrastructure change is taking place; is this spam issue the reason? Could that mean that Google was actually hacked instead of someone spamming the index? If so, webmasters may be waiting a long time for the expected Pagerank update while Google fixes the leaks.
Time to Worry?
This is the first time that I've ever been worried that Google's own index has been hacked. The obvious and blatant circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. A normal website pulling this would be banned almost instantly. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?
The fact that the SERPs have been so volatile lately shows that the Google algorithm is being updated and tested - often. Coupled with the fact that Google's normal quarterly Toolbar Pagerank update didn't occur at the beginning of August points to the fact that Google is making some major changes. It's not a giant leap of logic to assume that Google may be trying to figure out a way to stop the spamming of it's index, and is looking for some sort of heuristic formula to identify the sites without hurting legitimate U.S. and European websites. The length of time it's taking is scary, but I'd rather they fix the issue than put a band aid on the problem (Microsoft are you paying attention?) hoping it will go away.
If anyone has any other observations on this problem, post them here.