Showing posts with label Sitemaps. Show all posts
Showing posts with label Sitemaps. Show all posts

Monday, August 27, 2007

Google sitemaps - why isn't there an "official" tool to create one?

I'm not a big fan of sitemaps. If I've built my linking structure correctly, the Googlebot should be able to find and index all my pages in a natural way. However, because I've been trained to jump through the Goops (Google hoops), I build a sitemap for many of my sites. I use an outside tool that sometimes does a good job and sometimes does not. It's a crapshoot that often ends with me doing more work than I should for a process that I question whether actually provides any useful benefit to me.

Why doesn't Google have it's own in-house "official" sitemap builder? An application written by Google engineers who understand the specification behind the sitemap and know the internal workings of the Googlebot that uses the sitemap to traverse websites. This tool could be incorporated into the webmaster tools and provide a way after verifying the site to create the sitemap with a simple click. Also incorporated into this could be an HTML validator that shows errors upfront before they hit the sitemap.

So why hasn't Google done this yet? It can't be a matter of dollars and cents - they have more money than God. Maybe it's a bean-counter thing with the accountants asking "how does this make us any money?". Whatever the reason, it's time for Google to step up and either build their own sitemap creation tool or admit that the sitemaps don't really mean all that much.

Thursday, August 23, 2007

Are sitemaps really worth the time? The Google Sitemap experiment

I've long wondered exactly how effective a sitemap is in getting a site fully indexed. Like the other Goops (Google hoops) I jump through, I do it without thinking. 90% of the sites that I build have are tracked through the Google webmaster tools and have been verified and sitemapped. I've never known exactly why I create a sitemap other than Google claims it makes a site more "Google friendly".

According to the Webmaster Guidelines:

"Google uses your Sitemap to learn about the structure of your site and to increase our coverage of your webpages."

I haven't been able to quantify Google's claim, so I put it to a test. Here is how I set it up:

  • I created three sites with unique content
  • Each of the three sites had 6 pages
  • The navigation structure was identical on each page - I used a "link bar" at the top and bottom of each page that linked to all pages in the website.
  • The three sites had different topics. I originally wanted to create three identical sites with different URLs, but the chance that two of the sites would be seen as duplicate (and penalized) held me back. Instead, I went for three separate but equally innocuous topics about mundane tasks.
  • Each site had 1 image per page and between 200 and 300 words
I set up the three sites through my Google webmaster tools in three different ways:
  • The first site (Site A) was entered into the webmaster tools, verified, and had a complete sitemap submitted describing it's structure
  • The second site (Site B) was entered into the webmaster tools and verified, but I refrained from submitting a sitemap
  • The third site (Site C) wasn't even entered into the webmaster tools. Google had to find this third website naturally
To give each a chance to be found, I created three footer links on one of my higher ranking sites. The links were all on the same page and used "click here" as the anchor text (so that Google didn't try to weigh a site's value higher or lower because of fouled up linking text). I checked for the indexing of the site by using Google's site: operator in the search engine. I checked at least once per hour, except when I slept. I wanted to find out which site became fully indexed the fastest.

The results of my research were surprising to say the least.
  • The first site to be fully indexed by Google was Site B - the site listed in the webmaster tools and verified but not sitemapped. It took about 24 hours to be fully indexed.
  • The second site to be fully indexed was the site without any listing in the webmaster tools (Site C). Google found it through a natural link and indexed it completely. Oddly, this was the last site to have it's first page indexed, but all were indexed at one time. Site C took almost 6 days to become fully indexed.
  • The last site to become fully indexed was Site A - the website loaded into the webmaster tools, verified, and sitemapped. It was also the first site added to the tools and verified. It only took several hours (less than 8, but I can't be sure because I was sleeping) for the home page to get indexed, but more than 1 week for the entire site to be indexed.
I know that my little experiment was hardly scientific, but it's still surprising how it turned out. Does Google give more credence to sites it finds naturally? (I can't prove it, but I suspect it does) Does the sitemap help in hurrying up the indexing process? (probably not) I will continue to create sitemaps for my websites, but it would be nice to have more information about exactly why Google puts so much emphasis on creating sitemaps (often a time consuming process with large sites).