Wednesday, May 21, 2008

XML Sitemaps

Few days back, I had published a post on 10 basic thumb rules for SEO success. I got many queries regarding the 7th point i.e. xml sitemap. So I am publishing this post on xml sitemap for those who want to learn more about these sitemaps.

What is a Sitemap?

Sitemaps are an easy way for webmasters to inform the Search Engines about all the pages on their websites that are available for indexing or crawling. These are basically the tree structure of the website showing the hierarchy of the pages and the clear structure of website architecture and navigation. Usually Search Engine crawlers discover a new page from the links within the site and from other sites. Creation of sitemaps helps in providing this data to the crawlers.

Sitemaps can be of two types: html and xml. Html sitemaps are simple html files containing links to the individual pages of the website.

XML Sitemap

In the simplest form, an xml Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL. These metadata are the last updated or modified date, the change frequency, importance of individual pages, relative to other URLs of the site etc. These are the additional information for the Search Engine crawlers.

The Sitemap protocol format consists of XML tags. The file must be UTF-8 encoded and data values in the Sitemaps should be entity escaped. The xml Sitemap must:

  • Begin with an opening tag and end with a closing tag

  • The namespace (protocol standard) should be specified within the tag

  • A entry must be included for each URL, as a parent XML tag

  • A child entry must be included for each parent tag

All other tags like , etc. are optional and support for these optional tags may vary among Search Engines.

Another very important aspect to be remembered is that all URLs in a Sitemap must be from a single host, such as

A Sample XML Sitemap





A Sample XML Sitemap with All Attributes






Using Sitemap index files (to group multiple sitemap files)

You can also provide multiple Sitemap files, but make sure that each Sitemap file you provide must have no more than 50,000 URLs and must not be larger than 10MB. To list more than 50,000 URLs, you must create multiple Sitemap files.

In case of multiple Sitemaps creation, you should list each Sitemap file in a Sitemap index file.

The Sitemap index file must:
  • Begin with an opening tag and end with a closing tag

  • Include a entry for each Sitemap as a parent XML tag

  • Include a child entry for each parent tag

  • The optional tag is also available for Sitemap index files

Syndication Feeds

An RSS (Real Simple Syndication) 2.0 or Atom 0.3 or 1.0 feed can also be provided which is generally done when the site already has a syndication feed. Make sure that the RSS feed is located in the highest-level directory. Search engines extract the information from the feed as follows:
  • field - indicates the URL
  • modified date field (the field for RSS feeds and the date for Atom feeds) - indicates when each URL was last modified
Use of the modified date field is optional.

Text File

You can provide a simple text file that contains one URL per line. Following guidelines must be followed while creating a text file:

  • The text file must have one URL per line. The URLs cannot contain embedded new lines

  • You must fully specify URLs, including the http

  • Each text file can contain a maximum of 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you site includes more than 50,000 URLs, you can separate the list into multiple text files and add each one separately

  • The text file must use UTF-8 encoding. You can specify this when you save the file (for instance, in Notepad, this is listed in the Encoding menu of the Save As dialog box)

  • The text file should contain no information other than the list of URLs

  • The text file should contain no header or footer information

  • If you would like, you may compress your Sitemap text file using gzip to reduce your bandwidth requirement

  • You should upload the text file to the highest-level directory you want search engines to crawl and make sure that you don't list URLs in the text file that are located in a higher-level directory
Location of a Sitemap File

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. An xml Sitemap located at can include any URLs starting with but not the ones which include URLs starting with

So, the Sitemap should always be located under the root directory to include all the pages of the website.

Informing the Search Engines

After creating the Sitemap and placed it on the webserver, the Search Engines that support this protocol must be informed of its location. This can be done by:
  • Submitting it to the search engine via their submission interface

  • Specifying the location in the robots.txt file

  • Sending an HTTP request

The search engines can then retrieve the Sitemap and make the URLs available to their crawlers.



Ads Banner

My Blog List




E Marketing Strategies Copyright © 2009 Blogger Template Designed by Bie Blogger Template