What is a Sitemap?
Sitemaps are an easy way for webmasters to inform the Search Engines about all the pages on their websites that are available for indexing or crawling. These are basically the tree structure of the website showing the hierarchy of the pages and the clear structure of website architecture and navigation. Usually Search Engine crawlers discover a new page from the links within the site and from other sites. Creation of sitemaps helps in providing this data to the crawlers.
Sitemaps can be of two types: html and xml. Html sitemaps are simple html files containing links to the individual pages of the website.
XML Sitemap
In the simplest form, an xml Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL. These metadata are the last updated or modified date, the change frequency, importance of individual pages, relative to other URLs of the site etc. These are the additional information for the Search Engine crawlers.
The Sitemap protocol format consists of XML tags. The file must be UTF-8 encoded and data values in the Sitemaps should be entity escaped. The xml Sitemap must:
- Begin with an opening
tag and end with a closing tag - The namespace (protocol standard) should be specified within the
tag - A
entry must be included for each URL, as a parent XML tag - A
child entry must be included for each parent tag
Another very important aspect to be remembered is that all URLs in a Sitemap must be from a single host, such as www.example.com.
A Sample XML Sitemap
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
http://www.example.com/
2008-05-21
monthly
0.8
http://www.example.com/page1.html
weekly
http://www.example.com/page2.html
2008-05-21
weekly
http://www.example.com/page3.html
2008-04-20T18:00:15+00:00
0.3
http://www.example.com/page4.html
2008-03-21
You can also provide multiple Sitemap files, but make sure that each Sitemap file you provide must have no more than 50,000 URLs and must not be larger than 10MB. To list more than 50,000 URLs, you must create multiple Sitemap files.
In case of multiple Sitemaps creation, you should list each Sitemap file in a Sitemap index file.
The Sitemap index file must:
Begin with an opening
tag and end with a closing tag Include a
entry for each Sitemap as a parent XML tagInclude a
child entry for eachThe optional
tag is also available for Sitemap index files
An RSS (Real Simple Syndication) 2.0 or Atom 0.3 or 1.0 feed can also be provided which is generally done when the site already has a syndication feed. Make sure that the RSS feed is located in the highest-level directory. Search engines extract the information from the feed as follows:
- field - indicates the URL
- modified date field (the
field for RSS feeds and the - indicates when each URL was last modifieddate for Atom feeds)
Text File
You can provide a simple text file that contains one URL per line. Following guidelines must be followed while creating a text file:
- The text file must have one URL per line. The URLs cannot contain embedded new lines
- You must fully specify URLs, including the http
- Each text file can contain a maximum of 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you site includes more than 50,000 URLs, you can separate the list into multiple text files and add each one separately
- The text file must use UTF-8 encoding. You can specify this when you save the file (for instance, in Notepad, this is listed in the Encoding menu of the Save As dialog box)
- The text file should contain no information other than the list of URLs
- The text file should contain no header or footer information
- If you would like, you may compress your Sitemap text file using gzip to reduce your bandwidth requirement
- You should upload the text file to the highest-level directory you want search engines to crawl and make sure that you don't list URLs in the text file that are located in a higher-level directory
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. An xml Sitemap located at http://www.example.com/dir1/sitemap.xml can include any URLs starting with http://www.example.com/dir1/ but not the ones which include URLs starting with http://www.example.com/dir2/.
So, the Sitemap should always be located under the root directory to include all the pages of the website.
Informing the Search Engines
After creating the Sitemap and placed it on the webserver, the Search Engines that support this protocol must be informed of its location. This can be done by:
- Submitting it to the search engine via their submission interface
- Specifying the location in the robots.txt file
- Sending an HTTP request
0 comments:
Post a Comment