Sitemap

sitemap
Spread the love




 

A sitemap is a file where you can list the web pages or all urls of your site and submit these files to Search Engine. Search engine web crawlers like Googlebot read this file and index these all urls or pages. Sitemap also provide valuable metadata associatedwith the pages you list in that sitemap:

1. When the page was last updated
2. How often the page is changed
3. The importance of the page relative to other URLs.

You can use a sitemap to provide Google with metadata about specific types of content on your pages, including video,    image, and mobile content.

Why we need a sitemap?

If your site’s pages are properly linked, Search Engine web crawlers can discover most of pages. Even so, a sitemap can improve the crawling of your site, particularly if your site meets one of the following criteria:

1. Site has a large archive of pages and they isolated or not linked to each other.
2. Site is large or have number of pages
3. Site is new and has few external links to it.

 

General Sitemap Guidelines

1. Use consistent, fully-qualified URLs. Google will crawl your URLs exactly as listed.
2. Don’t include session IDs from URLs in your sitemap to reduce duplicate crawling of those URLs.
3. Point out translated versions of a URL to Google for crawling and indexing by listing the canonical URLs for each     language in your sitemap file and by using hreflang annotations.
4. Sitemap files must be UTF-8 encoded
5. Break up large sitemaps into a smaller sitemaps to prevent your server from being overloaded.
6. Number of URLs in Sitemap – A sitemap file can’t #be more than 50,000 URLs and must be no larger than 10  MB uncompressed.
7. Use a sitemap index file ( sitemapindex.xml) to list all your sitemaps and submit this single file to Google rather than submitting individual sitemaps.
8. Non-alphanumeric and non-latin characters. We require your Sitemap file to be UTF-8 encoded A sitemap can     contain only ASCII characters; it can’t contain special characters such as * and {}. If your Sitemap URL have these characters, you’ll will get an error when you try to add it.

  • Character
  • Escape Code
  • Ampersand
  • &
  • &
  • Single Quote
  • '
  • Double Quote
  • "
  • Greater Than
  • >
  • >
  • Less Than
  • <
  • &lt;

Here is an example of a URL that uses a non-ASCII character (ü), as well as a character that requires entity escaping (&):       http://www.example.com/ümlat.html&q=name

Multiple Sitemap

Suppose you have number of sitemaps files like sitemap1.xml, sitemap2.xml and sitemap3.xml. Then you can make a sitemaps index file (sitemapindex.xml) and include all these sitemap files (sitemap1.xml, sitemap2.xml and sitemap3.xml) in this index file. You need to submit only sitemapindex file at once and other sitemap file will automatically taken by Search Engine.  Sitemap index file have these XML tags:

       1. sitemapindex – parent tag.
       2. sitemap –  parent tag for each sitemap listed in the file
       3. loc – location of the child sitemap
       4. lastmod – sitemap file last modified date (optional)

Check the sitemap index in XML format with lists two sitemaps:

<?xml version=”1.0″ encoding=”UTF-8″?>
<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<sitemap>
<loc>http://www.abc.com/sitemap1.xml</loc>
<lastmod>20015-10-01</lastmod>
</sitemap>
<sitemap>
<loc>http://www.abc.com/sitemap2.xml</loc>
<lastmod>20015-01-01</lastmod>
</sitemap>

<sitemap>
<loc>http://www.abc.com/sitemap3.xml</loc>
<lastmod>20015-01-01</lastmod>
</sitemap>
</sitemapindex>

Save all your sitemaps to the same location on your host server. You can submit up to 500 sitemap index files for each site.

Submit your Sitemap to Google

There are three different ways you can submit  sitemap to Google:
 1. Submit it to Google using the Search Console Sitemaps tool
2. Through robots.txt file, Give the path to sitemap in robots.txt same as
Sitemap: http://www.abc.com/sitemap.xml

3. Submit via an HTTP request like
http://www.google.com/ping?sitemap=http://www.abc.com/sitemap.xml

Sitemaps XML format

The following example shows a Sitemap that contains just one URL and uses all optional tags.
lastmod, changefreq and priority meta tag are optional.

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc >http://www.example.com/</loc >
<lastmod >2005-01-01</lastmod>
<changefreq >monthly</changefreq>
< priority >0.8</priority>
</url>
<url>
<loc >http://www.example.com/abc.html</loc>
< lastmod >2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority >0.8</priority>
</url>
</urlset>

XML Tag Definitions

The available XML tags are described below.

Attribute Description
<urlset> required Encapsulates the file and references the current protocol standard.
<url> required Parent tag for each URL entry. The remaining tags are children of this tag.
<loc> required URL of the page. This URL must be begin with protocol (such as http) and end with a trailing slash
<lastmod> optional The date of last modification of the file. This format can be YYYY-MM-DD.
<changefreq> optional Frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value “always” should be used to describe documents that change each time they are accessed. The value “never” should be used to describe archived URLs.

Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked “hourly” less frequently than that, and they may crawl pages marked “yearly” more frequently than that. Crawlers may periodically crawl pages marked “never” so that they can handle unexpected changes to those pages.

<priority> optional The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.

The default priority of a page is 0.5.

Please note that the priority you assign to a page is not likely to influence the position of your URLs in a search engine’s result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index.

Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site.

Sitemap Formats

Google supports several sitemap formats as are below:

All formats limit a single sitemap to 10MB (uncompressed) and 50,000 URLs. If you have a larger file or more URLs, you will have to break your list into multiple sitemaps. You can optionally create a sitemap index file (a file that points to a list of sitemaps) and submit that single index file to Google. You can submit multiple sitemaps and/or sitemap index files to Google.

  1. XML
  2. Text
    •    Encode file by UTF-8 encoding.
    •    Text should have only list of URLs. The text file must have one URL per line.
    •    File name can anything, but with  extension .txt ( Ex. sitemap.txt).
    •    Each text file can contain a maximum of 50,000 URLs and must be no larger than 10MB
  3. RSS, mRSS, and Atom
    • You can provide an RSS (Real Simple Syndication) 2.0 or Atom 0.3 or 1.0 feed
    • You can submit the feed’s URL as a sitemap
    • <link rel=”alternate” type=application/rss+xml title=ABC Feed href=http://www.abc.com/feed/>

Leave a Reply

Your email address will not be published. Required fields are marked *