One of the benefits of using a static website is complete control over how everything in your site works. With dynamic websites, sitemaps that are automatically generated don’t necessarily create your site map correctly. What about if your site is using AMP? Will your site point to the correct URL for the AMP version of your page?

Sitemaps are one of the most overlooked tools for website developers to help with a site’s search result rankings, and creating one the right way will improve your site’s performance. If your site is designed correctly, you can generate a complete site map with only a few lines of code. Better yet, when you add new content, it will automatically be added to your site map.

Why Create a Site Map

When it comes down to a website, the income your site can generate is typically tied to how high your site ranks. Sitemaps are not very difficult to create, and with just a few minutes of your time, you can have yours built.

I case that isn't reason enough, here are something to think about to show you the value of a site map.

Efficient Crawling

If you aren't already aware, Google has to index your site for it to be available in its search results. You may also know that your pages aren't indexed the moment you publish them. This is because of something called the crawl budget.

Your crawl budget is basically how stale the content of your site is and how high the demand for your content is. If you don't publish new data very often or if you don't expect your site will grow to be very large, then you may not need to worry about the crawl budget. Creating a site map won't increase the demand for your data - only good content and good marketing will do that - but it will let a Google know when you have new or updated content.

Automatic Page Discovery

When you have new or updated content, your site map will help Google know that you've released new content or updates to your existing content. When you write the code for your site map, it'll be written in a way that automatically updates itself when new content is added.

Eliminate Duplicate Search Results

Google is continuously seeking to improve its search results with better results for users. In the past, that has been by interpreting the data your site is serving and adding that to its search results. This led to some problems, such as people stealing content and republishing it as their own. This can also be on your own site. For instance, if you run a blog that has category pages, it's likely that content from the main page is duplicated on the category pages.

Using a site map, you can use canonical links to point to original versions of the page. Combine that with the noindex tag, and you help Google determine which page is the most relevant. You help the engine by telling it which of your content is the one most users would want to see. This allows Google to be more helpful for its users.

Understanding an XML Site Map

There is specific guidance about how sitemaps should be constructed, which elements are required, and which ones you can include. XML is a very strict metalanguage that requires proper structure for it to be valid. This makes it easy to add the correct information that search engines will look for in your file.

All sitemaps start with the XML version tag and specifies which encoding format is used. Sitemaps are required to use UTF-8 encoding to be valid.

<?xml version="1.0" encoding="utf-8"?>

You then need to include a urlset tag with the XML namespace document. This will bracket your list of URLs and tells search engines that your document will conform to the sitemaps XML namespace. You only need to include the xmlns:xhtml portion if you’re using AMP links. If you’re not using AMP, then you can leave this part off.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">

Next, we need to include the following information for each URL that you want to be indexed on your site. You’ll put on the unique URL for each address, the last modification date - including the time if it’s modified throughout the day - and how frequently your data is changed. Technically, you just need the URL, but I highly recommend including them. For last mod, follow the pattern of YYYY-MM-DDThh:mm:ss - though as I mentioned before, the time component is optional.

<url>
  <loc>https://buildstaticwebsites.com</loc>
  <lastmod>2019-02-05</lastmod>
  <changefreq>weekly</lastmod>
</url>

Optionally, you can include the priority for your page between 0 and 1.0. This tells Google the significance of that page to your site.

You can also include an XHTML link to the amp page for your site. This is done with the following syntax

<xhtml:link rel=“amphtml” href=“[url_to_amp_page” />

Site Map Example

To help you get an idea of how it should look, here’s an section from my site map for this website:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
  <loc>https://buildstaticwebsites.com</loc>
  <lastmod>2019-02-04</lastmod>
  <changefreq>weekly</changefreq>
</url>
<url>
  <loc>https://buildstaticwebsites.com/guides/why-you-need-a-site-map-and-how-to-create-one-with-flask</loc>
  <xhtml:link rel="amphtml" href="https://buildstaticwebsites.com/guides/why-you-need-a-site-map-and-how-to-create-one-with-flask/amp" />
  <lastmod>2019-02-05</lastmod>
  <changefreq>weekly</changefreq>
</url>
</urlset

How to Create a Site Map (using Flask and Flask-Flatpages)

I’m sure you’re thinking that maintaining this level of detail on your website will be tedious, not to mention error-prone. Luckily, there’s a better way. If you’re using a dynamic environment to develop your site, you can easily create this dynamically. I use Flask - a microframework for Python - to build my sites, and I leverage Flask-FlatPages along with Frozen-Flask to allow me to create a static version of my site.

Leveraging Flask-FlatPages

Flask-Flatpages is a library that creates a list of your articles. You typically use this list to pass your article collection through your routes to create your pages. You can leverage this same list to build your collection site map.

Your articles are files that you create using Markdown, which allows you to create metadata for your article.

Article Metadata

For this to work, there are a few pieces of metadata that you need to use in your markdown files for your articles. In the metadata for your article, make sure you set the date last modified, the page ID to uniquely identify each page, and category - if you’re using categories. When my article list is passed to the site-map route, it contains this metadata that you can build into the sitemap

Putting it All Together

First, you need to create a route that will create point to the site-map template. I like to sort my article collection by the published date so that my newest articles will be at the bottom of my site map.

@app.route('/sitemap.xml')
def site_map():
  articles = sorted(flatpages, key=lambda item:item.meta['published'], reverse=False)
  return render_template('sitemap_template.xml', articles=articles, base_url=“https://buildstaticwebsites.com”)

Then create the sitemap_template.xml file that your route calls. This template creates your site’s base URL, then loops through each article in your articles collection and creates a URL entry for that article. The AMP URL is automatically generated for every page you’ve created.

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>{{ base_url }}</loc>
    <lastmod>2019-02-04</lastmod>
    <changefreq>weekly</changefreq>
  </url>
  {%- for article in articles %}
  <url>
    <loc>{{ base_url }}/{{ article.category }}/{{ article.pageID }}</loc>
    <xhtml:link rel="amphtml" href="{{ base_url }}/{{ article.category }}/{{ article.pageID }}/amp" />
    <lastmod>{{ article.modified }}</lastmod>
    <changefreq>weekly</changefreq>
  </url>
  {%- endfor -%}
</Urlset>

The Result

After this code is integrated into your static site, whenever you create a new page, it will automatically be added to your site. Your site map is created by calling the /sitemap.xml route, so when you call the freeze method of Frozen-Flask, it will create a new version of your sitemap.xml file.

Final Thoughts

Sitemaps are a handy tool to help search engines understand your site to make it as useful as possible to users. Creating a sitemap will help search engines understand which versions of your pages are canonical, and allow you to tell Google when you have new articles or have updated old ones. Setting up the creation of your sitemap.xml file to be dynamically created by your framework. This will make management more manageable for you and help eliminate the possibility of human error when re-typing data into your file.

Related Questions

Are there any limitations to sitemaps? - Yes, your size must be smaller than 50 MB and can only contain up to 50,000 URLs. If you need more than that, you can create multiple files and use a sitemap index instead.

What fields are required for sitemaps? - The only tag that is required is the <loc> tag. However, you will also need to make sure that you include the XML encoding statement and the structural elements (<urlset><url></url></urlset>) around each URL. The rest of the fields are optional, but they can help search engines understand your content better.

Do you need to include AMP links in your site map? - You don’t need to include AMP links in your site map file. Google will be able to find your AMP pages as long as you include the rel=“amphtml” links on the canonical version of your pages. However, if you’re just getting started or you can programmatically add AMP references to your sitemap.xml file, you might as well do so.