Keep in mind that it's your job to validate your urls and lastmod dates.
Sitemaps are important. Especially for big websites. It is always a good idea to develop your website with SEO in mind. Unfortunately, most developers ignore this part. This article describes general idea and how to implement your sitemaps with python. I made this article for myself in the first place, because I tend to forget things.
Table of contents
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/foo.html</loc>
<lastmod>2022-06-04</lastmod>
</url>
</urlset>
Sitemaps help search engines discover your website pages. You combine your most important URLs in a bunch of XML files. Different sitemaps can contain different types of media. It can be plain URLs, Images, Videos, and News entries. Images, videos, and news entries are just URLs with additional metadata.
Sitemaps are especially important if you have a website with a lot of pages. Now, I will not go into details, because obviously you're a smart person and will find everything at Google Search Central or sitemaps.org.
Just a few simple rules for you:
Don't forget to signup at Google Search Console and upload your sitemaps.
Yes, you can. Sitemap directive can be used multiple times. Here is real-world example:
Sitemap: https://zip.international/sitemaps/sitemaps.en.us.xml
Sitemap: https://zip.international/sitemaps/sitemaps.en.gb.xml
Here is the idea. You'll need 3 modules: xml, os and, optionally gzip. This snippet shows how sitemap can be created.
import os
import gzip
from xml.etree import cElementTree
def add_url(root_node, url, lastmod):
doc = cElementTree.SubElement(root_node, "url")
cElementTree.SubElement(doc, "loc").text = url
cElementTree.SubElement(doc, "lastmod").text = lastmod
return doc
def save_sitemap(root_node, save_as, **kwargs):
compress = kwargs.get("compress", False)
sitemap_name = save_as.split("/")[-1]
dest_path = "/".join(save_as.split("/")[:-1])
sitemap_name = f"{sitemap_name}.xml"
if compress:
sitemap_name = f"{sitemap_name}.gz"
save_as = f"{dest_path}/{sitemap_name}"
# create sitemap path if not existed
if not os.path.exists(f"{dest_path}/"):
os.makedirs(f"{dest_path}/")
if not compress:
tree = cElementTree.ElementTree(root_node)
tree.write(save_as, encoding='utf-8', xml_declaration=True)
else:
# gzip sitemap
gzipped_sitemap_file = gzip.open(save_as, 'wb')
cElementTree.ElementTree(root_node).write(gzipped_sitemap_file)
gzipped_sitemap_file.close()
return sitemap_name
# create root XML node
sitemap_root = cElementTree.Element('urlset')
sitemap_root.attrib['xmlns'] = "http://www.sitemaps.org/schemas/sitemap/0.9"
# add urls
add_url(sitemap_root, "https://example.com/url-1", "2022-04-07")
add_url(sitemap_root, "https://example.com/url-2", "2022-04-07")
add_url(sitemap_root, "https://example.com/url-3", "2022-04-07")
# save sitemap. xml extension will be added automatically
save_sitemap(sitemap_root, "sitemaps/sitemap")
# if you want to gzip sitemap
save_sitemap(sitemap_root, "sitemaps/sitemap", compress=True)
If you want to add images, videos or news sections you'll need to add xml attributes for your root node.
# create root XML node
sitemap_root = cElementTree.Element('urlset')
sitemap_root.attrib['xmlns'] = "http://www.sitemaps.org/schemas/sitemap/0.9"
# for images add
sitemap_root.attrib["xmlns:image"] = "http://www.google.com/schemas/sitemap-image/1.1"
# for videos add
sitemap_root.attrib["xmlns:video"] = "http://www.google.com/schemas/sitemap-video/1.1"
# for news add
sitemap_root.attrib["xmlns:news"] = "http://www.google.com/schemas/sitemap-news/0.9"
# add this snippet to attach image to url
def add_url_image(url_node, image_url):
image_node = cElementTree.SubElement(url_node, "image:image")
cElementTree.SubElement(image_node, "image:loc").text = image_url
return image_node
# now when you want to add image to url
url_1 = add_url(sitemap_root, "https://example.com/url-1", "2022-04-07"),
add_url_image(url_1, "https://example.com/image-1.jpg")
I will not describe here how to add videos or news to your url, because with this code you can easily do it yourself.
If you have a lot of pages on your website or you simply want to place your sitemaps in different sections you'll need index sitemaps. Index sitemap is just an XML-file with root tag sitemapindex with sitemap tags containing URLs to your sitemaps.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml</loc>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml</loc>
</sitemap>
</sitemapindex>
Let's improve our code to create index sitemap. Add function add_sitemap_url at the beginning of your file.
def add_sitemap_url(root_node, sitemap_url):
sitemap_url_node = cElementTree.SubElement(root_node, "sitemap")
cElementTree.SubElement(sitemap_url_node, "loc").text = sitemap_url
return sitemap_url_node
Then use it whenever you need it.
# create sitemapindex tag
sitemap_index_node = cElementTree.Element('sitemapindex')
sitemap_index_node.attrib['xmlns'] = "http://www.sitemaps.org/schemas/sitemap/0.9"
# append links to other sitemaps
add_sitemap_url(sitemap_index_node, "https://example.com/sitemap1.xml")
add_sitemap_url(sitemap_index_node, "https://example.com/sitemap2.xml")
save_sitemap(sitemap_index_root, "sitemaps/sitemap")
You can find code here. Feel free to comment or ask questions.
In this post, I'll talk about filters. Jinja2 has a list of built-in filters, and Flask leverages them.
Here is the code: GitHub
And package: PyPi
Now, for small sitemaps, it's all pretty easy. If you need to generate lots of sitemaps with images, videos, or news metadata, your code will become messy at some point. I created sitemapa as a little abstraction for XML burden.
Sitemapa is a small package to reduce your work while generating sitemaps. You describe your sitemaps with JSON structure. Sitemapa is framework-agnostic and not indexing your website — it's just generating sitemaps from your description. Noting more. I use it to generate sitemaps for millions of URLs on my websites.
pip install sitemapa
# import in your script
from sitemapa import Sitemap, IndexSitemap
You need to import Sitemap class to create a standard sitemap: from sitemapa import Sitemap. Sitemap class has two methods: append_url and save.
append_url(url, url_data=None)
Parameters: url(str) — Website URL
url_data(Optional[dict]) — URL Description
url_data can contain next keys:
- lastmod
- changefreq. Deprecated at Google
- priority. Deprecated at Google
- images. To describe URL images
- videos. To describe URL videos
- news. To describe URL news
Return type: dict. Dictionary with all urls and url_data
# ------
save(save_as, **kwargs)
Parameters: save_as(str) — Sitemap name and where to save. For example: sitemap1.xml or sitemap1.xml.gz
Return type: str. For example sitemap1.xml or sitemap1.xml.gz
Let's create a sitemap like this and save it as sitemap1.xml.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/url1.html</loc>
</url>
<url>
<loc>http://www.example.com/foo.html</loc>
<lastmod>2022-06-04</lastmod>
</url>
</urlset>
And this is the implementation with sitemapa:
from sitemapa import Sitemap
standard_sitemap = Sitemap()
standard_sitemap.append_url("http://www.example.com/url1.html")
standard_sitemap.append_url("http://www.example.com/foo.html", {
"lastmod": "2022-06-04"
})
# method 'save' will reset inner dictionary with URLs
sitemap1_name = standard_sitemap.save("sitemap1.xml")
# now, if you want to create new sitemap, just do this:
standard_sitemap.append_url("http://www.example.com/url-2.html")
standard_sitemap.append_url("http://www.example.com/url-3.html")
sitemap2_name = standard_sitemap.save("sitemap2.xml")
Let's take this example from Google.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/sample1.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/photo.jpg</image:loc>
</image:image>
</url>
<url>
<loc>http://example.com/sample2.html</loc>
<image:image>
<image:loc>http://example.com/picture.jpg</image:loc>
</image:image>
</url>
</urlset>
To do so, we'll use url_data description.
from sitemapa import Sitemap
sitemap_with_images = Sitemap()
sitemap_with_images.append_url("http://example.com/sample1.html", {
"images": [
"http://example.com/image.jpg",
"http://example.com/photo.jpg"
]
})
# you can also describe like this
sitemap_with_images.append_url("http://example.com/sample2.html", {
"images": [
{
"loc": "http://example.com/picture.jpg",
"lastmod": "2022-05-05"
}
]
})
sitemap_with_images.save("sitemap.xml")
As you can see you can use a list of images or a list of dictionaries. I prefer the first option, since Google deprecated all keys except loc.
This is where it gets a little tricky. Videos have a more complex structure. Let's dive into details, using an example from Google.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>http://www.example.com/videos/some_video_landing_page.html</loc>
<video:video>
<video:thumbnail_loc>http://www.example.com/thumbs/123.jpg</video:thumbnail_loc>
<video:title>Grilling steaks for summer</video:title>
<video:description>Alkis shows you how to get perfectly done steaks every
time</video:description>
<video:content_loc>
http://streamserver.example.com/video123.mp4</video:content_loc>
<video:player_loc>
http://www.example.com/videoplayer.php?video=123</video:player_loc>
<video:duration>600</video:duration>
<video:expiration_date>2021-11-05T19:20:30+08:00</video:expiration_date>
<video:rating>4.2</video:rating>
<video:view_count>12345</video:view_count>
<video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date>
<video:family_friendly>yes</video:family_friendly>
<video:restriction relationship="allow">IE GB US CA</video:restriction>
<video:price currency="EUR">1.99</video:price>
<video:requires_subscription>yes</video:requires_subscription>
<video:uploader
info="http://www.example.com/users/grillymcgrillerson">GrillyMcGrillerson
</video:uploader>
<video:live>no</video:live>
</video:video>
</url>
</urlset>
from sitemapa import Sitemap
sitemap = Sitemap()
sitemap.append_url("http://www.example.com/videos/some_video_landing_page.html", {
"videos": [
{
"thumbnail_loc": "http://www.example.com/thumbs/123.jpg",
"title": "Grilling steaks for summer",
"description": "Alkis shows you how to get perfectly done steaks every time",
"content_loc": "http://streamserver.example.com/video123.mp4",
"player_loc": "http://www.example.com/videoplayer.php?video=123",
"duration": "600",
"expiration_date": "2021-11-05T19:20:30+08:00",
"rating": "4.2",
"view_count": "12345",
"publication_date": "2007-11-05T19:20:30+08:00",
"family_friendly": "yes",
"restriction": {
"$value": "IE GB US CA",
"relationship": "allow"
},
"price": {
"$value": "1.99",
"currency": "EUR"
},
"requires_subscription": "yes",
"uploader": {
"info": "http://www.example.com/users/grillymcgrillerson",
"$value": "GrillyMcGrillerson"
},
"live": "no"
}
]
})
sitemap.save("sitemap.xml")
You can see that each item in the videos list is a description for <video:video>. Take a look at the "restriction" attribute. Each property(except $value) will add extra attributes to <video:restriction>. $value is a special property and it is the content of a tag. So basically it works like this: <video:restriction relationship="allow">restriction[$value]</video:restriction>.
Keep in mind that Google require you to publish in sitemap only new articles. Read more about this here.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>http://www.example.org/business/article55.html</loc>
<news:news>
<news:publication>
<news:name>The Example Times</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2008-12-23</news:publication_date>
<news:title>Companies A, B in Merger Talks</news:title>
</news:news>
</url>
</urlset>
And this is implementation with sitemapa
from sitemapa import Sitemap
sitemap = Sitemap()
sitemap.append_url("http://www.example.org/business/article55.html", {
"news": [
{
"publication": {
"$tags": {
"name": "The Example Times",
"language": "en"
}
},
"publication_date": "2008-12-23",
"title": "Companies A, B in Merger Talks"
}
]
})
sitemap.save("sitemap.xml")
As you can see we just added new tags(<news:name> and <news:language>) inside of <news:publication> using $tags key.
sitemap.append_url("http://www.example.org/business/article55.html", {
"lastmod": "",
"images": [],
"videos": [],
"news": []
})
We'll use an example sitemap from the beginning of this article. Import IndexSitemap from sitemapa. IndexSitemap class has two methods: append_sitemap and save.
from sitemapa import IndexSitemap
index_sitemap = IndexSitemap()
index_sitemap.append_sitemap("http://www.example.com/sitemap1.xml")
index_sitemap.append_sitemap("http://www.example.com/sitemap2.xml")
index_sitemap.save("index-sitemap.xml")
This article is my summary for sitemaps. I hope it helps you on your journey. Don't forget to verify everything with official resources. If you have any questions or you see mistakes in this text, don't be shy and drop me a line.