The content that is held on an average website has dramatically increased over the years, and the size of the internet, in general, is growing at an exponential rate. Almost every SEO campaign I’ve worked on involved increasing the amount of content on a website unless, under extreme circumstances, this means the total URLs on a site are only going one way. Websites that harness all available tools to highlight what is going on with their site, from a URL perspective, tend to have better outcomes when it comes to search engines.
An XML sitemap is a tool that most sites understand the importance of, but a very common issue is the XML sitemap they are submitting to search engines just isn’t correct. Some common issues I see frequently are URLs in there which are not correct (pages made by accident, test URLs and so on), the URLs are not inclusive of them all or they are using additional tags incorrectly such as lastmod or priority.
It can really pay dividends to audit your XML sitemap on a regular basis, this doesn’t mean sifting through thousands of URLs potentially, but looking at third-party alerts can often pinpoint errors within your sitemap. Fixing these quickly will likely dramatically improve the number of URLs showing in Google and also help save crawling bandwidth too.
How do I identify problems with my sitemap?
This is probably one of the main fixes that won’t be flagged up fully in my SEO auditing/crawling software, additional steps are often required. I would still run a crawler of your choice, be that SEMRush, ahrefs, Screamingfrog or Sitebulb. A quick way to see if there is a problem with your sitemap can be done through Google Search Console.
If you go to Coverage > Valid > there will often be two categories here. The first category will be Submitted and Valid, the second will be indexed, not submitted in sitemap. The URLs in the latter category will actually showcase URLs that Google have put into the search engine result pages and aren’t appearing in the XML sitemap.
Also in the coverage tab, you can select and filter by the URLs submitted into each sitemap – which allows you to lock errors to one sitemap rather than all URLs – this can be really useful with a large site with multiple sitemaps.
The final way to validate, in my experience, is to look at the sitemap itself, you are looking for lastmod dates that are true, I don’t often use the priority tag and I also look for URLs that should not be in there too. There is also a dedicated sitemap section of Google Search Console, this can highlight any additional errors too.
How do I fix XML sitemap problems?
The way to resolve it often sits on how they are generated in the first place, for example, if you are using WordPress, there may be a plugin managing the XML sitemap and you’ll need to log in and edit the settings accordingly. Almost all the sitemaps I fix tend to be around the generation of the sitemap, some use Screaming Frog for example to generate XML sitemaps when there is no fallback – the easiest way is to identify what is missing or broken and troubleshoot that way.
How do I prevent XML Sitemap issues for SEO?
If you have audited your website and the sitemap is now working as intended, it can generate a new risk. If a sitemap works and then stops working you need to have a process whereby you capture that it is not working. Aside from a regular site audit another quick way I tend to validate this is to publish a new page and occasionally view the XML sitemap to see if it’s there. Doing this every now and again can showcase that the sitemap is working – also keeping on top of the Search Console errors can be a big help and alert when things aren’t working as intended. Of course, regular professional SEO audits are one of the easiest ways to give your site a clean bill of health.
More comments on XML sitemaps
Occasionally it can pay to have a dedicated image XML sitemap, depending on the size of your site.
Most CMS will have robust XML sitemap generation tools, if one doesn’t fit the need entirely, you can often find additional plugins or services that will generate it.
This is geared more around the general XML sitemap as opposed to the news one and image – which do have nuances you need to be conscious of.
If you run an XML sitemap update once per day or at a frequency, make sure the lastmod date is the date on which the page changed and not the last time the sitemap was updated – this will likely undermine the credibility of the XML sitemap.