How to Audit Hreflang Tags at Scale
How to audit hreflang tags across large websites with thousands of pages. Covers crawling tools, common errors to check, prioritization, and ongoing monitoring.
Checking hreflang tags on a five-page site takes a few minutes. Checking them across a site with 10,000 pages in 8 languages -- 80,000 URLs worth of cross-references -- is a different challenge entirely. Manual inspection is not realistic at that scale. You need a systematic approach, the right tools, and a clear understanding of which errors matter most.
This guide walks through how to audit hreflang implementations on large websites, what to look for, and how to prioritize fixes. For a general introduction to hreflang, see our complete hreflang guide.
Why Large-Scale Hreflang Audits Matter
Small hreflang errors on a handful of pages are unlikely to cause noticeable problems. But when errors are systematic -- the same mistake repeated across thousands of pages -- the impact compounds. Google may ignore entire groups of hreflang annotations, leading to wrong-version serving across your site.
Common scenarios that create widespread hreflang issues:
- A CMS update changes how hreflang tags are generated
- A new language version is added but not properly linked to existing versions
- A site migration changes URL patterns without updating hreflang references
- Template-level errors that affect every page using that template
- A developer removes or modifies hreflang logic without understanding the requirements
At scale, you often do not notice these problems until international traffic drops or Search Console starts reporting errors. Regular audits catch issues before they affect your search performance.
Tools for Large-Scale Hreflang Audits
Screaming Frog SEO Spider
Screaming Frog is the most commonly used tool for hreflang audits. It crawls your site and extracts all hreflang tags, then cross-references them to find errors.
What it checks:
- Missing return links (page A references page B, but page B does not reference page A)
- Missing self-referencing tags
- Non-200 URLs in hreflang tags (redirects, 404s, 5xx errors)
- Inconsistent hreflang sets across pages
- Invalid language or region codes
For large sites, configure Screaming Frog to crawl in list mode using your sitemap URLs rather than a full crawl. This focuses the audit on pages you know should have hreflang and avoids crawling non-essential URLs.
Sitebulb
Sitebulb provides a visual interface for hreflang auditing with detailed reports on error types and affected pages. It is particularly useful for explaining issues to non-technical stakeholders because it generates clear visualizations of the problems.
Ahrefs Site Audit
Ahrefs crawls your site and reports on hreflang issues as part of its broader site audit. It checks for missing return links, inconsistent tags, and invalid codes. The advantage is that it runs in the cloud, so you do not need to keep your machine running for large crawls.
Custom scripts
For very large sites (millions of pages), off-the-shelf tools may struggle with memory or time constraints. A custom script that processes your XML sitemaps (where hreflang is often implemented at scale) can check for errors more efficiently. The logic is straightforward: parse each sitemap, extract hreflang entries, and verify that every referenced URL has a reciprocal entry.
Google Search Console
Search Console reports hreflang errors in the Enhancements section (or under the old International Targeting report). It does not show every error, and it can take weeks to surface new issues, but it reflects Google's actual interpretation of your tags. Check it regularly as a complement to crawl-based auditing.
What to Check
Missing return links
This is the most common and most impactful hreflang error. Hreflang is bidirectional: if page A says "my French version is page B," then page B must say "my English version is page A." If the return link is missing, Google may ignore both tags.
At scale, missing return links usually happen because:
- A new language version was added but the existing versions were not updated to reference it
- One language version has fewer pages than others, creating orphaned hreflang references
- Template logic generates hreflang tags based on assumptions that do not hold for every page
How to check: Export all hreflang pairs from your crawl data. For each pair (URL A, hreflang to URL B), verify that URL B has a hreflang entry pointing back to URL A. Any unpaired reference is a missing return link.
Missing self-referencing tags
Every page must include a hreflang tag pointing to itself. A page at example.com/en/page/ with hreflang="en" must have a tag with href="https://example.com/en/page/". Missing self-references are common when hreflang is generated dynamically and the current page is accidentally excluded from the loop.
Non-200 hreflang URLs
Every URL referenced in a hreflang tag must return a 200 status code. URLs that redirect (301, 302), return errors (404, 500), or are blocked by robots.txt are invalid hreflang targets. Google will not follow redirects in hreflang annotations.
How to check: Crawl every URL that appears in a hreflang tag and record the HTTP status code. Flag anything that is not a 200.
Inconsistent hreflang sets
All pages in a hreflang set should reference the same group of alternates. If your English page references English, French, and German versions, but your French page only references English and French (missing German), the set is inconsistent. Google may still process the tags it can validate, but inconsistencies reduce reliability.
How to check: For each group of pages connected by hreflang, extract the set of hreflang codes. Every page in the group should have an identical set. Differences indicate inconsistencies.
Invalid language or region codes
Codes must follow ISO 639-1 for languages and ISO 3166-1 alpha-2 for regions. Common invalid codes include en-UK (should be en-GB), fr-QC (should be fr-CA), and zh without a script or region subtag. See our hreflang language codes reference for the full list.
Canonical and hreflang conflicts
If a page has a canonical tag pointing to a different URL, but hreflang tags reference the non-canonical URL, Google has conflicting instructions. The canonical says "this is the main version." The hreflang says "this other URL is the alternate." Make sure hreflang tags always reference canonical URLs. For the full explanation, see hreflang and canonical tags.
Hreflang on non-indexable pages
Pages with noindex meta tags, pages blocked by robots.txt, or pages that redirect should not appear in hreflang annotations. If page A has a hreflang reference to page B, but page B is noindexed, Google cannot serve page B in search results, making the hreflang reference pointless.
Prioritizing Fixes
Not all hreflang errors have equal impact. When you find hundreds or thousands of issues, prioritize by:
High priority
- Missing return links on high-traffic pages. These directly affect which version Google serves for your most important queries.
- Non-200 hreflang URLs on high-traffic pages. Broken references on pages that get significant search traffic mean Google cannot serve the correct alternate version.
- Systematic template-level errors. An error in a template affects every page that uses it. Fixing one template can resolve thousands of individual errors.
Medium priority
- Missing self-referencing tags. Important for compliance but less impactful than missing return links if the rest of the set is correct.
- Inconsistent hreflang sets. Reduces reliability but Google can often work around partial sets.
- Invalid language codes. Google ignores invalid codes but does not break the valid ones in the same set.
Low priority
- Hreflang on low-traffic pages. The same errors, but on pages with minimal search visibility. Fix them, but after addressing high-traffic issues.
- Minor inconsistencies in unused language versions. If a language version gets almost no traffic, errors in its hreflang do not have meaningful impact.
The Audit Process
Step 1: Gather your hreflang data
Start by collecting all hreflang annotations from your site. If hreflang is implemented in HTML <head> tags, crawl the site and extract them. If hreflang is implemented in XML sitemaps, download the sitemaps and parse them.
For sites with both implementations, check both. If they conflict, that is an error in itself. You should use one implementation method, not both.
Step 2: Build a cross-reference matrix
Create a matrix where each row is a URL and each column is a language/region code. Each cell contains the URL that the row's page references for that language/region. This matrix makes it easy to spot:
- Empty cells (missing references)
- Inconsistent URLs (different pages referencing different alternates for the same code)
- Self-reference gaps (diagonal cells that should contain the row's own URL)
Step 3: Validate URLs
Check every URL that appears in the matrix. Confirm it returns a 200 status, is indexable (no noindex, no robots.txt block), and has the correct canonical tag.
Step 4: Check return links
For every pair of URLs connected by hreflang, verify the reverse connection exists. This is the most labor-intensive check but also the most important.
Step 5: Report and prioritize
Group errors by type and severity. Identify template-level issues that can be fixed in one place. Prioritize fixes by traffic impact.
Automate regular audits
A one-time audit finds existing problems. Regular audits (monthly or quarterly) catch new issues before they affect traffic. Set up automated crawls and configure alerts for hreflang error counts exceeding a threshold.
Ongoing Monitoring
Auditing is not a one-time event. Hreflang issues reappear when:
- New pages are published without proper hreflang tags
- Existing pages are moved or deleted without updating references
- CMS or plugin updates change hreflang generation logic
- New language versions are launched
Set up a monitoring process:
- Monthly automated crawls that check for common hreflang errors
- Google Search Console monitoring for new hreflang errors in the Enhancements report
- Pre-launch checks when adding new languages or migrating URLs
- Post-deployment verification after CMS updates or template changes
For a broader checklist of hreflang issues, see our hreflang troubleshooting guide. For a comparison of tools that help with this, see our hreflang checker tools compared.
Summary
Auditing hreflang at scale requires crawling tools, systematic cross-referencing, and clear prioritization. Focus first on missing return links and non-200 URLs on high-traffic pages. Fix template-level errors for maximum impact. Then set up ongoing monitoring to catch new issues before they affect your international search performance.
Generate correct hreflang tags from the start
Avoid audit headaches by generating valid, complete hreflang markup for all your language variants.
Try Hreflang Generator