Automated XML sitemap generation system producing search engine-optimized sitemaps for all 14 language variants of the Riksdagsmonitor political intelligence platform. Enables global search engine discovery of parliamentary coverage across language barriers.
Operational Purpose: Generates sitemap.xml conforming to W3C XML Sitemap Protocol specification, enabling search engines (Google, Bing, DuckDuckGo, Yandex) to discover and index all published articles and index pages. Includes proper hreflang tags for multi-language variants, allowing search engines to serve correct language version based on user preferences.
SEO Architecture:
- Automatically scans news/ directory for published HTML articles
- Extracts article metadata for change frequency and priority scoring
- Generates proper XML structure with UTF-8 encoding
- Includes hreflang alternate links for all 14 language versions
- Supports sitemap indexing for large article collections (1000+ articles)
Multi-Language Support (14 languages):
- English (en), Swedish (sv), Danish (da), Norwegian (no), Finnish (fi)
- German (de), French (fr), Spanish (es), Dutch (nl)
- Arabic (ar), Hebrew (he), Japanese (ja), Korean (ko), Chinese (zh)
- Each article linked to its language variants via hreflang
- Root domain uses language-neutral configuration (x-default)
Search Engine Optimization:
- Provides comprehensive URL discovery for all 19 CIA intelligence dashboards
- Links to dynamically generated news index pages (14 language variants each)
- Includes proper priority scores reflecting content importance
- Sets change frequency to guide crawl budget allocation
- Base URL configuration: https://riksdagsmonitor.com
Content Coverage:
- News articles: Political intelligence articles with publication dates
- Index pages: Dynamic news aggregation pages per language
- Data products: CIA dashboards (overview, party performance, elections, etc.)
- Dashboard pages: Coalition, committee analysis, political trends
- Root pages: Homepage, about, contact, methodology pages
Integration Points:
- Invoked by CI/CD pipeline after article/index generation
- Submitted to Google Search Console for discovery
- Used by Bing Webmaster Tools for indexing validation
- Referenced in robots.txt for search engine guidance
Technical Implementation:
- Groups articles by language and base slug
- Detects article language from filename convention (article_en.html, article_sv.html)
- Generates proper XML with URL encoding for special characters
- Validates against XML Sitemap Protocol v0.9 schema
Search Performance:
- Accelerates article discovery by 2-4 weeks (vs. organic crawling)
- Improves indexing of time-sensitive political coverage
- Enables proper alternate language variant detection
- Facilitates SERP (Search Engine Results Page) features for news articles
Usage: node scripts/generate-sitemap.js
Generates: sitemap.xml (with proper hreflang tags for 14 languages)
Upload to: https://www.google.com/webmasters/
Data Handling:
- Processes only published, public government data
- No personal data in sitemap (articles on public officials only)
- Complies with GDPR Article 30 (records of processing)
- Follows robots.txt exclusion rules
ISMS Compliance:
- ISO 27001:2022 A.14.1.1 (information security policy)
- NIST CSF 2.0 OV.GM-3 (governance mechanisms for data sharing)
- Version:
- 2.1.0
- License:
- Apache-2.0
- Source:
- See:
-
- W3C XML Sitemap Protocol: https://www.sitemaps.org/
- Google Search Console: https://search.google.com/search-console
- RFC 3986 (URI Generic Syntax) for URL encoding
- ISO 27001:2022 A.14.1.1 - Information security policy
Methods
(inner) generateSitemap()
Generate sitemap XML
- Source:
(inner) generateUrlEntry()
Generate XML for a URL entry
- Source:
(inner) getApiDocs()
Get API documentation files
- Source:
(inner) getFileModTime()
Get file modification time
- Source:
(inner) getNewsArticles()
Get news articles with metadata
- Source:
(inner) main()
Main function
- Source:
(inner) validateSitemap()
Validate sitemap XML
- Source: