Automated news index generation system that dynamically scans published news articles across all 14 supported languages and generates corresponding index pages with proper metadata, filtering capabilities, and SEO optimization for parliamentary intelligence.
Operational Context: This script solves the critical maintenance problem of hardcoded article arrays in static index HTML files. Instead of manually updating article lists for 14 language variants, the system autonomously discovers published articles and generates index pages with consistent structure, metadata, and search optimization.
Multi-Language Support (14 languages):
- English (en), Swedish (sv), Danish (da), Norwegian (no), Finnish (fi)
- German (de), French (fr), Spanish (es), Dutch (nl)
- Arabic (ar), Hebrew (he), Japanese (ja), Korean (ko), Chinese (zh)
- Each language includes localized titles, keywords, breadcrumbs, filtering UI
Core Functionality:
- Scans news/ directory recursively for published HTML article files
- Extracts article metadata: title, date, description, language, category tags
- Aggregates articles by language code for proper index organization
- Generates dynamic filter controls: article type, topic category, sort order
- Creates SEO-optimized index pages with proper JSON-LD schema markup
- Implements responsive UI with accessibility features (WCAG 2.1 AA)
Intelligence Integration:
- Enables real-time tracking of parliamentary activity coverage
- Identifies news gaps and coverage imbalances across political topics
- Supports rapid content discovery for international audience segments
- Maintains consistent intelligence narrative across language variants
Article Discovery & Categorization:
- Prospective news: Upcoming parliamentary events (week-ahead, committee agendas)
- Retrospective news: Completed parliamentary activities (votes, decisions)
- Analysis pieces: Strategic interpretation of political developments
- Breaking news: Urgent parliamentary developments and emergency situations
Topic Categories:
- Parliament (Riksdag structure, committee reports, legislative process)
- Government (cabinet decisions, ministry statements, regulatory actions)
- Defense (national security, military policy, NATO/EU coordination)
- Environment (climate policy, emissions trading, sustainability)
- Committees (specific committee activities and cross-committee coordination)
- Legislation (bill tracking, proposal analysis, amendments)
SEO & Accessibility:
- Implements Open Graph meta tags for social media sharing
- Generates JSON-LD structured data for search engine indexing
- Provides hreflang tags for multi-language version discovery
- Includes alt text for all images and proper heading hierarchy
- Mobile-responsive design with proper viewport configuration
Localization Features:
- Translated UI elements: filter labels, breadcrumbs, no-results messages
- Localized date formats and sort options
- Language-specific keyword optimization for search engines
- Proper locale configuration (en_US, sv_SE, etc.)
Integration Points:
- Invoked by CI/CD pipeline after news generation scripts
- Feeds article discovery service for dashboard widgets
- Consumed by search functionality and site navigation
- Referenced by analytics tracking for page visit metrics
Data Integrity:
- Validates article file existence before inclusion
- Handles missing or malformed metadata gracefully
- Provides diagnostic output for troubleshooting
- Complies with ISO 27001:2022 A.12.6.1 (change management)
Usage: node scripts/generate-news-indexes.js
Generates: news/index.html, news/index_sv.html, ... news/index_zh.html
- Version:
- 3.0.0
- License:
- Apache-2.0
- Source:
- See:
-
- NEWS_WORKFLOW_EXECUTIVE_SUMMARY.md for context
- generate-news-enhanced.js (produces articles consumed by this indexer)
- html-utils.js (provides HTML entity escaping)
- WCAG 2.1 AA accessibility standards
Methods
(inner) buildSlugToLanguagesMap(articlesByLang) → {Object}
Build map of base slugs to available languages for cross-language discovery
Detects articles with the same base slug (e.g., "2026-02-14-week-ahead") across different languages and maps slug -> [language codes].
Parameters:
| Name | Type | Description |
|---|---|---|
articlesByLang |
Object | Articles grouped by language |
- Source:
Returns:
Map of slug -> array of language codes
- Type
- Object
(inner) classifyArticleType()
Classify article type based on content and filename. Supports detection keywords in all 14 languages.
- Source:
(inner) extractDateFromJSONLD(html) → {string|null}
Extract date from JSON-LD structured data
Parameters:
| Name | Type | Description |
|---|---|---|
html |
string | HTML content |
- Source:
Returns:
Date in YYYY-MM-DD format or null
- Type
- string | null
(inner) extractFromFilename()
Extract date from filename (YYYY-MM-DD format)
- Source:
(inner) extractMetaContent()
Extract content from meta tags
Fixed: regex now properly handles apostrophes and special characters in content
- Source:
(inner) extractTags()
Extract tags from article:tag meta tags
- Source:
(inner) extractTitle()
Extract title from
- Source:
(inner) extractTopics()
Extract topics from article tags. Supports topic detection keywords in all 14 languages.
- Source:
(inner) generateAllIndexes()
Main generation function
(inner) generateAvailableLanguages(languages, currentLang) → {string}
Generate "Available in" text with language badges
Parameters:
| Name | Type | Description |
|---|---|---|
languages |
Array | Array of language codes |
currentLang |
string | Current display language |
- Source:
Returns:
HTML for available languages display
- Type
- string
(inner) generateHreflangTags()
Generate hreflang tags for all languages
(inner) generateIndexHTML(langKey, languageArticles, allArticlesByLang)
Generate index HTML for a specific language
Each language index displays only articles in that specific language. Articles include metadata about which other languages they're available in for cross-language discovery indicators.
Parameters:
| Name | Type | Description |
|---|---|---|
langKey |
string | Language code (en, sv, etc.) |
languageArticles |
Array | Articles in the target language only |
allArticlesByLang |
Object | All articles grouped by language |
- Source:
(inner) generateLanguageBadge(lang, isRTL) → {string}
Generate language badge HTML for an article
Parameters:
| Name | Type | Default | Description |
|---|---|---|---|
lang |
string | Language code (e.g., 'en', 'sv') |
|
isRTL |
boolean | false | Whether the current display language is RTL |
- Source:
Returns:
HTML for language badge
- Type
- string
(inner) generateLanguageNotice()
Generate language availability notice for non-EN/SV indexes
(inner) generateLanguageSwitcherNav(currentLang) → {string}
Generate language switcher navigation for news index pages
Parameters:
| Name | Type | Description |
|---|---|---|
currentLang |
string | Current language code |
- Source:
Returns:
HTML for language switcher nav
- Type
- string
(inner) generateRTLStyles()
Generate minimal RTL-specific styles All other styles are now in styles.css under .news-page scope
(inner) getAllArticlesWithLanguageInfo(articlesByLang) → {Array}
Get all articles with language information for cross-language discovery
NOTE: This function is currently UNUSED in production but preserved for potential future use. It was implemented for Issue #155's cross-language discovery feature but the requirement changed to language-specific filtering (each index shows only articles in its target language).
If cross-language discovery is needed again, this function can be used instead of passing articlesByLang[langKey] to generateIndexHTML() on line 958.
This function collects ALL articles from all languages and enriches each with metadata about which language versions are available for the same slug.
Parameters:
| Name | Type | Description |
|---|---|---|
articlesByLang |
Object | Articles grouped by language |
- Deprecated:
- Currently unused - kept for potential future cross-language discovery
- Source:
Returns:
All articles with availableLanguages field
- Type
- Array
(inner) normalizeDateString(dateStr) → {string}
Normalize date string to YYYY-MM-DD format Handles full ISO timestamps, simple dates, etc.
Parameters:
| Name | Type | Description |
|---|---|---|
dateStr |
string | Date string in various formats |
- Source:
Returns:
Date in YYYY-MM-DD format
- Type
- string
(inner) parseArticleMetadata()
Parse HTML file to extract article metadata
- Source:
(inner) scanNewsArticles()
Scan news directory and group articles by language
- Source: