INTELLIGENCE OPERATIVE PERSPECTIVE
This module ensures that news articles published in non-Swedish languages are fully translated, preventing the publication of partially-translated articles that could damage credibility with international audiences. In multilingual intelligence dissemination, translation completeness is a quality gate that prevents embarrassing publication failures and maintains reader trust.
SUPPORTED LANGUAGES (14 Total):
- Nordic: English (EN), Danish (DA), Norwegian (NO), Finnish (FI)
- European: German (DE), French (FR), Spanish (ES), Dutch (NL)
- Middle Eastern: Arabic (AR), Hebrew (HE)
- Asian: Japanese (JA), Korean (KO), Chinese Simplified (ZH)
- Swedish (SV) - Baseline/Development Language
TRANSLATION VALIDATION MECHANISM: The validator identifies untranslated content by detecting Swedish language markers embedded in non-Swedish language articles:
- HTML span elements with data-translate="true" attribute
- Indicates content that failed machine translation or was manually marked
- Prevents accidental publication of incomplete translations
DETECTION ALGORITHM:
- Identify article language from filename pattern (lang-code)
- If non-Swedish language detected, scan HTML for translation markers
- Collect sample untranslated strings for error reporting
- Calculate translation completion percentage
- Fail validation if any untranslated content found
QUALITY STANDARDS:
- 100% Translation: All content translated to target language
- Zero Markers: No data-translate="true" attributes present
- Consistent Language: No code-switching or mixed Swedish/target language
- Proper Character Encoding: UTF-8 for all special characters
OPERATIONAL INTEGRATION:
- Pre-publication CI/CD gate (blocks deployment if incomplete)
- Part of automated article generation pipeline
- Runs after machine translation, before human review
- Provides detailed error samples for editor investigation
ERROR REPORTING: Exit code 0: All articles fully translated Exit code 1: Untranslated content found or errors occurred
Sample error output includes:
- Article filename and language code
- Number and percentage of untranslated segments
- Sample untranslated text snippets (first 80 chars each)
- Specific location information for manual fixing
INTELLIGENCE APPLICATIONS:
- Prevents distribution of partially-translated intelligence briefings
- Ensures consistent messaging across language editions
- Catches machine translation failures before publication
- Supports quality metrics for translation services
PERFORMANCE CHARACTERISTICS:
- Single article validation: ~20ms
- Batch validation (100 articles): ~2 seconds
- Memory usage: Minimal (one article at a time)
- Parallelizable: No state, can run on multiple files
ERROR HANDLING:
- File not found: Reports with filepath and exit code 1
- File read errors: Detailed error message and exit code 1
- Invalid UTF-8: Logs encoding warning but continues
- Graceful degradation: Validates what can be read
GDPR COMPLIANCE:
- No personal data processing (content pattern matching only)
- No data storage (validates on-the-fly, discards after check)
- Translation completeness supports data accuracy requirement
- Audit log provides compliance evidence
- Version:
- 2.0.0
- Since:
- 2024-08-10
- License:
- Apache-2.0
- Source:
- See:
-
- scripts/translate-articles-llm.js (Translation Generation)
- tests/validate-news-translations.test.js (Test Suite)
- Issue #121 (Translation Quality Gates)
Methods
(inner) checkFileForUntranslatedContent()
Check if a file contains untranslated Swedish content markers
(inner) getAllHtmlFiles()
Get all HTML files in a directory (recursive)
(inner) getLanguageCode()
Determine language code from filename
(inner) validateNewsTranslations()
Main validation function