Module: Validation/TranslationQuality

INTELLIGENCE OPERATIVE PERSPECTIVE

This module ensures that news articles published in non-Swedish languages are fully translated, preventing the publication of partially-translated articles that could damage credibility with international audiences. In multilingual intelligence dissemination, translation completeness is a quality gate that prevents embarrassing publication failures and maintains reader trust.

SUPPORTED LANGUAGES (14 Total):

  • Nordic: English (EN), Danish (DA), Norwegian (NO), Finnish (FI)
  • European: German (DE), French (FR), Spanish (ES), Dutch (NL)
  • Middle Eastern: Arabic (AR), Hebrew (HE)
  • Asian: Japanese (JA), Korean (KO), Chinese Simplified (ZH)
  • Swedish (SV) - Baseline/Development Language

TRANSLATION VALIDATION MECHANISM: The validator identifies untranslated content by detecting Swedish language markers embedded in non-Swedish language articles:

  • HTML span elements with data-translate="true" attribute
  • Indicates content that failed machine translation or was manually marked
  • Prevents accidental publication of incomplete translations

DETECTION ALGORITHM:

  1. Identify article language from filename pattern (lang-code)
  2. If non-Swedish language detected, scan HTML for translation markers
  3. Collect sample untranslated strings for error reporting
  4. Calculate translation completion percentage
  5. Fail validation if any untranslated content found

QUALITY STANDARDS:

  • 100% Translation: All content translated to target language
  • Zero Markers: No data-translate="true" attributes present
  • Consistent Language: No code-switching or mixed Swedish/target language
  • Proper Character Encoding: UTF-8 for all special characters

OPERATIONAL INTEGRATION:

  • Pre-publication CI/CD gate (blocks deployment if incomplete)
  • Part of automated article generation pipeline
  • Runs after machine translation, before human review
  • Provides detailed error samples for editor investigation

ERROR REPORTING: Exit code 0: All articles fully translated Exit code 1: Untranslated content found or errors occurred

Sample error output includes:

  • Article filename and language code
  • Number and percentage of untranslated segments
  • Sample untranslated text snippets (first 80 chars each)
  • Specific location information for manual fixing

INTELLIGENCE APPLICATIONS:

  • Prevents distribution of partially-translated intelligence briefings
  • Ensures consistent messaging across language editions
  • Catches machine translation failures before publication
  • Supports quality metrics for translation services

PERFORMANCE CHARACTERISTICS:

  • Single article validation: ~20ms
  • Batch validation (100 articles): ~2 seconds
  • Memory usage: Minimal (one article at a time)
  • Parallelizable: No state, can run on multiple files

ERROR HANDLING:

  • File not found: Reports with filepath and exit code 1
  • File read errors: Detailed error message and exit code 1
  • Invalid UTF-8: Logs encoding warning but continues
  • Graceful degradation: Validates what can be read

GDPR COMPLIANCE:

  • No personal data processing (content pattern matching only)
  • No data storage (validates on-the-fly, discards after check)
  • Translation completeness supports data accuracy requirement
  • Audit log provides compliance evidence
Version:
  • 2.0.0
Since:
  • 2024-08-10
Author:
  • Hack23 AB (Multilingual Intelligence & Quality Assurance)
License:
  • Apache-2.0
Source:
See:
  • scripts/translate-articles-llm.js (Translation Generation)
  • tests/validate-news-translations.test.js (Test Suite)
  • Issue #121 (Translation Quality Gates)

Methods

(inner) checkFileForUntranslatedContent()

Check if a file contains untranslated Swedish content markers

Source:

(inner) getAllHtmlFiles()

Get all HTML files in a directory (recursive)

Source:

(inner) getLanguageCode()

Determine language code from filename

Source:

(inner) validateNewsTranslations()

Main validation function

Source: