Module: HTML Utilities - DOM Manipulation and XSS Prevention Framework

Essential utility library providing safe HTML generation and sanitization functions used across the political intelligence platform. Prevents XSS vulnerabilities while enabling dynamic content generation for multi-language news articles and dashboards.

Core Functionality:

  • HTML entity encoding: Converts &, <, >, ", ' to their safe HTML entity equivalents
  • XSS attack prevention: Blocks script injection through user-generated or external content
  • Safe HTML generation: Enables dynamic DOM construction without direct innerHTML risks
  • JSON-LD integration: Escapes special characters for embedded structured data

Security Architecture:

  • Whitelist-based entity encoding (only necessary entities escaped)
  • Prevents DOM-based XSS attacks through proper output encoding
  • Supports generation of safe HTML templates for 14 languages
  • Compatible with content security policy (CSP) headers

Integration Usage Across Codebase:

  • generate-news-indexes.js: Escapes article titles and metadata in dynamic index pages
  • generate-news-backport.js: Sanitizes article content during legacy migration
  • generate-sitemap.js: Escapes URL parameters and article descriptions
  • generate-news-enhanced.js: Handles safe HTML generation for multi-language articles
  • validate-articles-playwright.js: Validates generated HTML content integrity

OWASP Security Standards:

  • Implements Output Encoding from OWASP Top 10 (A03:2021 - Injection)
  • Prevents Stored XSS through proper entity escaping
  • Complies with OWASP API Security #2 (Broken Authentication/Authorization)

Data Protection:

  • No sensitive data storage; pure utility functions
  • Operates on publicly available political content
  • Complies with ISO 27001:2022 A.13.1.3 (segregation of networks)
  • Supports GDPR Article 32 (security of processing)

Multi-Language Support:

  • Handles UTF-8 text across all supported scripts (Latin, CJK, Arabic, Hebrew, etc.)
  • Preserves linguistic integrity while ensuring security
  • Supports bidirectional text (Hebrew, Arabic) in HTML attributes

Performance Considerations:

  • Lightweight regex-based character replacement
  • Minimal memory footprint for bulk article processing
  • Suitable for high-volume content generation pipelines

Functions:

  • escapeHtml(text): Escapes HTML special characters for safe inclusion in HTML/JSON-LD

Usage Example: import { escapeHtml } from './html-utils.js'; const safeTitle = escapeHtml(userProvidedTitle); const jsonLd = "headline": "${escapeHtml(articleTitle)}";

Version:
  • 2.1.0
Author:
  • Hack23 AB (Security Team)
License:
  • Apache-2.0
Source:
See:
  • OWASP Output Encoding
  • CWE-79: Improper Neutralization of Input During Web Page Generation
  • ISO 27001:2022 A.13.1.3 - Network segregation
  • GDPR Article 32 - Security of processing

Methods

(static) escapeHtml(text) → {string}

Escape HTML special characters for safe inclusion in HTML/JSON-LD. Prevents XSS by converting &, <, >, ", ' to their HTML entity equivalents.

Parameters:
Name Type Description
text string

Raw text to escape

Source:
Returns:

Escaped text safe for HTML insertion

Type
string