Module: Infrastructure/WorkflowOrchestration

INTELLIGENCE OPERATIVE PERSPECTIVE

This module orchestrates coordination between three independent news generation workflows operating on different schedules, preventing wasted computational resources on duplicate article generation and maintaining editorial consistency. In intelligence operations, workflow state management prevents information redundancy and ensures efficient use of computational and editorial resources.

WORKFLOW ARCHITECTURE: The platform operates three independent content generation workflows:

  1. Realtime Monitor (news-realtime-monitor.md) Schedule: 2x daily (morning + afternoon) Content: Event-driven breaking news, voting updates, crisis response Intelligence value: Rapid notification of parliamentary surprises Latency: Real-time (5-15 minute response to events)

  2. Evening Analysis (news-evening-analysis.md) Schedule: Daily at 17:00 (5 PM Swedish time) Content: Deep analytical synthesis, international context, forward assessment Intelligence value: End-of-day intelligence briefing format Latency: Structured analysis (1-2 hour research + writing)

  3. Article Generators (news-article-generator.md) Schedule: Variable (triggered by content calendar or on-demand) Content: Committee reports, motions, propositions, week-ahead Intelligence value: Systematic coverage of all parliamentary products Latency: Scheduled batch processing (hourly to daily)

DEDUPLICATION FRAMEWORK: The coordinator prevents duplicate article generation using similarity analysis:

  • Similarity Threshold: 70% Computes Levenshtein distance on article titles and keyword sets Articles >70% similar are considered duplicates Prevents wasted generation of already-covered topics

  • Time-Window Filtering: 6 hours Checks if similar article was generated in last 6 hours Allows coverage of same topic if sufficient time has passed Prevents rapid-fire duplicates while allowing topic revisits

  • Topic-Based Tracking Logs article topics (votes, bills, committees, etc.) Enables intelligent filtering at generation time Supports trending topic analysis

MCP QUERY CACHING: To avoid redundant API calls to riksdag-regering MCP platform:

  • Cache TTL: 2 hours Stores results of expensive queries (voting patterns, full-text search) Reduces MCP server load during peak hours Ensures consistency across multiple workflow invocations

  • Query Fingerprinting Creates deterministic hash of MCP query parameters Enables cache hits even if queries structured differently Supports query normalization

  • Staleness Handling Fresh data (within 2 hours) used for analysis Older data triggers MCP refresh Prevents stale intelligence from being published

STATE MANAGEMENT: Persistent state file (news/metadata/workflow-state.json) tracks:

  • Last workflow execution timestamp and results
  • Recently generated articles (content + timestamp)
  • MCP query cache with expiration times
  • Workflow coordination metadata
  • Running task list for cross-workflow visibility

OPERATIONAL WORKFLOW:

  1. Workflow begins: Load current state from persistent storage
  2. Query Analysis: Check if similar article was recently generated
  3. Cache Check: Retrieve cached MCP queries if available (<2hr old)
  4. Generation: Create new article (or skip if duplicate)
  5. State Update: Log article and update cache
  6. Persistence: Write updated state for next workflow invocation

INCIDENT SCENARIOS:

  • Double-Generation: Realtime Monitor and Article Generator both cover voting Solution: Similarity detection blocks duplicate, tracks in state

  • Stale Analysis: Evening Analysis uses MCP data from morning Solution: 2-hour cache expiration triggers fresh queries

  • Missed Coverage: Topic isn't covered by any workflow Solution: State logs enable gap analysis, manual workflow triggers

  • Cache Corruption: Stale query results cause analytical errors Solution: TTL-based expiration automatically refreshes

INTELLIGENCE APPLICATIONS:

  • Prevents topic redundancy (editorial efficiency)
  • Ensures consistent coverage across workflows
  • Enables gap analysis (which topics are missed?)
  • Supports workflow optimization (timing, triggers)
  • Provides audit trail for editorial decisions

PERFORMANCE OPTIMIZATION:

  • MCP cache reduces API calls by estimated 60-70%
  • Reduces computational load on MCP platform during peaks
  • Faster generation cycles (cache lookups faster than API calls)
  • Enables more frequent workflow execution

FAILURE MODES & RECOVERY:

  • State file corruption: Graceful fallback to generation without deduplication
  • Cache miss during load: Automatic MCP refresh triggered
  • Timestamp drift: UTC normalization prevents timezone confusion
  • Concurrent workflow execution: Lock-based synchronization

SCALABILITY CONSIDERATIONS:

  • State file size grows ~50KB per month (manageable)
  • Cache memory: ~5MB typical, scales with coverage breadth
  • Similarity computation: O(n) in articles, automated pruning at 180 days
  • MCP query cache: Automatic cleanup of expired entries

GDPR COMPLIANCE:

  • Member mentions in articles tracked in state
  • Data retention policies enforced (180-day pruning)
  • Audit trail supports member rights requests
  • No personal data stored in cache beyond article references
Version:
  • 2.2.0
Since:
  • 2024-10-15
Author:
  • Hack23 AB (Editorial Operations & Workflow Optimization)
License:
  • Apache-2.0
Source:
See:
  • news/metadata/workflow-state.json (State Persistence)
  • Issue #150 (Workflow Coordination Enhancement)
  • docs/WORKFLOW_ARCHITECTURE.md (Complete Architecture)

Classes

WorkflowStateCoordinator