Module: DataPipeline/CIADataLoader

CIA Intelligence Data Loader & Pipeline Orchestrator

Core data acquisition module implementing multi-source intelligence data loading from the Citizen Intelligence Agency (CIA) Platform. Manages CSV export ingestion for 19+ intelligence product categories and JSON fallback for model-generated electoral forecasts. Provides resilient data pipeline with local-first strategy and remote fallback capabilities.

Data Pipeline Architecture

Multi-Tier Source Strategy:

Tier 1 (Local):    ../cia-data/{category}/*.csv (deployed assets)
Tier 2 (JSON):     ../data/cia-exports/current/*.json (model outputs)
Tier 3 (Fallback): GitHub Raw API (authoritative source)

Benefits:

  • Performance: Local CSV loads ~10x faster than GitHub API
  • Resilience: Degradation from local → JSON → remote
  • Offline: Works with locally deployed data packages
  • Freshness: GitHub fallback ensures latest data availability

Intelligence Product Categories

19 CIA Platform Export Types:

Structural Intelligence

  1. personStatus - Active MP counts by status
  2. riskByParty - Party-level risk aggregation
  3. riskLevels - Aggregate risk distribution
  4. annualBallots - Yearly voting activity

Performance Metrics

  1. documents - Document production statistics
  2. attendance - Chamber/committee participation
  3. productivity - Legislative output metrics
  4. effectiveness - Bill passage rates

Risk Assessment

  1. riskScores - Quantitative risk scores (0-10 scale)
  2. ethicsConcerns - Top 10 ethics cases
  3. electoralRisk - Constituency vulnerability
  4. crisisResilience - Crisis response effectiveness

Behavioral Analysis

  1. votingAnomalies - Anomaly detection classification
  2. partyDiscipline - Voting cohesion metrics
  3. coalitionStability - Coalition behavior patterns

Temporal Intelligence

  1. seasonalPatterns - Quarterly activity trends
  2. electionCycles - Election period comparisons
  3. historicalTrends - Multi-year pattern analysis

Predictive Models

  1. electionForecasts - 2026 election predictions (JSON)

Data Source Mapping

CSV Sources (Real PostgreSQL Views):

  • Local: ../cia-data/{category}/{view_name}.csv
  • Remote: https://raw.githubusercontent.com/Hack23/cia/master/service.data.impl/sample-data/{view_name}.csv

JSON Sources (Model-Generated):

  • Local: ../data/cia-exports/current/{product_name}.json
  • Schema: CIA Platform JSON export format v2.0

Intelligent Loading Strategy

Load Priority Algorithm:

async loadData(category) {
  try {
    return await this.loadLocal(category);      // Tier 1: Local CSV
  } catch (err) {
    try {
      return await this.loadJSON(category);     // Tier 2: Local JSON
    } catch (err) {
      return await this.loadRemote(category);   // Tier 3: GitHub
    }
  }
}

Error Handling:

  • Network failures: Retry with exponential backoff (3 attempts)
  • Parse errors: Fallback to next tier
  • Missing data: Return empty dataset with warning
  • CORS errors: Proxy through service worker (if available)

Data Validation Pipeline

Quality Assurance Steps:

  1. Format Validation: CSV structure, delimiter, encoding (UTF-8)
  2. Schema Validation: Required columns, data types
  3. Range Validation: Numeric bounds, date ranges
  4. Completeness: Missing value checks, null handling
  5. Freshness: Timestamp validation (< 24 hours for real-time data)

Validation Rules:

  • Risk scores: 0.0 ≤ score ≤ 10.0
  • Years: 2002 ≤ year ≤ 2025
  • Quarters: 1 ≤ quarter ≤ 4
  • Party codes: Must match official Riksdag codes (S, M, SD, etc.)

Performance Characteristics

Load Times (typical):

  • Local CSV: ~50ms for 1000 rows
  • Local JSON: ~30ms (pre-parsed)
  • GitHub API: ~500ms + network latency

Memory Usage:

  • Per dataset: ~1-5MB raw data
  • Total cache: ~50MB for all 19 products
  • Browser limit: 10MB localStorage quota per origin

Caching Strategy

Not Implemented in This Module: Caching is responsibility of consumer modules (party-dashboard.js, risk-dashboard.js, etc.) using localStorage with appropriate TTLs. This module provides pure data loading without side effects.

GDPR Compliance

Version:
  • 2.0.0
Since:
  • 2024
Author:
  • Hack23 AB - Data Pipeline Engineering
License:
  • Apache-2.0
Source:
See: