Imposter
← Back to Projects

Severe Convective Storm

Geospatial analysis system for severe storm hazards & TIV loss modeling
View on GitHub →

Technologies Used

Python Machine Learning Geospatial Analysis NumPy Pandas Scikit-learn XGBoost

Project Overview

This project is research work down under IRMII (Institute for Risk Management and Insurance Innovation) A comprehensive geospatial analysis system for severe convective storm hazards with two primary components:

Forecast Verification Framework

Evaluates forecast skill through rigorous comparison of Practically Perfect Hindcasts (PPH) against operational Storm Prediction Center (SPC) convective outlooks. Processes multi-decadal datasets (2010-2024) to quantify forecasting performance using industry-standard verification metrics.

Total Insured Value (TIV) Loss Model

A machine learning-based catastrophe modeling system that predicts property-level insurance losses from hail events. Combines meteorological data (MESH radar, PPH forecasts), property characteristics (tax assessments, building features), and spatial factors to estimate percentage and dollar losses for individual properties.

Architecture

High-Level Design

Data Acquisition → Processing Pipeline → Verification Engine → Analysis & Visualization
      ↓                    ↓                    ↓                      ↓
  - NOAA/NCEI APIs    - Grid Alignment     - Brier Scores      - Comparative Maps
  - SPC Shapefiles    - Projection        - Performance       - Time Series  
  - Storm Reports       Handling            Diagrams          - Overlap Analysis
  - Property Data     - Feature            - TIV Loss         - Loss Predictions
  - MESH Radar          Engineering          Modeling          - Risk Assessment

Core Components

  • Data Acquisition Layer (download_*.py): Retrieve data from NOAA, NCEI, and SPC data sources
  • Spatial Processing Engine (hail_analysis.ipynb): NAM212 grid operations with shapefile-to-grid conversion
  • Verification Framework (hail_validation.ipynb): Statistical analysis engine implementing multiple verification metrics
  • TIV Loss Model (TIV_model.ipynb): Random Forest-based catastrophe model for property-level hail damage prediction
  • Caching System (cache/): Pickle-based persistence layer reducing computation time

Technical Implementation

Data Flow Architecture

  • NAM212 Grid System: 129×185 cells at ~40km resolution covering CONUS
  • Temporal Alignment: 1200z-1200z verification windows matching operational forecast cycles
  • Spatial Buffering: 40km verification radius around storm reports following SPC standards
  • Fair Comparison Logic: Ensures identical validation sets and climatological baselines

TIV Model Architecture

The TIV loss model implements a Random Forest regressor with:

Loss Prediction: TIV_Loss = F(X) × 100% where F maps 12-dimensional feature vectors to loss percentages

Feature Categories:

  • Weather Features (4D): MESH hail size, PPH probability, NCEI distance, storm duration
  • Property Features (5D): Market value, building age, living area, construction quality, frame type
  • Building Features (3D): Footprint area, complexity ratio, local building density

Key Features

  • Multi-Decadal Analysis: Processes historical data from 2010-2024 for comprehensive trend analysis
  • Industry-Standard Metrics: Implements Brier Skill Scores, Performance Diagrams, and ROC curves
  • Property-Level Predictions: Machine learning model for individual building loss estimation
  • Geospatial Processing: Advanced coordinate system handling and grid alignment
  • Automated Workflows: Streamlined data acquisition and processing pipelines

System Requirements

  • Python: 3.8+ (tested on 3.9-3.11)
  • Memory: 16GB+ recommended for full dataset processing
  • Storage: 50GB+ for complete historical data archive
  • Dependencies: NumPy, Pandas, XArray, GeoPandas, Scikit-learn, XGBoost, LightGBM

Data Sources

  • NCEI Storm Reports: National Centers for Environmental Information storm database
  • SPC Convective Outlooks: NOAA Storm Prediction Center operational forecasts
  • MESH Radar Data: Multi-Radar Multi-Sensor maximum hail size estimates
  • Property Tax Data: Dallas County Appraisal District assessments
  • Building Footprints: Microsoft Global ML Building Footprints dataset

Results & Impact

  • Comprehensive Verification: Rigorous statistical validation of forecast performance across multiple decades
  • Operational Insights: Seasonal and monthly resolution analysis revealing forecasting patterns
  • Loss Modeling: Property-level hail damage predictions for catastrophe risk assessment
  • Research Applications: Framework designed for academic and operational meteorology use

What I Learned

This project deepened my understanding of:

  • Geospatial Analysis: Advanced coordinate system transformations and grid-based computations
  • Statistical Verification: Implementation of meteorological forecast verification standards
  • Machine Learning for Insurance: Catastrophe modeling and risk assessment methodologies
  • Large-Scale Data Processing: Handling multi-decadal meteorological datasets efficiently
  • Scientific Computing: Integration of diverse data sources for comprehensive analysis
← Back to All Projects