Severe Convective Storm

Project Overview

This project is research work down under IRMII (Institute for Risk Management and Insurance Innovation) A comprehensive geospatial analysis system for severe convective storm hazards with two primary components:

Forecast Verification Framework

Evaluates forecast skill through rigorous comparison of Practically Perfect Hindcasts (PPH) against operational Storm Prediction Center (SPC) convective outlooks. Processes multi-decadal datasets (2010-2024) to quantify forecasting performance using industry-standard verification metrics.

Total Insured Value (TIV) Loss Model

A machine learning-based catastrophe modeling system that predicts property-level insurance losses from hail events. Combines meteorological data (MESH radar, PPH forecasts), property characteristics (tax assessments, building features), and spatial factors to estimate percentage and dollar losses for individual properties.

Architecture

High-Level Design

Data Acquisition → Processing Pipeline → Verification Engine → Analysis & Visualization
      ↓                    ↓                    ↓                      ↓
  - NOAA/NCEI APIs    - Grid Alignment     - Brier Scores      - Comparative Maps
  - SPC Shapefiles    - Projection        - Performance       - Time Series  
  - Storm Reports       Handling            Diagrams          - Overlap Analysis
  - Property Data     - Feature            - TIV Loss         - Loss Predictions
  - MESH Radar          Engineering          Modeling          - Risk Assessment

Core Components

Data Acquisition Layer (download_*.py): Retrieve data from NOAA, NCEI, and SPC data sources
Spatial Processing Engine (hail_analysis.ipynb): NAM212 grid operations with shapefile-to-grid conversion
Verification Framework (hail_validation.ipynb): Statistical analysis engine implementing multiple verification metrics
TIV Loss Model (TIV_model.ipynb): Random Forest-based catastrophe model for property-level hail damage prediction
Caching System (cache/): Pickle-based persistence layer reducing computation time

Technical Implementation

Data Flow Architecture

NAM212 Grid System: 129×185 cells at ~40km resolution covering CONUS
Temporal Alignment: 1200z-1200z verification windows matching operational forecast cycles
Spatial Buffering: 40km verification radius around storm reports following SPC standards
Fair Comparison Logic: Ensures identical validation sets and climatological baselines

TIV Model Architecture

The TIV loss model implements a Random Forest regressor with:

Loss Prediction: TIV_Loss = F(X) × 100% where F maps 12-dimensional feature vectors to loss percentages

Feature Categories:

Weather Features (4D): MESH hail size, PPH probability, NCEI distance, storm duration
Property Features (5D): Market value, building age, living area, construction quality, frame type
Building Features (3D): Footprint area, complexity ratio, local building density

Key Features

Multi-Decadal Analysis: Processes historical data from 2010-2024 for comprehensive trend analysis
Industry-Standard Metrics: Implements Brier Skill Scores, Performance Diagrams, and ROC curves
Property-Level Predictions: Machine learning model for individual building loss estimation
Geospatial Processing: Advanced coordinate system handling and grid alignment
Automated Workflows: Streamlined data acquisition and processing pipelines

System Requirements

Python: 3.8+ (tested on 3.9-3.11)
Memory: 16GB+ recommended for full dataset processing
Storage: 50GB+ for complete historical data archive
Dependencies: NumPy, Pandas, XArray, GeoPandas, Scikit-learn, XGBoost, LightGBM

Data Sources

NCEI Storm Reports: National Centers for Environmental Information storm database
SPC Convective Outlooks: NOAA Storm Prediction Center operational forecasts
MESH Radar Data: Multi-Radar Multi-Sensor maximum hail size estimates
Property Tax Data: Dallas County Appraisal District assessments
Building Footprints: Microsoft Global ML Building Footprints dataset

Results & Impact

Comprehensive Verification: Rigorous statistical validation of forecast performance across multiple decades
Operational Insights: Seasonal and monthly resolution analysis revealing forecasting patterns
Loss Modeling: Property-level hail damage predictions for catastrophe risk assessment
Research Applications: Framework designed for academic and operational meteorology use

What I Learned

This project deepened my understanding of:

Geospatial Analysis: Advanced coordinate system transformations and grid-based computations
Statistical Verification: Implementation of meteorological forecast verification standards
Machine Learning for Insurance: Catastrophe modeling and risk assessment methodologies
Large-Scale Data Processing: Handling multi-decadal meteorological datasets efficiently
Scientific Computing: Integration of diverse data sources for comprehensive analysis

Technologies Used