Project Overview
This project is research work down under IRMII (Institute for Risk Management and Insurance Innovation) A comprehensive geospatial analysis system for severe convective storm hazards with two primary components:
Forecast Verification Framework
Evaluates forecast skill through rigorous comparison of Practically Perfect Hindcasts (PPH) against operational Storm Prediction Center (SPC) convective outlooks. Processes multi-decadal datasets (2010-2024) to quantify forecasting performance using industry-standard verification metrics.
Total Insured Value (TIV) Loss Model
A machine learning-based catastrophe modeling system that predicts property-level insurance losses from hail events. Combines meteorological data (MESH radar, PPH forecasts), property characteristics (tax assessments, building features), and spatial factors to estimate percentage and dollar losses for individual properties.
Architecture
High-Level Design
Data Acquisition → Processing Pipeline → Verification Engine → Analysis & Visualization
↓ ↓ ↓ ↓
- NOAA/NCEI APIs - Grid Alignment - Brier Scores - Comparative Maps
- SPC Shapefiles - Projection - Performance - Time Series
- Storm Reports Handling Diagrams - Overlap Analysis
- Property Data - Feature - TIV Loss - Loss Predictions
- MESH Radar Engineering Modeling - Risk Assessment
Core Components
- Data Acquisition Layer (
download_*.py
): Retrieve data from NOAA, NCEI, and SPC data sources - Spatial Processing Engine (
hail_analysis.ipynb
): NAM212 grid operations with shapefile-to-grid conversion - Verification Framework (
hail_validation.ipynb
): Statistical analysis engine implementing multiple verification metrics - TIV Loss Model (
TIV_model.ipynb
): Random Forest-based catastrophe model for property-level hail damage prediction - Caching System (
cache/
): Pickle-based persistence layer reducing computation time
Technical Implementation
Data Flow Architecture
- NAM212 Grid System: 129×185 cells at ~40km resolution covering CONUS
- Temporal Alignment: 1200z-1200z verification windows matching operational forecast cycles
- Spatial Buffering: 40km verification radius around storm reports following SPC standards
- Fair Comparison Logic: Ensures identical validation sets and climatological baselines
TIV Model Architecture
The TIV loss model implements a Random Forest regressor with:
Loss Prediction: TIV_Loss = F(X) × 100%
where F maps 12-dimensional feature vectors to loss percentages
Feature Categories:
- Weather Features (4D): MESH hail size, PPH probability, NCEI distance, storm duration
- Property Features (5D): Market value, building age, living area, construction quality, frame type
- Building Features (3D): Footprint area, complexity ratio, local building density
Key Features
- Multi-Decadal Analysis: Processes historical data from 2010-2024 for comprehensive trend analysis
- Industry-Standard Metrics: Implements Brier Skill Scores, Performance Diagrams, and ROC curves
- Property-Level Predictions: Machine learning model for individual building loss estimation
- Geospatial Processing: Advanced coordinate system handling and grid alignment
- Automated Workflows: Streamlined data acquisition and processing pipelines
System Requirements
- Python: 3.8+ (tested on 3.9-3.11)
- Memory: 16GB+ recommended for full dataset processing
- Storage: 50GB+ for complete historical data archive
- Dependencies: NumPy, Pandas, XArray, GeoPandas, Scikit-learn, XGBoost, LightGBM
Data Sources
- NCEI Storm Reports: National Centers for Environmental Information storm database
- SPC Convective Outlooks: NOAA Storm Prediction Center operational forecasts
- MESH Radar Data: Multi-Radar Multi-Sensor maximum hail size estimates
- Property Tax Data: Dallas County Appraisal District assessments
- Building Footprints: Microsoft Global ML Building Footprints dataset
Results & Impact
- Comprehensive Verification: Rigorous statistical validation of forecast performance across multiple decades
- Operational Insights: Seasonal and monthly resolution analysis revealing forecasting patterns
- Loss Modeling: Property-level hail damage predictions for catastrophe risk assessment
- Research Applications: Framework designed for academic and operational meteorology use
What I Learned
This project deepened my understanding of:
- Geospatial Analysis: Advanced coordinate system transformations and grid-based computations
- Statistical Verification: Implementation of meteorological forecast verification standards
- Machine Learning for Insurance: Catastrophe modeling and risk assessment methodologies
- Large-Scale Data Processing: Handling multi-decadal meteorological datasets efficiently
- Scientific Computing: Integration of diverse data sources for comprehensive analysis