Precipitation Forecasting
A Thesis Workflow Visualizing a Network-Based LSTM Approach
Phase 1 & 2: Data Foundation & Network Construction
This initial phase focuses on preparing the raw data and constructing a geographical network based on statistical correlations. A solid foundation is critical for the model's ability to learn and predict accurately.
Network Linking Explained
An adjacency matrix is built by calculating the Spearman correlation between all node pairs using data from 1979-2014. If the correlation is above 0.8, a connection is formed, creating a network that informs the LSTM model about influential neighboring nodes.
Node Data Splitting
Nodes are divided into training and testing sets to ensure the model is evaluated on unseen data.
Phase 3 & 4: The LSTM Modeling Engine
With the network defined, the core of the project involves preparing the time series data and then training and deploying the LSTM model. This multi-step process is designed to capture complex temporal patterns for accurate forecasting.
3.1 Stationarize Data
The precipitation time series is transformed to remove trends and seasonality. This makes the data stationary, a crucial prerequisite for reliable LSTM model performance.
4.2 Train LSTM Model
The LSTM model is trained on 80% of the nodes using historical data from 1979-2014. The model learns the complex temporal dependencies and network influences during this phase.
4.3 Forecast Future
The trained model forecasts precipitation for the 20% of test nodes from 2015-2023, using their past data and recent data from their linked training nodes.
Phase 5: Results Visualization & Evaluation
The final phase is to rigorously evaluate the model's performance. This is achieved by comparing the forecasted values against the actual, ground-truth data using both qualitative visualizations and quantitative error metrics.
Time Series Overlay
This plot shows the actual precipitation trend for a sample test node, with the model's forecast overlaid to visually assess the fit and accuracy over time.
Predicted vs. Actual
A scatter plot comparing forecasted values (Y-axis) against actual values (X-axis). A tight cluster around the diagonal line indicates high model accuracy.
Model Performance: RMSE Scores
The Root Mean Squared Error (RMSE) is calculated for each of the 8 test nodes. Lower values signify a smaller error margin and better predictive performance.