infographics

Phase 1 & 2: Data Foundation & Network Construction

This initial phase focuses on preparing the raw data and constructing a geographical network based on statistical correlations. A solid foundation is critical for the model's ability to learn and predict accurately.

Total Nodes 40 Distinct Geographical Points

Data Timeframe 45 Years (1979 - 2023)

Correlation Threshold >0.8 Spearman Rank Coeff. ($\rho$)

Network Linking Explained

An adjacency matrix is built by calculating the Spearman correlation between all node pairs using data from 1979-2014. If the correlation is above 0.8, a connection is formed, creating a network that informs the LSTM model about influential neighboring nodes.

Node A

ρ = 0.85

Node B

Node Data Splitting

Nodes are divided into training and testing sets to ensure the model is evaluated on unseen data.

Phase 3 & 4: The LSTM Modeling Engine

With the network defined, the core of the project involves preparing the time series data and then training and deploying the LSTM model. This multi-step process is designed to capture complex temporal patterns for accurate forecasting.

3.1 Stationarize Data

The precipitation time series is transformed to remove trends and seasonality. This makes the data stationary, a crucial prerequisite for reliable LSTM model performance.

📈 ➡️ 📉

→

4.2 Train LSTM Model

The LSTM model is trained on 80% of the nodes using historical data from 1979-2014. The model learns the complex temporal dependencies and network influences during this phase.

🧠 ⚙️

→

4.3 Forecast Future

The trained model forecasts precipitation for the 20% of test nodes from 2015-2023, using their past data and recent data from their linked training nodes.

🔮 📊

Phase 5: Results Visualization & Evaluation

The final phase is to rigorously evaluate the model's performance. This is achieved by comparing the forecasted values against the actual, ground-truth data using both qualitative visualizations and quantitative error metrics.

Time Series Overlay

This plot shows the actual precipitation trend for a sample test node, with the model's forecast overlaid to visually assess the fit and accuracy over time.

Predicted vs. Actual

A scatter plot comparing forecasted values (Y-axis) against actual values (X-axis). A tight cluster around the diagonal line indicates high model accuracy.

Model Performance: RMSE Scores

The Root Mean Squared Error (RMSE) is calculated for each of the 8 test nodes. Lower values signify a smaller error margin and better predictive performance.