[ClimODE] Weather Forecasting Using Neural ODEs
3 main points
✔️ Propose ClimODE, a Neural ODE for weather forecasting
✔️ Introduce two networks to acquire short- and long-range interactions
✔️ Achieves state-of-the-art at global and regional levels
ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs
written by Yogesh Verma, Markus Heinonen, Vikas Garg
(Submitted on 15 Apr 2024)
Comments: Accepted as ICLR 2024 Oral. Project website: this https URL
Subjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
This research proposed ClimODE, a Neural ODE system for weather forecasting, which is designed to acquire local dependencies through local convolution operations and global dependencies through a global attention mechanism in order to properly acquire interactions with several spatial scales, which are important for weather forecasting. The local dependencies are obtained by local convolution operations, while the global dependencies are obtained by a global attention mechanism. As a result, the method outperforms conventional methods and achieves state-of-the-art performance at the global and regional levels, despite having fewer parameters than conventional methods (Table 1 shows a comparison with conventional methods based on deep learning).The study also discusses how to properly incorporate uncertainty in weather forecasting, and as a result, successfully predicts temperature variations due to the day-night cycle of weather.
Background
Weather forecasting has traditionally been conducted using numerical calculations. In particular, weather forecasting at the global level has been based on enormous calculations using huge computers such as supercomputers. However, the huge amount of calculations and the deterioration of accuracy due to the accumulation of errors in deducing future information from past information, one after another, have made it extremely difficult to achieve this goal. In other words, weather forecasting is a long-held dream of mankind.This research approaches such weather prediction using Neural ODE.
Related Research
Here is a brief summary regarding two traditional weather forecasting efforts related to this study.
Numerical Climate Models
Current numerical climate models can be categorized into short-term weather forecasting and long-term climate prediction models. In particular, one of the most advanced models is the Earth System Model (ESM), which integrates the physics in the atmosphere, cryosphere, land, and oceans. However, although these models have achieved some success, they suffer from problems such as sensitivity to initial values, structural discrepancies among the models, regional differences, and high computational burden. These have hindered the development of numerical climate models.
Climate Prediction by Deep Learning
With the promise of deep learning's high forecasting performance, there have been many attempts to use deep learning to predict the climate. They are efforts to predict the weather by applying basic neural networks, graph neural networks, Transformer, and other methods. However, those methods basically aim to realize forecasts from weather data only, and do not take physical mechanisms into account. In addition, they do not obtain the uncertainty of the forecast.
Proposed Method
Neural Transport Model
Thissection briefly introduces each of the elements of the climate modelintroduced in this paper. Figure 1 also shows a schematic of the ClimODE proposed in this paper.
advection equation
In this paper, the climate is modeled as a spatio-temporal series of $K$ species physical quantities denoted as . The paper also assumes that the system obeys the following advection equations
It shows that at a given point, the time variation of a physical quantity is described by advection and compression. In other words, this can be thought of as describing a conservation law for a particular physical quantity.
Flow velocity
In this paper, following previous studies, we model the flow velocity as follows
In other words, this table equation can be understood as modeling the time variation of the flow velocity of a given physical quantity as determined by the spatial gradient of the physical quantity, the flow velocity, and the space-time embedding vector ($\psi$) of the physical quantity.
Governing equations
Using the two equations above, a physical quantity and its flow velocity can be described by the following governing equations
Modeling of short- and long-range interactions
Looking at the model of flow velocity for a given physical quantity shown above, the time variation of flow velocity at a given point is described by the physical quantity at that point, its spatial gradient, and the flow velocity of that physical quantity. However, when considering an actual weather problem, one would expect the flow velocity at that point to change due to long-range interactions. Therefore, relatively long-range interactions must also be modeled. Therefore, in this paper, the time variation of flow velocity is implemented as follows.
In other words, the network was designed so that the first term describes local interactions by a convolutional network and the second term describes long-range interactions by a network with an attention mechanism.
Uncertainty quantification (emission model)
In addition, this paper also addresses the quantification of uncertainty. As a simple effort to do so, we assumed that each physical quantity follows a Gaussian distribution as follows.
This allows us to consider deviations and variances from the average behavior. This setup is actually an effort to model the uncertainty of weather problems by a very simple Gaussian distribution. It is important to note that there is no physical basis for this setup, but rather a very strong assumption that it follows a Gaussian distribution for ease of handling. Throughout this paper, we refer to this model as the emission model.
Loss function
The loss function introduced in this paper is as follows The first term represents the loss due to errors between observations and forecasts. In addition, a regularization term for the variance of the forecast is added as the second term. This prevents the magnitude of the variance from diverging.
Experimental results
In this paper, as an example, we consider the prediction of climate conditions 6~36 hours later. The data set was created by extracting data from what is called ERA5 with a spatiotemporal resolution of 5.625° and 6 hourly increments. As physical quantities, we selected ground temperature (t2m), atmospheric temperature (t), geopotential (z), and wind vectors at ground (u10,v10) as validation targets. For comparison, we alsopreparedseveral conventional methods:ClimaX based on Transformer (trained by the same dataset as in this study), FourCastNet (FCN) applying a large-scale adaptive Fourier neural network, and the usual Neural ODE The "European model" was also prepared. The integrated forecasting system IFS, well known as the "European model" and based on state-of-the-art physics simulations, was also considered for comparison.
Comparison of Global Forecasts
Figure 2 and Table2 show a comparison of the root mean square error and accuracy of the respective physical quantities predicted by ClimODE and the respective methods. The results suggest that ClimODE predicts weather more accurately than conventional methods. It should also be noted that ClimODE's performance is close to that of state-of-the-art IFS.
Comparison of local forecasts in several regions
In addition to the global forecast described above, the authors compared forecast performance limited to a few regions. Table 3 shows the results. The results also suggest the superiority of ClimODE over conventional methods.
Uncertainty quantification and the effect of the emisison model
The authors have tested the effectiveness of the emission model, which was introduced to quantify uncertainties, on forecasting. The authors visualized the predicted time-series changes in surface temperature at a specific location, including the uncertainties. Figure 3 shows the results, which indicate that the introduction of the emission model adequately captures the temperature variations at a specific location.
In an interesting experiment, the authors also visualized the spatial distribution of bias and dispersion at 12:00 AM UTC on a global level. Figure 4 shows the results. The results confirm that the introduction of the emission model properly extracts the day-night cycle as a bias. In addition, the uncertainty at each location is also visualized as variance. The resultsconfirm thatClimODEpredicts with relatively high accuracy over the ocean, but with a relatively low angle near land to the north. This ability to approach a certain confidence in the prediction is a remarkable feature of ClimODE. However, as noted above, caution must be exercised regarding the physical and other interpretations of this uncertainty, which require further discussion in the future.
Effectiveness of introduced components
The authors performed an ablation analysis on ClimODE to verify the effect of each introduced component on performance. Figure 5 shows the results. It can be seen that each component contributes to the performance improvement in an integrated manner.
Summary
We proposed ClimODE, a data-driven forecasting model that appropriately takes into account the physical continuum in weather. It even flirts with IFS. This result supports the effectiveness of the data-driven approach that introduces physical regularization. On the other hand, the discussion in this study focused on relatively short-term forecasts of a few dozen hours. Therefore, as the authors point out, it is still unclear whether ClimODE can accurately predict climate change over a long time span. Therefore, further discussion of ClimODE-based methodologies is needed. Nevertheless, the authors' attempt is ambitious and has high potential, and we look forward to its further development.
Categories related to this article