Integration Of Scientific Knowledge And Machine Learning
3 main points
✔️ A review of integrated models that compensate for the shortcomings of scientific and machine learning models and produce synergies
✔️ A number of models have been shown to reduce computational load and improve accuracy against physical simulators
✔️ Rapidly developing recently, still leaves room for growth
Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems
written by Jared Willard, Xiaowei Jia, Shaoming Xu, Michael Steinbach, Vipin Kumar
(Submitted on 10 Mar 2020 (v1), last revised 23 Jul 2021 (this version, v5))
Comments: Accepted by ACM Computing Surveys.
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
code:
The images used in this article are from the paper or created based on it.
first of all
When trying to apply machine learning to the field of scientific modeling, it has been less successful than in other fields such as image, natural language, and speech. This is because they require huge amounts of data, it is difficult to produce physically consistent results, and they cannot be generalized outside of sample scenarios. Therefore, research has begun to explore and synergistically integrate the continuum of scientific knowledge and ML models. Unlike the traditional way of applying machine learning domain knowledge to feature value engineering and preprocessing, we integrate scientific knowledge directly into the ML framework. Workshops and symposia dealing with this area have already started. (See references [1-6].) In this review paper, we first introduce a classification by purpose, followed by a description of the different integration methods.
Application perspective objectives of physics/machine learning integration
Fig. 1 is part of an abstract representation of a generic science problem. Taking the variables xt and the constant s as input, the mechanistic model F yields the output yt.
We will go through each objective in Table 1.
Replacing and improving the physical model SOTA
Although scientific models based on physical laws are widely used, not everything is known about the actual process and the model is an approximation. In addition, the models contain many parameters, and since the exact values cannot be observed, estimates are often substituted. On the other hand, ML models may be used to outperform physics-based models of many laws. This is because NNs can extract complex problem structures and patterns that cannot be explicitly expressed.
downscaling
Downscaling methods are used in cases where physical variables need to be modeled with finer resolution but are difficult to do so due to high computational load. There are two categories: statistical downscaling and dynamic downscaling. The former is an empirical model that predicts fine-resolution variables from coarse-resolution variables. It is traditionally difficult because it requires solving for complex nonlinearities, but it is showing promise in NN. The latter is used to dynamically simulate the relevant physical processes in a region where high-resolution, domain-specific simulations are required. It is still computationally expensive, but it is expected to be mitigated by ML. For both of them, the latest MLshuhou can be applied, but the issues are whether the learned ML part is consistent with the established physical laws and whether the overall simulation performance is improved.
parameterization
In order to fit physical phenomena that cannot be captured by complex physical models, parameterization is often used. Complex dynamic processes are replaced by simplified physical approximations represented by static parameters. A common approach is to use a grid search to find the optimal value. Another approach is to replace it with a dynamic or static ML process. This has already been done successfully in several areas. The main advantage is the reduction of computational time compared to traditional simulations.
Currently, we use the standard black-box parameterized ML, but there is interest in integrating physical and ML models. This is because it is expected to provide robustness, generalization performance, and training data reduction.
reduced model
Reduced-form models (ROMs) are computationally inexpensive representations of complex models. ML is beginning to help construct ROMs with greater accuracy and lower computational cost. ML is beginning to help us construct ROMs that are more accurate and less computationally expensive: one is an ML-based proxy model. Others are ML surrogate models for already existing ROMs, or ML models that map dimensionality reduction from a full-dimensional model to a reduced dimensional model. Model application has the potential to significantly extend the performance of ROM.
One area of recent focus is the approximation of the fundamental mode of the Koopman (or composite) operator as a method of dimensionality reduction; the Koopman operator is an infinite-dimensional linear operator that encodes the time convolution of system states through nonlinear dynamics [41 ]. This allows us to apply linear analytic methods to nonlinear systems, and to infer properties of dynamic systems that are too complex to be represented by traditional analytic methods. Approximating Kooper operator embeddings with deep learning. Adding physics-based knowledge to the training of Koopman operators has the potential to broaden their generalization and explanatory power.
partial differential equation
For many physical systems, even if the governing equations are known, general finite element and finite difference methods for solving partial differential equations can be very expensive; using ML models, especially NN solvers, can significantly reduce the computational burden, while at the same time the solutions are differentiable and have a closed The solution is differentiable and has a closed analytical form that can be transferred to any subsequent calculation. It has been used successfully for quantum many-body problems and the many-electron Schrodinger equation. Recently, Li et al. defined a neural Fourier operator that allows an NN to learn a whole family of partial differential equations, mapping any functional parameter dependence to a solution in Fourier space.
inverse model
The inverse model uses the (potentially noisy) output of the system to estimate the true physical parameters and inputs. Inverse problems are often important in the physics-based modeling community because they have the potential to shed light on valuable information that cannot be directly observed. An example is the use of x-ray images to generate 3D images that reflect the structure of the human body from CT scans.
In many cases, solving the inverse problem is computationally expensive, as the posterior distribution of physical parameters requires millions of forwarding model evaluations and feature extraction. ML-based reduced models are becoming a realistic option because they model high-dimensional phenomena with large amounts of data and are much faster than physical simulators.
In addition to computer tomography, seismic data processing, etc., there has been a lot of interest in the inverse design of materials. It takes as input the desired physical properties and uses models to determine the atomic and microscale structures that possess those properties [147].
The integration of prior physical knowledge is a common approach to inverse problems, and the integration with ML models has the potential to improve data efficiency and the ability to solve bad setting inverse problems.
governing equation search
In many disciplines (neuroscience, cellular physiology, economics, ecology, epidemiology), dynamical systems do not have formal analytical descriptions. Even when data are abundant, the governing equations remain elusive. Integrating principles of applied mathematics and physics with ML models to discover the governing equations has become an active research area.
In previous work [36, 232 ], symbolic regression is applied to the difference between computed and analytical derivatives to determine the potential dynamical system. More recently, sparse regression on a dictionary of functions and partial differential coefficients is used to construct the governing equations; Lagergren et al. constructed a dictionary of functions using ANNs. This sparse definition method is based on the principle of Occam's razor. The goal is to represent any nonlinear system with only a few equation terms.
data generation
Data generation is useful for performing virtual simulations of scientific data under specific conditions. Traditionally, physical simulations have been used, but they are computationally time-consuming. cGAN can generate the kind of data that a physics-based model would generate while reducing the computational load. Farimani et al. showed that cGAN can learn heat conduction and fluid flow from observations alone, without using the governing equations. In addition, engineering is being done to make use of prior physics knowledge about physical laws and invariant properties for GANs. This includes adding conservation laws and constraints on the energy spectrum to the loss function.
Uncertainty Quantification
Uncertainty quantification (UQ) is important in many areas of computer science (e.g., climate modeling, fluids, systems engineering, etc.) UQ requires accurate characterization of the entire distribution. It allows us to determine whether predictions are acceptable, analyze the sensitivity of input feature values, etc.
The traditional method of using physical models is Monte Carlo, which requires a huge amount of forwarding evaluation to converge. Rather than using a Gaussian process, the ML model is computationally less demanding [94,178,256]. However, since ML models do not naturally include UQs, methods such as Bayesian variants of NNs consisting of distributions of stochastic dropouts, weights, and biases, and ensembles of NNs that generate distributions that quantify uncertainty have been proposed.
The integration of physical models into ML for UQ has the potential to better characterize uncertainty. This includes limiting predictions that are physically inconsistent.
Integrated physics/machine learning methods
There are four categories of integration methods.
physically induced loss function
Standard ML models have difficulty capturing a high degree of complexity due to the relationships between physical parameters varying at various scales in time and space directly from the data. This is one reason why they fail to generalize to scenarios not present in the training data. Researchers incorporate the physical model into the loss function so that the ML model captures dynamic patterns that can be generalized consistent with the established physical model.
One of the most common techniques is to incorporate the constraints of the physical model into the loss function of the ML model, as in the following equation
A third term is added, the physical-based loss, where γ is a hyperparameter that determines the ratio to other losses.
Manipulating ML predictions in a way that is consistent with physics has the following advantages
- It offers the possibility to ensure consistency with physical laws and reduces the search space in the ML model.
- Regularization by physical constraints facilitates learning even with unlabeled data. This is because the physics-based loss function does not require observed data.
- An ML model that follows the desired physical properties is better suited for generalization outside the sample scenario than a reference ML model.
It should be noted, however, that the physics-based loss function is fundamentally less constraining and does not guarantee physical consistency or generalization properties.
In the temperature prediction model of the lake shown in Fig.2, the energy conservation law is included in the loss function.
Other applications include partial differential equation solving, governing equation search, inverse modeling, parameterization, downscaling, uncertainty quantification, and generative modeling.
physical induction initialization
Reflecting the physical model in the way the initial values of the parameters are given accelerates the learning and reduces the number of data required. Transfer learning is one way to do this. To incorporate the physical model, the ML model is pre-trained with physical model-based simulation data; Jia et al. applied this method to the temperature prediction model for a lake described above. Other applications include object location recognition in robotics, pre-training for autonomous car driving, and chemical process modeling.
Physically induced initialization can also be performed using self-supervised learning. In self-supervised learning, a discriminative representation is learned using pseudo-labels generated by a predefined pretext task. The pretext task is designed to extract complex patterns that are relevant to the target prediction task. For example, the pretext task can be defined to predict intermediate physical parameters that play a fundamentally important role. This approach allows the physics-based model to be used to simulate these intermediate physical variables, and the intermediate physical variables can be used to pre-train the ML model by adding supervision to the hidden layer.
Physical Induction Architecture Design
The two aforementioned methods place constraints on the search space while training the ML model, but the ML architecture itself remains a black box. They do not encode physical consistency or physical properties into the ML architecture. The direction of recent research has been towards building ML architectures that exploit specific properties of the problem being solved. Furthermore, incorporating physics-based guidance into the architecture design has the bonus of making the black box more explainable.
Intermediate physical variables
One way to embed physical principles into the NN design is to attribute physical meaning to certain neurons in the NN. It is also possible to explicitly declare physically relevant variables; Daw et al. incorporated physical intermediate variables into the LSTM structure; Muralidlar et al. used a similar approach to insert physical constraint variables into the CNN intermediate variables.
An additional advantage is that we can extract physically meaningful hidden representations that can be interpreted by expert scientists.
Another way is to fix some weights to physically meaningful values or parameters that cannot be changed during training. This is used in inverse modeling to find subsurface parameters from seismic data.
Invariants and Symmetry Encodings
In physics, the symmetry of a system, its invariants, and its dynamics are deeply connected. Deep learning models encode certain invariants from the beginning; RNNs encode temporal invariants and CNNs encode spatial coalescence, rotation, and scaling invariants.
Ling et al. incorporated rotational invariants into NNs with tensor-based NNs; Anderson et al. learned the behavior and properties of complex many-body physical systems with rotational covariant NN architectures; Wang et al. generally encoded concurrent symmetry, rotational symmetry, scaling invariants, and constant-velocity motion into CNNs with specially encoded in the NN using convolution.
Symmetry by informing the structure of the solution space may also reduce the search space of the ML algorithm. This approach is useful for the task of governing equation search. Because the space of mathematical terms and operators is exponentially large, Udrescu et al. constructed a recursive multivariate version of symbolic regression with a physical model to narrow the search space. Hidden signs of simplicity are found by NNs.
In molecular dynamics, an NN is used for each atom to calculate the atom's contribution to the total energy. To protect the conservation of energy, constraints are placed on the weights of each NN, and Schutt et al. allow the continuous filter convolution layers of a CNN to be modeled as objects with arbitrary positions, such as atoms in a molecule, rather than on a Cartesian coordinate system like an image. The per-atom layer deals with the distances between atoms and allows for models of quantum chemical constraints, such as rotation-invariant energy predictions and energy-conserving coupling force predictions.
Architectural modifications incorporating symmetry are also found extensively in dynamical systems involving differential equations. Mathematical theory is used to design CNNs based on the basic properties of partial differential equations. Anisotropic filtering defines a parabolic CNN and the Hamiltonian system defines a hyperbolic CNN. The parabolic CNN smoothes the output and reduces the energy while the hyperbolic CNN conserves the energy of the system. Solving partial differential equations with NNs concentrates on learning in Euclidean space, but architecture has recently been proposed that includes Fourier neural operators that generalize this to function space.
The Hamiltonian is primarily used for modeling the time evolution of systems with conserved quantities, but until recently it has not been integrated with NNs Greydanus et al. constructed an NN architecture with energy conservation constraints on a simple mass-spring system. They predicted the Hamiltonian of the system and reintegrated it instead of predicting the physical system state. Recently, the Hamiltonian parameterized NN has been further extended to an NN architecture that performs a differential equation-based integration stage with derivative approximations in Hamiltonian networks.
Encoding of other domain-specific physical findings
Physical information about other domains is also encoded in the architecture. While not corresponding to known invariants, they provide meaningful structure to the optimization process: domain-informed convolution in CNNs, discriminators with additional domain information in GANs, and structures informed by the physical properties of the problem. Fast Fourier transform layer and physically induced convolutional layer are added by physical information pre-learning; in some non-NN examples, Baseman et al. introduced a Markov random field that encodes the Spatio-temporal properties of computer memory into corresponding probability dependencies.
Auxiliary tasks in multitask learning
Multitask learning performs multiple learning tasks simultaneously and looks for commonalities and differences. As part of that task, a physical model can be used; De Oliveira et al. used a GAN discriminator that produces a jet image of particle energy with satisfying certain properties of the particle response. We have added an additional task.
Physics-induced Gaussian process regression
Gaussian process regression (GPR) is a nonparametric, Bayesian approach to regression; Glielmo et al. proposed a vector GPR that encodes a matrix-valued kernel function. It encodes the rotation and reflection symmetries of the interatomic forces into a Gaussian process with a specific invariant-preserving covariant kernel.
The hybrid physics-machine learning model
residual modeling
A common technique that directly represents the incompleteness of the physical base model is residual modeling: the ML model (linear regression) is built on the physical base model and predicts the errors, the residuals (Fig. 3 ). The key concept is to learn the error of the physical model with respect to the observed values and use it to modify the physical model predictions. A limitation of residual modeling is that it cannot enforce the constraints of the physical base because it models errors rather than physical quantities.
It is often used in combination with ROM (reduced-order models). In addition, DR-RNN captures the dynamic structure of partial differential equations with stacked RNNs, where each layer of RNNs solves the residual equations.
Physical model output → ML input
Karpatne et al. input the output of the physical model as one of the feature values in the ML model (Fig. 4).
ML replacement of a part of the physical model
It replaces some elements of the physical model with the ML model or predicts intermediate quantities that are inaccurate in the physical model. For the inconsistency of Reynolds-Averaged Navier-Stokes (RANS) solver in fluid mechanics, variables in turbulence models are predicted by NN models [200]. Some of the mechanistic models and power system state prediction physical models are replaced by ML models.
Physical model, ML prediction coupling
The physical model and the ML model are coupled to make the prediction as a whole. The weights depend on the environment of the prediction. For example, the weights are changed in the physical model for long-term forecasts and in the data-driven model for short-term models.
ML notification for inverse modeling / extended physical model
Hybrid models are increasingly being used in inverse modeling. First, direct inversion is performed using a physics-based model, followed by deep learning to improve the prediction accuracy of the inverse problem. This is used in computer tomography, MRI, etc.
Requirements and advantages of each method
Table 2 summarizes the technical requirements for application and the benefits of each method from the perspective of the method.
crossbreeding potential
A matrix of the application perspectives presented in this review and their classification from the methodology can be seen in Table.3. This shows that there are still many intersections with no research examples and few intersections. Of course, there are some combinations that are technically difficult, but even taking this into account, it means that there is still a lot of room for research.
In addition, since this review provides an overview of the whole picture, I think it has provided a good foundation from which new ideas for combinations can emerge. I hope that many researchers will make use of it.
Some studies did not fit into this classification. For example, the prediction of future events, which incorporates the idea of data assimilation, continuously updates the model state. It has been used in time series models and in the COVID-19 epidemiological model. The other direction is the coupling of physical and ML models to aid in decision-making.
Summary
One of the issues that the author of this article has dealt with in the past is that there are so many unknown reaction coefficients in physical simulations that it is not practical to create an ML model because it requires a huge amount of training data. I had thought that a combination of simulation and ML would be a possible direction.
This review has shown that the same situation is occurring overtly in environmental and medical systems and in many engineering challenges and that a number of attempts are being rapidly promoted with a wide range of objectives. As the paper's author writes, I believe that such an overview will provide further stimulus for new ideas. As a field close to practical application, we look forward to its future growth.
Categories related to this article