The motivation of reading on this topic, is closely connected to my interest in geometric data, in particular, learning functions for data supported on non-Euclidean domain, such as manifold and graphs. One of the reason being that a lot of practically important problems find data with some kind of underlying structure, which is probably more suitable to assume a locally Euclidean domain (e.g. a smooth manifold); and of course graph structured data is also ubiquitous and of great interest in many fields of study.

Motivations: SPDE approach in Spatial statistics and beyond

For someone with a spatial statistics background, it is kind of natural to find Gaussian Process (GP) as the first step in function learning on manifold, with its connection to SPDE and RKHS. Specifically, it is in fact a well motivated problem, with the pioneering work by Whittle on the discovery of Matérn class of GP (which is used extensively in spatial statistics) being solutions to some SPDE driven by Gaussian white noise [1]. It has resurfaced as the ‘SPDE method’ for kriging by Lindgren et al.[2] in the recent decades, and has been implemented in the r-INLA package, which is becoming increasingly popular in the applied studies with the element of spatial statistics, such as ecology, epidemiology, climate science etc. Particularly, their approach tackles the big $N$ problem by approximating GP with a Gaussian Markov Random Field (GMRF), which has a conditional independence structure resulting in a sparse Precision matrix for the finite-sample multivariant Gaussian distribution. The computational cost is dramatically reduced with such approximation, with typically $\calO(N^{3/2})$ for spatial domain and $\calO(N^2)$ in space and time. This is indeed a hard-to-beat solution to the big $N$ problem of GP on spatial and temporal domain, however, the FEM approach is rarely applied to high dimensional problem that is $d>3$, which is of interests for some Machine Learning tasks, such as semi-supervised classification with image data.

The approach that Borovitskiy et al.[3] has taken is to exploit the spectral properties of Laplace-Beltrami operator (or the Laplace operator for Euclidean space), and use a truncated Karhunen-Loève expansion of the GP to give a low rank approximate to the Covariance function, in a similar spirit to FEM basis representation. Additionally, it enables the exploitation of recent advancement in sparse GP such as inducing point methods [5], and various optimisation schemes such as mini-batch training. With the assumption that the spectral measure of a kernel is known, the eigenpairs are only required to compute once for each manifold, and is independent of the subsequent optimisation procedure. Their implementations however, is dependent on solving the eigenvalue problem for Laplace-Beltrami operator on manifold numerically with FEM solver.

Alternatively, Sanz-Alonso & Yang in [4] considered discrete Elliptical SPDE with graph Laplacian, and $\epsilon$-neighborhood graph for the observations, which is not dependent on domain triangulation and would be suitable in the setting of Manifold Learning: assume point cloud data that is sampled i.i.d. from a low dimensional manifold embedded in a high dimensional Euclidean space. Therefore, the approximation of the Karhunen-Loève expansion of GP is using that of the finite dimensional counterpart. In addition, the spectral convergence of the finite expansion is established using tools from Optimal Transport.

An even more general approach would be due to a line of works from Li et al. [10], who proposed a neural architecture that learns the Integral operator (i.e. the convolution of Green’s function) of some Elliptical PDE from $\epsilon$-neighborhood graph constructed from data, which is mesh-free and no information of the PDE is required. With finite number of parameters to train, the network can be however regarded as an operator that maps from a function space to another function space. The prediction obtained from the network is therefore the solution of the underlying PDE. Note that the solution on the entire domain is obtained from that on the sample points via Nystr"om method. It would be interesting to see how to extend the Neural operator to solving Elliptical or even Parabolic SPDE on manifold.

Connecting Elliptical SPDE to Gaussian Field

Our primary goal in this article is to construct an RKHS from an Elliptical SPDE driven by white noise on compact Riemannian manifold with no boundary. Previously, we have seen the formal definition of regular generalised Gaussian field $\frakX$, and how its nuclear Integral operator $\calK$ identifies a RKHS for Gaussian random element $X$ such that $\frakX(f) = \inner{X, f}_{\calH}$ on a separable Hilbert space $\calH$. Now, we first restrict our attention to $\calH = \calL^2(G)$ where $G\subset \R^d$ with a Dirichlet boundary condition, and the next thing to do is to construct a Gaussian white noise on such measurable space. Then we are fully ready to find the GP given by the solution to some SPDE on Euclidean domain. The generalisation to square integrable functions supported on compact Riemannian manifold with no boundary is straightforward, given the spectral properties of Laplace-Beltrami operator $\Delta_{LB}$ as we will see later.

Constructing RKHS from Elliptical SPDE

To motivate the sequel, we will first have a look at an example:

The Matérn GPs [5] on $\calX = \R^d$ satisfies for $f:\calX\to\R$

$$ \left( \frac{2\nu}{\kappa^2} - \Delta \right)^{\frac{\nu}{2} + \frac{d}{4}}f = \dot{\calW} $$

where we note the RHS is a (renormalised) Gaussian white noise.

In the following, we will first define Gaussian white noise on $\calL^2(G)$, then proceed to construct the RKHS that is resulted from the solution of Elliptical SPDEs.

Recall that if $\frakX$ is a Gaussian white noise on a separable Hilbert space $\calH$, then its Integral operator $K=\Id$, i.e.

$$ \E[\frakX(f)\frakX(g)] = \inner{f,g}_{\calH} $$

For $\calH = \calL^2(G)$, the Gaussian white noise $\frakX$ it is usually denoted by $\dot{\calW}$, with an alternative notation

$$ \dot{\calW}(f) = \int_G f(x)d\calW(x). $$

Additionally, as a consequence of Theorem 2, there is no $\calL^2(G)$ valued random element $X$ such that $\dot{\calW}(f) = \inner{X, f}_{\calL^2(G)}$. However, one is able to give a Chaos Expansion with $\{m_k\}_{k\geq 1}$ CONS of $\calL^2(G)$, denoted by

$$ \dot{\calW} = \sum_k \dot{\calW}(m_k)m_k. $$

Now, we are good to consider Elliptical SPDE driven Gaussian white noise on $\calL^2(G)$. Let us denote $L_{\kappa} = \nabla \cdot(\kappa(x)\nabla)$ some Elliptical operator, and we will continue using $\Delta$ for Laplace operator. Then the problem of interest is of the form

$$ \begin{align} \label{Eq:SPDE} L_{\kappa} \frakX = \dot{\calW} \end{align} $$

where $\frakX$ is a zero-mean generalised Gaussian random field (let’s call it zGGRF for short) over $\calL^2(G)$, which is the solution to the above SPDE. For any $f\in\calL^2(G)$, $\frakX$ is defined to be

$$ \frakX(L_{\kappa}^{\ast} f) = \dot{\calW}(f) $$

The following Theorem from [7] gives a way to construct the RKHS for $\frakX$.

Theorem 3 (Thm 4.2.2 [7]). Let $\frakB$ be a zGGRF over a Hilbert space $\calH$, and let $B:Y\to\calH$ be a bounded linear bijection. Then a zero-mean generalised Gaussian random field $\frakX$ over $Y$ is defined by

$$ \frakX(y) = \frakB((B^{-1})^{\ast}y) $$

is the unique solution to the equation $B\frakX = \frakB$.

Hence, the unique solution to \eqref{Eq:SPDE} is given by $\frakX(f) = \dot{\calW}((L_{\kappa}^{-1})^{\ast}f)$. Note that, since we know the Integral operator for $\dot{\calW}$ is identity,

$$ \begin{align*} \E[\frakX(f)\frakX(g)] &= \E[\dot{\calW}((L_{\kappa}^{1})^{\ast}f)\dot{\calW}((L_{\kappa}^{-1})^{\ast}g)]\\ &= \inner{(L_{\kappa}^{-1})^{\ast}f, (L_{\kappa}^{-1})^{\ast}g}_{\calL^2(G)}\\ &= \inner{f, L_{\kappa}^{-1}(L_{\kappa}^{-1})^{\ast} g}_{\calL^2(G)}\\ &= \inner{f, \calK g}_{\calL^2(G)} \end{align*} $$

Let us go back to the Matérn GP example, with a reparameterisation, where $\tau = \frac{2\nu}{\kappa^2}$ and $s = \nu + \frac{m}{2}$,

$$ (\tau I - \Delta)^{\frac{s}{2}}f(x) = \dot{\calW(x)} $$

where the marginal variance of $f$ is

$$ \sigma^2 = \frac{\Gamma(s-\frac{m}{2})}{(4\pi)^{\frac{m}{2}}\Gamma(s)\tau^{s-\frac{m}{2}}}; $$

as before $\dot{\calW}$ is the zero-mean Gaussian white-noise with unit variance. Since in this case, the Elliptical operator $(\tau I - \Delta)^{\frac{s}{2}}$ is self-adjoiint, the Integral operator admits the form (up to a scaling factor due to the variance of $f$)

$$ \calK = \tau^{s-\frac{m}{2}}(\tau I - \Delta)^{-s} $$

The kernel function therefore admits the expansion (up to scaling of a constant) due to the Spectral property of Laplace operator and Borel functional calculus,

$$ k(x,y) = \tau^{s-\frac{m}{2}} \sum_k (\tau + \lambda_k)^{-\frac{s}{2}} \phi_k(x)\phi_k(y) $$

where $\{\lambda_k\}_k$ and $\{\phi_k\}_k$ is the set of eigenvalues and eigenfunctions of $-\Delta$ respectively. So the Karhunen-Lo'eve expansion of the Gaussian random element is

$$ f(x) = \tau^{\frac{s}{2}-\frac{m}{4}} \sum_k (\tau + \lambda_k)^{-\frac{s}{2}} \phi_k(x)\xi_k $$

where $\{\xi_k\}_k$ is a set of i.i.d. standard Gaussian random variables.

The non-stationary version of Matérn class is GP can be extended by replacing the Laplace operator with $\nabla\cdot(\kappa(x)\nabla)$ and $\tau(x)$ (which governs the lengthscale of the process) that depends on spatial variation, where we instead have the SPDE

$$ \left(\tau(x)I - \nabla\cdot(\kappa(x)\nabla) \right)^{\frac{s}{2}} f(x) = \dot{\calW(x)} $$

which gives a Karhunen-Lo'eve expansion of similar form to that of the stationary case, but with an additional normalising factor to the variance,

$$ f(x) = \tau(x)^{\frac{s}{2}-\frac{m}{4}}\gamma(x)^{\frac{m}{4}} \sum_k (\lambda_k)^{-\frac{s}{2}} \phi_k(x)\xi_k. $$

The above is indeed applicable for compact Riemannian manifold with no boundaries, given the spectral property of Laplace-Beltrami operator on such manifold. We provide a brief review in the following.

Spectral theory for Elliptical operator on compact Riemannian manifold with no boundary

Let $(\calM, g)$ be a compact connected Riemannian manifold without boundary, where $g$ is the Riemannian metric that defines an inner product on the local tangent space $\calT_p\calM$ of the manifold $\calM$ at $p$. Let $\Delta_g$ denotes the Laplace-Beltrami operator defined on $\calL^2(\calM)$, which is the space of square integrable functions on $\calM$ w.r.t. Riemannian volume measure. Then the operator $-\Delta_g$ is self-adjoint and positive.

Under this setting, we will see that $-\Delta_g$ has spectral properties that are analogous to the Laplace operator under the setting of bounded domain in Euclidean space.

Theorem 4 (Sturm-Liouville decomposition).Let $(\calM, g)$ be a compact Riemannian manifold without boundary, there exists an orthonormal basis $\{\phi_i\}_{i\in\Z^+}$ of $\calL^2(\calM)$ such that $-\Delta_g \phi_i = \lambda_i \phi_i$ with $0=\lambda_0 < \lambda_1 \leq \dots\leq \lambda_i \to 0$ as $i\to \infty$. Moreover, $-\Delta_g$ admits the representation

$$ -\Delta_g f = \sum_{i=0}^{\infty} \lambda_i \inner{f, \phi_i}_{\calL^2}\phi_i $$

which converges unconditionally in $\calL^2(\calM)$.

This is analogous to the Dirichlet Laplace operator case, where the spectrum and eigenfunctions are given by the Resolvent operator defined by $(1+\Delta_g)^{-1}$, which is compact self-adjoint and positive. In addition, for any Borel function $\Phi:[0,+\infty)]\to\R$, the functional calculus allows for the decomposition of $\Phi(-\Delta_g)$ having the form

$$ \Phi(-\Delta_g) = \sum_{i=0}^{\infty} \Phi(\lambda_i) \inner{f, \phi_i}_{\calL^2}\phi_i $$

References

  1. Whittle P. Stochastic-processes in several dimensions. Bulletin of the International Statistical Institute. 1963 Jan 1;40(2):974-94.
  2. Lindgren F, Rue H, Lindström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2011 Sep;73(4):423-98.
  3. Borovitskiy V, Terenin A, Mostowsky P, Deisenroth MP. Matérn Gaussian processes on Riemannian manifolds. NeuRIPS 2020.
  4. Sanz-Alonso D, Yang R. The SPDE Approach to Mat'ern Fields: Graph Representations. arXiv preprint arXiv:2004.08000. 2020 Apr 16.
  5. Hensman J, Durrande N, Solin A. Variational Fourier features for Gaussian processes. The Journal of Machine Learning Research. 2017 Jan 1;18(1):5537-88.
  6. Dutordoir V, Durrande N, Hensman J. Sparse Gaussian Processes with Spherical Harmonic Features. ICML 2020.
  7. Lototsky SV, Rozovsky BL. Stochastic partial differential equations. New York: Springer; 2017 Jul 6.
  8. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine. 2017 Jul 11;34(4):18-42.
  9. Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, Anandkumar A. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485. 2020 Mar 7.