Part 3: Mercer's theorem and the spectral perspective of RKHS

Disclaimer. This post is served as a summarisation of my study notes on RKHS theory [1], and classes I took on relevant topics, containing theoratical details from [1, 2, 3] along with my interpretations directed by my personal interests on those topics. I am by no means an expert in functional analysis and spectral theory, so please feel free to point out any mistakes in the post.

In this article, we assume that $\calX$ is a locally compact Hausdorff space equipped with a positive Borel measure $\mu$. Moreover, let $\calL^2(\calX, \mu)$ be the separable Hilbert space of real functions defined on $\calX$. We will mainly consider the spectral property of the integral operator for $k\in \calL^2(\calX\times\calX, \mu)$, which is defined as

$$ \begin{align} \calK (f)(y) = \int_{\calX} f(x)k(x,y)\;d\mu(x),\quad \calK: \calL^2(\calX) \to \calL^2(\calX),\label{Eq:HSIO} \end{align} $$

where $\calK$ is a Hilbert-Schmidt integral operator (abbreviated as integral operator in the following) as will be seen in Section 2. For completeness, we first present a brief introduction to the spectral theory of compact self-adjoint operator on Hilbert space. Then we show that the spectral theory is also applicable to the integral operator $\calK$ in our case. In the end, we will see how Mercer’s theorem bridges spectral theory and RKHS, and finally its implication in ML.

Spectral theorem in Hilbert space

The spectral theorem for finite dimensional Hermitian matrices (over $\R$) are of particular interest in Linear algebra, since its eigenfunctions provide with an orthogonal basis, so that the linear map is diagonalisable. Compact operators in Hilbert space consisting of real functions may be viewed as an extension from the finite dimensional case for its spectral properties. We will see in this section, how we may obtain a CONS from a compact operator in $\calH$.

In the article, we use $\calH$ to denote separable Hilbert space if without specification; and we use ‘linear operator’ and ‘linear map’ interchangeably. The theory can be easily extended to spaces of complex-valued functions, but we here only present the results with real-valued functions for simplicity.

Definition 4 (Compact operator). Let $\calH$ be a Hilbert space and $L(\calH)$ be the set of bounded operators on $\calH$, i.e. for any $T\in L(\calH),\; T:\calH\to\calH$, its induced operator norm is finite. Then, $T$ is compact if the image of each bounded set under $T$ is relatively compact (its closure is compact).

Definition 5 (Self-adjoint operator). A bounded operator $T$ as in Definition 4 is called self-adjoint if $A^{\ast} = A$, where $A^{\ast}$ is the adjoint of $A$.

Now, we are ready to present the spectral theorem for compact operators in separable Hilbert space $\calH$.

Theorem 5 (Thm 6.27, M.Einsiedler 2017). Let $\calH$ be a separable Hilbert space, and $T$ a compact self-adjoint operator on $\calH$. Then there exists a sequence of real eigenvalues $\{\lambda_n\}_n$ with $\abs{\lambda_n}\to 0$ as $n\to\infty$; and an orthonormal basis $\{v_n\}_n$ of eigenfunctions with $A v_n = \lambda_n v_n\; \forall n\geq 1$.

It follows directly that such $T$ is diagonalisable, and each non-zero eigenvalue has finite-multiplicity. In Section 2, we will see that an integral operator $\calK$ defined in \eqref{Eq:HSIO} is indeed compact.

Mercer expansion of integral operator

We first show the $\calK$ is Hilbert-Schmidt (HS).

Lemma 1. Operator $\calK$ defined in \eqref{Eq:HSIO} is Hilbert-Schmidt.

Proof. Recall that an operator $T$ on a separable Hilbert space is HS if for any Complete orthogonal system (CONS) $\{\psi_i\}_{i=1}^{\infty}$, we have the following

$$ \norm{T}_{HS}^2 = \sum_{i=1}^{\infty}\norm{T\psi_i}^2 < \infty. $$

Since $k\in \calL^2(\calX\times\calX, \mu)$ then we have $k(x,\cdot) = \sum_i \inner{k(x,\cdot), \psi_i}_{\calL^2} \psi_i$, and

$$ \begin{align*} \int \abs{k(x,y)}^2 \;dy &= \inner{k(x, \cdot), k(x, \cdot)}_{\calL^2}\\ &= \sum_{i=1}^{\infty}\abs{\inner{k(x,\cdot), \psi_i}_{\calL^2}}^2 \\ &=\sum_{i=1} \abs{\int k(x,y)\psi_i(y) \;dy}^2\\ &=\sum_{i=1} \abs{\calK (\psi_i)(x)}, \end{align*} $$

then integrate w.r.t $x$ gives

$$ \int\int \abs{k(x,y)}^2 \;dydx = \sum_i \norm{\calK\psi_i}_{\calL^2}^2 = \norm{\calK}_{HS}^2. $$

Since $k\in \calL^2(\calX\times\calX, \mu)$, the term on the LHS is bounded. Then we complete the proof.

$\square$

Remark. Interestingly, for any HS operator $T$ on separable $\calL^2(\calX, \mu)$, there is a square integrable function $k_T\in \calL^2(\calX\times\calX, \mu)$ such that $T$ can be written in the form of \eqref{Eq:HSIO} [3].

Remark. We have seen in Part 2 that an RKHS can be characterised with any CONS of the separable Hilbert space $\calH$, however it is dependent on the choice for the basis. On the other hand, the norm of HS integral operator $\calK$ is specified irrespective of the CONS.

Proof (informal). Consider any HS operator $T$ on $\calH$ with two sets of CONS $\{\psi_i\}_{i=1}^{\infty}$ and $\{\varphi_i\}_{i=1}^{\infty}$, then

$$ \begin{align*} \norm{T}_{HS} &=\sum_{i=1}^{\infty}\norm{T\psi_i}_{\calH}^2 = \sum_{i,j=1}^{\infty}\abs{\inner{\varphi_i, T\psi_i}_{\calH}}^2\\ &= \sum_{i,j=1}^{\infty} \abs{\inner{T^{\ast}\varphi_j, \psi_i}_{\calH}}^2 = \sum_{j=1}^{\infty} \norm{T^{\ast}\varphi_j}_{\calH}^2, \end{align*} $$

where the second equality is due to $T\psi_i = \sum_{j=1}^{\infty} \inner{T\psi_i, \varphi_j}_{\calH}\varphi_j$.

$\square$

Furthermore, the following Proposition gives that the HS-integral operator is compact and self-adjoint.

Proposition 1 (Hilbert-Schmidt, M.Einsiedler 2017). Let $(\calX, \calB_{\calX}, \mu)$, $(\calY, \calB_{\calY}, \nu)$ be $\sigma$-finite measure spaces. Let $k\in \calL^2(\calX \times\calY, \mu\times\nu)$. Then the HS integral operator $T_k:\calL^2(\calX, \mu)\to \calL^2(\calY, \nu)$ defined by

$$ T_k(f)(y) = \int_{\calX} f(x)k(x,y)\;d\mu(x) $$

for $\nu$-almost everywhere $y\in\calY$ defines a compact operator.

In addition, $T_k$ in Proposition 1 is also continuous thus bounded [3], and it is self-adjoint as long as $k(x,y) = k(y,x)$ (which is automatically satisfied if $k$ is a real-valued PSD kernel). Clearly, Proposition 1 applies to our integral operator $\calK\in L^2(\calX\times\calX, \mu)$; with the assumption that $\calL^2$ is a separable Hilbert space, $\calK$ admits a set of non-zero eigenvalues $\{\lambda_j\}_j$ and eigenfunctions $\{\phi_j\}_j$. Moreover, by the property of HS operator, we have that $\calK = \sum_{i}\lambda_i (\phi_i \otimes \phi_i)$, which can be seen as a generalisation to eigen-decomposition for Hermitian matrices [3].

Finally, Theorem 6 says that for positive definite quadratic form (condition 2 in Theorem 6), operator $\calK$ has positive eigenvalues. We present here a modification of Mercer’s theorem in [1] to accommodate the setting given in the beginning, however, it can be shown to apply to a more general scenario.

Theorem 6 (Mercer’s theorem, S.Saitoh 2016). For $\mu$ and $\calX$ defined previously, assume $k$ satisfies the following assumptions:

$k(x,y) = k(y,x)\;\forall x,y\in\supp{\mu}$;
$\int\int_{\calX\times\calX} k(x,y)f(x)f(y)\;d\mu(y) \geq 0, \;\forall f\in\calL^2(\calX, \mu).$

Then $\calK$ admits a set of positive eigenvalues and corresponding orthonormal eigenfunctions, where $\calK = \sum_{i}\lambda_i (\phi_i \otimes \phi_i)$, where the convergence is absolute and uniform on $\supp{\mu\times\mu}$.

Since condition 2 is equivalent to the definition of positive definite quadratic form, Mercer’s theorem is generally applicable to all reproducing kernels in Part 1.

Construction of RKHS from a spectral perspective

Finally, as a consequence of Theorem 6, we present the explicit expression of RKHS using the eigen-decomposition of the HS integral operator (which clearly is a CONS according to Theorem 5).

Theorem 7 (Thm 18, K.Fukumizu 2010). Under the same notation, we have a RKHS with $k$ s.t.

$$ H_k = \left\{f\in\calL^2(\calX, \mu): f = \sum_i^{\infty} a_i \phi_i, \; \left(\frac{\abs{a_i}^2}{\lambda_i}\right)\in\ell^2\right\}, $$

and the inner product of RKHS is of the form

$$ \inner{f, g}_{H_k} = \sum_i^{\infty} \frac{a_i b_i}{\lambda_i} $$

where $f = \sum_i^{\infty} a_i \phi_i, \, g = \sum_i^{\infty} b_i \phi_i$, and $a_i, b_i = 0$ for $\lambda_i=0$.

Notably, it show that $k(x,x) = \sum_i^{\infty}\lambda_i \phi_i(x)^2<\infty$ by Mercer’s theorem, hence $k(x,\cdot) = \sum_i^{\infty} \lambda_i \phi_i(x)\phi_i(\cdot) \in H_k$. Now, the reproducing property follows directly:

$$ \inner{f, k(x,\cdot)}_{H_k} = \sum_i^{\infty} \frac{\lambda_i^2 \phi_i(x)}{\lambda_i} = f(x). $$

Finally, we end the article with a remark on the implication of this expression of RKHS.

Remark. We can observe that dividing each element of $\ell^2$ inner product by $\lambda_i$ gives $\inner{\cdot, \cdot}_{H_k}$. So we have an isomorphism between Hilbert spaces as followed, for $\calK = \calT\calT^{\ast}$ s.t.

$$ \begin{align*} \calT^{\ast}: \calL^2(\calX, \mu) &\to H_k\\ \sum_i a_i\phi_i &\mapsto \sum_i a_i\sqrt{\lambda_i}\phi_i. \end{align*} $$

This in effect imposes a smoothness condition on $H_k$ (in comparison to $\calL^2$), in the sense that for $f = \sum_i^{\infty} a_i \phi_i$ to be in $H_k$, $a_i$ must go to zero fast enough so that $\norm{f}_{H_k} = \sum_i^{\infty} a_i^2/\lambda_i <\infty$ [4].

References

Saitoh, S and Sawano, Y. Theory of reproducing kernels and applications. Springer, 2016.
Einsiedler, M, and Ward, T. Functional analysis, spectral theory, and applications. Vol. 276. Springer, 2017.
Fukumizu, K. Kernel Method: Data Analysis with Positive Definite Kernels, Lecture 5. Tokyo Institute of Technology, 2010.
Jordan, M. Stat241B: Advanced Topics in Learning & Decision Making, Reproducing kernel Hilbert spaces I. University of California Berkeley, 2004.