J. Taylor
September 3-6, 2017
\[ \newcommand{\sqbinom}[2]{\begin{bmatrix} #1 \\ #2 \end{bmatrix}} \newcommand{\Ee}{\mathbb{E}} \newcommand{\Pp}{\mathbb{P}} \newcommand{\real}{\mathbb{R}} \newcommand{\hauss}{{\cal H}} \newcommand{\lips}{{\cal L}} \newcommand{\mink}{{\cal M}} \]
Kinematic Fundamental Formulae (KFF)
Gaussian Kinematic Formula (GKF)
Volume of tubes
Inverting KKT conditions
Kac-Rice formula
Slepian models
\[ \hauss_3\left( \text{Tube}([0,a] \times [0,b] \times [0,c],r)\right) = abc + 2r \cdot ( ab+bc+ac) + (\pi r^2) \cdot (a+b+c) + \frac{4\pi r^3}{3} \]
Another fundamental place where \(\lips_j(\cdot)\)’s appear.
KFF considers the ``average’’ curvature measures of \(M_1 \cap g M_2\)
Isometry group \(G_N\) of rigid motions of \(\real^N\), \[ G_N \sim \real^N \rtimes O(N) \]
Fix a Haar measure: \[ \nu_N \left(\left\{g_N \in G_N: g_Nx \in A\right\}\right) \ =\ \hauss_N(A) \]
\[ \begin{aligned} \int_{G_N} \lips_i \left(M_1 \cap g_N M_2 \right) \; d\nu_N(g_N) & =\ \sum_{j=0}^{N-i} \sqbinom{i+j}{i} \sqbinom{N}{j}^{-1} \lips_{i+j}(M_1) \lips_{N-j}(M_2) \end{aligned} \]
Forgetting all the constants, the answer decomposes into a sum of products.
Gauss-Bonnet theorem: \(\lips_0(\cdot) = \chi(\cdot)\).
Recall what we were trying to study \[ \begin{aligned} \Ee \left\{\chi\left(M \cap f^{-1}[u,+\infty)\right)\right\} & = \int_{\Omega} \lips_0(M \cap f(\omega)^{-1}[u,+\infty)) \; \Pp(d\omega) \\ & = \sum_{j=0}^{\text{dim}(M)} \lips_j(M) \rho_j(u) \\ \end{aligned} \]
This looks like KFF where \(g_NM_2\) is replaced by \(f^{-1}[u,+\infty)=f^{-1}D\).
Can replace \(f\) with vector version \(f=(f_1, \dots, f_k)\).
Starting from a Gaussian process on a manifold \(M\), we have recovered analogies of classical integral geometric results without embedding \(M\) in Euclidean space!
Is there some embedding hidden here?
We should first make our link between classical KFF and Gaussian processes precise…
Let \(f=(f_1, \dots, f_k)\) be made of IID copies of a Gaussian field
Consider the additive functional on \(\real^k\) that takes a rejection region and computes \[ D \mapsto \Ee \left\{\chi\left(M \cap f^{-1}D \right)\right\}. \]
How do the \(\lips_j\)’s enter into this functional?
How about \(D\)?
Define the functionals \(\mink^{\gamma_k}_j(\cdot)\) implicitly by \[ \gamma_k \left(y \in \real^k: d(y, D) \leq r \right) = \sum_{j \geq 0} \frac{(2\pi)^{j/2} r^j}{j!} \mink^{\gamma_k}_j(D).\] with \(\gamma_k \overset{D}{=} N(0, I_{k \times k})\).
Then: \[ \Ee \left\{\chi\left(M \cap f^{-1}D \right)\right\}= \sum_j \lips_j(M) \cdot \mink^{\gamma_k}_j(D) \]
A Gaussian process on \(M\) can be thought of as a map \[ M \ni x \overset{\Psi}{\mapsto} R_x \in {\cal H}_f \] with \({\cal H}_f\) the RKHS of \(f\).
This is the hidden embedding!
The Riemannian metric is the pullback of \(\langle , \rangle_{{\cal H}_f}\) under \(\Psi\).
The KFF is also defined on the Euclidean sphere \(S(\real^M)\) where rigid motions are \(O(M)\).
Possible to arrive at the GKF by studying KFF for \(f^M \dots\)
Finishing this calculation requires more differential geometry than we will go into here.
Some magical (?) combinatorial identities as well.
One can directly compute \[ \Ee \left(\chi(M \cap f^{-1}D)\right) \] using Morse theory and the Kac-Rice formula.
We will see Kac-Rice after considering tubes a little more.
Suppose \(\omega \sim G\) is some nice distribution on \(\real^p\) with density \(g\).
For convex set \(K\), suppose we want to compute \[ \Pp(d(\omega, K) \leq \epsilon) \] with \[ d(y,A) = \inf_{x \in A} \frac{1}{2} \|y-x\|^2_2 \] the Euclidean distance function.
The KKT conditions read \[ x^*(\omega) - \omega + \eta^*(\omega) = 0 \] where \(\eta^*(\omega)\) is an outward pointing normal vector to \(K\) at \(x^*(\omega)\).
Or, \[ \omega = x^*(\omega) + \eta^*(\omega). \]
The metric projection onto \(K\) determines a map \[ \real^p \ni \omega \overset{\psi}{\mapsto} (x^*(\omega), \eta^*(\omega)) \in N(K). \]
As we discussed yesterday, this map can be inverted \[ \omega(x,\eta) \overset{\phi=\psi^{-1}}{=} x + \eta \]
Inverse is piecewise well-behaved over different parts of tube.
Our event is \[ \left\{\omega: d(\omega,K) \leq \epsilon \right\} \]
After reparametrizing by \(\phi\) this event is \[ \left\{(x,\eta): \|\eta\|_2 \leq \epsilon \right\}. \]
Yields \[ \Pp(d(\omega, K) \leq \epsilon) = \sum_{H \in N(K)} \int_H 1_{\{\|\eta\|_2\leq \epsilon} g(x + \eta) J_{\phi}(x, \eta) \; d\hauss_p(x,\eta) \]
The Hausdorff notation is meant to indicate that each piece of \(N(K)\) is \(p\)-dimensional.
In this case \(g \equiv 1\) and everything interesting is baked into the Jacobian.
Written in polar coordinates (fixing length of \(\eta\)) this Jacobian is \[ \det \begin{pmatrix} I & 0 \\ \dots & r \cdot C_{\eta_x} \end{pmatrix} = \sum_{j=0}^{d} r^j\text{detr}_j(C_{\eta_x}) \] when \(x\) is an \(d\)-dimensional piece of the boundary of \(K\).
The quantity \(\text{detr}_j(A)\) is the \(j\)-th elementary symmetric polynomial of the eigenvalues of \(A\).
The principal curvatures are the eigenvalues of \(C_{\eta_x}\).
After gather terms, the final answer has expressions like \[ \int_{H \cap \{\|\eta\|_2=1\}} \text{\detr}_j(C_{\eta_x}) \hauss_{p-1}(x,\eta) \]
Weyl noted that these integrals were Riemannian invariants.
Formally, \(C_{\eta_x}\) depends on how \(M\) is embedded in \(\real^p\).
The value of the integrals do not depend on this embedding.
To derive an expression for \(\mink^{G}_j(K)\), take a Taylor expansion for \[ (x, \eta) \mapsto g(x + \eta), \qquad \|\eta\|_2 \leq r \]
Powers of \(r^j\) can be expressed as integrals over \(N(K)\).
Messy but not impossible to work with.
Somewhat (?) surprising that these quantities show up in EC heuristic.
Let \(H^*(\omega)\) denote the piece of \(N(K)\) containing \((x^*(\omega),\eta^*(\omega))\).
We can also condition on the value of \(H^*(\omega)\) \[ \begin{aligned} \Ee \left(h(x^*,\eta^*) \bigl \vert H^*=H \right ) &= \frac{\int_H h(x,\eta) g(x + \eta) J_{\phi}(x, \eta) d\hauss_p(x,\eta)} {\int_H g(x + \eta) J_{\phi}(x, \eta) d\hauss_p(x,\eta)} \\ \end{aligned} \]
In this case \(K=\left\{r : \|X^Tr\|_{\infty} \leq \lambda \right\}\).
Similar to estimator augmentation (Zhou, (2014); Zhou (2016)) which we will revisit tomorrow.
For our metric projection problem, we are always looking for a unique critical point as our problem was a strongly convex minimization problem.
Our Gaussian processes from Lecture 1 can have many more than one critical point.
We may still want to understand the critical points of a random \(f\) \[ \text{Crit}(f) = \left\{x: \nabla f(x)=0 \right\} \]
Simplest task is counting the number of critical points in expectation \[ \Ee \left({\cal H}_0(\text{Crit}(f))\right)? \]
\[ \begin{aligned} \Ee \left({\cal H}_0(\text{Crit}(f))\right) = \int_M \Ee\left(|\det(\nabla^2 f(x))| \bigl \vert \nabla f(x)=0 \right) \; p_{\nabla f(x)}(0) \; dx \end{aligned} \]
Morse theory represents the Euler characteristic of \(f^{-1}[u,+\infty)\) in terms of critical points of \(f\) with value larger than \(u\) and their indices.
In this case \(g(x)=\nabla^2 f(x)\).
For \(M\) without boundary \[ \begin{aligned} \Ee \left(\chi(M \cap f^{-1}[u,+\infty))\right) &= \sum_{j=0}^{\text{dim}(M)} (-1)^j \int_M \Ee(|\det(\nabla^2 f(x))| 1_{\{f(x) \geq u, \text{index}(-\nabla^2 f(x))=j\}} \bigl \vert \nabla f(x)=0) \; p_{\nabla f(x)}(0) \; dx \\ &= \int_M \Ee(\det(-\nabla^2 f(x)) 1_{\{f(x) \geq u\}} \bigl \vert \nabla f(x)=0) \; p_{\nabla f(x)}(0) \; dx \\ \end{aligned} \]
Remaining integral is straightforward to compute.
Some differential geometry required to recognize the answer as \(\lips_j(M) \dots\)
For \(M\) with boundary, we get a sum over parts of the boundary \[ \begin{aligned} \Ee \left(\chi(M \cap f^{-1}[u,+\infty))\right) &= \sum_{j=0}^{\text{dim}(M)} \int_{\partial_j M} \Ee(\det(-\nabla^2 f_{|\partial j_M}(x)) 1_{\{f(x) \geq u, -\grad f^{\perp, \partial_jM}(x) \in N_xM\}} \bigl \vert \nabla f_{|\partial_jM}(x)=0) \; p_{\nabla f(x)}(0) \; dx \\ \end{aligned} \]
Without going into the gory details this might remind us of our formula for metric projection \[ \begin{aligned} \Ee \left(h(x^*,\eta^*) \right ) &= \sum_{H \in N(K)}\int_H h(x,\eta) g(x + \eta) J_{\phi}(x, \eta) d\hauss_p(x,\eta) \]
We could try applying the Kac-Rice formula to the random function \[ x \overset{f}{\mapsto} \frac{1}{2} \|x-\omega\|^2_2, \qquad x \in K \]
In this notation \(\eta = \nabla f^{\perp, \partial_j K}\).
The Hessian of the restriction of \(f\) to \(\partial_jK\) retains curvature matrix \(C_{\eta}\) which we also see \(J_{\phi}\).
All possible random marks can be expressed in terms of \((x,\eta)\) because \(\omega \mapsto (x, \eta)\) is invertible.
Our explicit metric projection formula is a special case of Kac-Rice.
\[ \begin{aligned} \left|\Ee \left(\chi(M \cap f^{-1}[u,+\infty))\right) - \Ee\left(\text{local maxima of $f$ above $u$}\right)\right| &\leq \int_M \Ee(|\det(-\nabla^2 f(x))| 1_{\{- \nabla f^2 \ngeq 0\}} \bigl \vert \nabla f(x)=0) \; p_{\nabla f(x)}(0) \; dx \\ & \overset{u \to \infty}{\lessapprox} e^{-\alpha u^2/2} \end{aligned} \] for some \(\alpha > 1\).
Suppose that \(f_{x^*} = \sup_{x \in M} f_x\) (a.s. unique for Gaussian processes).
That is, \(x^*\) is a critical point of \(f\) and \[ f_{x^*} \geq f_y, \qquad \forall y \in M \]
Equivalent to \(x^*\) is a critical point of \(f\) and \[ f_{x^*} - \text{Cov}(f_{x^*}, f_y) f_{x^*} \geq f_y - \text{Cov}(f_{x^*}, f_y) \cdot f_{x^*}, \qquad \forall y \in M \]
Equivalent to \(x^*\) is a critical point of \(f\) and \[ f_{x^*} \geq \frac{f_y - \text{Cov}(f_{x^*}, f_y) \cdot f_{x^*}}{1 - \text{Cov}(f_{x^*}, f_y)}, \qquad \forall y \in M \]
Equivalent to \(x^*\) is a critical point of \(f\) and \[f_{x^*} \geq \sup_{y \in M \setminus\{x\}} \widetilde{f}^{x^*}_y.\]
By construction the process \(\widetilde{f}^{x}_y\) is independent of \(f_x\) for each \(x \in M\).
This characterization is closely connected to tomorrow’s polyhedral lemma.
Recall the law when we condition on \(H^*\) \[ \begin{aligned} \Ee \left(h(x^*,\eta^*) \bigl \vert H^*=H \right ) &= \frac{\int_H h(x,\eta) g(x + \eta) J_{\phi}(x, \eta) d\hauss_p(x,\eta)} {\int_H g(x + \eta) J_{\phi}(x, \eta) d\hauss_p(x,\eta)} \\ \end{aligned} \]
Nothing stops us also from conditioning on \(\eta^*\) (which a fortiori fixes \(H^*\)) yielding \[ \begin{aligned} \Ee \left(h(x^*,\eta^*) \bigl \vert H^*=H, \eta^*=\eta \right ) &= \frac{\int_{\pi(H)} h(x,\eta) g(x + \eta) J_{\phi}(x, \eta) d\hauss_d(x)} {\int_{\pi(H)} g(x + \eta) J_{\phi}(x, \eta) d\hauss_d(x)} \\ \end{aligned} \] when \(\eta\) is such that \(x\) lies in \(\pi(H)\) a \(d\)-dimensional part of the boundary of \(K\).
We could also condition on \(x^*\) (which also fixes \(H^*\)) \[ \begin{aligned} \Ee \left(h(x^*,\eta^*) \bigl \vert H^*=H, x^*=x \right ) &= \frac{\int_{\pi(H)} h(x,\eta) g(x + \eta) J_{\phi}(x, \eta) d\hauss_{p-d}(\eta)} {\int_{\pi(H)} g(x + \eta) J_{\phi}(x, \eta) d\hauss_{p-d}(\eta)} \\ \end{aligned} \]
This conditions our convex program to have its one and only critical point (i.e. its solution) at \(x\).
The point here is that this change of measure makes conditioning straightforward.
For the convex program, there is only one critical point so it is easy talk about conditioning the objective function to have a critical point at \(x^*\), say.
For general \(f\) , we can also condition \(f\) on having a critical point at \(x \in M\).
Let \(\bar{f}^x\) be the Gaussian process \(f - \Ee_0(f|\nabla f(x))\) with law \(\Pp^x\).
Define a new process with law \(\mathbb{Q}^x\) \[ \frac{d\mathbb{Q}^x}{d\Pp^x}(\bar{f}) \propto |\det(\nabla^2 \bar{f}(x))|. \]
Conditioning on local maxima above the level \(u\) has law \(\bar{\mathbb{Q}}^{x,u,+}\) \[ \frac{d\bar{\mathbb{Q}}^{x,u,+}}{d\Pp^x}(\bar{f}) \propto \det(-\nabla^2 \bar{f}(x)) \cdot 1_{\{f(x) \geq u, \nabla^2 f(x) < 0 \}} \]
\(P\)-values for signal detection in fMRI and other imaging modalities are considered by several: Chumbley et al. (2009), Schwartzman and Cheng (2017).
If you squint enough, the law \(\mathbb{Q}^x\) is very similar \(\bar{\Pp}^x\) for our metric projection.
Intrinsic volumes determined by metric projection problem.
Metric projection invertible (globally for convex \(K\) – locally for nice enough \(M\)).
Change of measure for metric projection is essentially Kac-Rice.
Kac-Rice is basic tool for computing expected values for critical values of a smooth function.
Slepian models can be used to condition a process to have critical points at fixed locations.
The explicit inverse of metric projection makes Slepian conditioning very clear.