École d’été 2024


1-4 septembre 2024

Responsable de l'activité

Christian Mazza


Prof. Christian Mazza, Université de Fribourg

Mme Caroline Gillardin, coordinatrice CUSO

Co organisation avec l'EPFL



Prof. Bartek Blaszczyszyn, INRIA, Paris, France

Prof. Roland Langrock, Bielefeld University, Allemagne

Prof. Antonietta Mira, Università della Svizzera italiana, Suisse



Prof. Bartek Blaszczyszyn, INRIA, Paris, France


Title : Ergodic Learning of Spatial Geometric Structures



Ergodicity serves as a crucial link between probability theory and

statistics. In spatial statistics, it connects various spatial

averages to their corresponding mathematical expectations. A

remarkable—and perhaps underemphasized—implication of ergodicity is

that, in theory, a single complete realization of a stationary ergodic

model almost surely allows one to estimate the underlying distribution

of the model.


In the first lesson, we will revisit these foundational results in the

context of point processes. The second lesson will explore how

ergodicity can be leveraged to develop generative models for point

processes, learned from a single realization. In the final lesson, we

will narrow our focus to learning some striking feature of the model,

namely hyperuniformity, and provide mathematical limiting results that

justify this approach to ergodic learning.


**Bibliography for the course:**

1. Blaszczyszyn, B. *Lecture Notes on Random Geometric Models*; hal:cel-01654766.

2. Brochard, A., Blaszczyszyn, B., Mallat, S., and Zhang, S. (2022). *Particle Gradient Descent Model for Point Process Generation*. *Statistics and Computing*; arXiv:2010.14928.

3. Mastrilli, G., Blaszczyszyn, B., Lavancier, F. (2024). *Estimating the Hyperuniformity Exponent of Point Processes*. arXiv:2407.16797.








Prof. Roland Langrock, Bielefeld University, Allemagne


Title : Hidden Markov models Abstact : Hidden Markov models (HMMs) are flexible statistical models for sequences of observations that are driven by underlying states. Over the last two decades, this class of models has become increasingly popular in applied statistics since many real-world phenomena naturally translate to the HMM framework: for example, observed animal movement depends on not directly observed behavioural modes, financial share returns depend on the underlying market volatility, and medical measurements depend on the patient's underlying health state. In such scenarios, HMMs allow for comprehensive statistical inference, including forecasting, state decoding and investigations of the system's response to internal and external drivers.This mini-course will introduce the HMM framework, covering the following topics: – overview & basic model formulationfitting an HMM to data – model selection & model checking

– state decoding

– incorporating covariates, random effects and seasonality

– extensions of the basic model formulation




Prof. Antonietta Mira, Università della Svizzera italiana, Suisse


Title : How can Bayesian statistics help in dimensionality reduction?


Abstract : I will introduce the Bayesian paradigm to statistical inference, and then explain how it can be exploited to estimate the intrinsic dimension (ID) of data, and to answer questions related to dimensionality reduction that are becoming more pressing as the size of available data becomes larger. Indeed, real-world datasets tend to show a high degree of (possibly) non-linear correlations and constraints between their features. This means that, despite a very large embedding dimensionality, data typically lie on a manifold characterized by a much lower ID. which, in the presence of noise, may depend on the scale at which the data is analysed. This fact rises interesting questions: How many variables, or combinations there of, are necessary to describe a real-world data set without significant information loss? What is the appropriate scale at which one should analyze and visualize the data? These two issues, which are often considered unrelated, are actually strongly entangled, and can be addressed within a unified framework. We introduce an approach in which the optimal number of variables and the optimal scale are determined self-consistently, recognizing and bypassing the scale at which the data are affected by noise. To this aim we estimate the data ID in an adaptive way, and exploit it as a summary statistics in Approximate Bayesian Computation for inference in network type data. Sometimes, within the same dataset, it is possible to identify more than one ID meaning that different subsets of the data points lie onto manifolds with different IDs. Identifying these manifold provides a clustering of the data, and in many real world applications a simple topological feature, like the ID, allows to uncover a rich data structure and improves our insight into subsequent statistical analysis. Examples of these applications range from gene expression to protein folding, pandemic evolution, FMRI, all the way to finance, sport data and the analysis of the representations of neural networks.






slides 1 and slides 2 for the Bayesian part of the lectures







Program (can be changed)


Conférences and coffee breaks in : Hôtel des Masques (2 mns walk from Hôtel Eden), Place du Village 7, 1972 Anzère


Wecome tea Sunday : Hôtel Eden


Apero Sunday : Bar Hôtel Eden


Breakfast, lunch and dinner : Hôtel Eden



