Information détaillée concernant le cours
École d’hiver 2018
4 - 7 février 2018
M. Yves Tillé, UNINE (Président), Mme Caroline Gillardin, UNINE (coordinatrice)
Prof. Patrice Bertail, Université Paris Nanterre Prof. Peter Bühlmann, ETH, Zurich Prof. Adele Cutler, Utah State University
Professor Patrice Bertail
(Université Paris Nanterre)
Title : Survey sampling for non-parametric statistics and big-data
In many situations, statisticians have at their disposal not only data but also weights arising from some survey sampling plan. Ignoring the method used to form the database can often result in a significant bias, thereby completely jeopardizing the estimation in non-parametric statistics. To avoid such drawbacks, variants of the estimators can be adopted, which incorporate the underlying sampling design through the use of the inclusion probabilities. The beginning of the Big Data era henceforth provides an additional motivation for investigating the impact of survey designs on statistical procedures. In contrast to naive sub-sampling, sampling designs with unequal probabilities offer a control over the efficiency of estimators. Use of appropriate sampling algorithms could appear as a natural alternative to distributed/parallelized strategies combined with fully random data splitting. In this course we will present - some basic tools for studying Horvitz-Thompson type of empirical process. In particular we will show how the notion of negative association allows to obtain functional CLT for many sampling plans. - some applications of these ideas in several fields : how to efficiently estimate the tail of a r.v. when too many data are available and should be selected , how to perform gradient descent method in likelihood problem with too many observations. We will conclude the course by some recent concentration results for some sampling designs and their extensions.
Professor Peter Bühlmann
Title : Statistics for High-Dimensional Data
We will discuss the topic of high-dimensional statistical inference, a theme which has attracted a lot of interest in statistics, machine learning and a wide range of data analysis problems in science and engineering. The presentations will cover methodology, algorithms, theory as well as applications (mainly) from genetics and genomics.
Professor Adele Cutler
(Utah State University)
Title : Random Forests, Then and Now
Abstract Random forests (Breiman, 2001) are a popular tool in data science. They are particularly appealing because they are accurate predictors that usually work well without needing to be tuned. Some of the lesser-known features of random forests are that they provide proximity information that can be used to visualize the data from a "forest-eye" point of view. They also provide measures of local variable importance that can be used to help with interpretation. This course gives a comprehensive introduction to the random forests algorithm, including an introduction to tree-based methods and ensembles. Throughout the course, examples will be provided so that students can see random forests applied to real data from various fields. R code for the examples will be made available.
Les Diablerets (VD)
Accès aux Diablerets :
EN VOITURE Autoroute A9, direction Grand St-Bernard, sortie Aigle. Puis la route Aigle - Les Diablerets - Col du Pillon (20km). EN AVION Aéroports internationaux de:
Doctorant CUSO chambre double: 200 CHF
Versement sur compte postal: