Description |
Professor Christian Genest from McGill University, Montreal Canada
Title : Introduction to Copula Modeling and its Implementation in R
Overview: Copulas are multivariate distributions having uniform margins on the unit interval. They are fundamental to the study of stochastic dependence and provide a handy tool for the construction of models between variables whose distributions are heterogeneous or involve covariates. Due to their flexibility, copula models have recently become quite popular in statistics, biostatistics, finance, insurance, and risk management. These lectures will provide an introduction to statistical inference for copula models and its implementation in the R Project for Statistical Computing _ Lecture 1 Copula basics: Concepts and uses Copula models: Construction and simulation Examples of application Lecture 2 Exploratory data analysis: Graphical tools and formal tests of independence Model construction and fitting, emphasizing rank-based methods Model validation through goodness-of-fit testing Lecture 3 Inference for copula models: Implementation with R Recent developments: Copula modeling for extremes and discrete data Instructor Christian Genest, PhD, PStat, is Professor of Statistics and holder of the Canada Research Chair in Stochastic Dependence Modeling at McGill University, Montréal, Canada. He is one of the main contributors to the copula modeling literature and received the 2011 Gold Medal of the Statistical Society of Canada for his work
_
Professor Mike Kenward from London School of Hygiene & Tropical Medecine, UK
Title : Statistical Analysis with Missing Data
Overview: In many areas of research, especially in the life sciences, the problem of missing data is almost ubiquitous. We can define missing data as observations we intended to collect but, for one reason or another, could not. Early work on the problem was largely computational, and addressed the resulting lack of balance. For example, developments such as the EM algorithm allowed "complete data" tools to be applied in a wide variety of incomplete data settings. Following a seminal 1976 paper by Rubin however, the focus shifted to the inferential aspects of the missing data problem and this has dominated the subsequent literature. The topic has now become a mainstream research area in statistics, with major practical implications. In this series of lectures I focus on post-1976 developments, and cover the principal themes that make up the subject today. _ Lecture 1: Introduction to missing data This provides a broad introduction to the subject. Rubin's key definitions, Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), are introduced both through the conventional definitions in terms of probability statements about the missing value mechanism, and through a more recent formulation in terms of Directed Acyclic Graphs. The relevance of the analysis goal (or estimand) for these definitions is emphasized, as is the distinction between missing outcomes and missing covariates. It is seen how assumptions about the missing value mechanism guide appropriate analysis procedures but how, at the same time, the data at hand cannot be used to fully justify such assumptions. As a consequence sensitivity analysis is seen to play a key role in the statistical handling of missing data. Three broad classes of model are introduced: selection, pattern mixture and shared parameter, and their potential use under the different missing data mechanisms, and for sensitivity analysis, is sketched. Lecture 2: Principled analyses for missing data A distinction is made between ad hoc methods of handling missing data, and statistically principled methods. The disadvantages of the former are briefly sketched. Fully parametric model based analyses using selection, pattern-mixture and shared parameter models are developed, and additional issues surrounding missing covariates are considered. A brief treatment is given of semi-parametric approaches that use simple, and augmented, inverse probability weighted estimating equations, and of their relationship with double robust estimators. The important tool of multiple imputation is introduced, links with previous approaches are established, and the key issues of congeniality and auxiliary variables are discussed. Lecture 3: Sensitivity analysis The methodology introduced in Lecture 2 is developed for a variety of different forms of sensitivity analysis, with a particular emphasis on attrition in longitudinal data settings. Such analysis typically explore the impact of departures from the missing at random assumption. There are broadly two ways of approaching such analyses. In the first, a postulated missing value (or attrition) mechanism is modified in a non-random manner. In the second, the statistical behaviour of the conditional distribution of the missing data given the observed is directly modified, again in a specific non-random manner. The two approaches are very different in formulation, but each has advantages in particular settings. Some examples are given.
_
Professor Ray Chambers from Universitata of Wollongog, Australia
Title : Survey sampling and survey methodology
_ Lecture 1: Model-Based Sample Survey Inference The traditional inferential paradigm for survey sampling is repeated sampling. That is, the inferential properties of any estimator are derived from the distribution of this estimator under repeated sampling from a fixed set of values corresponding to the finite population of interest. Unfortunately, although technically free of any population modelling assumptions, this paradigm has become increasingly irrelevant as the complexity of the outputs from surveys have increased, and auxiliary data sources have become more accessible. Alternative model-based approaches to survey inference were first formulated in the 1960s and 1970s, and now represent a viable approach to dealing with the inferential demands made of sample survey data. In this talk I will describe the basics of such a model-based approach to sample survey inference in the context of prediction of a finite population total under a model for the regression of the survey variable on a set of auxiliary variables over the target population. The general development will follow that set out in Chapters 3 to 7 of Chambers and Clark (2012). The idea of calibrated sample weighting (Deville and Sarndal 1992) will be described, as well the relationship between calibration and optimal linear prediction (Chambers 1996). Extensions to the use of non-parametric regression models for finite population inference will also be considered, as well as issues of robustness to model-misspecification and robustness to representative sample outliers (Chambers 1986). An important application of the preceding theory is in prediction of the finite population distribution of a survey variable. The model-based approach of Chambers and Dunstan (1986), which is based on the application of the so-called 'smearing' concept (Duan 1983) to prediction, will be described, as well as the alternative calibration approach of Harms and Duchesne (2006). A realistic large survey application involving a significant level of dislocation between the surveyed and non-surveyed components of the population will then be used to illustrate the benefits of a nonparametric regression approach to prediction of the finite population distribution function. References Chambers, R.L. (1986). Outlier robust finite population estimation. Journal of the American Statistical Association 81, 1063 - 1069. Chambers, R.L. (1996). Robust case-weighting for multipurpose establishment surveys. Journal of Official Statistics 12, 3 - 32. Chambers, R.L. and Clark, R.G. (2012). An Introduction to Model-Based Survey Sampling with Applications. Oxford University Press: Oxford. Chambers, R.L. and Dunstan, R. (1986). Estimating distribution functions from survey data. Biometrika 73, 597 - 604. Deville, J. C. and Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 376 - 382. Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association, 78, 605 - 610. Harms, T. and Duchesne, P. (2006). On calibration estimation for quantiles. Survey Methodology 52, 37-52. _ Lecture 2:Applications of M-quantile Modelling in Sample Surveys M-quantile regression models, as a generalization of quantile regression models (Koenker and Bassett 1978; Koenker 2005), were introduced in Breckling and Chambers (1988). Unlike standard regression models, which characterise the behaviour of a conditional mean, i.e. the expected value of the response given the values of the explanatory variables, these models characterise the behaviour of the complete conditional distribution of the response given these explanatory values. This is done either via models for the quantiles of this distribution, leading to quantile regression; or more generally via models for the M-quantiles of this distribution, leading to M-quantile regression. An important modern application of M-quantile modelling is in small area estimation (Chambers and Tzavidis 2006; Tzavidis et al. 2010)). This is the part of sample survey methodology concerned with predictive model-based inference for unplanned domains in cases where standard domain-specific estimates are too imprecise. In many applications these domains are area-based, and the imprecision arises because domain sample sizes are small, hence the descriptor 'small area' estimation. The standard approach to dealing with this issue is to use mixed models for the overall population; i.e. models that are made up of fixed effects characterised by population level relationships as well as random effects that characterise differences between the small areas. An alternative is to use M-quantile regression to characterise between-area differences in the regression relationship linking the response variable to the explanators. In this talk, I will outline the basic ideas underpinning the application of M-quantile models to small area estimation, and illustrate how these can be extended to small area data that are spatially non-homogeneous (Salvati et al. 2012) and to outlier contaminated data (Chambers et al. 2010). Applications to poverty mapping (Tzavidis et al. 2008) will be described, as well as current research extending the M-quantile modelling approach to binary data.
References
Breckling, J.U. and Chambers, R.L. (1988). M-Quantiles. Biometrika 75, 761 - 771. Chambers, R., Chandra, H., Salvati, N. and Tzavidis, N. (2010). Outlier robust small area estimation. Working Paper 16-09, Centre for Statistical and Survey Methodology, University of Wollongong. URL cssm.uow.edu.au/publications. Chambers, R. and Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika 93, 255-268. Koenker, R. (2005). Quantile Regression. New York: Cambridge University Press. Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46, 33-50. Salvati, N., Pratesi, M., Tzavidis, N. and Chambers, R. (2012). Small area estimation via M-quantile geographically weighted regression. Test, 21, 1-28. Tzavidis, N., Marchetti, S. and Chambers, R. (2010). Robust estimation of small area means and quantiles. Australian and New Zealand Journal of Statistics, 52, 167-186. Tzavidis, N., Salvati, N., Pratesi, M. and Chambers, R. (2008). M-quantile models with application to poverty mapping. Statistical Methods and Applications, 17, 393-411. _ Lecture 3 : Using Likelihood-Based Ideas to Combine Survey and Auxiliary Data Sources A key feature of modern survey data analysis is the complexity of the designs of the surveys involved, and the availability of relevant auxiliary data sources. In particular, the aim is typically to combine the information from these auxiliary data sources with data from the (typically) probability-based survey to improve efficiency. Making efficient use of such varied and complex data sources is usually beyond the capacity of traditional survey data analysis methods. In contrast, modern likelihood-based theory for surveys offers the prospect for such integration. Much attention has been devoted to the analysis of complex data over the last two decades (Chambers and Skinner, 2003). In particular, it is now clear that statistical methods that assume that the distribution of the sample data and the distribution of the population data are identical generally lead to biased inference, since they take no account of either the complex sample design or the availability of auxiliary data for characterising differences between sampled and non-sampled population units. There are three likelihood-based inferential frameworks that are generally used to deal with this problem - Pseudo-likelihood (Pfeffermann 1993); Sample likelihood (Krieger and Pfeffermann 1992); and Full information likelihood (Breckling et al. 1994). Under the pseudo-likelihood approach, unknown sufficient statistics in the population level likelihood estimating equations replaced by sample-weighted estimators, while the sample likelihood approach uses Bayes Theorem to integrate the population model for the sample data and the model for the sampling process. In contrast, under the full information likelihood approach the auxiliary information and the sampling design are directly accounted for in the likelihood, which is itself defined by a joint model for the population distribution of the survey variables, the sampling process and the auxiliary information. Typically, joint inference then proceeds via application of the Missing Information Principle. In this talk, I will develop the full information approach to maximum likelihood inference using both sample survey data as well as data from auxiliary sources, focussing on the situation where the target of inference is the population regression relationship between two survey variables. The ideas that I will use are set out in detail in Chapters 2, 3 and 8 of Chambers et al. (2012).
References
Breckling, J.U., Chambers, R.L., Dorfman, A.H., Tam, S.M. and Welsh, A.H. (1994). Maximum likelihood inference from sample survey data. International Statistical Review, 62, 349-363. Chambers, R.L. and Skinner, C.J. (editors) (2003). Analysis of Survey Data. Chichester: Wiley. Chambers, R.L., Steel, D.G., Wang, S. and Welsh, A.H. (2012). Maximum Likelihood Estimation for Sample Surveys. Boca Raton: CRC Press/Chapman and Hall. Krieger, A.M. and Pfeffermann, D. (1992). Maximum likelihood from complex sample surveys. Survey Methodology, 18, 225-239. Pfeffermann, D. (1993). The role of sampling weights when modelling survey data. International Statistical Review, 61, 317-337.
PROGRAMME
| mercredi 12.09 | jeudi 13.09 | vendredi 14.09 | samedi 15.09 |
8h30-10h00 |
|
Mike Kenward |
Christian Genest |
Ray Chambers |
10h-10h30 |
|
pause café |
pause café |
pause café |
10h30-12h00 |
|
Christian Genest |
Ray Chambers |
Mike Kenward |
12h00-14h00 |
|
pause de midi |
pause de midi |
départ |
14h00-16h00 |
|
|
|
|
|
dès 16h thé de bienvenue |
pause café |
pause café |
|
17h00-18h30 |
Christian Genest |
Ray Chambers |
Mike Kenward |
|
18h30-19h00 |
apéritif |
apéritif |
Réunion Commission Scientifique |
|
19h15-22h00 |
Repas du soir |
Repas du soir |
Repas du soir |
|
|