These examples and code are taken from the upcoming book ‘Bayesian Statistics: A Primer for Data Scientists, with R and Python’ by Agresti, Kateri and Mira.
In this example we find Bayesian posterior mean estimates of the probability \(\pi\) of success for each of three binomial experiments that have the same data, \(y = 10\) successes in \(n = 10\) trials: (1) a British woman claims that in tasting a cup of tea with milk, she can tell whether the milk was poured before or after the tea was poured; (2) A professor of 18th century musicology claims to be able to tell for any pair of pages of music, one composed by Mozart and one by Haydn, who composed each. (3) A person in a drunken state claims to be able to predict whether your flip of a balanced coin will result in a head or in a tail.
For the tea taster, if we have no a priori reason to expect any particular value for \(\pi\), we could select a uniform prior distribution, which is the Beta(1, 1) distribution. By contrast, for coin flips we may be highly skeptical of any claim about predictions and decide to use a beta prior distribution with mean 0.50 but with small standard deviation, say 0.03, so that the beta distribution is highly concentrated near 0.50. We might have greater faith in the musicologist’s claim, perhaps using a beta prior distribution with a mean of 0.90 and standard deviation 0.10.
alpha1 = 138; alpha2 = 138 # beta hyperparameters for predictor of coin flips
alpha1/(alpha1+alpha2) # mean of beta prior distribution
## [1] 0.5
sqrt((alpha1*alpha2)/((alpha1+alpha2)^2 * (alpha1+alpha2+1)))
## [1] 0.03004209
# standard deviation of beta prior distribution
alpha1 = 7; alpha2 = 7/9 # beta hyperparameters for musicologist
alpha1/(alpha1+alpha2) # mean of beta prior distribution
## [1] 0.9
sqrt((alpha1*alpha2)/((alpha1+alpha2)^2 * (alpha1+alpha2+1)))
## [1] 0.1012579
# standard deviation of beta prior distribution
We now find the posterior mean \(\tilde{\pi}\) for the three chosen prior distributions:
alpha1 <- 1; alpha2 <- 1 # hyperparameter values for uniform prior distribution
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for British tea taster
## [1] 0.9166667
qbeta(0.5, alpha1post, alpha2post) # posterior median for British tea taster
## [1] 0.9389309
alpha1 = 138; alpha2 = 138 # beta hyperparameters with mean 0.50, standard dev. 0.03
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for predictor of coin flips
## [1] 0.5174825
alpha1 = 7; alpha2 = 7/9 # beta hyperparameters with mean 0.90, standard dev. 0.10
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for musicologist
## [1] 0.95625
qbeta(0.5, alpha1post, alpha2post) # posterior median for musicologist
## [1] 0.971962
With these prior distributions, the posterior mean estimates of \(\pi\) are 0.917 for the tea taster and 0.956 for the musicologist but only 0.517 for the predictor of coin flips. The posterior mean for coin flipping is very close to the prior mean of 0.50, the data having relatively little influence when the prior distribution is so sharp.
The 2018 General Social Survey in the U.S. asked “Do you believe in hell?” The percentage who responded yes was 74.0% for the 727 subjects with a high school education or less, 70.1% for the 304 subjects with a junior college or bachelor’s degree, and 56.8% for the 111 subjects with a graduate degree. We shall find posterior intervals for the population proportion \(\pi\) of those with a graduate degree who would respond yes.
qbeta(c(0.025, 0.975), 63 + 1, 48 + 1)
## [1] 0.4744648 0.6560523
# quantiles of beta posterior dist. for 95% equal-tail percentile int. with uniform prior
qbeta(c(0.025, 0.975), 63.5, 48.5)
## [1] 0.4746530 0.6570094
# quantiles of beta posterior dist. for 95% equal-tail percentile int. with Jeffreys prior
library(binom)
binom.bayes(63, 111, conf.level=0.95, type="central", alpha1=1, alpha2=1)
# 95% equal-tail percentile int. for uniform prior dist.
binom.bayes(63, 111, conf.level=0.95, alpha1=1, alpha2=1)
# 95% HPD interval for uniform prior dist.
binom.bayes(63, 111, conf.level=0.95, alpha1=0.5, alpha2=0.5)
# 95% HPD interval for Jeffreys prior dist.
binom.confint(63, 111, conf.level=0.95, method="asymptotic") # 95% Wald confidence int.
binom.confint(63, 111, conf.level=0.95, method="wilson")
# 95% score confidence int. (better than Wald CI if n small or pi near 0 or 1)
As the sample size \(n\) increases, the influence of the prior distribution weakens and the posterior distribution has appearance closer to that of the likelihood function. In the limit, Bayes estimates of \(\pi\) become more similar to the ML estimate and Bayesian posterior intervals become more similar to frequentist confidence intervals. We illustrate this with a sequence of cases in which \(\hat{\pi} = 0.70\), computing for increasing values of n the frequentist Wald confidence interval and the Bayesian HPD interval obtained with uniform prior distribution (\(\alpha_1 = \alpha_2 = 1\)):
library(binom)
binom.confint(7, 10, conf.level=0.95, method="asymptotic") # n = 10
# frequentist Wald CI
binom.bayes(7, 10, conf.level=0.95, alpha1=1, alpha2=1)
# Bayesian HPD interval for uniform prior distribution
binom.confint(70, 100, conf.level=0.95, method="asymptotic") # n = 100
# frequentist Wald CI
binom.bayes(70, 100, conf.level=0.95, alpha1=1, alpha2=1)
# Bayesian HPD interval
binom.confint(7000, 10000, conf.level=0.95, method="asymptotic") # n=10000
# frequentist Wald CI
binom.bayes(7000, 10000, conf.level=0.95, alpha1=1, alpha2=1)
# Bayesian HPD interval
For a particular sample size n (set to 100 in the R code below), a Bayesian interval is more similar to the frequentist confidence interval as the prior distribution is more diffuse, that is, as \(\alpha_1 = \alpha_2 = \alpha\) decreases toward 0:
library(binom)
binom.bayes(70, 100, conf.level=0.95, alpha1=100, alpha2=100) # beta hyperpara's=100
# narrow interval when prior dist. has small variance
binom.bayes(70, 100, conf.level=0.95, alpha1=10, alpha2=10) # beta hyperpara's=10
binom.bayes(70, 100, conf.level=0.95, alpha1=1, alpha2=1) # beta hyperpara's=1
binom.bayes(70, 100, conf.level=0.95, alpha1=0.1, alpha2=0.1) # beta hyperpara's=0.1
binom.bayes(70, 100, conf.level=0.95, alpha1=0.01, alpha2=0.01) # beta hyperpara's=0.01
binom.confint(70, 100, conf.level=0.95, method="asymptotic") # frequentist Wald CI
With the uniform prior, the posterior distribution is beta with hyperparameters \(\alpha_{1}^* = y + 1 = 64\) and \(\alpha_{2}^* = n - y + 1 = 49\). We next use this posterior distribution to find \(P(\pi < 0.50 \mid y)\) and \(P(\pi > 0.50 \mid y)\):
pbeta(0.50, 64, 49) # cumulative probability at 0.50 for beta posterior distribution
## [1] 0.07803315
1 - pbeta(0.50, 64, 49) # posterior P(pi > 0.50 | y)
## [1] 0.9219668
There is substantial but not overly strong evidence that the majority believe in hell.
A corresponding frequentist statistical inference reports the \(P\)-value
1 - pbinom(62, 111, 0.50) # one-sided (right-tail) P-value for Binom(111, 0.50) dist.
## [1] 0.09182859
Returning to the example of the British tea taster we illustrate the concept of Bayesian prediction of future observations. After observing 10 successful guesses in 10 cups, combined with a uniform prior distribution we obtained a posterior beta distribution for the probability of a successful prediction, with \(\alpha_{1}^{*} = 11\) and \(\alpha_{2}^{*} = 1\), for which the posterior mean is \(\mu^{*} = 11/12 = 0.9167\). For the next \(n_{_{f}} = 5\) cups, here is the beta-binomial distribution for the number of correct guesses \(Y_{_f}\):
library(extraDistr)
dbbinom(c(0,1,2,3,4,5), 5, alpha = 11, beta = 1) # displays P(Y_f = 0), P(Y_f = 1), ... , P(Y_f = 5)
## [1] 0.0002289377 0.0025183150 0.0151098901 0.0654761905 0.2291666667
## [6] 0.6875000000
The expected number of successful guesses is \(n_{_{f}}\mu^{*} = 5(11/12) = 4.58\), and the probability is 0.6875 of getting all five correct.
We are now comparing the proportions of females and of males who believe in hell, again using data from the 2018 GSS. The sample proportions are \(\hat{\pi}_1 = 498/674 = 0.739\) for females and \(\hat{\pi}_2 = 316/468 = 0.675\) for males. With independent binomial samples and independent beta prior distributions for \(\pi_1\) and \(\pi_2\), we can form a Bayesian posterior interval to determine the plausible values of \(\pi_1 - \pi_2\). The following R code shows a posterior 95% percentile interval based on simulating 50 million random draws from each posterior beta density. We also show results using R functions to simulate the HPD interval and to construct the frequentist 95% Wald and score confidence intervals:
pi1 <- rbeta(5000000, 499, 177) # simulating from posterior dist. of pi_1
pi2 <- rbeta(5000000, 317, 153) # simulating from posterior dist. of pi_2
quantile(pi1 - pi2, c(0.025, 0.975)) # 0.025 and 0.975 quantiles of simulated difference
## 2.5% 97.5%
## 0.01018281 0.11759764
# simulated 95% percentile interval for pi1 - pi2
hist(pi1 - pi2) # posterior distribution is bell-shaped
plot(density(pi1 - pi2), ylab="Density", xlab="Difference of Probabilities", main="")
#install.packages('HDInterval')
library(HDInterval)
hdi(pi1 - pi2, credMass=0.95) # simulated 95% HPD interval for pi1 - pi2
## lower upper
## 0.01015136 0.11756478
## attr(,"credMass")
## [1] 0.95
prop.test(c(498, 316), c(674, 468), correct=FALSE) # data c(y1, y2), c(n1, n2)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(498, 316) out of c(674, 468)
## X-squared = 5.4675, df = 1, p-value = 0.01937
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.009809589 0.117507867
## sample estimates:
## prop 1 prop 2
## 0.7388724 0.6752137
# frequentist 95% Wald CI for pi1 - pi2
#install.packages('PropCIs')
library(PropCIs)
diffscoreci(498, 674, 316, 468, conf.level=0.95) # arguments y1, n1, y2, n2
##
##
##
## data:
##
## 95 percent confidence interval:
## 0.01019264 0.11786252
# frequentist 95% score CI for pi1 - pi2
quantile(pi1/pi2, c(0.025, 0.975)) # simulated 95% percentile interval for risk ratio
## 2.5% 97.5%
## 1.014403 1.184039
Using a simulated posterior distribution for \(\pi_1 - \pi_2\), we can precisely approximate posterior probabilities such as \(P(\pi_1 > \pi_2 | y_1,y_2)\) by the proportion of the simulated joint distribution for which \(\pi_1 > \pi_2\). For analyzing belief in hell by gender, we used uniform prior distributions. As seen in the following R code, the posterior probability that females are more likely than males to believe in hell is 0.990. It seems highly likely that \(\pi_1 > \pi_2\).
pi1 <- rbeta(5000000, 499, 177)
pi2 <- rbeta(5000000, 317, 153)
mean(pi1 < pi2); mean(pi1 > pi2) # simulated posterior P(pi1 < pi2), P(pi1 > pi2)
## [1] 0.0098148
## [1] 0.9901852
prop.test(c(498, 316), c(674, 468), correct=FALSE) # data c(y1, y2), c(n1, n2)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(498, 316) out of c(674, 468)
## X-squared = 5.4675, df = 1, p-value = 0.01937
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.009809589 0.117507867
## sample estimates:
## prop 1 prop 2
## 0.7388724 0.6752137
# X-squared = chi-squared statistic = square of z test stat; doesn't show direction
The R output also shows the result of the frequentist significance test of \(H_0: \pi_1 = \pi_2\). It is a chi-squared statistic that is the square of the \(z\) test statistic.
In practice, the number of groups to be compared often exceeds two. We can construct the inferences so that the error probability applies to the entire family of inferences rather than to each individual one. The method for constructing simultaneous posterior intervals is called the Bonferroni method. To illustrate, for the 2018 General Social Survey responses to “Do you believe in hell?” by education level, the (yes, no) totals were (538, 189) for those with at most a high-school education, (213, 91) for those with a junior college or bachelor’s degree, and (63, 48) for those with a graduate degree. When we use posterior probability 0.98 for each of the 3 comparisons of pairs of education levels, the entire set of 3 posterior intervals holds with probability at least 1-3(0.02) = 0.94. We next simulate percentile intervals to do this, using the posterior distributions generated with uniform prior distributions:
pi1 <- rbeta(5000000, 600+1, 1635-600+1) # posterior dist. with uniform prior for white
pi2 <- rbeta(5000000, 100+1, 368-100+1) # posterior dist. with uniform prior for black
pi3 <- rbeta(5000000, 54+1, 277-54+1) # posterior dist. with uniform prior for other
quantile(pi1 - pi2, c(0.01, 0.99)) # posterior probability = 0.99 - 0.01 = 0.98
## 1% 99%
## 0.03230678 0.15312961
quantile(pi1 - pi3, c(0.01, 0.99))
## 1% 99%
## 0.1057531 0.2293427
quantile(pi2 - pi3, c(0.01, 0.99))
## 1% 99%
## -0.002094133 0.152297242
At this probability level, we conclude that belief in hell is less common for those with a graduate degree than for the other two education levels, because the posterior intervals for \(\pi_1 - \pi_3\) and for \(\pi_2 - \pi_3\) contain only positive values. For example, we infer that the population proportion believing in hell is between 0.06 and 0.29 higher for those with at most a high school degree than for those with a graduate degree. The intervals are wide, and belief in hell could be substantially less common for those with a graduate degree. However, it is plausible that the proportions believing in hell are equal for the first two education levels, because the posterior interval for \(\pi_1 - \pi_2\) contains 0.
The 2021 General Social Survey asked respondents whether they considered themselves to be very happy, pretty happy, or not too happy. The sample counts of the 4003 respondents in the (very happy, pretty happy, not too happy) categories were (778, 2304, 921), for percentages (19.4, 57.6, 23.0). We shall find posterior intervals for the proportions of adults in the U.S. who characterize themselves in each category. Here are 99% percentile intervals for \(\pi_1\), \(\pi_2\), and \(\pi_3\):
qbeta(c(0.005, 0.995), 779, 3227) # quantiles, beta hyperparameters for very happy
## [1] 0.1786434 0.2108461
# 99% posterior percentile interval
qbeta(c(0.005, 0.995), 2305, 1701) # quantiles, beta hyperparameters for pretty happy
## [1] 0.5552075 0.5954250
# 99% posterior percentile interval
qbeta(c(0.005, 0.995), 922, 3084) # quantiles, beta hyperparameters for not too happy
## [1] 0.2132832 0.2475323
# 99% posterior percentile interval
For example, the probability is 0.99 that the population proportion who would report being very happy is between 0.179 and 0.211. By the Bonferroni method, all three posterior intervals contain the true population proportion values with probability at least 1-3(0.01) = 0.97.
Bayesian methods also apply for evaluating \(H_0\): independence. For example the Bayes factor compares two hypotheses or models corresponding to the hypotheses. The Bayes factor in favor of the statistical dependence model over the independence one has a complex expression involving several gamma functions. Its formula does not provide intuition and we omit it, but it is provided by R as shown in the following code:
data <- matrix(c(282,324,172,670,966,668,198,301,422),c(3,3)) # 3 rows and 3 columns
data
## [,1] [,2] [,3]
## [1,] 282 670 198
## [2,] 324 966 301
## [3,] 172 668 422
chisq.test(data) # Pearson chi-squared test
##
## Pearson's Chi-squared test
##
## data: data
## X-squared = 133.46, df = 4, p-value < 2.2e-16
library(BayesFactor)
## Loading required package: coda
## Loading required package: Matrix
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "ndiMatrix" of class "replValueSp"; definition not updated
## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey ([email protected]).
##
## Type BFManual() to open the manual.
## ************
BF <- contingencyTableBF(data, priorConcentration = 1, sampleType = "jointMulti")
# priorConcentration = 1 is default uniform Dirichlet dist's encoding independence
# use sampleType = "indepMulti" for independent multinomial sampling in the rows
BF
## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 3.305774e+23 ±0%
##
## Against denominator:
## Null, independence, a = 1
## ---
## Bayes factor type: BFcontingencyTable, joint multinomial
We estimate that the data would be \(3.3 \cdot 10^{23}\) times more likely under the dependence model than under the independence one. This is extremely strong evidence against the hypothesis that happiness and income are statistically independent.
This example simulates a sample of 1000 people to mimic a poll taken before the 2020 U.S. presidential election, in which the Democratic candidate Joseph Biden faced the Republican candidate Donald Trump. Let \(\pi_i\) be the population proportion in the actual election that voted for Biden in state \(i\) (\(i = 1, \dots , 51\), where \(i = 8\) is District of Columbia), out of those that voted for Biden or Trump. We take the number of observations \(n_i\) in the simulated sample from state i proportional to the number of people in state i who voted in that election, subject to \(\sum_i n_i = 1000\). The simulated number \(y_i\) voting for Biden is the outcome of a Binom(\(n_i,\pi_i\)) random variable.
Many states have small \(n_i\), and we can obtain better estimates by borrowing from the whole data set. The essence of the empirical Bayes approach with a beta prior distribution is that we regard the data as being generated in the following way: 51 randomly generated values from a Beta(\(\alpha_1,\alpha_2\)) distribution determine the population proportions \(\{\pi_i\}\) voting for Biden. Then, a randomly generated value from the Binom(\(n_i,\pi_i\)) distribution yields the number \(y_i\) in the sample of size \(n_i\) from state \(i\) voting for Biden, \(i = 1, \dots , 51\). The sample proportions \(\{\hat{\pi}_i = y_i/n_i\}\) exhibit greater variability than if the \(\{y_i\}\) came from outcomes of {Binom(\(n_i,\pi\))} binomial random variables that all had the same success probability \(\pi\). We do not know \(\alpha_1\) and \(\alpha_2\) for the beta distribution that generated the \(\{\pi_j\}\), but once we observe the counts \(\{y_i\}\) and sample size indices \(\{n_i\}\), software can use iterative methods to find marginal ML estimates \(\alpha_1\) and \(\alpha_2\) by maximizing the beta-binomial likelihood function for the data. Then we perform ordinary Bayesian inference using the Beta(\(\hat{\alpha}_1,\hat{\alpha}_2\)) prior distribution. The posterior mean estimate of a population proportion \(\pi_i\) is a weighted average of the sample proportion \(\hat{\pi}_i\) and the mean \(\hat{\alpha}_1/(\hat{\alpha}_1+\hat{\alpha}_2)\) of the estimated beta prior distribution.
For the election polling data, the following R code shows that maximizing the marginal beta-binomial likelihood yields estimated hyperparameters for the beta distribution of \(\hat{\alpha}_1 = 32.49\) and \(\hat{\alpha}_2 = 29.39\). The following code shows results for the first five states in the list.
Election <- read.table("Election2020.dat", header=TRUE)
head(Election, 5)
library(VGAM) # package that can find ML fit of beta-binomial dist.
## Warning: package 'VGAM' was built under R version 4.3.3
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:coda':
##
## nvar
## The following objects are masked from 'package:extraDistr':
##
## dfrechet, dgev, dgompertz, dgpd, dgumbel, dhuber, dkumar, dlaplace,
## dlomax, dpareto, drayleigh, dskellam, dslash, pfrechet, pgev,
## pgompertz, pgpd, pgumbel, phuber, pkumar, plaplace, plomax,
## ppareto, prayleigh, pslash, qfrechet, qgev, qgompertz, qgpd,
## qgumbel, qhuber, qkumar, qlaplace, qlomax, qpareto, qrayleigh,
## rfrechet, rgev, rgompertz, rgpd, rgumbel, rhuber, rkumar, rlaplace,
## rlomax, rpareto, rrayleigh, rskellam, rslash
fit.bb <- vglm(cbind(y, n-y) ~ 1, betabinomialff, data=Election) # Fits beta binomial
Coef(fit.bb) # to find marginal ML estimates of beta hyperpara's
## shape1 shape2
## 32.49346 29.38929
# for empirical Bayes approach
#Estimated beta hyperparameters
Popul.prop <- Election$pi; State <- Election$state
Sample.prop <- Election$y/Election$n
Emp.Bayes.est <- (Election$y+Coef(fit.bb)[1])/(Election$n+Coef(fit.bb)[1]+Coef(fit.bb)[2])
Results <- data.frame(State, Popul.prop, Sample.prop, Emp.Bayes.est)
head(Results, 5)
sqrt((sum(Election$n*(Sample.prop - Popul.prop)^2))/1000)
## [1] 0.09737371
#root mean square of sample proportions around population proportions
sqrt((sum(Election$n*(Emp.Bayes.est - Popul.prop)^2))/1000)
## [1] 0.0737974
#root mean square of empirical Bayes estimates around population prop's
The normal distribution has a bell-shaped pdf that is symmetric. We denote the normal distribution with mean \(\mu\) and variance \(\sigma^2\) by \(N(\mu,\sigma^2)\), where \(\mu\) can be any real number and \(\sigma > 0\). When we use the normal distribution as a prior distribution for \(\log[\pi/(1 - \pi)]\), we refer to the distribution for \(\pi\) itself as the logit-normal distribution. When we take \(\mu = 0\) for the normal distribution, the corresponding logit-normal distribution on the \(\pi\) scale has pdf with a symmetric shape around 0.50. It is unimodal when \(\sigma^2 \leq 2\) and bimodal when \(\sigma^2 > 2\), but always tapering off toward 0 as \(\pi\) approaches 0 or 1. Specifically, it is mound-shaped for \(\sigma = 1\), roughly uniform except near the boundaries when \(\sigma \approx 1.5\), and with more pronounced peaks for the modes when \(\sigma\) is about 2 or larger.
We will not consider here the methodology for finding the posterior distribution for the logit parameter, \(L = \log[\pi/(1 - \pi)]\),. However, it is simple to construct it using software. Once we do this, a posterior interval for \(L\) yields a posterior interval for \(\pi\) using the inverse transformation, \[ \pi = e^L/(1 + e^L). \]
To illustrate, the same GSS that asked about belief in hell also asked respondents if they believed in an afterlife. Of 1129 respondents, 904 said yes and 225 said no. We next find posterior intervals for the population proportion \(\pi\) saying yes using Bayesian approaches with (i) a uniform prior distribution for \(\pi\), and (ii) a \(N(0,\sigma^2)\) prior distribution for \(\log[\pi/(1 - \pi)]\) with \(\sigma = 2\). Here, we first use the binom package in R with the uniform prior, and then the brms (Bayesian Regression Models using ’Stan’) package. The brms package uses ‘bernoulli’ to specify the binomial with a single trial and uses ‘normal(mu, sigma)’ to specify a normal distribution, with the standard deviation rather than the variance in the second argument.
library(binom)
binom.bayes(904, 1129, conf.level=0.95, type="central", alpha1=1, alpha2=1)
# 95% posterior interval for pi for uniform prior
y <- c(rep(1, 904), rep(0, 225)) # the sample of 904 `yes' and 225 `no' outcomes
Results <- data.frame(y)
library(brms)
## Loading required package: Rcpp
## Loading 'brms' package (version 2.21.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
##
## Attaching package: 'brms'
## The following objects are masked from 'package:VGAM':
##
## acat, cratio, cumulative, dfrechet, dirichlet, exponential,
## frechet, geometric, lognormal, multinomial, negbinomial, pfrechet,
## qfrechet, rfrechet, s, sratio
## The following objects are masked from 'package:extraDistr':
##
## ddirichlet, dfrechet, pfrechet, qfrechet, rdirichlet, rfrechet
## The following object is masked from 'package:stats':
##
## ar
fit.bayes <- brm(y ~ 1, family=bernoulli(link=identity), refresh = 0,
data=Results, prior = prior(uniform(0,1), lb=0, ub=1, class=Intercept), silent = 2)
print(fit.bayes, digits=3)
## Family: bernoulli
## Links: mu = identity
## Formula: y ~ 1
## Data: Results (Number of observations: 1129)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.800 0.012 0.775 0.823 1.001 1418 1804
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# 95% post. int. for pi for uniform prior
fit.bayes2 <- brm(y ~ 1, family=bernoulli(link=logit), refresh = 0,
data=Results, prior = prior(normal(0,2), class=Intercept), silent = 2)
print(fit.bayes2, digits=3)
## Family: bernoulli
## Links: mu = logit
## Formula: y ~ 1
## Data: Results (Number of observations: 1129)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.389 0.076 1.243 1.540 1.004 1304 1991
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# 95% posterior interval for logit using normal prior
exp(fixef(fit.bayes2))/(1 + exp(fixef(fit.bayes2)))
## Estimate Est.Error Q2.5 Q97.5
## Intercept 0.8003536 0.5189663 0.7761063 0.8234435
# 95% post. interval for proportion using normal prior for logit
Rather than use noninformative normal priors with large \(\sigma\), we could select a relatively small \(\sigma\) based on historical information. Past elections suggest that all (or nearly all) states tend to have proportion \(\pi_i\) favoring the Democratic candidate between about 0.25 and 0.75, for which logit(\(\pi_i\)) falls between -1.1 and +1.1. Since nearly the entire normal distribution falls within 3 standard deviations of its mean, we could take the normal prior to have \(\mu \pm 3\sigma\) equal to \(\pm1.1\), or \(\mu = 0\) and \(\sigma \approx 1.1/3 = 0.37\). To obtain posterior intervals, we use the brms package in R to model the logit transform of \(\pi\) in terms of the state identification.
Election <- read.table("Election2020ungrouped.dat", header=TRUE)
head(Election)
library(brms)
fit.bayes <- brm(y ~ -1 + factor(state), family=bernoulli(link=logit), data=Election,
refresh = 0, prior = prior(normal(0,0.37), class=b))
## Compiling Stan program...
## Start sampling
print(fit.bayes, digits=3) # Alaska posterior mean logit = -0.129
## Family: bernoulli
## Links: mu = logit
## Formula: y ~ -1 + factor(state)
## Data: Election (Number of observations: 1000)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## factorstateAK -0.127 0.348 -0.796 0.555 1.002 7405 3398
## factorstateAL -0.316 0.301 -0.909 0.274 1.001 6923 2955
## factorstateAR -0.110 0.325 -0.750 0.522 1.000 6896 2832
## factorstateAZ 0.202 0.291 -0.374 0.790 1.000 7336 2768
## factorstateCA 0.369 0.172 0.043 0.712 1.001 6488 3136
## factorstateCO 0.166 0.288 -0.404 0.746 1.000 6707 2899
## factorstateCT 0.297 0.316 -0.327 0.915 1.001 8643 3148
## factorstateDC 0.125 0.366 -0.595 0.839 1.000 6335 2715
## factorstateDE 0.062 0.356 -0.636 0.765 1.005 7374 2995
## factorstateFL -0.412 0.198 -0.806 -0.018 1.001 6162 2989
## factorstateGA 0.067 0.260 -0.441 0.568 1.001 8158 3090
## factorstateHI 0.122 0.359 -0.575 0.807 1.003 8063 2789
## factorstateID -0.178 0.340 -0.842 0.480 1.000 8433 3094
## factorstateIL 0.296 0.241 -0.172 0.775 1.003 6624 2997
## factorstateIN 0.204 0.289 -0.381 0.755 1.003 6538 3125
## factorstateIO -0.252 0.306 -0.846 0.365 1.002 7359 2893
## factorstateKS -0.055 0.322 -0.696 0.586 1.001 6822 2704
## factorstateKY -0.179 0.301 -0.746 0.403 1.001 6662 2754
## factorstateLA 0.001 0.303 -0.611 0.586 1.001 7088 2988
## factorstateMA 0.040 0.284 -0.532 0.580 1.001 7501 2444
## factorstateMD 0.296 0.285 -0.274 0.855 1.001 7275 3022
## factorstateME -0.058 0.334 -0.707 0.587 1.001 6768 2744
## factorstateMI 0.344 0.252 -0.139 0.845 1.004 6042 2593
## factorstateMN 0.045 0.281 -0.493 0.597 1.001 7028 2759
## factorstateMO 0.208 0.291 -0.360 0.769 1.001 6572 2820
## factorstateMS 0.216 0.314 -0.376 0.844 1.001 6436 2908
## factorstateMT 0.003 0.348 -0.659 0.686 1.001 7079 2667
## factorstateNC -0.030 0.252 -0.516 0.462 1.000 6594 3261
## factorstateND 0.007 0.348 -0.683 0.699 1.003 6664 2767
## factorstateNE -0.116 0.344 -0.786 0.571 1.000 6984 2780
## factorstateNH 0.170 0.330 -0.460 0.819 1.002 6460 2963
## factorstateNJ 0.175 0.256 -0.337 0.675 1.004 7470 3503
## factorstateNM -0.001 0.345 -0.687 0.678 1.002 6758 2960
## factorstateNV -0.153 0.328 -0.799 0.469 1.001 5741 2922
## factorstateNY 0.411 0.221 -0.020 0.865 1.001 6836 2673
## factorstateOH -0.057 0.258 -0.552 0.439 1.002 6519 2795
## factorstateOK -0.304 0.323 -0.934 0.338 1.001 6984 3005
## factorstateOR 0.228 0.294 -0.347 0.805 1.000 6563 3162
## factorstatePA -0.055 0.233 -0.513 0.406 1.000 7241 2972
## factorstateRI 0.059 0.350 -0.617 0.759 1.002 8058 3259
## factorstateSC -0.003 0.290 -0.557 0.558 1.001 7015 3106
## factorstateSD -0.062 0.360 -0.784 0.659 1.000 5767 2918
## factorstateTN -0.120 0.292 -0.705 0.463 1.000 6412 2827
## factorstateTX -0.101 0.199 -0.485 0.292 1.002 6979 3003
## factorstateUT -0.157 0.326 -0.788 0.490 1.003 7874 2878
## factorstateVA 0.142 0.266 -0.392 0.680 1.004 6934 2978
## factorstateVT 0.133 0.358 -0.568 0.830 1.000 5824 2935
## factorstateWA 0.119 0.265 -0.403 0.635 1.002 7832 2324
## factorstateWI 0.199 0.274 -0.342 0.744 1.002 6451 3029
## factorstateWV -0.292 0.345 -0.968 0.383 1.000 7445 3032
## factorstateWY -0.005 0.371 -0.734 0.723 1.002 7445 2467
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(fit.bayes))/(1 + exp(fixef(fit.bayes)))# Alaska estimated Biden proportion = 0.468
## Estimate Est.Error Q2.5 Q97.5
## factorstateAK 0.4681858 0.5860871 0.3109682 0.6352632
## factorstateAL 0.4215750 0.5747137 0.2871985 0.5680110
## factorstateAR 0.4726228 0.5806029 0.3208246 0.6275355
## factorstateAZ 0.5504309 0.5722914 0.4075261 0.6877787
## factorstateCA 0.5913125 0.5429591 0.5106286 0.6708075
## factorstateCO 0.5413232 0.5715056 0.4004280 0.6782704
## factorstateCT 0.5736763 0.5784440 0.4190782 0.7140294
## factorstateDC 0.5311975 0.5906072 0.3554713 0.6982241
## factorstateDE 0.5155996 0.5881897 0.3460605 0.6824782
## factorstateFL 0.3985014 0.5494008 0.3088173 0.4955420
## factorstateGA 0.5168315 0.5647461 0.3914980 0.6383472
## factorstateHI 0.5303417 0.5887076 0.3601553 0.6914281
## factorstateID 0.4557211 0.5842925 0.3011608 0.6177167
## factorstateIL 0.5734240 0.5599927 0.4570649 0.6845089
## factorstateIN 0.5507556 0.5717867 0.4059120 0.6803537
## factorstateIO 0.4373951 0.5758553 0.3002518 0.5901834
## factorstateKS 0.4862079 0.5799294 0.3327878 0.6424037
## factorstateKY 0.4552992 0.5746197 0.3216815 0.5993779
## factorstateLA 0.5002403 0.5751929 0.3519336 0.6423518
## factorstateMA 0.5099844 0.5704918 0.3700017 0.6410241
## factorstateMD 0.5734397 0.5706610 0.4318236 0.7015441
## factorstateME 0.4853814 0.5827061 0.3301862 0.6426925
## factorstateMI 0.5851374 0.5626283 0.4654126 0.6995252
## factorstateMN 0.5111656 0.5699129 0.3792896 0.6449530
## factorstateMO 0.5519142 0.5721220 0.4109254 0.6834034
## factorstateMS 0.5536915 0.5779392 0.4071646 0.6993466
## factorstateMT 0.5008307 0.5860597 0.3410266 0.6650377
## factorstateNC 0.4923909 0.5625466 0.3738710 0.6135764
## factorstateND 0.5017260 0.5862111 0.3356250 0.6679328
## factorstateNE 0.4710492 0.5852288 0.3131143 0.6390303
## factorstateNH 0.5424281 0.5816403 0.3869220 0.6940141
## factorstateNJ 0.5436278 0.5637228 0.4165364 0.6626930
## factorstateNM 0.4997995 0.5853537 0.3345897 0.6633523
## factorstateNV 0.4619239 0.5812927 0.3102529 0.6152047
## factorstateNY 0.6014457 0.5550674 0.4950174 0.7037436
## factorstateOH 0.4858735 0.5642551 0.3653363 0.6081286
## factorstateOK 0.4246635 0.5799794 0.2821768 0.5837510
## factorstateOR 0.5568056 0.5729947 0.4140851 0.6911431
## factorstatePA 0.4863316 0.5578693 0.3746026 0.6002464
## factorstateRI 0.5147141 0.5867006 0.3504670 0.6812263
## factorstateSC 0.4991797 0.5719535 0.3642510 0.6359850
## factorstateSD 0.4846082 0.5890497 0.3134561 0.6589824
## factorstateTN 0.4700937 0.5723713 0.3306614 0.6138267
## factorstateTX 0.4747691 0.5497056 0.3811374 0.5725632
## factorstateUT 0.4608449 0.5807910 0.3125964 0.6200614
## factorstateVA 0.5355606 0.5661266 0.4033266 0.6637902
## factorstateVT 0.5332601 0.5885822 0.3617889 0.6963712
## factorstateWA 0.5296369 0.5657452 0.4004778 0.6535576
## factorstateWI 0.5495514 0.5681782 0.4154326 0.6779303
## factorstateWV 0.4275097 0.5853337 0.2753343 0.5946661
## factorstateWY 0.4986431 0.5917653 0.3243641 0.6733495