Coding Examples Chapter 2

These examples and code are taken from the upcoming book ‘Bayesian Statistics: A Primer for Data Scientists, with R and Python’ by Agresti, Kateri and Mira.

Chapter 2

Example: Bayes Estimates with Three Prior Distributions and the Same Data

In this example we find Bayesian posterior mean estimates of the probability \(\pi\) of success for each of three binomial experiments that have the same data, \(y = 10\) successes in \(n = 10\) trials: (1) a British woman claims that in tasting a cup of tea with milk, she can tell whether the milk was poured before or after the tea was poured; (2) A professor of 18th century musicology claims to be able to tell for any pair of pages of music, one composed by Mozart and one by Haydn, who composed each. (3) A person in a drunken state claims to be able to predict whether your flip of a balanced coin will result in a head or in a tail.

For the tea taster, if we have no a priori reason to expect any particular value for \(\pi\), we could select a uniform prior distribution, which is the Beta(1, 1) distribution. By contrast, for coin flips we may be highly skeptical of any claim about predictions and decide to use a beta prior distribution with mean 0.50 but with small standard deviation, say 0.03, so that the beta distribution is highly concentrated near 0.50. We might have greater faith in the musicologist’s claim, perhaps using a beta prior distribution with a mean of 0.90 and standard deviation 0.10.

alpha1 = 138; alpha2 = 138 # beta hyperparameters for predictor of coin flips
alpha1/(alpha1+alpha2) # mean of beta prior distribution

## [1] 0.5

sqrt((alpha1*alpha2)/((alpha1+alpha2)^2 * (alpha1+alpha2+1)))

## [1] 0.03004209

# standard deviation of beta prior distribution 

alpha1 = 7; alpha2 = 7/9 # beta hyperparameters for musicologist
alpha1/(alpha1+alpha2) # mean of beta prior distribution

## [1] 0.9

sqrt((alpha1*alpha2)/((alpha1+alpha2)^2 * (alpha1+alpha2+1)))

## [1] 0.1012579

# standard deviation of beta prior distribution

We now find the posterior mean \(\tilde{\pi}\) for the three chosen prior distributions:

alpha1 <- 1; alpha2 <- 1  # hyperparameter values for uniform prior distribution
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for British tea taster

## [1] 0.9166667

qbeta(0.5, alpha1post, alpha2post) # posterior median for British tea taster

## [1] 0.9389309

alpha1 = 138; alpha2 = 138 # beta hyperparameters with mean 0.50, standard dev. 0.03
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for predictor of coin flips

## [1] 0.5174825

alpha1 = 7; alpha2 = 7/9 # beta hyperparameters with mean 0.90, standard dev. 0.10
alpha1post <- 10 + alpha1; alpha2post <- alpha2 # hyperpara's for posterior when y=n=10
alpha1post/(alpha1post + alpha2post) # beta posterior mean for musicologist

## [1] 0.95625

qbeta(0.5, alpha1post, alpha2post) # posterior median for musicologist

## [1] 0.971962

With these prior distributions, the posterior mean estimates of \(\pi\) are 0.917 for the tea taster and 0.956 for the musicologist but only 0.517 for the predictor of coin flips. The posterior mean for coin flipping is very close to the prior mean of 0.50, the data having relatively little influence when the prior distribution is so sharp.

Example: Do you believe in hell?

The 2018 General Social Survey in the U.S. asked “Do you believe in hell?” The percentage who responded yes was 74.0% for the 727 subjects with a high school education or less, 70.1% for the 304 subjects with a junior college or bachelor’s degree, and 56.8% for the 111 subjects with a graduate degree. We shall find posterior intervals for the population proportion \(\pi\) of those with a graduate degree who would respond yes.

qbeta(c(0.025, 0.975), 63 + 1, 48 + 1)

## [1] 0.4744648 0.6560523

# quantiles of beta posterior dist. for 95% equal-tail percentile int. with uniform prior

qbeta(c(0.025, 0.975), 63.5, 48.5)

## [1] 0.4746530 0.6570094

# quantiles of beta posterior dist. for 95% equal-tail percentile int. with Jeffreys prior

library(binom)
binom.bayes(63, 111, conf.level=0.95, type="central", alpha1=1, alpha2=1)

# 95% equal-tail percentile int. for uniform prior dist.

binom.bayes(63, 111, conf.level=0.95, alpha1=1, alpha2=1)

# 95% HPD interval for uniform prior dist.

binom.bayes(63, 111, conf.level=0.95, alpha1=0.5, alpha2=0.5)

# 95% HPD interval for Jeffreys prior dist.

binom.confint(63, 111, conf.level=0.95, method="asymptotic") # 95% Wald confidence int.

binom.confint(63, 111, conf.level=0.95, method="wilson")

# 95% score confidence int. (better than Wald CI if n small or pi near 0 or 1)

Influence of Sample Size and Prior Distribution on Posterior Intervals

As the sample size \(n\) increases, the influence of the prior distribution weakens and the posterior distribution has appearance closer to that of the likelihood function. In the limit, Bayes estimates of \(\pi\) become more similar to the ML estimate and Bayesian posterior intervals become more similar to frequentist confidence intervals. We illustrate this with a sequence of cases in which \(\hat{\pi} = 0.70\), computing for increasing values of n the frequentist Wald confidence interval and the Bayesian HPD interval obtained with uniform prior distribution (\(\alpha_1 = \alpha_2 = 1\)):

library(binom)
binom.confint(7, 10, conf.level=0.95, method="asymptotic") # n = 10

# frequentist Wald CI

binom.bayes(7, 10, conf.level=0.95, alpha1=1, alpha2=1)

# Bayesian HPD interval for uniform prior distribution
                                              
binom.confint(70, 100, conf.level=0.95, method="asymptotic") # n = 100

# frequentist Wald CI

binom.bayes(70, 100, conf.level=0.95, alpha1=1, alpha2=1)

# Bayesian HPD interval

binom.confint(7000, 10000, conf.level=0.95, method="asymptotic") # n=10000

# frequentist Wald CI

binom.bayes(7000, 10000, conf.level=0.95, alpha1=1, alpha2=1)

# Bayesian HPD interval

For a particular sample size n (set to 100 in the R code below), a Bayesian interval is more similar to the frequentist confidence interval as the prior distribution is more diffuse, that is, as \(\alpha_1 = \alpha_2 = \alpha\) decreases toward 0:

library(binom)
binom.bayes(70, 100, conf.level=0.95, alpha1=100, alpha2=100)  # beta hyperpara's=100

# narrow interval when prior dist. has small variance

binom.bayes(70, 100, conf.level=0.95, alpha1=10, alpha2=10)    # beta hyperpara's=10

binom.bayes(70, 100, conf.level=0.95, alpha1=1, alpha2=1)      # beta hyperpara's=1

binom.bayes(70, 100, conf.level=0.95, alpha1=0.1, alpha2=0.1)  # beta hyperpara's=0.1

binom.bayes(70, 100, conf.level=0.95, alpha1=0.01, alpha2=0.01) # beta hyperpara's=0.01

binom.confint(70, 100, conf.level=0.95, method="asymptotic") # frequentist Wald CI

With the uniform prior, the posterior distribution is beta with hyperparameters \(\alpha_{1}^* = y + 1 = 64\) and \(\alpha_{2}^* = n - y + 1 = 49\). We next use this posterior distribution to find \(P(\pi < 0.50 \mid y)\) and \(P(\pi > 0.50 \mid y)\):

pbeta(0.50, 64, 49) # cumulative probability at 0.50 for beta posterior distribution

## [1] 0.07803315

1 - pbeta(0.50, 64, 49) # posterior P(pi > 0.50 | y)

## [1] 0.9219668

There is substantial but not overly strong evidence that the majority believe in hell.

A corresponding frequentist statistical inference reports the \(P\)-value

1 - pbinom(62, 111, 0.50) # one-sided (right-tail) P-value for Binom(111, 0.50) dist.

## [1] 0.09182859

Bayesian Prediction of Future Observations

Returning to the example of the British tea taster we illustrate the concept of Bayesian prediction of future observations. After observing 10 successful guesses in 10 cups, combined with a uniform prior distribution we obtained a posterior beta distribution for the probability of a successful prediction, with \(\alpha_{1}^{*} = 11\) and \(\alpha_{2}^{*} = 1\), for which the posterior mean is \(\mu^{*} = 11/12 = 0.9167\). For the next \(n_{_{f}} = 5\) cups, here is the beta-binomial distribution for the number of correct guesses \(Y_{_f}\):

library(extraDistr)
dbbinom(c(0,1,2,3,4,5), 5, alpha = 11, beta = 1) # displays P(Y_f = 0), P(Y_f = 1), ... , P(Y_f = 5)

## [1] 0.0002289377 0.0025183150 0.0151098901 0.0654761905 0.2291666667
## [6] 0.6875000000

The expected number of successful guesses is \(n_{_{f}}\mu^{*} = 5(11/12) = 4.58\), and the probability is 0.6875 of getting all five correct.

Example: Do you believe in hell? - continued

We are now comparing the proportions of females and of males who believe in hell, again using data from the 2018 GSS. The sample proportions are \(\hat{\pi}_1 = 498/674 = 0.739\) for females and \(\hat{\pi}_2 = 316/468 = 0.675\) for males. With independent binomial samples and independent beta prior distributions for \(\pi_1\) and \(\pi_2\), we can form a Bayesian posterior interval to determine the plausible values of \(\pi_1 - \pi_2\). The following R code shows a posterior 95% percentile interval based on simulating 50 million random draws from each posterior beta density. We also show results using R functions to simulate the HPD interval and to construct the frequentist 95% Wald and score confidence intervals:

pi1 <- rbeta(5000000, 499, 177) # simulating from posterior dist. of pi_1
pi2 <- rbeta(5000000, 317, 153) # simulating from posterior dist. of pi_2 
quantile(pi1 - pi2, c(0.025, 0.975)) # 0.025 and 0.975 quantiles of simulated difference

##       2.5%      97.5% 
## 0.01018281 0.11759764

# simulated 95% percentile interval for pi1 - pi2

hist(pi1 - pi2)                  # posterior distribution is bell-shaped

plot(density(pi1 - pi2), ylab="Density", xlab="Difference of Probabilities", main="")

#install.packages('HDInterval')
library(HDInterval)
hdi(pi1 - pi2, credMass=0.95) # simulated 95% HPD interval for pi1 - pi2

##      lower      upper 
## 0.01015136 0.11756478 
## attr(,"credMass")
## [1] 0.95

prop.test(c(498, 316), c(674, 468), correct=FALSE) # data c(y1, y2), c(n1, n2)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(498, 316) out of c(674, 468)
## X-squared = 5.4675, df = 1, p-value = 0.01937
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.009809589 0.117507867
## sample estimates:
##    prop 1    prop 2 
## 0.7388724 0.6752137

# frequentist 95% Wald CI for pi1 - pi2

#install.packages('PropCIs')
library(PropCIs)
diffscoreci(498, 674, 316, 468, conf.level=0.95) # arguments y1, n1, y2, n2

## 
## 
## 
## data:  
## 
## 95 percent confidence interval:
##  0.01019264 0.11786252

# frequentist 95% score CI for pi1 - pi2

quantile(pi1/pi2, c(0.025, 0.975)) # simulated 95% percentile interval for risk ratio

##     2.5%    97.5% 
## 1.014403 1.184039

Using a simulated posterior distribution for \(\pi_1 - \pi_2\), we can precisely approximate posterior probabilities such as \(P(\pi_1 > \pi_2 | y_1,y_2)\) by the proportion of the simulated joint distribution for which \(\pi_1 > \pi_2\). For analyzing belief in hell by gender, we used uniform prior distributions. As seen in the following R code, the posterior probability that females are more likely than males to believe in hell is 0.990. It seems highly likely that \(\pi_1 > \pi_2\).

pi1 <- rbeta(5000000, 499, 177) 
pi2 <- rbeta(5000000, 317, 153) 
mean(pi1 < pi2); mean(pi1 > pi2) # simulated posterior P(pi1 < pi2), P(pi1 > pi2)

## [1] 0.0098148

## [1] 0.9901852

prop.test(c(498, 316), c(674, 468), correct=FALSE) # data c(y1, y2), c(n1, n2)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(498, 316) out of c(674, 468)
## X-squared = 5.4675, df = 1, p-value = 0.01937
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.009809589 0.117507867
## sample estimates:
##    prop 1    prop 2 
## 0.7388724 0.6752137

# X-squared = chi-squared statistic = square of z test stat; doesn't show direction

The R output also shows the result of the frequentist significance test of \(H_0: \pi_1 = \pi_2\). It is a chi-squared statistic that is the square of the \(z\) test statistic.

In practice, the number of groups to be compared often exceeds two. We can construct the inferences so that the error probability applies to the entire family of inferences rather than to each individual one. The method for constructing simultaneous posterior intervals is called the Bonferroni method. To illustrate, for the 2018 General Social Survey responses to “Do you believe in hell?” by education level, the (yes, no) totals were (538, 189) for those with at most a high-school education, (213, 91) for those with a junior college or bachelor’s degree, and (63, 48) for those with a graduate degree. When we use posterior probability 0.98 for each of the 3 comparisons of pairs of education levels, the entire set of 3 posterior intervals holds with probability at least 1-3(0.02) = 0.94. We next simulate percentile intervals to do this, using the posterior distributions generated with uniform prior distributions:

pi1 <- rbeta(5000000, 600+1, 1635-600+1) # posterior dist. with uniform prior for white
pi2 <- rbeta(5000000, 100+1, 368-100+1)  # posterior dist. with uniform prior for black
pi3 <- rbeta(5000000, 54+1, 277-54+1)    # posterior dist. with uniform prior for other
quantile(pi1 - pi2, c(0.01, 0.99)) # posterior probability = 0.99 - 0.01 = 0.98

##         1%        99% 
## 0.03230678 0.15312961

quantile(pi1 - pi3, c(0.01, 0.99))

##        1%       99% 
## 0.1057531 0.2293427

quantile(pi2 - pi3, c(0.01, 0.99))

##           1%          99% 
## -0.002094133  0.152297242

At this probability level, we conclude that belief in hell is less common for those with a graduate degree than for the other two education levels, because the posterior intervals for \(\pi_1 - \pi_3\) and for \(\pi_2 - \pi_3\) contain only positive values. For example, we infer that the population proportion believing in hell is between 0.06 and 0.29 higher for those with at most a high school degree than for those with a graduate degree. The intervals are wide, and belief in hell could be substantially less common for those with a graduate degree. However, it is plausible that the proportions believing in hell are equal for the first two education levels, because the posterior interval for \(\pi_1 - \pi_2\) contains 0.

Example: What percentage of people are very happy?

The 2021 General Social Survey asked respondents whether they considered themselves to be very happy, pretty happy, or not too happy. The sample counts of the 4003 respondents in the (very happy, pretty happy, not too happy) categories were (778, 2304, 921), for percentages (19.4, 57.6, 23.0). We shall find posterior intervals for the proportions of adults in the U.S. who characterize themselves in each category. Here are 99% percentile intervals for \(\pi_1\), \(\pi_2\), and \(\pi_3\):

qbeta(c(0.005, 0.995), 779, 3227)  # quantiles, beta hyperparameters for very happy

## [1] 0.1786434 0.2108461

#   99% posterior percentile interval

qbeta(c(0.005, 0.995), 2305, 1701) # quantiles, beta hyperparameters for pretty happy

## [1] 0.5552075 0.5954250

#   99% posterior percentile interval

qbeta(c(0.005, 0.995), 922, 3084)  # quantiles, beta hyperparameters for not too happy

## [1] 0.2132832 0.2475323

#   99% posterior percentile interval

For example, the probability is 0.99 that the population proportion who would report being very happy is between 0.179 and 0.211. By the Bonferroni method, all three posterior intervals contain the true population proportion values with probability at least 1-3(0.01) = 0.97.

Bayes Factor for Evaluating Plausibility of Independence

Bayesian methods also apply for evaluating \(H_0\): independence. For example the Bayes factor compares two hypotheses or models corresponding to the hypotheses. The Bayes factor in favor of the statistical dependence model over the independence one has a complex expression involving several gamma functions. Its formula does not provide intuition and we omit it, but it is provided by R as shown in the following code:

data <- matrix(c(282,324,172,670,966,668,198,301,422),c(3,3)) # 3 rows and 3 columns
data

##      [,1] [,2] [,3]
## [1,]  282  670  198
## [2,]  324  966  301
## [3,]  172  668  422

chisq.test(data) # Pearson chi-squared test

## 
##  Pearson's Chi-squared test
## 
## data:  data
## X-squared = 133.46, df = 4, p-value < 2.2e-16

library(BayesFactor)

## Loading required package: coda

## Loading required package: Matrix

## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "ndiMatrix" of class "replValueSp"; definition not updated

## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey ([email protected]).
## 
## Type BFManual() to open the manual.
## ************

BF <- contingencyTableBF(data, priorConcentration = 1, sampleType = "jointMulti")
# priorConcentration = 1 is default uniform Dirichlet dist's encoding independence
# use sampleType = "indepMulti" for independent multinomial sampling in the rows
BF

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 3.305774e+23 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, joint multinomial

We estimate that the data would be \(3.3 \cdot 10^{23}\) times more likely under the dependence model than under the independence one. This is extremely strong evidence against the hypothesis that happiness and income are statistically independent.

Example: Election

This example simulates a sample of 1000 people to mimic a poll taken before the 2020 U.S. presidential election, in which the Democratic candidate Joseph Biden faced the Republican candidate Donald Trump. Let \(\pi_i\) be the population proportion in the actual election that voted for Biden in state \(i\) (\(i = 1, \dots , 51\), where \(i = 8\) is District of Columbia), out of those that voted for Biden or Trump. We take the number of observations \(n_i\) in the simulated sample from state i proportional to the number of people in state i who voted in that election, subject to \(\sum_i n_i = 1000\). The simulated number \(y_i\) voting for Biden is the outcome of a Binom(\(n_i,\pi_i\)) random variable.

Many states have small \(n_i\), and we can obtain better estimates by borrowing from the whole data set. The essence of the empirical Bayes approach with a beta prior distribution is that we regard the data as being generated in the following way: 51 randomly generated values from a Beta(\(\alpha_1,\alpha_2\)) distribution determine the population proportions \(\{\pi_i\}\) voting for Biden. Then, a randomly generated value from the Binom(\(n_i,\pi_i\)) distribution yields the number \(y_i\) in the sample of size \(n_i\) from state \(i\) voting for Biden, \(i = 1, \dots , 51\). The sample proportions \(\{\hat{\pi}_i = y_i/n_i\}\) exhibit greater variability than if the \(\{y_i\}\) came from outcomes of {Binom(\(n_i,\pi\))} binomial random variables that all had the same success probability \(\pi\). We do not know \(\alpha_1\) and \(\alpha_2\) for the beta distribution that generated the \(\{\pi_j\}\), but once we observe the counts \(\{y_i\}\) and sample size indices \(\{n_i\}\), software can use iterative methods to find marginal ML estimates \(\alpha_1\) and \(\alpha_2\) by maximizing the beta-binomial likelihood function for the data. Then we perform ordinary Bayesian inference using the Beta(\(\hat{\alpha}_1,\hat{\alpha}_2\)) prior distribution. The posterior mean estimate of a population proportion \(\pi_i\) is a weighted average of the sample proportion \(\hat{\pi}_i\) and the mean \(\hat{\alpha}_1/(\hat{\alpha}_1+\hat{\alpha}_2)\) of the estimated beta prior distribution.

For the election polling data, the following R code shows that maximizing the marginal beta-binomial likelihood yields estimated hyperparameters for the beta distribution of \(\hat{\alpha}_1 = 32.49\) and \(\hat{\alpha}_2 = 29.39\). The following code shows results for the first five states in the list.

Election <- read.table("Election2020.dat", header=TRUE)
head(Election, 5)

library(VGAM)                  # package that can find ML fit of beta-binomial dist.

## Warning: package 'VGAM' was built under R version 4.3.3

## Loading required package: stats4

## Loading required package: splines

## 
## Attaching package: 'VGAM'

## The following object is masked from 'package:coda':
## 
##     nvar

## The following objects are masked from 'package:extraDistr':
## 
##     dfrechet, dgev, dgompertz, dgpd, dgumbel, dhuber, dkumar, dlaplace,
##     dlomax, dpareto, drayleigh, dskellam, dslash, pfrechet, pgev,
##     pgompertz, pgpd, pgumbel, phuber, pkumar, plaplace, plomax,
##     ppareto, prayleigh, pslash, qfrechet, qgev, qgompertz, qgpd,
##     qgumbel, qhuber, qkumar, qlaplace, qlomax, qpareto, qrayleigh,
##     rfrechet, rgev, rgompertz, rgpd, rgumbel, rhuber, rkumar, rlaplace,
##     rlomax, rpareto, rrayleigh, rskellam, rslash

fit.bb <- vglm(cbind(y, n-y) ~ 1, betabinomialff, data=Election) # Fits beta binomial
Coef(fit.bb)                   #    to find marginal ML estimates of beta hyperpara's

##   shape1   shape2 
## 32.49346 29.38929

#   for empirical Bayes approach
#Estimated beta hyperparameters
Popul.prop <- Election$pi; State <- Election$state
Sample.prop <- Election$y/Election$n
Emp.Bayes.est <- (Election$y+Coef(fit.bb)[1])/(Election$n+Coef(fit.bb)[1]+Coef(fit.bb)[2])
Results <- data.frame(State, Popul.prop, Sample.prop, Emp.Bayes.est)
head(Results, 5)

sqrt((sum(Election$n*(Sample.prop - Popul.prop)^2))/1000)

## [1] 0.09737371

#root mean square of sample proportions around population proportions
sqrt((sum(Election$n*(Emp.Bayes.est - Popul.prop)^2))/1000)

## [1] 0.0737974

#root mean square of empirical Bayes estimates around population prop's

Example: Do you believe in hell? - continued Part 2

The normal distribution has a bell-shaped pdf that is symmetric. We denote the normal distribution with mean \(\mu\) and variance \(\sigma^2\) by \(N(\mu,\sigma^2)\), where \(\mu\) can be any real number and \(\sigma > 0\). When we use the normal distribution as a prior distribution for \(\log[\pi/(1 - \pi)]\), we refer to the distribution for \(\pi\) itself as the logit-normal distribution. When we take \(\mu = 0\) for the normal distribution, the corresponding logit-normal distribution on the \(\pi\) scale has pdf with a symmetric shape around 0.50. It is unimodal when \(\sigma^2 \leq 2\) and bimodal when \(\sigma^2 > 2\), but always tapering off toward 0 as \(\pi\) approaches 0 or 1. Specifically, it is mound-shaped for \(\sigma = 1\), roughly uniform except near the boundaries when \(\sigma \approx 1.5\), and with more pronounced peaks for the modes when \(\sigma\) is about 2 or larger.

We will not consider here the methodology for finding the posterior distribution for the logit parameter, \(L = \log[\pi/(1 - \pi)]\),. However, it is simple to construct it using software. Once we do this, a posterior interval for \(L\) yields a posterior interval for \(\pi\) using the inverse transformation, \[ \pi = e^L/(1 + e^L). \]

To illustrate, the same GSS that asked about belief in hell also asked respondents if they believed in an afterlife. Of 1129 respondents, 904 said yes and 225 said no. We next find posterior intervals for the population proportion \(\pi\) saying yes using Bayesian approaches with (i) a uniform prior distribution for \(\pi\), and (ii) a \(N(0,\sigma^2)\) prior distribution for \(\log[\pi/(1 - \pi)]\) with \(\sigma = 2\). Here, we first use the binom package in R with the uniform prior, and then the brms (Bayesian Regression Models using ’Stan’) package. The brms package uses ‘bernoulli’ to specify the binomial with a single trial and uses ‘normal(mu, sigma)’ to specify a normal distribution, with the standard deviation rather than the variance in the second argument.

library(binom)
binom.bayes(904, 1129, conf.level=0.95, type="central", alpha1=1, alpha2=1)

# 95% posterior interval for pi for uniform prior

y <- c(rep(1, 904), rep(0, 225)) # the sample of 904 `yes' and 225 `no' outcomes 
Results <- data.frame(y) 
library(brms)

## Loading required package: Rcpp

## Loading 'brms' package (version 2.21.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').

## 
## Attaching package: 'brms'

## The following objects are masked from 'package:VGAM':
## 
##     acat, cratio, cumulative, dfrechet, dirichlet, exponential,
##     frechet, geometric, lognormal, multinomial, negbinomial, pfrechet,
##     qfrechet, rfrechet, s, sratio

## The following objects are masked from 'package:extraDistr':
## 
##     ddirichlet, dfrechet, pfrechet, qfrechet, rdirichlet, rfrechet

## The following object is masked from 'package:stats':
## 
##     ar

fit.bayes <- brm(y ~ 1, family=bernoulli(link=identity), refresh = 0,
                 data=Results, prior = prior(uniform(0,1), lb=0, ub=1, class=Intercept), silent = 2)
print(fit.bayes, digits=3)

##  Family: bernoulli 
##   Links: mu = identity 
## Formula: y ~ 1 
##    Data: Results (Number of observations: 1129) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
## Intercept    0.800     0.012    0.775    0.823 1.001     1418     1804
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

# 95% post. int. for pi for uniform prior

fit.bayes2 <- brm(y ~ 1, family=bernoulli(link=logit), refresh = 0,
                  data=Results, prior = prior(normal(0,2), class=Intercept), silent = 2)
print(fit.bayes2, digits=3)

##  Family: bernoulli 
##   Links: mu = logit 
## Formula: y ~ 1 
##    Data: Results (Number of observations: 1129) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
## Intercept    1.389     0.076    1.243    1.540 1.004     1304     1991
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

# 95% posterior interval for logit using normal prior

exp(fixef(fit.bayes2))/(1 + exp(fixef(fit.bayes2)))

##            Estimate Est.Error      Q2.5     Q97.5
## Intercept 0.8003536 0.5189663 0.7761063 0.8234435

# 95% post. interval for proportion using normal prior for logit

Example: Election - continued

Rather than use noninformative normal priors with large \(\sigma\), we could select a relatively small \(\sigma\) based on historical information. Past elections suggest that all (or nearly all) states tend to have proportion \(\pi_i\) favoring the Democratic candidate between about 0.25 and 0.75, for which logit(\(\pi_i\)) falls between -1.1 and +1.1. Since nearly the entire normal distribution falls within 3 standard deviations of its mean, we could take the normal prior to have \(\mu \pm 3\sigma\) equal to \(\pm1.1\), or \(\mu = 0\) and \(\sigma \approx 1.1/3 = 0.37\). To obtain posterior intervals, we use the brms package in R to model the logit transform of \(\pi\) in terms of the state identification.

Election <- read.table("Election2020ungrouped.dat", header=TRUE)
head(Election)

library(brms)
fit.bayes <- brm(y ~ -1 + factor(state), family=bernoulli(link=logit), data=Election,
                 refresh = 0, prior = prior(normal(0,0.37), class=b))

## Compiling Stan program...

## Start sampling

print(fit.bayes, digits=3) # Alaska posterior mean logit = -0.129

##  Family: bernoulli 
##   Links: mu = logit 
## Formula: y ~ -1 + factor(state) 
##    Data: Election (Number of observations: 1000) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##               Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
## factorstateAK   -0.127     0.348   -0.796    0.555 1.002     7405     3398
## factorstateAL   -0.316     0.301   -0.909    0.274 1.001     6923     2955
## factorstateAR   -0.110     0.325   -0.750    0.522 1.000     6896     2832
## factorstateAZ    0.202     0.291   -0.374    0.790 1.000     7336     2768
## factorstateCA    0.369     0.172    0.043    0.712 1.001     6488     3136
## factorstateCO    0.166     0.288   -0.404    0.746 1.000     6707     2899
## factorstateCT    0.297     0.316   -0.327    0.915 1.001     8643     3148
## factorstateDC    0.125     0.366   -0.595    0.839 1.000     6335     2715
## factorstateDE    0.062     0.356   -0.636    0.765 1.005     7374     2995
## factorstateFL   -0.412     0.198   -0.806   -0.018 1.001     6162     2989
## factorstateGA    0.067     0.260   -0.441    0.568 1.001     8158     3090
## factorstateHI    0.122     0.359   -0.575    0.807 1.003     8063     2789
## factorstateID   -0.178     0.340   -0.842    0.480 1.000     8433     3094
## factorstateIL    0.296     0.241   -0.172    0.775 1.003     6624     2997
## factorstateIN    0.204     0.289   -0.381    0.755 1.003     6538     3125
## factorstateIO   -0.252     0.306   -0.846    0.365 1.002     7359     2893
## factorstateKS   -0.055     0.322   -0.696    0.586 1.001     6822     2704
## factorstateKY   -0.179     0.301   -0.746    0.403 1.001     6662     2754
## factorstateLA    0.001     0.303   -0.611    0.586 1.001     7088     2988
## factorstateMA    0.040     0.284   -0.532    0.580 1.001     7501     2444
## factorstateMD    0.296     0.285   -0.274    0.855 1.001     7275     3022
## factorstateME   -0.058     0.334   -0.707    0.587 1.001     6768     2744
## factorstateMI    0.344     0.252   -0.139    0.845 1.004     6042     2593
## factorstateMN    0.045     0.281   -0.493    0.597 1.001     7028     2759
## factorstateMO    0.208     0.291   -0.360    0.769 1.001     6572     2820
## factorstateMS    0.216     0.314   -0.376    0.844 1.001     6436     2908
## factorstateMT    0.003     0.348   -0.659    0.686 1.001     7079     2667
## factorstateNC   -0.030     0.252   -0.516    0.462 1.000     6594     3261
## factorstateND    0.007     0.348   -0.683    0.699 1.003     6664     2767
## factorstateNE   -0.116     0.344   -0.786    0.571 1.000     6984     2780
## factorstateNH    0.170     0.330   -0.460    0.819 1.002     6460     2963
## factorstateNJ    0.175     0.256   -0.337    0.675 1.004     7470     3503
## factorstateNM   -0.001     0.345   -0.687    0.678 1.002     6758     2960
## factorstateNV   -0.153     0.328   -0.799    0.469 1.001     5741     2922
## factorstateNY    0.411     0.221   -0.020    0.865 1.001     6836     2673
## factorstateOH   -0.057     0.258   -0.552    0.439 1.002     6519     2795
## factorstateOK   -0.304     0.323   -0.934    0.338 1.001     6984     3005
## factorstateOR    0.228     0.294   -0.347    0.805 1.000     6563     3162
## factorstatePA   -0.055     0.233   -0.513    0.406 1.000     7241     2972
## factorstateRI    0.059     0.350   -0.617    0.759 1.002     8058     3259
## factorstateSC   -0.003     0.290   -0.557    0.558 1.001     7015     3106
## factorstateSD   -0.062     0.360   -0.784    0.659 1.000     5767     2918
## factorstateTN   -0.120     0.292   -0.705    0.463 1.000     6412     2827
## factorstateTX   -0.101     0.199   -0.485    0.292 1.002     6979     3003
## factorstateUT   -0.157     0.326   -0.788    0.490 1.003     7874     2878
## factorstateVA    0.142     0.266   -0.392    0.680 1.004     6934     2978
## factorstateVT    0.133     0.358   -0.568    0.830 1.000     5824     2935
## factorstateWA    0.119     0.265   -0.403    0.635 1.002     7832     2324
## factorstateWI    0.199     0.274   -0.342    0.744 1.002     6451     3029
## factorstateWV   -0.292     0.345   -0.968    0.383 1.000     7445     3032
## factorstateWY   -0.005     0.371   -0.734    0.723 1.002     7445     2467
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

exp(fixef(fit.bayes))/(1 + exp(fixef(fit.bayes)))# Alaska estimated Biden proportion = 0.468

##                Estimate Est.Error      Q2.5     Q97.5
## factorstateAK 0.4681858 0.5860871 0.3109682 0.6352632
## factorstateAL 0.4215750 0.5747137 0.2871985 0.5680110
## factorstateAR 0.4726228 0.5806029 0.3208246 0.6275355
## factorstateAZ 0.5504309 0.5722914 0.4075261 0.6877787
## factorstateCA 0.5913125 0.5429591 0.5106286 0.6708075
## factorstateCO 0.5413232 0.5715056 0.4004280 0.6782704
## factorstateCT 0.5736763 0.5784440 0.4190782 0.7140294
## factorstateDC 0.5311975 0.5906072 0.3554713 0.6982241
## factorstateDE 0.5155996 0.5881897 0.3460605 0.6824782
## factorstateFL 0.3985014 0.5494008 0.3088173 0.4955420
## factorstateGA 0.5168315 0.5647461 0.3914980 0.6383472
## factorstateHI 0.5303417 0.5887076 0.3601553 0.6914281
## factorstateID 0.4557211 0.5842925 0.3011608 0.6177167
## factorstateIL 0.5734240 0.5599927 0.4570649 0.6845089
## factorstateIN 0.5507556 0.5717867 0.4059120 0.6803537
## factorstateIO 0.4373951 0.5758553 0.3002518 0.5901834
## factorstateKS 0.4862079 0.5799294 0.3327878 0.6424037
## factorstateKY 0.4552992 0.5746197 0.3216815 0.5993779
## factorstateLA 0.5002403 0.5751929 0.3519336 0.6423518
## factorstateMA 0.5099844 0.5704918 0.3700017 0.6410241
## factorstateMD 0.5734397 0.5706610 0.4318236 0.7015441
## factorstateME 0.4853814 0.5827061 0.3301862 0.6426925
## factorstateMI 0.5851374 0.5626283 0.4654126 0.6995252
## factorstateMN 0.5111656 0.5699129 0.3792896 0.6449530
## factorstateMO 0.5519142 0.5721220 0.4109254 0.6834034
## factorstateMS 0.5536915 0.5779392 0.4071646 0.6993466
## factorstateMT 0.5008307 0.5860597 0.3410266 0.6650377
## factorstateNC 0.4923909 0.5625466 0.3738710 0.6135764
## factorstateND 0.5017260 0.5862111 0.3356250 0.6679328
## factorstateNE 0.4710492 0.5852288 0.3131143 0.6390303
## factorstateNH 0.5424281 0.5816403 0.3869220 0.6940141
## factorstateNJ 0.5436278 0.5637228 0.4165364 0.6626930
## factorstateNM 0.4997995 0.5853537 0.3345897 0.6633523
## factorstateNV 0.4619239 0.5812927 0.3102529 0.6152047
## factorstateNY 0.6014457 0.5550674 0.4950174 0.7037436
## factorstateOH 0.4858735 0.5642551 0.3653363 0.6081286
## factorstateOK 0.4246635 0.5799794 0.2821768 0.5837510
## factorstateOR 0.5568056 0.5729947 0.4140851 0.6911431
## factorstatePA 0.4863316 0.5578693 0.3746026 0.6002464
## factorstateRI 0.5147141 0.5867006 0.3504670 0.6812263
## factorstateSC 0.4991797 0.5719535 0.3642510 0.6359850
## factorstateSD 0.4846082 0.5890497 0.3134561 0.6589824
## factorstateTN 0.4700937 0.5723713 0.3306614 0.6138267
## factorstateTX 0.4747691 0.5497056 0.3811374 0.5725632
## factorstateUT 0.4608449 0.5807910 0.3125964 0.6200614
## factorstateVA 0.5355606 0.5661266 0.4033266 0.6637902
## factorstateVT 0.5332601 0.5885822 0.3617889 0.6963712
## factorstateWA 0.5296369 0.5657452 0.4004778 0.6535576
## factorstateWI 0.5495514 0.5681782 0.4154326 0.6779303
## factorstateWV 0.4275097 0.5853337 0.2753343 0.5946661
## factorstateWY 0.4986431 0.5917653 0.3243641 0.6733495