Homework #1
Stat4DS2+DS
1) Sample survey: Suppose we are going to sample 100 individuals from a county (of size much larger
than 100) and ask each sampled person whether they support policy Z or not. Let Yi = 1 if person i in
the sample supports the policy, and Yi = 0 otherwise.
a) Assume Y1, …, Y100 are, conditional on θ, i.i.d. binary random variables with expectation θ. Write down
the joint distribution of P r(Y1 = y1, …, Y100 = y100|θ) in a compact form. Also write down the form of
P r(
Pn
i=1 Yi = y|θ).
#
b) For the moment, suppose you believed that θ ∈ {0.0, 0.1, …, 0.9, 1.0}. Given that the results of the
survey were Pn
i=1 Yi = 57 , compute
P r(
Xn
i=1
Yi = 57|θ)
for each of these 11 values of θ and plot these probabilities as a function of θ.
#
c) Now suppose you originally had no prior information to believe one of these θ-values over another,
and so P r(θ = 0.0) = P r(θ = 0.1) = … = P r(θ = 0.9) = P r(θ = 1.0). Use Bayes??? rule to compute
π(θ|
Pn
i=1 Yi = 57) for each θ-value. Make a plot of this posterior distribution as a function of θ.
#
d) Now suppose you allow θ to be any value in the interval Θ = [0, 1]. Using the uniform prior density for
θ ∈ [0, 1], so that π(θ) = I[0,1](θ), plot π(θ) × P r(
Pn
i=1 Yi = 57|θ) as a function of θ.
#
e) As discussed in chapter 3 of Peter Hoff’s book, the posterior distribution of θ is Beta(1+ 57, 1+ 100−57).
Plot the posterior density as a function of θ. Discuss the relationships among all of the plots you have
made for this exercise.
#
2) Consider a normal statistical model Xi
|θ
i.i.d ∼ N(θ, λ = 1/σ2
) where the precision parameter is known.
Use as a prior distribution on the (conditional) mean θ a Normal with prior mean µ and prior precison
ν.
a) derive the general formula of the prior predictive distribution for a single observation X
#
b) derive the general formula of the posterior predictive distribution for a single observation X
#
2
c) Elicit your prior distribution on the unknown θ in such a way that your prior mean is 0 and you believe
that the unknown θ is in the interval [−5, 5] with prior probability 0.96
#
d) assume that the known value of λ is 1/3 and suppose you have observed the following data
−1.25 8.77 1.18 10.66 11.81 − 6.09 3.56 10.85 4.03 2.13
derive your posterior distribution and represent it graphically
#
e) derive your favorite point estimate and interval estimate and motivate your choices
#
3) As an alternative model for the previous 10 observations
−1.25 8.77 1.18 10.66 11.81 − 6.09 3.56 10.85 4.03 2.13
consider the following statistical model where Xi
|θ are i.i.d with
Xi
|θ ∼ f(x|θ) = 1
20
I[θ−10,θ+10](x)
Use the same prior elicitation for θ as in the model of the previous excercise
a) Provide a fully Bayesian analysis for these data explaining all the basic ingredients and steps for carrying
it out. In particular, compare your final inference on the unknown θ = E[X|θ] with the one you have
derived in the previous point 2)
#
b) Write the formula of the prior predictive distribution of a single observation and explain how you can
simulate i.i.d random draws from it. Use the simulated values to represent approximately the predictive
density in a plot and compare it with the prior predictive density of a single observation of the previous
model
#
c) Consider the same discrete (finite) grid of values as parameter space Θ for the conditional mean θ in
both models. Use this simplified parametric setting to decide whether one should use the Normal model
rather than the Uniform model in light of the observed data.
#
4) A-R algorithm
a) show how it is possible to simulate from a standard Normal distribution using pseudo-random
deviates from a standard Cauchy and the A-R algorithm
b) provide your R code for the implementation of the A-R
c) evaluate numerically (approximately by MC) the acceptance probability
d) write your theretical explanation about how you have conceived your Monte Carlo estimate of the
acceptance probability
e) save the rejected simulations and provide a graphical representation of the empirical distribution
(histogram or density estimation)
f) derive the underlying density corresponding to the rejected random variables and try to compare
it with the empirical distribution
#
5) Marginal likelihood evaluation for a Poisson data model. Simulate 10 observations from a known
Poisson distribution with expected value 2. Use set.seed(123) before starting your simulation. Use a
Gamma(1,1) prior distribution and compute the corresponding marginal likelihood in 3 differnt ways:
a) exact analytic computation
b) by Monte Carlo approximation using a sample form the posterior distribution and the harmonic
mean approach. Try to evaluate random behaviour by repeating/iterating the approximation ˆI a
sufficiently large number of times and show that the approximation tends to be (positively) biased.
Use these simulations to evaluate approximately the corresponding variance and mean square error
c) by Monte Carlo Importance sampling choosing an appropriate Cauchy distribution as auxiliary
distribution for the simulation. Compare its performance with respect to the previous harmonic
mean approach.
#
6) Provide two alternative implemetations of 10000 i.i.d. simulations from the following left-truncated
normal distribution
f(x) ∝ exp
−
1
2
(x + 1)2
10
I(−2,∞)(x)
using:
a) integral transform i.e. using the inverse CDF
b) Acceptance-rejection
Try to write R functions which can be generalized for arbitrary parameters (number of simulations, truncation
point, mean and standard deviation of the underlying Gaussian density). Provide theoretical arguments to
justify your implementation and show the matching between the theoretical distribution and the empirical
distribution of your 1000 simulations.


