\[ p(\theta \mid y)=\frac{p(y\mid\theta)\,p(\theta)}{p(y)} \]
Where:
- \(p(\theta)\): prior
- \(p(y\mid\theta)\): likelihood
- \(p(y)\): evidence
- \(p(\theta\mid y)\): posterior
Evidence:
\[ p(y)=\int p(y\mid\theta)p(\theta)\,d\theta \]
18 questions. Use Show Answer, then slide right (or use Next) to continue.
\[ p(\theta \mid y)=\frac{p(y\mid\theta)\,p(\theta)}{p(y)} \]
Where:
Evidence:
\[ p(y)=\int p(y\mid\theta)p(\theta)\,d\theta \]
\[ p(\theta\mid y)\propto p(y\mid\theta)\,p(\theta) \]
We drop \(p(y)\) because it does not depend on \(\theta\).
After observing data \(y\), we obtain a posterior distribution:
\[ p(\theta \mid y) \]
This represents our updated beliefs about \(\theta\).
Bayesian inference computes expectations under this posterior:
\[ \mathbb{E}[g(\theta) \mid y]=\int g(\theta)\,p(\theta \mid y)\,d\theta \]
Intuition: Bayesian inference = take averages using the posterior distribution.
Examples:
Assume
\[ y\mid \theta\sim \mathrm{Binomial}(n,\theta) \]
Assumptions:
Likelihood:
\[ p(y\mid\theta)=\binom{n}{y}\theta^y(1-\theta)^{n-y} \]
\[ \theta \sim \mathrm{Beta}(\alpha,\beta) \]
\[ p(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1} \]
Assumptions:
A prior is conjugate if posterior is in the same family as the prior.
For Beta–Binomial:
\[ \theta\mid y\sim \mathrm{Beta}(\alpha+y,\;\beta+n-y) \]
\[ \mathbb{E}[\theta\mid y]=\frac{\alpha+y}{\alpha+\beta+n} \]
Interpretation: weighted average of prior mean and sample proportion \(y/n\).
The prior is a continuous distribution:
\[ \theta \sim \mathrm{Beta}(\alpha,\beta),\quad 0<\theta<1 \]
It models uncertainty about a probability parameter \(\theta\).
Prior mean:
\[ \mathbb{E}[\theta]=\frac{\alpha}{\alpha+\beta} \]
Concentration (prior strength):
\[ \alpha+\beta \]
Pseudo-count intuition:
Larger \(\alpha+\beta\) → stronger (more concentrated) prior.
Posterior predictive integrates over \(\theta\) uncertainty:
\[ P(Y_{n+1}=1 \mid y)=\int P(Y_{n+1}=1 \mid \theta)\,p(\theta \mid y)\,d\theta \]
For Bernoulli/Binomial:
\[ P(Y_{n+1}=1 \mid \theta)=\theta \]
So:
\[ P(Y_{n+1}=1 \mid y)=\int \theta\,p(\theta \mid y)\,d\theta=\mathbb{E}[\theta \mid y] \]
Under Beta–Binomial:
\[ P(Y_{n+1}=1 \mid y)=\mathbb{E}[\theta \mid y]=\frac{\alpha+y}{\alpha+\beta+n} \]
If
\[ I=\mathbb{E}[g(X)]=\int g(x)\,p(x)\,dx \]
then \(p(x)\) is the sampling distribution (density/pmf) of \(X\), and \(g(x)\) is the function applied to each draw.
Draw \(X_1,\dots,X_N\sim p(x)\) and compute
\[ \hat I_N=\frac{1}{N}\sum_{i=1}^{N} g(X_i) \]
Sampling from \(p(x)\) automatically applies the probability weighting.
Let \(X_1,\dots,X_N\) be i.i.d. with \(\mathbb{E}[X_i]=\mu\). Define
\[ \bar X_N=\frac{1}{N}\sum_{i=1}^N X_i \]
Then:
\[ \bar X_N \xrightarrow{P} \mu \quad \text{as } N\to\infty \]
Monte Carlo connection:
If \(\hat I_N=\frac{1}{N}\sum_{i=1}^N g(X_i)\), then \(\hat I_N \to \mathbb{E}[g(X)]\).
Intuition: more simulations → the estimate stabilizes around the true expectation.
Let \(X_1,\dots,X_N\) be i.i.d. with \(\mathbb{E}[X_i]=\mu\), \(\mathrm{Var}(X_i)=\sigma^2\). Then:
\[ \sqrt{N}(\bar X_N-\mu)\xrightarrow{d}\mathcal{N}(0,\sigma^2) \]
Practical form (large \(N\)):
\[ \bar X_N \approx \mathcal{N}\left(\mu,\frac{\sigma^2}{N}\right) \]
Monte Carlo connection:
\[ \hat I_N=\frac{1}{N}\sum g(X_i)\approx \mathcal{N}\left(\mathbb{E}[g(X)],\frac{\mathrm{Var}(g(X))}{N}\right) \]
Intuition: CLT describes the distribution of the estimator and gives standard error \(\sigma/\sqrt{N}\).
If we can sample:
\[ \theta^{(1)},\dots,\theta^{(N)}\sim p(\theta\mid y) \]
we compute posterior expectations:
\[ \mathbb{E}[g(\theta)\mid y]=\int g(\theta)\,p(\theta\mid y)\,d\theta \]
Monte Carlo approximation:
\[ \mathbb{E}[g(\theta)\mid y]\approx \frac{1}{N}\sum_{i=1}^N g(\theta^{(i)}) \]
Sampling gives draws; averaging \(g(\theta)\) over draws gives the quantity we care about.
\(g(\theta)\) is the function of \(\theta\) whose posterior expectation we want.
Examples:
Key point: sampling from the posterior is not the goal—the goal is \(\mathbb{E}[g(\theta)\mid y]\).
(Continued)
Use MCMC when the posterior is not a standard distribution you can sample from directly:
\[ p(\theta\mid y)\propto p(y\mid\theta)p(\theta) \]
but you can evaluate the unnormalized posterior for candidate \(\theta\).
Model:
\[ y_i\mid \theta\sim \mathrm{Bernoulli}(\sigma(\theta)),\quad \sigma(\theta)=\frac{1}{1+e^{-\theta}} \]
If \(k\) successes out of \(n\):
\[ p(y\mid\theta)=\sigma(\theta)^k(1-\sigma(\theta))^{n-k} \]
Prior:
\[ \theta\sim \mathcal{N}(0,10^2) \]
Posterior (unnormalized):
\[ p(\theta\mid y)\propto \sigma(\theta)^k(1-\sigma(\theta))^{n-k}\exp\left(-\frac{\theta^2}{200}\right) \]
Not a known distribution \(\Rightarrow\) cannot sample directly \(\Rightarrow\) MCMC.
Define unnormalized log posterior:
\[ \log\tilde p(\theta)=k\log\sigma(\theta)+(n-k)\log(1-\sigma(\theta))-\frac{\theta^2}{200} \]
Algorithm:
\[ \alpha=\min\left(1,\exp\left(\log\tilde p(\theta')-\log\tilde p(\theta^{(r)})\right)\right) \]