Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Bayesian inference updates beliefs about unknown parameters using observed data. This cheat sheet covers priors, likelihoods, posteriors, posterior summaries, and common conjugate prior models. Students need it because Bayesian calculations often become simpler when the right prior family is matched to the likelihood. It is designed as a quick reference for interpreting formulas and choosing standard models. The central rule is Bayes' theorem, which says the posterior is proportional to the likelihood times the prior. A prior distribution encodes information before seeing the data, while the likelihood measures how compatible parameter values are with the observed data. Conjugate priors make the posterior stay in the same distribution family as the prior, which gives closed-form updates. Posterior means, variances, predictive distributions, and credible intervals summarize the updated uncertainty.

Key Facts

  • Bayes' theorem for a parameter is p(θy)=p(yθ)p(θ)p(y)p(\theta \mid y)=\frac{p(y \mid \theta)p(\theta)}{p(y)}, where p(y)=p(yθ)p(θ)dθp(y)=\int p(y \mid \theta)p(\theta)\,d\theta.
  • The posterior kernel is p(θy)p(yθ)p(θ)p(\theta \mid y) \propto p(y \mid \theta)p(\theta), so constants not involving θ\theta can be ignored during proportional calculations.
  • For YθBinomial(n,θ)Y \mid \theta \sim \operatorname{Binomial}(n,\theta) and θBeta(α,β)\theta \sim \operatorname{Beta}(\alpha,\beta), the posterior is θyBeta(α+y,β+ny)\theta \mid y \sim \operatorname{Beta}(\alpha+y,\beta+n-y).
  • For YiλPoisson(λ)Y_i \mid \lambda \sim \operatorname{Poisson}(\lambda) and λGamma(α,β)\lambda \sim \operatorname{Gamma}(\alpha,\beta) using rate β\beta, the posterior is λyGamma(α+iyi,β+n)\lambda \mid y \sim \operatorname{Gamma}(\alpha+\sum_i y_i,\beta+n).
  • For YiμN(μ,σ2)Y_i \mid \mu \sim N(\mu,\sigma^2) with known σ2\sigma^2 and μN(μ0,τ02)\mu \sim N(\mu_0,\tau_0^2), the posterior variance is τn2=(1τ02+nσ2)1\tau_n^2=\left(\frac{1}{\tau_0^2}+\frac{n}{\sigma^2}\right)^{-1}.
  • For the normal mean model with known σ2\sigma^2, the posterior mean is μn=τn2(μ0τ02+nyˉσ2)\mu_n=\tau_n^2\left(\frac{\mu_0}{\tau_0^2}+\frac{n\bar{y}}{\sigma^2}\right).
  • A 95%95\% credible interval for θ\theta is an interval [a,b][a,b] such that P(aθby)=0.95P(a \le \theta \le b \mid y)=0.95.
  • The posterior predictive distribution is p(y~y)=p(y~θ)p(θy)dθp(\tilde{y} \mid y)=\int p(\tilde{y} \mid \theta)p(\theta \mid y)\,d\theta.

Vocabulary

Prior distribution
A probability distribution p(θ)p(\theta) that represents uncertainty about a parameter before observing the current data.
Likelihood
The function p(yθ)p(y \mid \theta) that measures how plausible the observed data are for each possible value of θ\theta.
Posterior distribution
The updated distribution p(θy)p(\theta \mid y) for a parameter after combining the prior distribution with the likelihood.
Conjugate prior
A prior distribution that produces a posterior distribution in the same family after being updated by a specified likelihood.
Marginal likelihood
The normalizing constant p(y)=p(yθ)p(θ)dθp(y)=\int p(y \mid \theta)p(\theta)\,d\theta that makes the posterior integrate to 11.
Credible interval
An interval that contains the parameter with a stated posterior probability, such as P(aθby)=0.95P(a \le \theta \le b \mid y)=0.95.

Common Mistakes to Avoid

  • Confusing the likelihood with the posterior is wrong because p(yθ)p(y \mid \theta) is a function of θ\theta based on fixed data, while p(θy)p(\theta \mid y) is a probability distribution over θ\theta.
  • Dropping terms that contain the parameter is wrong because only constants independent of θ\theta can be ignored when using p(θy)p(yθ)p(θ)p(\theta \mid y) \propto p(y \mid \theta)p(\theta).
  • Mixing Gamma rate and scale conventions is wrong because Gamma(α,β)\operatorname{Gamma}(\alpha,\beta) with rate β\beta has mean αβ\frac{\alpha}{\beta}, while using scale gives mean αβ\alpha\beta.
  • Interpreting a frequentist confidence interval as a Bayesian credible interval is wrong because a credible interval makes a probability statement about θ\theta conditional on the observed data.
  • Using a conjugate update without checking the likelihood form is wrong because conjugacy depends on the exact sampling model, such as Binomial with Beta or Poisson with Gamma.

Practice Questions

  1. 1 Let YθBinomial(20,θ)Y \mid \theta \sim \operatorname{Binomial}(20,\theta) and θBeta(3,5)\theta \sim \operatorname{Beta}(3,5). If y=14y=14, find the posterior distribution of θ\theta.
  2. 2 Let Y1,,Y5λPoisson(λ)Y_1,\ldots,Y_5 \mid \lambda \sim \operatorname{Poisson}(\lambda) with observations 2,0,3,1,42,0,3,1,4, and let λGamma(2,3)\lambda \sim \operatorname{Gamma}(2,3) using rate 33. Find the posterior distribution of λ\lambda.
  3. 3 For YiμN(μ,4)Y_i \mid \mu \sim N(\mu,4) with known variance 44, prior μN(10,9)\mu \sim N(10,9), sample size n=16n=16, and sample mean yˉ=12\bar{y}=12, compute τn2\tau_n^2 and write the formula for μn\mu_n.
  4. 4 Explain why a very concentrated prior can strongly influence the posterior even when the likelihood is based on real data.