This vignette is the fourth
chapter of the worked-case companion. The target is a strongly
correlated bivariate posterior whose covariance is known in closed form,
so the geometry the sampler recovers can be checked against the
Cramer-Rao bound, the inverse Fisher information. The call is
gpu_metropolis(method = "de") of v0.4.0.
Let \(y_1, \ldots, y_n\) be drawn from a bivariate normal with unknown mean \(\mu = (\mu_1, \mu_2)\) and a known covariance \(\Sigma\) with correlation \(\rho = 0.9\). With a flat prior on \(\mu\), the posterior is
\[ \mu \mid y \sim \mathcal{N}_2\!\left(\bar y,\; \Sigma / n\right), \]
a bivariate normal carrying the same correlation \(0.9\). A diagonal random-walk proposal cannot represent that correlation in a single move; the difference proposal of Differential Evolution can.
library(gpumetropolis)
set.seed(1)
rho <- 0.9
Sigma <- matrix(c(1, rho, rho, 1), 2, 2)
n <- 500
L <- chol(Sigma)
y <- matrix(stats::rnorm(n * 2), n, 2) %*% L
y[, 1] <- y[, 1] + 2 # truth: mu1 = 2
y[, 2] <- y[, 2] - 1 # truth: mu2 = -1
ybar <- colMeans(y)
post_cov <- Sigma / n # closed-form posterior covarianceThe per-observation Gaussian log-likelihood uses \(\Sigma^{-1}\) as constants. When a
coefficient of the formula is a number computed in R, the supported way
to put it into the DSL is to build the formula text with
sprintf() and turn it into a formula with
as.formula(), as below; the entries of \(\Sigma^{-1}\) enter as literal constants
the kernel reads.
Both methods sample this unimodal posterior correctly; the difference is efficiency. The diagonal random walk pays for ignoring the correlation with a lower effective sample size per iteration.
dat <- list(y1 = y[, 1], y2 = y[, 2])
fit_rwm <- gpu_metropolis(model, data = dat, n_iter = 8000, n_chains = 16,
method = "rwm", seed = 1, backend = "cpu")
fit_de <- gpu_metropolis(model, data = dat, n_iter = 8000, n_chains = 16,
method = "de", seed = 1, backend = "cpu")
eff <- function(fit) c(
ess_mu1 = ess(fit$draws[, , "mu1"], 0),
ess_mu2 = ess(fit$draws[, , "mu2"], 0),
corr = stats::cor(as.vector(fit$draws[, , "mu1"]),
as.vector(fit$draws[, , "mu2"]))
)
rbind(rwm = eff(fit_rwm), de = eff(fit_de))
#> ess_mu1 ess_mu2 corr
#> rwm 2223.738 2168.689 0.8992094
#> de 8387.983 8606.421 0.9008327The recovered correlation sits near the target \(0.9\) for both, and the Differential Evolution run reaches a higher effective sample size on the correlated geometry.
For \(n\) independent draws from a normal with known \(\Sigma\), the Fisher information about the mean is \(I(\mu) = n\,\Sigma^{-1}\), so the Cramer-Rao lower bound on the covariance of an unbiased estimator is
\[ I(\mu)^{-1} = \Sigma / n , \]
which is exactly the posterior covariance here. This model is regular and the posterior is Gaussian, so the Bernstein-von Mises identification of the posterior covariance with the inverse Fisher information is not an approximation but an equality; the comparison is a clean check that the sampler recovers the information-bound geometry rather than a frequentist claim layered onto a Bayesian fit. The estimator-theory reading is that the posterior mean here is consistent and asymptotically efficient: it concentrates on the true mean and attains the Cramer-Rao bound, so the bound is the natural yardstick for its spread.
The package computes the bound directly with
gpum_crlb(), which forms the observed Fisher information by
a central-difference Hessian of the same compiled log-likelihood the
sampler used, then inverts it. It returns the bound beside the empirical
posterior spread.
cr <- gpum_crlb(fit_de, data = dat)
cr
#> <gpum_crlb>
#> parameter posterior_sd cramer_rao_sd ratio
#> mu1 0.04447 0.04472 0.9944
#> mu2 0.04468 0.04472 0.9992
#>
#> posterior correlation 0.901, Cramer-Rao 0.900
#>
#> Frequentist asymptotic reference (inverse Fisher information), prior-free. A weak/flat prior is assumed; an informative prior makes the posterior tighter.The reported bound matches the analytic inverse Fisher information
solve(n * Sigma^{-1}) = Sigma / n for this model, and the
posterior spread from the Differential Evolution draws tracks it to
within Monte Carlo error, correlation included.
analytic_crlb <- solve(n * Si) # = Sigma / n, in closed form
round(cbind(gpum_crlb = cr$crlb_sd,
analytic = sqrt(diag(analytic_crlb))), 4)
#> gpum_crlb analytic
#> [1,] 0.0447 0.0447
#> [2,] 0.0447 0.0447The package draws the comparison in one call.
gpum_pairs(fit, crlb = cr) shows the marginal posteriors
with their highest density intervals on the diagonal and, off the
diagonal, the bivariate draws with their credible-region contour in
black and the Cramer-Rao ellipse in red. Reading it: the black empirical
region and the red theoretical ellipse coincide when the recovered
spread matches the information bound, which is the case here.
The plot() method of the Cramer-Rao object gives the
same verdict as a bar chart, the posterior standard deviation against
the bound, parameter by parameter.
The Cramer-Rao comparison at a single \(n\) is one slice of a sharper statement: as the data grow, the posterior mean concentrates on the truth (consistency) and its spread shrinks at the rate \(n^{-1/2}\) set by the bound (asymptotic efficiency). The appendix below makes both visible by refitting the same model at a sequence of sample sizes and reading off the error of the posterior mean against the truth and the posterior standard deviation against the closed-form \(\sqrt{1/n}\) reference. This is a simulation check, possible only because the truth is known here; it is a validation of the theory, not a per-fit diagnostic.
ns <- c(125, 500, 2000)
rows <- lapply(ns, function(nn) {
yy <- matrix(stats::rnorm(nn * 2), nn, 2) %*% L
yy[, 1] <- yy[, 1] + 2
yy[, 2] <- yy[, 2] - 1
f <- gpu_metropolis(model, data = list(y1 = yy[, 1], y2 = yy[, 2]),
n_iter = 4000, n_chains = 12, method = "de",
seed = 1, backend = "cpu")
m1 <- as.vector(f$draws[, , "mu1"])
data.frame(n = nn,
mean_error = abs(mean(m1) - 2), # distance of the estimate to truth
post_sd = stats::sd(m1),
crlb_sd = sqrt(1 / nn)) # Sigma_11 / n, here Sigma_11 = 1
})
tab <- do.call(rbind, rows)
tab$sd_ratio <- tab$post_sd / tab$crlb_sd
tab
#> n mean_error post_sd crlb_sd sd_ratio
#> 1 125 0.031489967 0.08889299 0.08944272 0.9938538
#> 2 500 0.009717197 0.04525701 0.04472136 1.0119774
#> 3 2000 0.019773891 0.02247282 0.02236068 1.0050150The error of the posterior mean stays within the posterior standard
deviation at every size and shrinks with it as the sample grows, the
signature of consistency; the single-replication error is itself noisy,
so it is the trend and the scale, not a strict ordering, that carry the
point. The posterior standard deviation tracks the \(\sqrt{1/n}\) reference with a ratio near
one at every size, the signature of asymptotic efficiency: it roughly
halves each time \(n\) quadruples, as
the printed post_sd column against crlb_sd
shows. Consistency is the more robust of the two: it can hold on
identifiable models where the normal approximation behind the efficiency
rate does not, which is why gpum_crlb() reports the bound
only where the stronger conditions also hold.
The agreement above is what theory predicts for a regular Gaussian
model; it is a validation of the sampler, not a discovery. The reference
is worth stating precisely because it does not transfer to the harder
targets the package also serves: the bimodal M2 of the
parallel-tempering chapter and any copula parameter near the
independence boundary violate the regularity the Cramer-Rao bound
assumes, and the posterior there is not asymptotically normal. The
lesson of this chapter is narrow and exact: on a regular correlated
target, method = "de" recovers the full covariance,
correlation included, that the inverse Fisher information
prescribes.