This vignette is the second
chapter of the package’s worked-case companion. The first chapter,
m2_parallel_tempering, used a synthetic separated bimodal
target to show why method = "pt" is needed. This chapter
takes the same machinery to a real-world question: when annual extremes
come from two distinct weather regimes, can the package recover both
regimes and decide whether a single Gumbel is enough?
The setting is the textbook one for univariate block maxima: each year contributes its maximum, and the resulting series is modelled as a Gumbel or a Generalised Extreme Value distribution (Coles 2001). When the site sits at the edge of two climatic regimes the same series carries two families of extremes, say summer-driven and winter-driven maxima, and a single Gumbel is then the wrong model.
evd::oxford records the annual maximum daily temperature
in degrees Fahrenheit at Oxford, England, from 1901 to 1980
(n = 80). It is a standard didactic dataset in
evd (Stephenson 2002).
data(oxford)
y_ox <- as.numeric(oxford)
summary(y_ox)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 75.00 83.00 85.00 85.33 88.25 95.00
diptest::dip.test(y_ox)
#>
#> Hartigans' dip test for unimodality / multimodality
#>
#> data: y_ox
#> D = 0.05625, p-value = 0.0531
#> alternative hypothesis: non-unimodal, i.e., at least bimodalThe dip test sits right at the conventional boundary, p
near 0.05. This is the “tie zone” described in CATALOG_DESIGN.md:
the test does not clearly reject unimodality and does not comfortably
accept it. The catalogue policy is then to fit both a unimodal candidate
and a mixture challenger and compare by WAIC.
The Gumbel log-density up to an additive constant is
-(y - mu)/beta - exp(-(y - mu)/beta) - log(beta); with the
scale reparametrised as lb = log(beta) the parameter is
unbounded:
model_gumbel <- gpum_model(
loglik = ~ -(y - mu) / exp(lb)
- exp(-(y - mu) / exp(lb))
- lb,
params = c("mu", "lb"), data = "y"
)
fit_single <- gpu_metropolis(
model_gumbel, data = list(y = y_ox),
proposal_sd = c(0.5, 0.1),
init = matrix(c(median(y_ox), log(IQR(y_ox) / 2)),
nrow = 4, ncol = 2, byrow = TRUE),
n_iter = 4000, n_chains = 4, seed = 1, backend = "cpu"
)
fit_single
#> <gpum_fit>
#> parameters : mu, lb
#> method : rwm
#> backend : cpu
#> chains : 4
#> iterations : 2000 per chain (4000 raw, 2000 adaptive warmup discarded)
#> accept_rate : 0.242 to 0.266
#> mu : posterior mean 83.1812 (sd 0.5041)
#> lb : posterior mean 1.4341 (sd 0.0793)The posterior summary recovers mu near 83 (degrees
Fahrenheit) and beta = exp(lb) near 4.2, the textbook fit.
gpum_diagnose(fit_single) opens the convergence panel.
gpum_diagnose(fit_single)
#> <gpum_diagnose: Converged>
#> method rwm, backend cpu, chains 4, iterations 2000 (warmup 2000, adaptive)
#>
#> parameter mean sd q2.5 q50 q97.5 Rhat ESS MCSE
#> mu 83.1812 0.50414 82.2337 83.1708 84.1708 1.0037 717.1473 0.01883
#> lb 1.4341 0.07926 1.2818 1.4319 1.5945 1.0010 966.3131 0.00255
#>
#> R-hat below 1.01 and ESS at or above 400 in every parameter.When the modes overlap heavily, as they do in Oxford, the mixture
posterior on (p, mu_1, beta_1, mu_2, beta_2) is weakly
identified: the data can be reasonably explained by a single Gumbel, so
the posterior has a wide ridge where one component dominates and the
other becomes a “ghost” whose parameters drift. Fitting a mixture on
Oxford without informative priors or hard order constraints produces a
fit whose nominal cold-chain summary is unstable. The catalogue layer of
v0.6.0 will handle this through default priors and an order constraint;
the next section shows the mixture recovery in a regime where the data
carry it cleanly.
A small simulation captures the structure of a station near a climatic edge. Each year is in either a cool-extreme regime or a warm-extreme regime with respective Gumbel parameters; the annual maximum is drawn from the corresponding distribution.
set.seed(2026)
rgumbel <- function(n, mu, beta) mu - beta * log(-log(runif(n)))
n_years <- 60
regime <- rbinom(n_years, size = 1, prob = 0.65)
y <- ifelse(regime == 1,
rgumbel(n_years, mu = 36, beta = 2.5),
rgumbel(n_years, mu = 22, beta = 3.0))
hist(y, breaks = 25, freq = FALSE,
main = "Simulated annual maxima at a climatic-edge station",
xlab = "annual maximum (deg)", col = "grey85", border = "white")The dip test does not reject unimodality on this sample, even though by construction the data come from a mixture:
That is the case the catalogue worries about: the dip test gives the “unimodal” verdict, yet the data come from a mixture. The decision between a single Gumbel and a mixture is settled below by WAIC, the predictive criterion the package computes from the pointwise log-likelihood, once both models are fitted to this sample.
The 2-component Gumbel mixture log-density is
log(p * f_1(y) + (1 - p) * f_2(y)), the canonical
log-sum-exp pattern that the DSL handles through log,
exp, +. The reparametrisation maps every
parameter to the real line so random-walk Metropolis can propose
freely:
eta_p in R, with
p = 1 / (1 + exp(-eta_p));lb1, lb2 in R, with
beta_j = exp(lb_j);mu1, mu2 in R,
unconstrained.model_mix <- gpum_model(
loglik = ~ log(
(1 / (1 + exp(-eta_p))) *
exp(-((y - mu1) / exp(lb1)) - exp(-((y - mu1) / exp(lb1)))) / exp(lb1)
+ (1 - 1 / (1 + exp(-eta_p))) *
exp(-((y - mu2) / exp(lb2)) - exp(-((y - mu2) / exp(lb2)))) / exp(lb2)
),
params = c("eta_p", "mu1", "lb1", "mu2", "lb2"),
data = "y"
)The posterior of a mixture has at least two equivalent labellings (component 1 and component 2 can switch with no change in likelihood), so the posterior is genuinely multimodal in the parameter space even when the data marginal is unimodal. Parallel tempering is the right sampler. The cold chain visits each labelling and the swap step keeps the chain from sticking to one of them; a post-processing step then fixes a canonical labelling.
qs <- quantile(y, c(0.2, 0.8))
init <- cbind(eta_p = rep(0, 8),
mu1 = rep(qs[1], 8) + rnorm(8, 0, 1),
lb1 = rep(log(2), 8),
mu2 = rep(qs[2], 8) + rnorm(8, 0, 1),
lb2 = rep(log(2), 8))
fit_mix <- gpu_metropolis(
model_mix, data = list(y = y),
proposal_sd = c(0.3, 0.4, 0.1, 0.4, 0.1),
init = init, n_iter = 8000, n_chains = 8,
method = "pt", seed = 11, backend = "cpu"
)
fit_mix
#> <gpum_fit>
#> parameters : eta_p, mu1, lb1, mu2, lb2
#> method : pt
#> backend : cpu
#> chains : 8
#> iterations : 4000 per chain (8000 raw, 4000 adaptive warmup discarded)
#> accept_rate : 0.039 to 0.430
#> swap accept : pairs (1-7) mean 0.410 to 0.772
#> eta_p (T=1) : posterior mean -1.2134 (sd 0.3312)
#> mu1 (T=1) : posterior mean 22.4242 (sd 1.2085)
#> lb1 (T=1) : posterior mean 1.1800 (sd 0.3030)
#> mu2 (T=1) : posterior mean 36.0542 (sd 0.3745)
#> lb2 (T=1) : posterior mean 0.8072 (sd 0.1330)The promise made above, that a mixture predicts this sample better
than a single Gumbel, is now computed rather than asserted. The single
Gumbel is fitted to the same simulated y, and
gpum_waic() reads the predictive accuracy of each from its
pointwise log-likelihood. Both log-likelihoods are normalised (the
single Gumbel carries its - lb term, the mixture its
/ exp(lb) factors), so the comparison is valid; lower WAIC
is the better predictor.
fit_single_y <- gpu_metropolis(
model_gumbel, data = list(y = y),
proposal_sd = c(0.5, 0.1),
init = matrix(c(median(y), log(IQR(y) / 2)), nrow = 4, ncol = 2,
byrow = TRUE),
n_iter = 4000, n_chains = 4, seed = 1, backend = "cpu"
)
waic_single <- gpum_waic(fit_single_y, data = list(y = y))
waic_mix <- gpum_waic(fit_mix, data = list(y = y))
c(single = waic_single$waic, mixture = waic_mix$waic)
#> single mixture
#> 418.0165 361.6134The mixture has the lower WAIC, the verdict the dip test could not
give. The WAIC is invariant to the label switching of the next section,
so it is read here directly from the cold-chain draws.
gpum_loo() reports the same ranking through Pareto-smoothed
leave-one-out cross-validation, with the Pareto k
diagnostic flagging any observation whose estimate is unreliable.
if (requireNamespace("loo", quietly = TRUE)) gpum_loo(fit_mix, data = list(y = y))
#>
#> Computed from 4000 by 60 log-likelihood matrix.
#>
#> Estimate SE
#> elpd_loo -180.9 7.3
#> p_loo 4.3 0.7
#> looic 361.8 14.6
#> ------
#> MCSE of elpd_loo is 0.0.
#> MCSE and ESS estimates assume independent draws (r_eff=1).
#>
#> All Pareto k estimates are good (k < 0.7).
#> See help('pareto-k-diagnostic') for details.The mixture log-density is invariant under swapping component 1 and
component 2 (and simultaneously flipping eta_p to
-eta_p). The canonical labelling enforced here is
mu1 < mu2: every cold-chain draw in which
mu1 > mu2 has its component labels swapped.
draws <- fit_mix$draws[, 1L, , drop = TRUE]
swap_idx <- draws[, "mu1"] > draws[, "mu2"]
if (any(swap_idx)) {
draws[swap_idx, ] <- draws[swap_idx, c("eta_p", "mu2", "lb2", "mu1", "lb1")]
draws[swap_idx, "eta_p"] <- -draws[swap_idx, "eta_p"]
}
p <- 1 / (1 + exp(-draws[, "eta_p"]))
mu_cool <- draws[, "mu1"]; beta_cool <- exp(draws[, "lb1"])
mu_warm <- draws[, "mu2"]; beta_warm <- exp(draws[, "lb2"])
post <- data.frame(
parameter = c("p (cool)", "mu_cool", "beta_cool", "mu_warm", "beta_warm"),
mean = c(mean(p), mean(mu_cool), mean(beta_cool),
mean(mu_warm), mean(beta_warm)),
sd = c(sd(p), sd(mu_cool), sd(beta_cool),
sd(mu_warm), sd(beta_warm)),
truth = c(0.35, 22, 3.0, 36, 2.5)
)
print(post, row.names = FALSE)
#> parameter mean sd truth
#> p (cool) 0.2341972 0.05766285 0.35
#> mu_cool 22.4241515 1.20853642 22.00
#> beta_cool 3.4168659 1.16149372 3.00
#> mu_warm 36.0542177 0.37449097 36.00
#> beta_warm 2.2615647 0.30441433 2.50Posterior means recover the four Gumbel parameters within a posterior
standard deviation; the mixing weight is slightly underestimated, the
usual signal that for n = 60 the data identify the modes
but not their exact balance.
A direct visual check: generate new annual maxima from the fitted
mixture and compare their distribution against the observed data.
gpum_ppc() draws the posterior predictive sample, with the
family’s simulator passed in one line (pick a component by its weight,
then a Gumbel draw), and gpum_density_compare() overlays
the densities. Label switching does not matter here, since the
predictive distribution is the same in either labelling.
pp <- gpum_ppc(fit_mix, function(q) {
if (stats::runif(1) < 1 / (1 + exp(-q["eta_p"]))) {
q["mu1"] - exp(q["lb1"]) * log(-log(stats::runif(1)))
} else {
q["mu2"] - exp(q["lb2"]) * log(-log(stats::runif(1)))
}
})
gpum_density_compare(y, list(mixture = pp),
main = "Posterior predictive vs observed",
xlab = "annual maximum (deg)")The two densities overlap; the slight differences at the two peaks
are within the sampling variation of n = 60 annual
maxima.
A fair benchmark of gpumetropolis against existing R
practice on extreme-value mixtures asks two distinct questions: how does
the single Gumbel fit compare to the established evd::fgev
route, and how does the 2-component Gumbel mixture fit compare to what
users would otherwise reach for. The two answers are quite different and
the package’s value proposition is in the second one.
evd::fgev is the standard one-liner for fitting a Gumbel
(or full GEV) by maximum likelihood. On the Oxford series:
fit_evd <- evd::fgev(y_ox, shape = 0) # shape = 0 -> Gumbel
fit_evd$estimate
#> loc scale
#> 83.199562 4.157983
sqrt(diag(fit_evd$var.cov)) # standard errors
#> [1] 0.4932118 0.3370603The MLE point estimates of loc (mu) and
scale (beta) agree with the Bayesian posterior
means of fit_single above to within the posterior standard
deviation. The MLE pipeline is faster (a single optimisation) and is the
right choice when the goal is just the point estimate plus delta-method
standard errors. The Bayesian path through gpumetropolis
adds: a full posterior, propagation of uncertainty into predictive
quantities (return levels, quantile credible bands), and the portability
of one kernel source across CPU, CUDA and Vulkan when the log-likelihood
is expensive over a large series.
Honest verdict on the single-component case: the established
evd pipeline is competitive in accuracy and faster in
wall-clock time; gpumetropolis is the right choice when the
user already wants a full Bayesian workflow or wants the same code path
for downstream mixture and copula inference. Neither dominates the other
on the Tier A unimodal Gumbel taken alone.
There is no R package today that fits a k-component
Gumbel mixture through a single function call. The closest options
are:
| Path | What it covers | Effort |
|---|---|---|
evd, extRemes, ismev,
eva |
single Gumbel / GEV / GP | low, mature |
evmix |
bulk-and-tail hybrid models (Normal bulk plus GP tail) | low for the supported families, not a k-component
Gumbel mixture |
mclust, mixR, flexmix,
mixtools |
finite mixtures, almost exclusively of Gaussians | low for Gaussian mixtures, requires custom distribution code for Gumbel |
evdbayes |
Bayesian inference for the single GEV via Metropolis | low for single GEV, requires manual extension for mixtures |
Custom rstan / cmdstanr |
any user-coded mixture, including Gumbel | medium to high; needs Stan code, posterior label-switching handling and warm-up tuning |
Custom nimble |
same as Stan with BUGS-style syntax | medium; nimble does not ship dgumbel, the user writes a
custom distribution |
The honest comparison for the mixture case is therefore against
custom-written Stan or nimble code. gpumetropolis offers
three things in that comparison:
gpumetropolis it is method = "pt". In a
Stan or nimble workflow the user typically reaches for relabelling
tricks, ordered constraints or hand-coded swap moves.n = 80 the GPU is not faster than the CPU; the portability
matters when the workflow scales to longer series or to many parallel
chains.A fair speed comparison is harder to state in one number because the
mixture pipeline is not standardised. The closest informal benchmark:
running the simulation fit above with nimble requires (i) a
custom dgumbel distribution, (ii) a careful order
constraint on the means, (iii) a longer warm-up to manage
label-switching without tempering. The user who is willing to write that
code gets results comparable to the gpumetropolis mixture
in wall-clock; the user who wants a one-formula path benefits from this
package.
The catalogue layer of v0.6.0 (next section) is the design that
closes the remaining gap:
gpum_fit_catalog(y, catalog = c("gumbel", "gumbel_mix2"))
will rank the two candidates by WAIC and pick the better one, the
practical equivalent of the established workflow for single Gumbel plus
the missing standard for the mixture.
The whole script above is the v0.3.0 user experience for an
extreme-value mixture: write the DSL formula, reparametrise to unbounded
coordinates, set per-chain initial values that bracket the modes, pick
method = "pt", post-process the labels.
The v0.6.0 release, the marginal auto-selection described in CATALOG_DESIGN.md,
will collapse the workflow to a single call:
# Not available until v0.6.0
result <- gpum_fit_catalog(
data = y,
catalog = c("gumbel", "gumbel_mix2"),
ranking = "WAIC"
)
plot(result)At that point the catalogue layer chooses the candidates from the
data type, runs the unimodal Gumbel and the 2-component Gumbel mixture
in parallel, ranks them by WAIC, handles the reparametrisation, enforces
the mu_1 < mu_2 constraint, and opens a posterior
predictive plot for the top model. Until then the workflow above is the
honest manual path.