6. Reference Guide
6.1. Probability density
Collect probability densities
Collect abstract base probability density classes
- class pypmc.density.base.LocalDensity[source]
Bases:
object
Abstract base class for a local probability density. Can be used as proposal for the Markov chain sampler.
- class pypmc.density.base.ProbabilityDensity[source]
Bases:
object
Abstract Base class of a probability density. Can be used as proposal for the importance sampler.
- evaluate(x)[source]
Evaluate log of the density to propose
x
, namely log(q(x)).- Parameters:
x – Vector-like array; the proposed point.
- multi_evaluate(x, out=None)[source]
Evaluate log of the density to propose
x
, namely log(q(x)) for each row in x.- Parameters:
x – Matrix-like array; the proposed points. Expect i-th accessible as
x[i]
.out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.
6.1.1. Gauss
Collect Gaussian probability densities
- class pypmc.density.gauss.Gauss(self, mu, sigma)
Bases:
ProbabilityDensity
A Gaussian probability density. Can be used as component for MixtureDensities.
- Parameters:
mu – Vector-like array; the gaussian’s mean \(\mu\)
sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)
- evaluate(self, ndarray x)
Evaluate log of the density to propose
x
, namely log(q(x)).- Parameters:
x – Vector-like array; the proposed point.
- multi_evaluate(self, ndarray x, ndarray out=None)
Evaluate log of the density to propose
x
, namely log(q(x)) for each row in x.- Parameters:
x – Matrix-like array; the proposed points. Expect i-th accessible as
x[i]
.out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.
- propose(self, int N=1, rng=_np.random.mtrand)
- propose(self, N=1, rng=numpy.random.mtrand) None
Propose N points using the random number generator
rng
.- Parameters:
N – Integer; the number of random points to be proposed
rng – State of a random number generator like
numpy.random.mtrand
Important
rng
must meet the requirements ofLocalGauss.propose
.
- update(self, mu, sigma)
Re-initialize the density with new mean and covariance matrix.
- Parameters:
mu – Vector-like array; the gaussian’s mean \(\mu\)
sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)
Note
On
LinAlgError
, the oldmu
andsigma
are plugged in and the proposal remains in a valid state.
- class pypmc.density.gauss.LocalGauss(self, sigma)
Bases:
LocalDensity
A multivariate local Gaussian density with redefinable covariance.
- Parameters:
sigma – Matrix-like array; covariance-matrix.
- evaluate(self, ndarray x, ndarray y)
Evaluate log of the density to propose
x
giveny
, namely log(q(x|y)).- Parameters:
x – Vector-like array; the proposed point
y – Vector-like array; the current point
- propose(self, y, rng=_np.random.mtrand)
- propose(self, y, rng=numpy.random.mtrand) None
Propose a new point given
y
using the random number generatorrng
.- Parameters:
y – Vector-like array; the current position
rng –
State of a random number generator like numpy.random.mtrand
Important
rng
must return a numpy array of N samples from:rng.normal(0,1,N): standard gaussian distribution
- update(self, sigma)
Re-initialize the proposal with a new covariance matrix.
- Parameters:
sigma – Matrix-like array; the new covariance-matrix.
Note
On
LinAlgError
, the old covariance matrix is plugged in and the proposal remains in a valid state.
6.1.2. StudentT
Collect Student’s t probability densities
- class pypmc.density.student_t.LocalStudentT(self, sigma, double dof)
Bases:
LocalGauss
A multivariate local Student’s t density with redefinable covariance.
- Parameters:
sigma – Matrix-like array; the covariance-matrix
dof – Float; the degrees of freedom
- evaluate(self, x, y)
Evaluate log of the density to propose
x
, namely log(q(x)).- Parameters:
x – Vector-like array; the proposed point.
- propose(self, y, rng=_np.random.mtrand)
- propose(self, N=1, rng=numpy.random.mtrand) None
Propose N points using the random number generator
rng
.- Parameters:
N – Integer; the number of random points to be proposed
rng –
State of a random number generator like
numpy.random.mtrand
Important
rng
must returna numpy array of N samples from rng.normal(0,1,N): standard gaussian distribution
sample as float from rng.chisquare(degree_of_freedom): any chi-squared distribution
- class pypmc.density.student_t.StudentT(self, mu, sigma, double dof)
Bases:
ProbabilityDensity
A Student’s t probability density. Can be used as a component in MixtureDensities.
- Parameters:
mu – Vector-like array; the gaussian’s mean \(\mu\)
sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)
dof – Float; the degrees of freedom \(\nu\)
- evaluate(self, ndarray x)
Evaluate log of the density to propose
x
, namely log(q(x)).- Parameters:
x – Vector-like array; the proposed point.
- multi_evaluate(self, ndarray x, ndarray out=None)
Evaluate log of the density to propose
x
, namely log(q(x)) for each row in x.- Parameters:
x – Matrix-like array; the proposed points. Expect i-th accessible as
x[i]
.out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.
- propose(self, int N=1, rng=_np.random.mtrand)
- propose(self, N=1, rng=numpy.random.mtrand) None
Propose N points using the random number generator
rng
.- Parameters:
N – Integer; the number of random points to be proposed
rng –
State of a random number generator like
numpy.random.mtrand
Important
rng
must meet the requirements ofLocalStudentT.propose
.
- update(self, mu, sigma, double dof)
Re-initialize the density with new mean, covariance matrix and degrees of freedom.
- Parameters:
mu – Vector-like array; the gaussian’s mean \(\mu\)
sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)
dof – Float; the degrees of freedom \(\nu\)
Note
On
LinAlgError
, the oldmu
,sigma
, anddof
are plugged in and the proposal remains in a valid state.
6.1.3. Mixture
Collect mixture probability densities
- class pypmc.density.mixture.MixtureDensity(self, components, weights=None)
Bases:
ProbabilityDensity
Base class for mixture probability densities.
- Parameters:
components – Iterable of ProbabilityDensities; the mixture’s components
weights – Iterable of floats; the weights of the components (will be normalized automatically during initialization)
- evaluate(self, x, individual=False)
Evaluate log of the density to propose
x
, namely log(q(x)).- Parameters:
x – Vector-like array; the proposed point.
individual – bool; If true, return the evaluation of each component at
x
as an array.
Important
This function assumes that the weights are normalized, i.e. that
self.normalized()
returnsTrue
.
- multi_evaluate(self, ndarray x, ndarray out=None, ndarray individual=None, components=None)
Evaluate density at all points in
x
for all components specified incomponents
and return inindividual
. Return log(q(x)) for each point (ifcomponents
isNone
).Use
individual
if you need the density atx
for each component and you want to compute it only once to increase efficiency.- Parameters:
x – (N x D) array; one D-dim. sample per row.
out – (N) array, optional; write output into this array (if
components
isNone
).individual – (N x K) array; density of k-th component at the n-th sample.
components – Iterable of integers; calculate
individual
for these components only.
- normalize(self)
Normalize the component weights to sum up to 1.
- normalized(self)
Check if the component weights are normalized.
- propose(self, N=1, rng=_np.random.mtrand, trace=False, shuffle=True)
Propose N points using the random number generator
rng
.- Parameters:
N – Integer; the number of random points to be proposed
rng – State of a random number generator like numpy.random.mtrand
shuffle – bool; if True (default), the samples are disordered. Otherwise, the samples are ordered by the components.
trace – bool; if True, return the proposed samples and an array containing the number of the component responsible for each sample, otherwise just return the samples.
Important
rng
must:return a numpy array of N samples from the multinomial distribution with probabilities
pvals
when calling rng.multinomial(N, pvals)shuffle an
array
in place when calling rng.shuffle(array)
Important
This function assumes that the weights are normalized, i.e. that
self.normalized()
returnsTrue
.
- prune(self, threshold=0.0)
Remove components with weight less or equal
threshold
. Return list of removed components and weights in the form: [(index, component, weight), …].- Parameters:
threshold – Float; components with lower weight are deleted
- pypmc.density.mixture.create_gaussian_mixture(means, covs, weights=None)
Create a
MixtureDensity
with Gaussian (Gauss
) components. The output can be used for the clustering algorithms.- Parameters:
means – Vector-like array; the means of the Gaussian mixture
covs – 3d-matrix-like array; the covariances of the Gaussian mixture.
cov[i]
will be interpreted as the i’th covariance matrix.weights –
Vector-like array, optional; the component weights. If not provided all components will get equal weight.
Note
The weights will automatically be normalized.
- pypmc.density.mixture.create_t_mixture(means, covs, dofs, weights=None)
Create a
MixtureDensity
with Student t (StudentT
) components. The output can be used for the clustering algorithms.- Parameters:
means – Vector-like array; the means of the mixture
covs – 3d-matrix-like array; the covariances of the mixture.
cov[i]
will be interpreted as the i’th covariance matrix.dofs – Vector-like array; the degrees of freedom
weights –
Vector-like array, optional; the component weights. If not provided all components will get equal weight.
Note
The weights will automatically be normalized.
- pypmc.density.mixture.recover_gaussian_mixture(mixture)
Extract the means, covariances and component weights from a
MixtureDensity
.- Parameters:
mixture –
MixtureDensity
with Gaussian (Gauss
) components; the mixture to be decomposed.
- pypmc.density.mixture.recover_t_mixture(mixture)
Extract the means, covariances, degrees of freedom and component weights from a
MixtureDensity
.- Parameters:
mixture –
MixtureDensity
with Student t (StudentT
) components; the mixture to be decomposed.
6.2. Sampler
Collect the sampler modules
6.2.1. Markov Chain
Collect Markov Chain
- class pypmc.sampler.markov_chain.MarkovChain(target, proposal, start, indicator=None, rng=numpy.random.mtrand)[source]
A Markov chain to generate samples from the target density.
- Parameters:
target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.
proposal –
The proposal density q. Should be of type
pypmc.density.base.LocalDensity
.Hint
If your proposal density is symmetric, define the member variable
proposal.symmetric = True
. This will omit calls to proposal.evaluate in the Metropolis-Hastings steps.start – The starting point of the Markov chain. (numpy array)
indicator –
The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True, otherwise the proposed point is rejected without call to target. Use this function to specify the support of the target.
See also
prealloc –
Integer; the number of Markov chain samples for which memory in
self.samples
is allocated. If more memory is needed, it will be allocated on demand.Hint
Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.
save_target_values – Bool; if
True
, store the evaluatedtarget
at every visited point inself.target_values
rng –
The rng passed to the proposal when calling proposal.propose
Important
rng
must return a sample from the uniform distribution in [0,1) when calling rng.rand()See also
rng
must also fulfill the requirements of your proposalpypmc.density.base.LocalDensity.propose
- class pypmc.sampler.markov_chain.AdaptiveMarkovChain(target, proposal, start, indicator=None, rng=numpy.random.mtrand)[source]
A Markov chain with proposal covariance adaptation as in [HST01] to generate samples from the target density.
- Parameters:
target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.
proposal –
The proposal density q. Should be of type
pypmc.density.base.LocalDensity
.Hint
If your proposal density is symmetric, define the member variable
proposal.symmetric = True
. This will omit calls to proposal.evaluate in the Metropolis-Hastings steps.start – The starting point of the Markov chain. (numpy array)
indicator –
The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True, otherwise the proposed point is rejected without call to target. Use this function to specify the support of the target.
See also
prealloc –
Integer; the number of Markov chain samples for which memory in
self.samples
is allocated. If more memory is needed, it will be allocated on demand.Hint
Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.
save_target_values – Bool; if
True
, store the evaluatedtarget
at every visited point inself.target_values
rng –
The rng passed to the proposal when calling proposal.propose
Important
rng
must return a sample from the uniform distribution in [0,1) when calling rng.rand()See also
rng
must also fulfill the requirements of your proposalpypmc.density.base.LocalDensity.propose
6.2.2. Importance Sampling
Some useful tools for importance sampling. The main class is
ImportanceSampler
and there are some utility functions.
- pypmc.sampler.importance_sampling.calculate_covariance(samples, weights)[source]
Calculates the covariance matrix of weighted samples (like the output of an importance-sampling run).
- Parameters:
samples – Matrix-like numpy array; the samples to be used.
weights – Vector-like numpy array; the (unnormalized) importance weights.
- pypmc.sampler.importance_sampling.calculate_expectation(samples, weights, f)[source]
Calculate the expectation value of function
f
using weighted samples (like the output of an importance-sampling run).Denoting \(x_n\) as the sample n and \(w_n\) as its (normalized) weight, the following is returned:
\[\sum_{n=1}^{N} w_n f(x_n) \mathrm{\ \ where\ \ } \sum_{n=1}^{N}w_n \overset{!}{=} 1\]- Parameters:
samples – Matrix-like numpy array; the samples to be used.
weights – Vector-like numpy array; the (unnormalized) importance weights.
f – Callable, the function to be evaluated.
- pypmc.sampler.importance_sampling.calculate_mean(samples, weights)[source]
Calculate the mean of weighted samples (like the output of an importance-sampling run).
- Parameters:
samples – Matrix-like numpy array; the samples to be used.
weights – Vector-like numpy array; the (unnormalized) importance weights.
- pypmc.sampler.importance_sampling.combine_weights(samples, weights, proposals)[source]
Calculate the deterministic mixture weights according to [Cor+12] given
samples
, standardweights
and theirproposals
for a number of steps in which importance samples are computed for the same target density but different proposals.Return the weights as a
pypmc.tools.History
such that the weights for each proposal are easily accessible.- Parameters:
samples – Iterable of matrix-like arrays; the weighted samples whose importance weights shall be combined. One sample per row in each array, one array for each step, or different proposal.
weights – Iterable of 1D arrays; the standard importance weights \(P(x_i^t)/q_t(x_i^t)\). Each array in the iterable contains all weights of the samples of step
t
, they array’s size has to match thet
-th entry in samples.proposals – Iterable of
pypmc.density.base.ProbabilityDensity
instances; the proposal densities from which thesamples
have been drawn.
- class pypmc.sampler.importance_sampling.ImportanceSampler(target, proposal, indicator=None, prealloc=0, rng=numpy.random.mtrand)[source]
An importance sampler, generates weighted samples from
target
usingproposal
.- Parameters:
target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.
proposal – The proposal density q. Should be of type
pypmc.density.base.ProbabilityDensity
.indicator –
The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True. Otherwise, the proposed point will get zero-weight without call to target. Use this function to specify the support of the target.
See also
prealloc –
Integer; the number of samples for which memory is preallocated. If more memory is needed, it will be allocated on demand.
Hint
Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.
save_target_values – Bool; if
True
, store the evaluatedtarget
at every visited point inself.target_values
rng –
The rng passed to the proposal when calling proposal.propose
Important
rng
must fulfill the requirements of your proposalpypmc.density.base.ProbabilityDensity.propose
6.3. Mixture adaptation
Collect algorithms to adapt a pypmc.density.base.ProbabilityDensity
to data.
6.3.1. Hierarchical clustering
Hierarchical clustering as described in [GR04]
- class pypmc.mix_adapt.hierarchical.Hierarchical(input_components, initial_guess)[source]
Bases:
object
Hierarchical clustering as described in [GR04].
Find a Gaussian mixture density \(g\) with components \(g_j\) that most closely matches the Gaussian mixture density specified by \(f\) and its components \(f_i\), but with less components. The algorithm is an iterative EM procedure alternating between a regroup and a refit step, and requires an
initial_guess
of the output density that defines the maximum number of components to use.- Parameters:
input_components –
pypmc.density.mixture.MixtureDensity
with Gaussian (pypmc.density.gauss.Gauss
) components; the Gaussian mixture to be reduced.initial_guess –
pypmc.density.mixture.MixtureDensity
with Gaussian (pypmc.density.gauss.Gauss
) components; initial guess for the EM algorithm.
- run(eps=0.0001, kill=True, max_steps=50, verbose=False)[source]
Perform the clustering on the input components updating the initial guess. The result is available in the member
self.g
.Return the number of iterations at convergence, or None.
- Parameters:
eps –
If relative change of distance between current and last step falls below
eps
, declare convergence:\[0 < \frac{d^t - d^{t-1}}{d^t} < \varepsilon\]kill – If a component is assigned zero weight (no input components), it is removed.
max_steps – Perform a maximum number of update steps.
6.3.2. Variational Bayes
Variational clustering as described in [Bis06]
- pypmc.mix_adapt.variational.Dirichlet_log_C(alpha)
Compute normalization constant of Dirichlet distribution on log scale, (B.23) of [Bis06] .
- class pypmc.mix_adapt.variational.GaussianInference(self, ndarray data, int components=0, weights=None, initial_guess='first', **kwargs)
Bases:
object
Approximate a probability density by a Gaussian mixture with a variational Bayes approach. The motivation, notation, and derivation is explained in detail in chapter 10.2 in [Bis06].
Typical usage: call
run
until convergence. If interested in clustering/classification, extract the responsibility matrix as the attributer
. Else get the Gaussian mixture density at the mode of the variational posterior usingmake_mixture
.See also
Another implementation can be found at https://github.com/jamesmcinerney/vbmm.
- Parameters:
data – Matrix like array; Each of the \(N\) rows contains one \(D\)-dimensional sample from the probability density to be approximated.
components – Integer; \(K\) is the number of Gaussian components in the approximating Gaussian mixture. Will be detected from
initial_guess
if provided.weights – Vector-like array; The i-th of the \(N\) entries contains the weight of the i-th sample in
data
. Weights must be nonnegative and finite.initial_guess –
string or
pypmc.density.mixture.MixtureDensity
with Gaussian (pypmc.density.gauss.Gauss
) components;Allowed string values:
”first”: initially place the components (defined by the mean parameter
m
) at the firstK
data points.”random”: like “first”, but randomly select
K
data points. For reproducibility, set the seed withnumpy.random.seed(123)
If a MixtureDensity, override other (default) values of the parameters
m
,W
andalpha
.Default: “first”
All keyword arguments are processed by
set_variational_parameters
.- E_step(self)
Compute expectation values and summary statistics.
- M_step(self)
Update parameters of the Gaussian-Wishart distribution.
- likelihood_bound(self)
Compute the lower bound on the true log marginal likelihood \(L(Q)\) given the current parameter estimates.
- make_mixture(self)
Return the mixture-density defined by the mode of the variational-Bayes estimate.
- posterior2prior(self)
Return references to posterior values of all variational parameters as dict.
Hint
GaussianInference
(…, **output) creates a new instance using the inferred posterior as prior.
- prior_posterior(self)
Return references to prior and posterior values of all variational parameters as dict.
- prune(self, threshold=1.)
Delete components with an effective number of samples \(N_k\) below the threshold.
- Parameters:
threshold – Float; the minimum effective number of samples a component must have to survive.
- run(self, iterations=1000, prune=1., rel_tol=1e-10, abs_tol=1e-5, verbose=False)
Run variational-Bayes parameter updates and check for convergence using the change of the log likelihood bound of the current and the last step. Convergence is not declared if the number of components changed, or if the bound decreased. For the standard algorithm, the bound must increase, but for modifications, this useful property may not hold for all parameter values.
Return the number of iterations at convergence, or None.
- Parameters:
iterations – Maximum number of updates.
prune – Call
prune
after each update; i.e., remove components whose associated effective number of samples is below the threshold. Setprune=0
to deactivate. Default: 1 (effective samples).rel_tol –
Relative tolerance \(\epsilon\). If two consecutive values of the log likelihood bound, \(L_t, L_{t-1}\), are close, declare convergence. More precisely, check that
\[\left\| \frac{L_t - L_{t-1}}{L_t} \right\| < \epsilon .\]abs_tol –
Absolute tolerance \(\epsilon_{a}\). If the current bound \(L_t\) is close to zero, (\(L_t < \epsilon_{a}\)), declare convergence if
\[\| L_t - L_{t-1} \| < \epsilon_a .\]
- set_variational_parameters(self, *args, **kwargs)
Reset the parameters to the submitted values or default.
Use this function to set the prior value (indicated by the subscript \(0\) as in \(\alpha_0\)) or the initial value (e.g., \(\alpha\)) used in the iterative procedure to find the values of the hyperparameters of variational posterior distribution.
Every parameter can be set in two ways:
1. It is specified for only one component, then it is copied to all other components.
2. It is specified separately for each component as a \(K\) vector.
The prior and posterior variational distributions of \(\boldsymbol{\mu}\) and \(\boldsymbol{\Lambda}\) for each component are given by
\[q(\boldsymbol{\mu}, \boldsymbol{\Lambda}) = q(\boldsymbol{\mu}|\boldsymbol{\Lambda}) q(\boldsymbol{\Lambda}) = \prod_{k=1}^K \mathcal{N}(\boldsymbol{\mu}_k|\boldsymbol{m_k},(\beta_k\boldsymbol{\Lambda}_k)^{-1}) \mathcal{W}(\boldsymbol{\Lambda}_k|\boldsymbol{W_k}, \nu_k),\]where \(\mathcal{N}\) denotes a Gaussian and \(\mathcal{W}\) a Wishart distribution. The weights \(\boldsymbol{\pi}\) follow a Dirichlet distribution
\[q(\boldsymbol{\pi}) = Dir(\boldsymbol{\pi}|\boldsymbol{\alpha}).\]Warning
This function may delete results obtained by
update
.- Parameters:
alpha (alpha0,) –
Float or \(K\) vector; parameter of the mixing coefficients’ probability distribution (prior: \(\alpha_0\), posterior initial value: \(\alpha\)).
\[\alpha_i > 0, i=1 \dots K.\]A scalar is promoted to a \(K\) vector as
\[\boldsymbol{\alpha} = (\alpha,\dots,\alpha),\]but a K vector is accepted, too.
Default:
\[\alpha = 10^{-5}.\]beta (beta0,) –
Float or \(K\) vector; \(\beta\) parameter of the probability distribution of \(\boldsymbol{\mu}\) and \(\boldsymbol{\Lambda}\). The same restrictions as for
alpha
apply. Default:\[\beta_0 = 10^{-5}.\]nu (nu0,) –
Float or \(K\) vector; degrees of freedom of the Wishart distribution of \(\boldsymbol{\Lambda}\). A well defined Wishard distribution requires:
\[\nu_0 \geq D - 1.\]The same restrictions as for
alpha
apply.Default:
\[\nu_0 = D - 1 + 10^{-5}.\]m (m0,) –
\(D\) vector or \(K \times D\) matrix; mean parameter for the Gaussian \(q(\boldsymbol{\mu_k}|\boldsymbol{m_k}, \beta_k \Lambda_k)\).
Default:
For the prior of each component:
\[\boldsymbol{m}_0 = (0,\dots,0)\]For initial value of the posterior, \(\boldsymbol{m}\): the sequence of \(K \times D\) equally spaced values in [-1,1] reshaped to \(K \times D\) dimensions.
Warning
If all \(\boldsymbol{m}_k\) are identical initially, they may remain identical. It is advisable to randomly scatter them in order to avoid singular behavior.
W (W0,) – \(D \times D\) or \(K \times D \times D\) matrix-like array; \(\boldsymbol{W}\) is a symmetric positive-definite matrix used in the Wishart distribution. Default: identity matrix in \(D\) dimensions for every component.
- update(self)
Recalculate the parameters (M step) and expectation values (E step) using the update equations.
- class pypmc.mix_adapt.variational.VBMerge(self, input_mixture, N, components=0, initial_guess='first', **kwargs)
Bases:
GaussianInference
Parsimonious reduction of Gaussian mixture models with a variational-Bayes approach [BGP10].
The idea is to reduce the number of components of an overly complex Gaussian mixture while retaining an accurate description. The original samples are not required, hence it much faster compared to standard variational Bayes. The great advantage compared to hierarchical clustering is that the number of output components is chosen automatically. One starts with (too) many components, updates, and removes those components with vanishing weight using
prune()
. All the methods the typical user wants to call are taken over from and documented inGaussianInference
.- Parameters:
input_mixture – MixtureDensity with Gauss components, the input to be compressed.
N – The number of (virtual) input samples that the
input_mixture
is based on. For example, ifinput_mixture
was fitted to 1000 samples, setN
to 1000.components – Integer; the maximum number of output components.
initial_guess – MixtureDensity with Gauss components, optional; the starting point for the optimization. If provided, its number of components defines the maximum possible and the parameter
components
is ignored.
All other keyword arguments are documented in
GaussianInference.set_variational_parameters
.
- pypmc.mix_adapt.variational.Wishart_H(D, nu, log_det)
Entropy of the Wishart distribution, (B.82) of [Bis06] .
- pypmc.mix_adapt.variational.Wishart_expect_log_lambda(D, nu, log_det)
\(E[\log |\Lambda|]\), (B.81) of [Bis06] .
- pypmc.mix_adapt.variational.Wishart_log_B(D, nu, log_det)
Compute first part of a Wishart distribution’s normalization, (B.79) of [Bis06], on the log scale.
- Parameters:
D – Dimension of parameter vector; i.e.
W
is a DxD matrix.nu – Degrees of freedom of a Wishart distribution.
log_det – The determinant of
W
, \(|W|\).
6.3.3. PMC
Collect Population Monte Carlo
- class pypmc.mix_adapt.pmc.PMC(self, ndarray samples, density, weights=None, latent=None, rb=True, mincount=0, **kwargs)
Bases:
object
Adapt a Gaussian or Student t mixture
density
using the (M-)PMC algorithm according to Cap+08]_, [Kil+09], and [HOD12]. It turns out that running multiple PMC updates using the same samples is often useful.- Parameters:
samples –
Matrix-like array; the samples to be used for the PMC run.
Warning
The
samples
are NOT copied!density –
MixtureDensity
withGauss
orStudentT
components; the density that proposed thesamples
and shall be updated.weights –
Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.
Warning
The
weights
are NOT copied!latent –
Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.
Warning
The
latent
variables are NOT copied!rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.
mincount –
Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.
Important
Only possible if
latent
is provided.See also
Additional keyword arguments are passed to the standalone PMC function.
- log_likelihood(self)
Calculate the log likelihood of the current density according to equation (5) in [Cap+08].
- run(self, iterations=1000, prune=0., rel_tol=1e-10, abs_tol=1e-5, verbose=False)
Run PMC updates and check for convergence using the change of the log likelihood of the current and the last step. Convergence is not declared if the likelihood decreased or if components are removed.
Return the number of iterations at convergence, or None.
The output density can be accessed via
self.density
.- Parameters:
iterations – Maximum number of updates.
prune – Call
MixtureDensity.prune
after each update; i.e., remove components whose component weight is below the threshold. Setprune=0
(default) to deactivate.rel_tol –
Relative tolerance \(\epsilon\). If two consecutive values of the log likelihood, \(L_t, L_{t-1}\), are close, declare convergence. More precisely, check that
\[\left\| \frac{L_t - L_{t-1}}{L_t} \right\| < \epsilon .\]abs_tol –
Absolute tolerance \(\epsilon_{a}\). If the current log likelihood \(L_t\) is close to zero, (\(L_t < \epsilon_{a}\)), declare convergence if
\[\| L_t - L_{t-1} \| < \epsilon_a .\]
- pypmc.mix_adapt.pmc.gaussian_pmc(ndarray samples, density, weights=None, latent=None, rb=True, mincount=0, copy=True)
Adapt a Gaussian mixture
density
using the (M-)PMC algorithm according to [Cap+08] and [Kil+09] and return the updated density.- Parameters:
samples – Matrix-like array; the samples to be used for the PMC run.
density –
MixtureDensity
withGauss
components; the density which proposed thesamples
and shall be updated.weights – Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.
latent – Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.
rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.
mincount –
Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.
Important
Only possible if
latent
is provided.See also
copy – Bool; If True (default), the parameter
density
remains untouched. Otherwise,density
is overwritten by the adapted density.
- pypmc.mix_adapt.pmc.student_t_pmc(ndarray samples, density, weights=None, latent=None, rb=True, dof_solver_steps=100, mindof=1e-5, maxdof=1e3, mincount=0, copy=True)
Adapt a Student t mixture
density
using the (M-)PMC algorithm according to [Cap+08], [Kil+09], and [HOD12] and return the updated density.- Parameters:
samples – Matrix-like array; the samples to be used for the PMC run.
density –
MixtureDensity
withStundentT
components; the density which proposed thesamples
and shall be updated.weights – Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.
latent – Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.
rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.
dof_solver_steps –
Integer; If
0
, the Student t’s degrees of freedom are not updated, otherwise an iterative algorithm is run for at mostdof_solver_steps
steps.Note
There is no closed form solution for the optimal degree of freedom. If
dof_solver_steps
is not0
,len(density)
first order equations must be solved numerically which can take a while.maxdof (mindof,) – Float; Degree of freedom adaptation is a one dimentional root finding problem. The numerical root finder used in this function (
scipy.optimize.brentq
) needs an interval where to search.mincount –
Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.
Important
Only possible if
latent
is provided.See also
copy – Bool; If True (default), the parameter
density
remains untouched. Otherwise,density
is overwritten by the adapted density.
6.3.4. Gelman-Rubin R-value
Functions associated with the Gelman-Rubin R value [GR92].
- pypmc.mix_adapt.r_value.make_r_gaussmix(data, K_g=15, critical_r=2.0, indices=None, approx=False)[source]
Use
data
from multiple “Iterative Simulations” (e.g. Markov Chains) to form a Gaussian Mixture. This approach refers to the “long patches” in [BC13].The idea is to group chains according to their R-value as in
r_group
and formK_g
Gaussian Components per chain group. Once the groups are found byr_group
, thedata
from each chain group is partitioned intoK_g
parts (usingpypmc.tools.partition
). For each of these parts a Gaussian with its empirical mean and covariance is created.Return a
pypmc.density.mixture.MixtureDensity
withpypmc.density.gauss.Gauss
components.See also
- Parameters:
data –
Iterable of matrix-like arrays; the individual items are interpreted as points from an individual chain.
Important
Every chain must bring the same number of points.
K_g – Integer; the number of components per chain group.
critical_r – Float; the maximum R value a chain group may have.
indices – Integer; Iterable of Integers; use R value in these dimensions only. Default is all.
Note
If
K_g
is too large, some covariance matrices may not be positive definite. ReduceK_g
or increaselen(data)
!- Parameters:
approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.
- pypmc.mix_adapt.r_value.make_r_tmix(data, K_g=15, critical_r=2.0, dof=5.0, indices=None, approx=False)[source]
Use
data
from multiple “Iterative Simulations” (e.g. Markov Chains) to form a Student t Mixture. This approach refers to the “long patches” in [BC13].The idea is to group chains according to their R-value as in
r_group
and formK_g
Student t Components per chain group. Once the groups are found byr_group
, thedata
from each chain group is partitioned intoK_g
parts (usingpypmc.tools.partition
). For each of these parts a Student t component with its empirical mean, covariance and degree of freedom is created.Return a
pypmc.density.mixture.MixtureDensity
withpypmc.density.student_t.StudentT
components.See also
- Parameters:
data –
Iterable of matrix-like arrays; the individual items are interpreted as points from an individual chain.
Important
Every chain must bring the same number of points.
K_g – Integer; the number of components per chain group.
critical_r – Float; the maximum R value a chain group may have.
dof – Float; the degree of freedom the components will have.
indices – Integer; Iterable of Integers; use R value in these dimensions only. Default is all.
approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.
- pypmc.mix_adapt.r_value.r_group(means, variances, n, critical_r=2.0, approx=False)[source]
Group
m
(Markov) chains whose commonr_value
is less thancritical_r
in each of the D dimensions.- Parameters:
means – (m x D) Matrix-like array; the mean value estimates.
variances – (m x D) Matrix-like array; the variance estimates.
n – Integer; the number of samples used to determine the estimates passed via
means
andvariances
.critical_r – Float; group the chains such that their common R value is below
critical_r
.approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.
- pypmc.mix_adapt.r_value.r_value(means, variances, n, approx=False)[source]
Calculate the Gelman-Rubin R value (Chapter 2.2 in [GR92]).
The R value can be used to quantify mixing of “multiple iterative simulations” (e.g. Markov Chains) in parameter space. An R value “close to one” indicates that all chains explored the same region of the parameter.
Note
The R value is defined only in one dimension.
- Parameters:
means – Vector-like array; the sample mean of each chain.
variances – Vector-like array; the sample variance of each chain.
n – Integer; the number of samples used to determine the estimates passed via
means
andvariances
.approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.
6.4. Tools
Helper functions for general purposes
6.4.1. Convergence diagnostics
Provide functions to rate the quality of weighted samples.
- pypmc.tools.convergence.ess(weights)[source]
Calculate the normalized effective sample size \(ESS\) [LC95] of samples with
weights
\(\omega_i\). \(ESS=0\) is terrible and \(ESS=1\) is perfect.\[ESS = \frac{1}{1+C^2}\]where
\[C^2 = \frac{1}{N} \sum_{i=1}^N (N \bar{\omega}_i - 1)^2\]\[\bar{\omega}_i = \frac{\omega_i}{\sum_i \omega_i}\]- Parameters:
weights – Vector-like array; the samples’ weights
- pypmc.tools.convergence.perp(weights)[source]
Calculate the normalized perplexity \(\mathcal{P}\) of samples with
weights
\(\omega_i\). \(\mathcal{P}=0\) is terrible and \(\mathcal{P}=1\) is perfect.\[\mathcal{P} = exp(H) / N\]where
\[H = - \sum_{i=1}^N \bar{\omega}_i log ~ \bar{\omega}_i\]\[\bar{\omega}_i = \frac{\omega_i}{\sum_i \omega_i}\]- Parameters:
weights – Vector-like array; the samples’ weights
6.4.2. History
- class pypmc.tools.History(dim, prealloc=1)[source]
Bases:
object
Save a history of 1d-arrays. Each call to
append
is counted as a new “run”.- Parameters:
dim – Integer; the length of 1d-arrays to be saved.
prealloc – Integer; indicates for how many points memory is allocated in advance. When more memory is needed, it will be allocated on demand.
Access:
self[run_nr]
andself[run_begin:run_end]
return one array that includes the samples for the runs specified (excluding run_end).Warning
Index access returns a reference. Modification changes the history.
Hint
Negative numbers are supported, for example
self[-1]
returns the latest run.- Example:
>>> h = History(2) >>> for i in range(2): >>> a = h.append(i+1) >>> a[:] = i+1 >>> h[0] # first run array([[ 1., 1.]]) >>> h[1] # second run array([[ 2., 2.], [ 2., 2.]]) >>> h[:] # entire history array([[ 1., 1.], [ 2., 2.], [ 2., 2.]]) >>> len(h) # number of runs 2
6.4.3. Indicator
Collect generators of typical indicator functions.
- pypmc.tools.indicator.ball(center, radius=1.0, bdy=True)[source]
Returns the indicator function of a ball.
- Parameters:
center –
A vector-like numpy array, defining the center of the ball.
len(center) fixes the dimension.
radius – Float or int, the radius of the ball
bdy – Bool, When
x
is at the ball’s boundary thenball_indicator(x)
returnsTrue
if and only ifbdy=True
.
- pypmc.tools.indicator.hyperrectangle(lower, upper, bdy=True)[source]
Returns the indicator function of a hyperrectangle.
- Parameters:
lower –
Vector-like numpy array, defining the lower boundary of the hyperrectangle.
len(lower) fixes the dimension.
upper – Vector-like numpy array, defining the upper boundary of the hyperrectangle.
bdy – Bool. When
x
is at the hyperrectangles’s boundary thenhr_indicator(x)
returnsTrue
if and only ifbdy=True
.
- pypmc.tools.indicator.merge_function_with_indicator(function, indicator, alternative)[source]
Returns a function such that a call to it is equivalent to:
- if indicator(x):
return function(x)
- else:
return alternative
Note that
function
is not called if indicator evaluates to False.- Parameters:
function – The function to be called when indicator returns True.
indicator – Bool-returning function; the indicator
alternative – The object to be returned when indicator returns False
6.4.4. Parallel sampler
Run sampling algorithms in parallel using mpi4py
- class pypmc.tools.parallel_sampler.MPISampler(sampler_type, comm=MPI.COMM_WORLD, mpi_tag=0, *args, **kwargs)[source]
An MPI4Py parallelized sampler. Parallelizes any
pypmc.sampler
.- Parameters:
sampler_type – A class defined in
pypmc.sampler
; the class of the sampler to be run in parallel. Example:sampler_type=ImportanceSampler
.comm –
mpi4py
communicator; the communicator to be used.kwargs (args,) – Additional arguments which are passed to the constructor of
sampler_type
.
6.4.5. Partition
- pypmc.tools.partition(N, k)[source]
Distribute
N
intok
parts such that each part takes the valueN//k
orN//k + 1
where//
denotes integer division; i.e., perform the minimal lexicographic integer partition.Example: N = 5, k = 2 –> return [3, 2]
- pypmc.tools.patch_data(data, L=100, try_diag=True, verbose=False)[source]
Patch
data
(for example Markov chain output) into parts of lengthL
. Return a Gaussian mixture where each component gets the empirical mean and covariance of one patch.- Parameters:
data – Matrix-like array; the points to be patched. Expect
data[i]
as the d-dimensional i-th point.L – Integer; the length of one patch. The last patch will be shorter if
L
is not a divisor oflen(data)
.try_diag – Bool; If some patch does not define a proper covariance matrix, it cannot define a Gaussian component.
try_diag
defines how to handle that case: IfTrue
(default), the off-diagonal elements are set to zero and it is tried to form a Gaussian with that matrix again. If that fails as well, the patch is skipped. IfFalse
the patch is skipped directly.
6.4.6. Plot
- pypmc.tools.plot_mixture(mixture, i=0, j=1, center_style={'s': 0.15}, cmap='nipy_spectral', cutoff=0.0, ellipse_style={'alpha': 0.3}, solid_edge=True, visualize_weights=False)[source]
Plot the (Gaussian) components of the
mixture
density as one-sigma ellipses in the(i,j)
plane.- Parameters:
center_style – If a non-empty
dict
, plot mean value with the style passed toscatter
.cmap – The color map to which components are mapped in order to choose their face color. It is unaffected by the
cutoff
. The meaning depends onvisualize_weights
.cutoff – Ignore components whose weight is below the
cut off
.ellipse_style – Passed on to define the properties of the
Ellipse
.solid_edge – Draw the edge of the ellipse as solid opaque line.
visualize_weights – Colorize the components according to their weights if
True
. One can do plt.colorbar() after this function and the bar allows to read off the weights. IfFalse
, coloring is based on the component index and the total number of components. This option makes it easier to track components by assigning them the same color in subsequent calls to this function.
- pypmc.tools.plot_responsibility(data, responsibility, cmap='nipy_spectral')[source]
Classify the 2D
data
according to theresponsibility
and make a scatter plot of each data point with the color of the component it is most likely from. Theresponsibility
is normalized internally such that each row sums to unity.- Parameters:
data – matrix-like; one row = one 2D sample
responsibility – matrix-like; one row = probabilities that sample n is from 1st, 2nd, … component. The number of rows has to agree with
data
cmap – colormap; defines how component indices are mapped to the color of the data points