6. Reference Guide

6.1. Probability density

Collect probability densities

Collect abstract base probability density classes

class pypmc.density.base.LocalDensity[source]

Bases: object

Abstract base class for a local probability density. Can be used as proposal for the Markov chain sampler.

evaluate(x, y)[source]

Evaluate log of the density to propose x given y, namely log(q(x|y)).

Parameters:
  • x – Vector-like array; the proposed point

  • y – Vector-like array; the current point

propose(self, y, rng=numpy.random.mtrand)[source]

Propose a new point given y using the random number generator rng.

Parameters:
  • y – Vector-like array; the current position

  • rng – State of a random number generator like numpy.random.mtrand

class pypmc.density.base.ProbabilityDensity[source]

Bases: object

Abstract Base class of a probability density. Can be used as proposal for the importance sampler.

evaluate(x)[source]

Evaluate log of the density to propose x, namely log(q(x)).

Parameters:

x – Vector-like array; the proposed point.

multi_evaluate(x, out=None)[source]

Evaluate log of the density to propose x, namely log(q(x)) for each row in x.

Parameters:
  • x – Matrix-like array; the proposed points. Expect i-th accessible as x[i].

  • out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.

propose(self, N=1, rng=numpy.random.mtrand)[source]

Propose N points using the random number generator rng.

Parameters:
  • N – Integer; the number of random points to be proposed

  • rng – State of a random number generator like numpy.random.mtrand

6.1.1. Gauss

Collect Gaussian probability densities

class pypmc.density.gauss.Gauss(self, mu, sigma)

Bases: ProbabilityDensity

A Gaussian probability density. Can be used as component for MixtureDensities.

Parameters:
  • mu – Vector-like array; the gaussian’s mean \(\mu\)

  • sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)

evaluate(self, ndarray x)

Evaluate log of the density to propose x, namely log(q(x)).

Parameters:

x – Vector-like array; the proposed point.

multi_evaluate(self, ndarray x, ndarray out=None)

Evaluate log of the density to propose x, namely log(q(x)) for each row in x.

Parameters:
  • x – Matrix-like array; the proposed points. Expect i-th accessible as x[i].

  • out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.

propose(self, int N=1, rng=_np.random.mtrand)
propose(self, N=1, rng=numpy.random.mtrand) None

Propose N points using the random number generator rng.

Parameters:
  • N – Integer; the number of random points to be proposed

  • rng – State of a random number generator like numpy.random.mtrand

Important

rng must meet the requirements of LocalGauss.propose.

update(self, mu, sigma)

Re-initialize the density with new mean and covariance matrix.

Parameters:
  • mu – Vector-like array; the gaussian’s mean \(\mu\)

  • sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)

Note

On LinAlgError, the old mu and sigma are plugged in and the proposal remains in a valid state.

class pypmc.density.gauss.LocalGauss(self, sigma)

Bases: LocalDensity

A multivariate local Gaussian density with redefinable covariance.

Parameters:

sigma – Matrix-like array; covariance-matrix.

evaluate(self, ndarray x, ndarray y)

Evaluate log of the density to propose x given y, namely log(q(x|y)).

Parameters:
  • x – Vector-like array; the proposed point

  • y – Vector-like array; the current point

propose(self, y, rng=_np.random.mtrand)
propose(self, y, rng=numpy.random.mtrand) None

Propose a new point given y using the random number generator rng.

Parameters:
  • y – Vector-like array; the current position

  • rng

    State of a random number generator like numpy.random.mtrand

    Important

    rng must return a numpy array of N samples from:

    • rng.normal(0,1,N): standard gaussian distribution

update(self, sigma)

Re-initialize the proposal with a new covariance matrix.

Parameters:

sigma – Matrix-like array; the new covariance-matrix.

Note

On LinAlgError, the old covariance matrix is plugged in and the proposal remains in a valid state.

6.1.2. StudentT

Collect Student’s t probability densities

class pypmc.density.student_t.LocalStudentT(self, sigma, double dof)

Bases: LocalGauss

A multivariate local Student’s t density with redefinable covariance.

Parameters:
  • sigma – Matrix-like array; the covariance-matrix

  • dof – Float; the degrees of freedom

evaluate(self, x, y)

Evaluate log of the density to propose x, namely log(q(x)).

Parameters:

x – Vector-like array; the proposed point.

propose(self, y, rng=_np.random.mtrand)
propose(self, N=1, rng=numpy.random.mtrand) None

Propose N points using the random number generator rng.

Parameters:
  • N – Integer; the number of random points to be proposed

  • rng

    State of a random number generator like numpy.random.mtrand

    Important

    rng must return

    • a numpy array of N samples from rng.normal(0,1,N): standard gaussian distribution

    • sample as float from rng.chisquare(degree_of_freedom): any chi-squared distribution

class pypmc.density.student_t.StudentT(self, mu, sigma, double dof)

Bases: ProbabilityDensity

A Student’s t probability density. Can be used as a component in MixtureDensities.

Parameters:
  • mu – Vector-like array; the gaussian’s mean \(\mu\)

  • sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)

  • dof – Float; the degrees of freedom \(\nu\)

evaluate(self, ndarray x)

Evaluate log of the density to propose x, namely log(q(x)).

Parameters:

x – Vector-like array; the proposed point.

multi_evaluate(self, ndarray x, ndarray out=None)

Evaluate log of the density to propose x, namely log(q(x)) for each row in x.

Parameters:
  • x – Matrix-like array; the proposed points. Expect i-th accessible as x[i].

  • out – Vector-like array, length==``len(x)``, optional; If provided, the output is written into this array.

propose(self, int N=1, rng=_np.random.mtrand)
propose(self, N=1, rng=numpy.random.mtrand) None

Propose N points using the random number generator rng.

Parameters:
  • N – Integer; the number of random points to be proposed

  • rng

    State of a random number generator like numpy.random.mtrand

    Important

    rng must meet the requirements of LocalStudentT.propose.

update(self, mu, sigma, double dof)

Re-initialize the density with new mean, covariance matrix and degrees of freedom.

Parameters:
  • mu – Vector-like array; the gaussian’s mean \(\mu\)

  • sigma – Matrix-like array; the gaussian’s covariance matrix \(\Sigma\)

  • dof – Float; the degrees of freedom \(\nu\)

Note

On LinAlgError, the old mu, sigma, and dof are plugged in and the proposal remains in a valid state.

6.1.3. Mixture

Collect mixture probability densities

class pypmc.density.mixture.MixtureDensity(self, components, weights=None)

Bases: ProbabilityDensity

Base class for mixture probability densities.

Parameters:
  • components – Iterable of ProbabilityDensities; the mixture’s components

  • weights – Iterable of floats; the weights of the components (will be normalized automatically during initialization)

evaluate(self, x, individual=False)

Evaluate log of the density to propose x, namely log(q(x)).

Parameters:
  • x – Vector-like array; the proposed point.

  • individual – bool; If true, return the evaluation of each component at x as an array.

Important

This function assumes that the weights are normalized, i.e. that self.normalized() returns True.

multi_evaluate(self, ndarray x, ndarray out=None, ndarray individual=None, components=None)

Evaluate density at all points in x for all components specified in components and return in individual. Return log(q(x)) for each point (if components is None).

Use individual if you need the density at x for each component and you want to compute it only once to increase efficiency.

Parameters:
  • x – (N x D) array; one D-dim. sample per row.

  • out – (N) array, optional; write output into this array (if components is None).

  • individual – (N x K) array; density of k-th component at the n-th sample.

  • components – Iterable of integers; calculate individual for these components only.

normalize(self)

Normalize the component weights to sum up to 1.

normalized(self)

Check if the component weights are normalized.

propose(self, N=1, rng=_np.random.mtrand, trace=False, shuffle=True)

Propose N points using the random number generator rng.

Parameters:
  • N – Integer; the number of random points to be proposed

  • rng – State of a random number generator like numpy.random.mtrand

  • shuffle – bool; if True (default), the samples are disordered. Otherwise, the samples are ordered by the components.

  • trace – bool; if True, return the proposed samples and an array containing the number of the component responsible for each sample, otherwise just return the samples.

Important

rng must:
  • return a numpy array of N samples from the multinomial distribution with probabilities pvals when calling rng.multinomial(N, pvals)

  • shuffle an array in place when calling rng.shuffle(array)

Important

This function assumes that the weights are normalized, i.e. that self.normalized() returns True.

prune(self, threshold=0.0)

Remove components with weight less or equal threshold. Return list of removed components and weights in the form: [(index, component, weight), …].

Parameters:

threshold – Float; components with lower weight are deleted

pypmc.density.mixture.create_gaussian_mixture(means, covs, weights=None)

Create a MixtureDensity with Gaussian (Gauss) components. The output can be used for the clustering algorithms.

Parameters:
  • means – Vector-like array; the means of the Gaussian mixture

  • covs – 3d-matrix-like array; the covariances of the Gaussian mixture. cov[i] will be interpreted as the i’th covariance matrix.

  • weights

    Vector-like array, optional; the component weights. If not provided all components will get equal weight.

    Note

    The weights will automatically be normalized.

pypmc.density.mixture.create_t_mixture(means, covs, dofs, weights=None)

Create a MixtureDensity with Student t (StudentT) components. The output can be used for the clustering algorithms.

Parameters:
  • means – Vector-like array; the means of the mixture

  • covs – 3d-matrix-like array; the covariances of the mixture. cov[i] will be interpreted as the i’th covariance matrix.

  • dofs – Vector-like array; the degrees of freedom

  • weights

    Vector-like array, optional; the component weights. If not provided all components will get equal weight.

    Note

    The weights will automatically be normalized.

pypmc.density.mixture.recover_gaussian_mixture(mixture)

Extract the means, covariances and component weights from a MixtureDensity.

Parameters:

mixtureMixtureDensity with Gaussian (Gauss) components; the mixture to be decomposed.

pypmc.density.mixture.recover_t_mixture(mixture)

Extract the means, covariances, degrees of freedom and component weights from a MixtureDensity.

Parameters:

mixtureMixtureDensity with Student t (StudentT) components; the mixture to be decomposed.

6.2. Sampler

Collect the sampler modules

6.2.1. Markov Chain

Collect Markov Chain

class pypmc.sampler.markov_chain.MarkovChain(target, proposal, start, indicator=None, rng=numpy.random.mtrand)[source]

A Markov chain to generate samples from the target density.

Parameters:
  • target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.

  • proposal

    The proposal density q. Should be of type pypmc.density.base.LocalDensity.

    Hint

    If your proposal density is symmetric, define the member variable proposal.symmetric = True. This will omit calls to proposal.evaluate in the Metropolis-Hastings steps.

  • start – The starting point of the Markov chain. (numpy array)

  • indicator

    The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True, otherwise the proposed point is rejected without call to target. Use this function to specify the support of the target.

  • prealloc

    Integer; the number of Markov chain samples for which memory in self.samples is allocated. If more memory is needed, it will be allocated on demand.

    Hint

    Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.

  • save_target_values – Bool; if True, store the evaluated target at every visited point in self.target_values

  • rng

    The rng passed to the proposal when calling proposal.propose

    Important

    rng must return a sample from the uniform distribution in [0,1) when calling rng.rand()

    See also

    rng must also fulfill the requirements of your proposal pypmc.density.base.LocalDensity.propose

class pypmc.sampler.markov_chain.AdaptiveMarkovChain(target, proposal, start, indicator=None, rng=numpy.random.mtrand)[source]

A Markov chain with proposal covariance adaptation as in [HST01] to generate samples from the target density.

Parameters:
  • target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.

  • proposal

    The proposal density q. Should be of type pypmc.density.base.LocalDensity.

    Hint

    If your proposal density is symmetric, define the member variable proposal.symmetric = True. This will omit calls to proposal.evaluate in the Metropolis-Hastings steps.

  • start – The starting point of the Markov chain. (numpy array)

  • indicator

    The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True, otherwise the proposed point is rejected without call to target. Use this function to specify the support of the target.

  • prealloc

    Integer; the number of Markov chain samples for which memory in self.samples is allocated. If more memory is needed, it will be allocated on demand.

    Hint

    Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.

  • save_target_values – Bool; if True, store the evaluated target at every visited point in self.target_values

  • rng

    The rng passed to the proposal when calling proposal.propose

    Important

    rng must return a sample from the uniform distribution in [0,1) when calling rng.rand()

    See also

    rng must also fulfill the requirements of your proposal pypmc.density.base.LocalDensity.propose

6.2.2. Importance Sampling

Some useful tools for importance sampling. The main class is ImportanceSampler and there are some utility functions.

pypmc.sampler.importance_sampling.calculate_covariance(samples, weights)[source]

Calculates the covariance matrix of weighted samples (like the output of an importance-sampling run).

Parameters:
  • samples – Matrix-like numpy array; the samples to be used.

  • weights – Vector-like numpy array; the (unnormalized) importance weights.

pypmc.sampler.importance_sampling.calculate_expectation(samples, weights, f)[source]

Calculate the expectation value of function f using weighted samples (like the output of an importance-sampling run).

Denoting \(x_n\) as the sample n and \(w_n\) as its (normalized) weight, the following is returned:

\[\sum_{n=1}^{N} w_n f(x_n) \mathrm{\ \ where\ \ } \sum_{n=1}^{N}w_n \overset{!}{=} 1\]
Parameters:
  • samples – Matrix-like numpy array; the samples to be used.

  • weights – Vector-like numpy array; the (unnormalized) importance weights.

  • f – Callable, the function to be evaluated.

pypmc.sampler.importance_sampling.calculate_mean(samples, weights)[source]

Calculate the mean of weighted samples (like the output of an importance-sampling run).

Parameters:
  • samples – Matrix-like numpy array; the samples to be used.

  • weights – Vector-like numpy array; the (unnormalized) importance weights.

pypmc.sampler.importance_sampling.combine_weights(samples, weights, proposals)[source]

Calculate the deterministic mixture weights according to [Cor+12] given samples, standard weights and their proposals for a number of steps in which importance samples are computed for the same target density but different proposals.

Return the weights as a pypmc.tools.History such that the weights for each proposal are easily accessible.

Parameters:
  • samples – Iterable of matrix-like arrays; the weighted samples whose importance weights shall be combined. One sample per row in each array, one array for each step, or different proposal.

  • weights – Iterable of 1D arrays; the standard importance weights \(P(x_i^t)/q_t(x_i^t)\). Each array in the iterable contains all weights of the samples of step t, they array’s size has to match the t-th entry in samples.

  • proposals – Iterable of pypmc.density.base.ProbabilityDensity instances; the proposal densities from which the samples have been drawn.

class pypmc.sampler.importance_sampling.ImportanceSampler(target, proposal, indicator=None, prealloc=0, rng=numpy.random.mtrand)[source]

An importance sampler, generates weighted samples from target using proposal.

Parameters:
  • target – The target density. Must be a function accepting a 1d numpy array and returning a float, namely \(\log(P(x))\), the log of the target P.

  • proposal – The proposal density q. Should be of type pypmc.density.base.ProbabilityDensity.

  • indicator

    The indicator function receives a numpy array and returns bool. The target is only called if indicator(proposed_point) returns True. Otherwise, the proposed point will get zero-weight without call to target. Use this function to specify the support of the target.

  • prealloc

    Integer; the number of samples for which memory is preallocated. If more memory is needed, it will be allocated on demand.

    Hint

    Preallocating memory can speed up the calculation, in particular if it is known in advance how long the chains are run.

  • save_target_values – Bool; if True, store the evaluated target at every visited point in self.target_values

  • rng

    The rng passed to the proposal when calling proposal.propose

    Important

    rng must fulfill the requirements of your proposal pypmc.density.base.ProbabilityDensity.propose

6.3. Mixture adaptation

Collect algorithms to adapt a pypmc.density.base.ProbabilityDensity to data.

6.3.1. Hierarchical clustering

Hierarchical clustering as described in [GR04]

class pypmc.mix_adapt.hierarchical.Hierarchical(input_components, initial_guess)[source]

Bases: object

Hierarchical clustering as described in [GR04].

Find a Gaussian mixture density \(g\) with components \(g_j\) that most closely matches the Gaussian mixture density specified by \(f\) and its components \(f_i\), but with less components. The algorithm is an iterative EM procedure alternating between a regroup and a refit step, and requires an initial_guess of the output density that defines the maximum number of components to use.

Parameters:
run(eps=0.0001, kill=True, max_steps=50, verbose=False)[source]

Perform the clustering on the input components updating the initial guess. The result is available in the member self.g.

Return the number of iterations at convergence, or None.

Parameters:
  • eps

    If relative change of distance between current and last step falls below eps, declare convergence:

    \[0 < \frac{d^t - d^{t-1}}{d^t} < \varepsilon\]

  • kill – If a component is assigned zero weight (no input components), it is removed.

  • max_steps – Perform a maximum number of update steps.

pypmc.mix_adapt.hierarchical.kullback_leibler(c1, c2)[source]

Kullback Leibler divergence of two Gaussians, \(KL(1||2)\)

6.3.2. Variational Bayes

Variational clustering as described in [Bis06]

pypmc.mix_adapt.variational.Dirichlet_log_C(alpha)

Compute normalization constant of Dirichlet distribution on log scale, (B.23) of [Bis06] .

class pypmc.mix_adapt.variational.GaussianInference(self, ndarray data, int components=0, weights=None, initial_guess='first', **kwargs)

Bases: object

Approximate a probability density by a Gaussian mixture with a variational Bayes approach. The motivation, notation, and derivation is explained in detail in chapter 10.2 in [Bis06].

Typical usage: call run until convergence. If interested in clustering/classification, extract the responsibility matrix as the attribute r. Else get the Gaussian mixture density at the mode of the variational posterior using make_mixture.

See also

Another implementation can be found at https://github.com/jamesmcinerney/vbmm.

Parameters:
  • data – Matrix like array; Each of the \(N\) rows contains one \(D\)-dimensional sample from the probability density to be approximated.

  • components – Integer; \(K\) is the number of Gaussian components in the approximating Gaussian mixture. Will be detected from initial_guess if provided.

  • weights – Vector-like array; The i-th of the \(N\) entries contains the weight of the i-th sample in data. Weights must be nonnegative and finite.

  • initial_guess

    string or pypmc.density.mixture.MixtureDensity with Gaussian (pypmc.density.gauss.Gauss) components;

    Allowed string values:

    • ”first”: initially place the components (defined by the mean parameter m) at the first K data points.

    • ”random”: like “first”, but randomly select K data points. For reproducibility, set the seed with numpy.random.seed(123)

    If a MixtureDensity, override other (default) values of the parameters m, W and alpha.

    Default: “first”

All keyword arguments are processed by set_variational_parameters.

E_step(self)

Compute expectation values and summary statistics.

M_step(self)

Update parameters of the Gaussian-Wishart distribution.

likelihood_bound(self)

Compute the lower bound on the true log marginal likelihood \(L(Q)\) given the current parameter estimates.

make_mixture(self)

Return the mixture-density defined by the mode of the variational-Bayes estimate.

posterior2prior(self)

Return references to posterior values of all variational parameters as dict.

Hint

GaussianInference(…, **output) creates a new instance using the inferred posterior as prior.

prior_posterior(self)

Return references to prior and posterior values of all variational parameters as dict.

prune(self, threshold=1.)

Delete components with an effective number of samples \(N_k\) below the threshold.

Parameters:

threshold – Float; the minimum effective number of samples a component must have to survive.

run(self, iterations=1000, prune=1., rel_tol=1e-10, abs_tol=1e-5, verbose=False)

Run variational-Bayes parameter updates and check for convergence using the change of the log likelihood bound of the current and the last step. Convergence is not declared if the number of components changed, or if the bound decreased. For the standard algorithm, the bound must increase, but for modifications, this useful property may not hold for all parameter values.

Return the number of iterations at convergence, or None.

Parameters:
  • iterations – Maximum number of updates.

  • prune – Call prune after each update; i.e., remove components whose associated effective number of samples is below the threshold. Set prune=0 to deactivate. Default: 1 (effective samples).

  • rel_tol

    Relative tolerance \(\epsilon\). If two consecutive values of the log likelihood bound, \(L_t, L_{t-1}\), are close, declare convergence. More precisely, check that

    \[\left\| \frac{L_t - L_{t-1}}{L_t} \right\| < \epsilon .\]

  • abs_tol

    Absolute tolerance \(\epsilon_{a}\). If the current bound \(L_t\) is close to zero, (\(L_t < \epsilon_{a}\)), declare convergence if

    \[\| L_t - L_{t-1} \| < \epsilon_a .\]

set_variational_parameters(self, *args, **kwargs)

Reset the parameters to the submitted values or default.

Use this function to set the prior value (indicated by the subscript \(0\) as in \(\alpha_0\)) or the initial value (e.g., \(\alpha\)) used in the iterative procedure to find the values of the hyperparameters of variational posterior distribution.

Every parameter can be set in two ways:

1. It is specified for only one component, then it is copied to all other components.

2. It is specified separately for each component as a \(K\) vector.

The prior and posterior variational distributions of \(\boldsymbol{\mu}\) and \(\boldsymbol{\Lambda}\) for each component are given by

\[q(\boldsymbol{\mu}, \boldsymbol{\Lambda}) = q(\boldsymbol{\mu}|\boldsymbol{\Lambda}) q(\boldsymbol{\Lambda}) = \prod_{k=1}^K \mathcal{N}(\boldsymbol{\mu}_k|\boldsymbol{m_k},(\beta_k\boldsymbol{\Lambda}_k)^{-1}) \mathcal{W}(\boldsymbol{\Lambda}_k|\boldsymbol{W_k}, \nu_k),\]

where \(\mathcal{N}\) denotes a Gaussian and \(\mathcal{W}\) a Wishart distribution. The weights \(\boldsymbol{\pi}\) follow a Dirichlet distribution

\[q(\boldsymbol{\pi}) = Dir(\boldsymbol{\pi}|\boldsymbol{\alpha}).\]

Warning

This function may delete results obtained by update.

Parameters:
  • alpha (alpha0,) –

    Float or \(K\) vector; parameter of the mixing coefficients’ probability distribution (prior: \(\alpha_0\), posterior initial value: \(\alpha\)).

    \[\alpha_i > 0, i=1 \dots K.\]

    A scalar is promoted to a \(K\) vector as

    \[\boldsymbol{\alpha} = (\alpha,\dots,\alpha),\]

    but a K vector is accepted, too.

    Default:

    \[\alpha = 10^{-5}.\]

  • beta (beta0,) –

    Float or \(K\) vector; \(\beta\) parameter of the probability distribution of \(\boldsymbol{\mu}\) and \(\boldsymbol{\Lambda}\). The same restrictions as for alpha apply. Default:

    \[\beta_0 = 10^{-5}.\]

  • nu (nu0,) –

    Float or \(K\) vector; degrees of freedom of the Wishart distribution of \(\boldsymbol{\Lambda}\). A well defined Wishard distribution requires:

    \[\nu_0 \geq D - 1.\]

    The same restrictions as for alpha apply.

    Default:

    \[\nu_0 = D - 1 + 10^{-5}.\]

  • m (m0,) –

    \(D\) vector or \(K \times D\) matrix; mean parameter for the Gaussian \(q(\boldsymbol{\mu_k}|\boldsymbol{m_k}, \beta_k \Lambda_k)\).

    Default:

    For the prior of each component:

    \[\boldsymbol{m}_0 = (0,\dots,0)\]

    For initial value of the posterior, \(\boldsymbol{m}\): the sequence of \(K \times D\) equally spaced values in [-1,1] reshaped to \(K \times D\) dimensions.

    Warning

    If all \(\boldsymbol{m}_k\) are identical initially, they may remain identical. It is advisable to randomly scatter them in order to avoid singular behavior.

  • W (W0,) – \(D \times D\) or \(K \times D \times D\) matrix-like array; \(\boldsymbol{W}\) is a symmetric positive-definite matrix used in the Wishart distribution. Default: identity matrix in \(D\) dimensions for every component.

update(self)

Recalculate the parameters (M step) and expectation values (E step) using the update equations.

class pypmc.mix_adapt.variational.VBMerge(self, input_mixture, N, components=0, initial_guess='first', **kwargs)

Bases: GaussianInference

Parsimonious reduction of Gaussian mixture models with a variational-Bayes approach [BGP10].

The idea is to reduce the number of components of an overly complex Gaussian mixture while retaining an accurate description. The original samples are not required, hence it much faster compared to standard variational Bayes. The great advantage compared to hierarchical clustering is that the number of output components is chosen automatically. One starts with (too) many components, updates, and removes those components with vanishing weight using prune(). All the methods the typical user wants to call are taken over from and documented in GaussianInference.

Parameters:
  • input_mixture – MixtureDensity with Gauss components, the input to be compressed.

  • N – The number of (virtual) input samples that the input_mixture is based on. For example, if input_mixture was fitted to 1000 samples, set N to 1000.

  • components – Integer; the maximum number of output components.

  • initial_guess – MixtureDensity with Gauss components, optional; the starting point for the optimization. If provided, its number of components defines the maximum possible and the parameter components is ignored.

All other keyword arguments are documented in GaussianInference.set_variational_parameters.

pypmc.mix_adapt.variational.Wishart_H(D, nu, log_det)

Entropy of the Wishart distribution, (B.82) of [Bis06] .

pypmc.mix_adapt.variational.Wishart_expect_log_lambda(D, nu, log_det)

\(E[\log |\Lambda|]\), (B.81) of [Bis06] .

pypmc.mix_adapt.variational.Wishart_log_B(D, nu, log_det)

Compute first part of a Wishart distribution’s normalization, (B.79) of [Bis06], on the log scale.

Parameters:
  • D – Dimension of parameter vector; i.e. W is a DxD matrix.

  • nu – Degrees of freedom of a Wishart distribution.

  • log_det – The determinant of W, \(|W|\).

6.3.3. PMC

Collect Population Monte Carlo

class pypmc.mix_adapt.pmc.PMC(self, ndarray samples, density, weights=None, latent=None, rb=True, mincount=0, **kwargs)

Bases: object

Adapt a Gaussian or Student t mixture density using the (M-)PMC algorithm according to Cap+08]_, [Kil+09], and [HOD12]. It turns out that running multiple PMC updates using the same samples is often useful.

Parameters:
  • samples

    Matrix-like array; the samples to be used for the PMC run.

    Warning

    The samples are NOT copied!

  • densityMixtureDensity with Gauss or StudentT components; the density that proposed the samples and shall be updated.

  • weights

    Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.

    Warning

    The weights are NOT copied!

  • latent

    Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.

    Warning

    The latent variables are NOT copied!

  • rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.

  • mincount

    Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.

    Important

    Only possible if latent is provided.

Additional keyword arguments are passed to the standalone PMC function.

log_likelihood(self)

Calculate the log likelihood of the current density according to equation (5) in [Cap+08].

run(self, iterations=1000, prune=0., rel_tol=1e-10, abs_tol=1e-5, verbose=False)

Run PMC updates and check for convergence using the change of the log likelihood of the current and the last step. Convergence is not declared if the likelihood decreased or if components are removed.

Return the number of iterations at convergence, or None.

The output density can be accessed via self.density.

Parameters:
  • iterations – Maximum number of updates.

  • prune – Call MixtureDensity.prune after each update; i.e., remove components whose component weight is below the threshold. Set prune=0 (default) to deactivate.

  • rel_tol

    Relative tolerance \(\epsilon\). If two consecutive values of the log likelihood, \(L_t, L_{t-1}\), are close, declare convergence. More precisely, check that

    \[\left\| \frac{L_t - L_{t-1}}{L_t} \right\| < \epsilon .\]

  • abs_tol

    Absolute tolerance \(\epsilon_{a}\). If the current log likelihood \(L_t\) is close to zero, (\(L_t < \epsilon_{a}\)), declare convergence if

    \[\| L_t - L_{t-1} \| < \epsilon_a .\]

pypmc.mix_adapt.pmc.gaussian_pmc(ndarray samples, density, weights=None, latent=None, rb=True, mincount=0, copy=True)

Adapt a Gaussian mixture density using the (M-)PMC algorithm according to [Cap+08] and [Kil+09] and return the updated density.

Parameters:
  • samples – Matrix-like array; the samples to be used for the PMC run.

  • densityMixtureDensity with Gauss components; the density which proposed the samples and shall be updated.

  • weights – Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.

  • latent – Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.

  • rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.

  • mincount

    Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.

    Important

    Only possible if latent is provided.

  • copy – Bool; If True (default), the parameter density remains untouched. Otherwise, density is overwritten by the adapted density.

pypmc.mix_adapt.pmc.student_t_pmc(ndarray samples, density, weights=None, latent=None, rb=True, dof_solver_steps=100, mindof=1e-5, maxdof=1e3, mincount=0, copy=True)

Adapt a Student t mixture density using the (M-)PMC algorithm according to [Cap+08], [Kil+09], and [HOD12] and return the updated density.

Parameters:
  • samples – Matrix-like array; the samples to be used for the PMC run.

  • densityMixtureDensity with StundentT components; the density which proposed the samples and shall be updated.

  • weights – Vector-like array of floats; The (unnormalized) importance weights. If not given, assume all samples have equal weight.

  • latent – Vector-like array of integers, optional; the latent variables (indices) of the generating components for each sample.

  • rb – Bool; If True, the component which proposed a sample is considered as a latent variable (unknown). This implements the Rao-Blackwellized algorithm. If False, each sample only updates its responsible component. This non-Rao-Blackwellized scheme is faster but only an approximation.

  • dof_solver_steps

    Integer; If 0, the Student t’s degrees of freedom are not updated, otherwise an iterative algorithm is run for at most dof_solver_steps steps.

    Note

    There is no closed form solution for the optimal degree of freedom. If dof_solver_steps is not 0, len(density) first order equations must be solved numerically which can take a while.

  • maxdof (mindof,) – Float; Degree of freedom adaptation is a one dimentional root finding problem. The numerical root finder used in this function (scipy.optimize.brentq) needs an interval where to search.

  • mincount

    Integer; The minimum number of samples a component has to generate in order not to be ignored during updates. A value of zero (default) disables this feature. The motivation is that components with very small weight generate few samples, so the updates become unstable and it is more efficient to simply assign weight zero.

    Important

    Only possible if latent is provided.

  • copy – Bool; If True (default), the parameter density remains untouched. Otherwise, density is overwritten by the adapted density.

6.3.4. Gelman-Rubin R-value

Functions associated with the Gelman-Rubin R value [GR92].

pypmc.mix_adapt.r_value.make_r_gaussmix(data, K_g=15, critical_r=2.0, indices=None, approx=False)[source]

Use data from multiple “Iterative Simulations” (e.g. Markov Chains) to form a Gaussian Mixture. This approach refers to the “long patches” in [BC13].

The idea is to group chains according to their R-value as in r_group and form K_g Gaussian Components per chain group. Once the groups are found by r_group, the data from each chain group is partitioned into K_g parts (using pypmc.tools.partition). For each of these parts a Gaussian with its empirical mean and covariance is created.

Return a pypmc.density.mixture.MixtureDensity with pypmc.density.gauss.Gauss components.

See also

make_r_tmix

Parameters:
  • data

    Iterable of matrix-like arrays; the individual items are interpreted as points from an individual chain.

    Important

    Every chain must bring the same number of points.

  • K_g – Integer; the number of components per chain group.

  • critical_r – Float; the maximum R value a chain group may have.

  • indices – Integer; Iterable of Integers; use R value in these dimensions only. Default is all.

Note

If K_g is too large, some covariance matrices may not be positive definite. Reduce K_g or increase len(data)!

Parameters:

approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.

pypmc.mix_adapt.r_value.make_r_tmix(data, K_g=15, critical_r=2.0, dof=5.0, indices=None, approx=False)[source]

Use data from multiple “Iterative Simulations” (e.g. Markov Chains) to form a Student t Mixture. This approach refers to the “long patches” in [BC13].

The idea is to group chains according to their R-value as in r_group and form K_g Student t Components per chain group. Once the groups are found by r_group, the data from each chain group is partitioned into K_g parts (using pypmc.tools.partition). For each of these parts a Student t component with its empirical mean, covariance and degree of freedom is created.

Return a pypmc.density.mixture.MixtureDensity with pypmc.density.student_t.StudentT components.

See also

make_r_gaussmix

Parameters:
  • data

    Iterable of matrix-like arrays; the individual items are interpreted as points from an individual chain.

    Important

    Every chain must bring the same number of points.

  • K_g – Integer; the number of components per chain group.

  • critical_r – Float; the maximum R value a chain group may have.

  • dof – Float; the degree of freedom the components will have.

  • indices – Integer; Iterable of Integers; use R value in these dimensions only. Default is all.

  • approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.

pypmc.mix_adapt.r_value.r_group(means, variances, n, critical_r=2.0, approx=False)[source]

Group m (Markov) chains whose common r_value is less than critical_r in each of the D dimensions.

Parameters:
  • means – (m x D) Matrix-like array; the mean value estimates.

  • variances – (m x D) Matrix-like array; the variance estimates.

  • n – Integer; the number of samples used to determine the estimates passed via means and variances.

  • critical_r – Float; group the chains such that their common R value is below critical_r.

  • approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.

pypmc.mix_adapt.r_value.r_value(means, variances, n, approx=False)[source]

Calculate the Gelman-Rubin R value (Chapter 2.2 in [GR92]).

The R value can be used to quantify mixing of “multiple iterative simulations” (e.g. Markov Chains) in parameter space. An R value “close to one” indicates that all chains explored the same region of the parameter.

Note

The R value is defined only in one dimension.

Parameters:
  • means – Vector-like array; the sample mean of each chain.

  • variances – Vector-like array; the sample variance of each chain.

  • n – Integer; the number of samples used to determine the estimates passed via means and variances.

  • approx – Bool; If False (default), calculate the R value as in [GR92]. If True, neglect the uncertainty induced by the sampling process.

6.4. Tools

Helper functions for general purposes

6.4.1. Convergence diagnostics

Provide functions to rate the quality of weighted samples.

pypmc.tools.convergence.ess(weights)[source]

Calculate the normalized effective sample size \(ESS\) [LC95] of samples with weights \(\omega_i\). \(ESS=0\) is terrible and \(ESS=1\) is perfect.

\[ESS = \frac{1}{1+C^2}\]

where

\[C^2 = \frac{1}{N} \sum_{i=1}^N (N \bar{\omega}_i - 1)^2\]
\[\bar{\omega}_i = \frac{\omega_i}{\sum_i \omega_i}\]
Parameters:

weights – Vector-like array; the samples’ weights

pypmc.tools.convergence.perp(weights)[source]

Calculate the normalized perplexity \(\mathcal{P}\) of samples with weights \(\omega_i\). \(\mathcal{P}=0\) is terrible and \(\mathcal{P}=1\) is perfect.

\[\mathcal{P} = exp(H) / N\]

where

\[H = - \sum_{i=1}^N \bar{\omega}_i log ~ \bar{\omega}_i\]
\[\bar{\omega}_i = \frac{\omega_i}{\sum_i \omega_i}\]
Parameters:

weights – Vector-like array; the samples’ weights

6.4.2. History

class pypmc.tools.History(dim, prealloc=1)[source]

Bases: object

Save a history of 1d-arrays. Each call to append is counted as a new “run”.

Parameters:
  • dim – Integer; the length of 1d-arrays to be saved.

  • prealloc – Integer; indicates for how many points memory is allocated in advance. When more memory is needed, it will be allocated on demand.

Access:

self[run_nr] and self[run_begin:run_end] return one array that includes the samples for the runs specified (excluding run_end).

Warning

Index access returns a reference. Modification changes the history.

Hint

Negative numbers are supported, for example self[-1] returns the latest run.

Example:
>>> h = History(2)
>>> for i in range(2):
>>>     a = h.append(i+1)
>>>     a[:] = i+1
>>> h[0] # first run
array([[ 1.,  1.]])
>>> h[1] # second run
array([[ 2.,  2.],
       [ 2.,  2.]])
>>> h[:] # entire history
array([[ 1.,  1.],
       [ 2.,  2.],
       [ 2.,  2.]])
>>> len(h) # number of runs
2
append(new_points_len)[source]

Allocate memory for a new run and return a reference to that memory wrapped in an array of size (new_points_len, self.dim).

Parameters:

new_points_len – Integer; the number of points to be stored in the target memory.

clear()[source]

Deletes the history

6.4.3. Indicator

Collect generators of typical indicator functions.

pypmc.tools.indicator.ball(center, radius=1.0, bdy=True)[source]

Returns the indicator function of a ball.

Parameters:
  • center

    A vector-like numpy array, defining the center of the ball.

    len(center) fixes the dimension.

  • radius – Float or int, the radius of the ball

  • bdy – Bool, When x is at the ball’s boundary then ball_indicator(x) returns True if and only if bdy=True.

pypmc.tools.indicator.hyperrectangle(lower, upper, bdy=True)[source]

Returns the indicator function of a hyperrectangle.

Parameters:
  • lower

    Vector-like numpy array, defining the lower boundary of the hyperrectangle.

    len(lower) fixes the dimension.

  • upper – Vector-like numpy array, defining the upper boundary of the hyperrectangle.

  • bdy – Bool. When x is at the hyperrectangles’s boundary then hr_indicator(x) returns True if and only if bdy=True.

pypmc.tools.indicator.merge_function_with_indicator(function, indicator, alternative)[source]

Returns a function such that a call to it is equivalent to:

if indicator(x):

return function(x)

else:

return alternative

Note that function is not called if indicator evaluates to False.

Parameters:
  • function – The function to be called when indicator returns True.

  • indicator – Bool-returning function; the indicator

  • alternative – The object to be returned when indicator returns False

6.4.4. Parallel sampler

Run sampling algorithms in parallel using mpi4py

class pypmc.tools.parallel_sampler.MPISampler(sampler_type, comm=MPI.COMM_WORLD, mpi_tag=0, *args, **kwargs)[source]

An MPI4Py parallelized sampler. Parallelizes any pypmc.sampler.

Parameters:
  • sampler_type – A class defined in pypmc.sampler; the class of the sampler to be run in parallel. Example: sampler_type=ImportanceSampler.

  • commmpi4py communicator; the communicator to be used.

  • kwargs (args,) – Additional arguments which are passed to the constructor of sampler_type.

6.4.5. Partition

pypmc.tools.partition(N, k)[source]

Distribute N into k parts such that each part takes the value N//k or N//k + 1 where // denotes integer division; i.e., perform the minimal lexicographic integer partition.

Example: N = 5, k = 2 –> return [3, 2]

pypmc.tools.patch_data(data, L=100, try_diag=True, verbose=False)[source]

Patch data (for example Markov chain output) into parts of length L. Return a Gaussian mixture where each component gets the empirical mean and covariance of one patch.

Parameters:
  • data – Matrix-like array; the points to be patched. Expect data[i] as the d-dimensional i-th point.

  • L – Integer; the length of one patch. The last patch will be shorter if L is not a divisor of len(data).

  • try_diag – Bool; If some patch does not define a proper covariance matrix, it cannot define a Gaussian component. try_diag defines how to handle that case: If True (default), the off-diagonal elements are set to zero and it is tried to form a Gaussian with that matrix again. If that fails as well, the patch is skipped. If False the patch is skipped directly.

6.4.6. Plot

pypmc.tools.plot_mixture(mixture, i=0, j=1, center_style={'s': 0.15}, cmap='nipy_spectral', cutoff=0.0, ellipse_style={'alpha': 0.3}, solid_edge=True, visualize_weights=False)[source]

Plot the (Gaussian) components of the mixture density as one-sigma ellipses in the (i,j) plane.

Parameters:
  • center_style – If a non-empty dict, plot mean value with the style passed to scatter.

  • cmap – The color map to which components are mapped in order to choose their face color. It is unaffected by the cutoff. The meaning depends on visualize_weights.

  • cutoff – Ignore components whose weight is below the cut off.

  • ellipse_style – Passed on to define the properties of the Ellipse.

  • solid_edge – Draw the edge of the ellipse as solid opaque line.

  • visualize_weights – Colorize the components according to their weights if True. One can do plt.colorbar() after this function and the bar allows to read off the weights. If False, coloring is based on the component index and the total number of components. This option makes it easier to track components by assigning them the same color in subsequent calls to this function.

pypmc.tools.plot_responsibility(data, responsibility, cmap='nipy_spectral')[source]

Classify the 2D data according to the responsibility and make a scatter plot of each data point with the color of the component it is most likely from. The responsibility is normalized internally such that each row sums to unity.

Parameters:
  • data – matrix-like; one row = one 2D sample

  • responsibility – matrix-like; one row = probabilities that sample n is from 1st, 2nd, … component. The number of rows has to agree with data

  • cmap – colormap; defines how component indices are mapped to the color of the data points