sincei.ExponentialFamily module

sincei.ExponentialFamily module#

class sincei.ExponentialFamily.ExponentialFamily(family_params=None, **kwargs)[source]#

Bases: object

Encodes an exponential family distribution using PyTorch autodiff structures.

ExponentialFamily corresponds to the superclass providing a backbone for a subclass for any exponential family distribution. Each subclass should contain the following methods, defined based on the distribution of choice (same notation as in Mourragui et al, 2023):

  • sufficient_statistics (\(T\))

  • natural_parametrization (\(\eta\))

  • log_partition (\(A\))

  • invert_g (\(g^{-1}\))

  • initialize_family_parameters: computes parameters used in other methods, e.g., gene-level dispersion for Negative Binomial.

We added a "base_measure" for the sake of completeness, but this method is not necessary for running GLM-PCA. The log-likelihood and exponential term are defined directly from the aforementionned methods.

Parameters#

family_namestr

Name of the family.

class sincei.ExponentialFamily.Gaussian(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Gaussian with standard deviation one.

GLMPCA with Gaussian as family is equivalent to the standard PCA.

class sincei.ExponentialFamily.Bernoulli(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Bernoulli distribution

family_params of interest:
  • "max_val" (int) corresponding to the max value (replaces infinity). Empirically, values above 10 yield similar results.

class sincei.ExponentialFamily.Poisson(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Poisson distribution

family_params of interest:
  • "min_val" (int) corresponding to the min value (replaces 0).

class sincei.ExponentialFamily.Beta(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Beta distribution, using a standard formulation.

Original formulation presented in [Mourragui et al, 2023].

family_params of interest:
  • "min_val" (int): min data value (replaces 0 and 1).

  • "n_jobs" (int): number of jobs, specifically for computing the "nu" parameter.

  • "method" (str): method use to compute the "nu" parameter per feature. Two possibles: "MLE" and "MM". Defaults to "MLE".

  • "eps" (float): minimum difference used for inverting the g function. Defaults to 1e-4

  • "maxiter" (int): maximum number of iterations for the inversion of the g function. Defaults to 100.

class sincei.ExponentialFamily.SigmoidBeta(family_params=None, **kwargs)[source]#

Bases: Beta

Beta distribution re-parametrized using a Sigmoid.

This distribution is similar to the previous Beta (which it inherits from) but the natural parameter is re-parametrized using a Sigmoid. This is shown expeerimentally to stabilize the optimisation by removing the ]0,1[ constraint.

family_params of interest:
  • "min_val" (int): min data value (replaces 0 and 1).

  • "n_jobs" (int): number of jobs, specifically for computing the "nu" parameter.

  • "method" (str): method use to compute the "nu" parameter per feature. Two possibles: "MLE" and "MM". Defaults to "MLE".

  • "eps" (float): minimum difference used for inverting the g function. Defaults to 1e-4

  • "maxiter" (int): maximum number of iterations for the inversion of the g function. Defaults to 100.

class sincei.ExponentialFamily.Gamma(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Gamma distribution using a standard formulation.

Original formulation presented in [Mourragui et al, 2023].

family_params of interest:
  • "min_val" (int): min data value. Defaults to 1e-5.

  • "max_val" (int): max data value. Defaults to 1e7.

  • "n_jobs" (int): number of jobs, specifically for computing the "nu" parameter.

  • "method" (str): method use to compute the "nu" parameter per feature. Two possibles: "MLE" and "MM". Defaults to "MLE".

  • "eps" (float): minimum difference used for inverting the g function. Defaults to 1e-4

  • "maxiter" (int): maximum number of iterations for the inversion of the g function. Defaults to 100.

class sincei.ExponentialFamily.LogNormal(family_params=None, **kwargs)[source]#

Bases: ExponentialFamily

Log-normal distribution using a standard formulation.

Original formulation presented in [Mourragui et al, 2023].

family_params of interest:
  • "min_val" (int): min data value. Defaults to 1e-5.

  • "max_val" (int): max data value. Defaults to 1e7.

  • "n_jobs" (int): number of jobs, specifically for computing the "nu" parameter.

  • "method" (str): method use to compute the "nu" parameter per feature. Two possibles: "MLE" and "MM". Defaults to "MLE".

  • "eps" (float): minimum difference used for inverting the g function. Defaults to 1e-4

  • "maxiter" (int): maximum number of iterations for the inversion of the g function. Defaults to 100.