API Reference

Linear Models

class gaugefixer.HierarchicalModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, generating_orbits: list[tuple] | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

Linear hierarchical model for sequence-function relationships

Parameters:
alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.AllOrderModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

All-order model for sequence-function relationships

Parameters:
alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, lda: float | np.ndarray | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for the model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

  • 'trivial': No gauge fixing (lambda=0).

  • 'euclidean': Euclidean gauge fixing (lambda=1).

  • 'equitable': Equitable gauge fixing (lambda=n_alleles).

  • None: Custom gauge fixing with provided lambda and pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

ldafloat, np.ndarray or None, optional

Lambda parameter, which controls how much variance should be explained by the lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
theta_fixedpd.Series of shape (n_features,)

Gauge-fixed parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_landscape(f: pandas.Series) None

Define the model parameters theta from function values.

Parameters:
fpd.Series

Landscape values indexed by sequences.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.KorderModel(K: int, alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

K-order model for sequence-function relationships

Parameters:
Kint

The order of the model (size of interaction terms).

alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.PairwiseModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

Pairwise model for sequence-function relationships

Parameters:
alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.AdditiveModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

Additive model for sequence-function relationships

Parameters:
alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.KadjacentModel(K: int, alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

K-adjacent model for sequence-function relationships

Parameters:
Kint

The order of the model (size of interaction terms).

alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.

class gaugefixer.NeighborModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)

Neighbor model for sequence-function relationships

Parameters:
Kint

The order of the model (size of interaction terms).

alphabetlist of str, optional

The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.

alphabet_namestr, optional

Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.

alphabet_listlist of list of str, optional

List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.

Lint or None, optional

The length of the sequences for which features are being generated.

positionslist of int, optional

Positions to use if different from range(self.L). If positions=None, all positions are included.

thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

__call__(seqs: list[str]) numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:
seqslist of str

List of input sequences. All sequences must be of length L.

Returns:
fpd.Series

Landscape values indexed by sequences.

get_features() list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:
featureslist[tuple]

Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:
gaugestr or None, optional

Specifies the type of gauge fixing to apply:

  • 'wild-type': Fix parameters relative to a wild-type sequence.

  • 'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).

  • 'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

get_generating_orbits() list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:
orbitslist of tuple, optional

Model’s set of orbits.

get_params() pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:
thetapd.Series of shape (n_features,), optional

Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) None

Define the values of the model parameters theta.

Parameters:
thetapd.Series of shape (n_features,)

Model parameters indexed by features.

set_random_params() None

Initialize model with random values of the parameters.