API Reference

Linear Models

Linear hierarchical model for sequence-function relationships

Parameters:

alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

All-order model for sequence-function relationships

Parameters:

alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

Returns gauge-fixed parameters for the model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.
'trivial': No gauge fixing (lambda=0).
'euclidean': Euclidean gauge fixing (lambda=1).
'equitable': Equitable gauge fixing (lambda=n_alleles).
None: Custom gauge fixing with provided lambda and pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

ldafloat, np.ndarray or None, optional

Lambda parameter, which controls how much variance should be explained by the lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

theta_fixedpd.Series of shape (n_features,): Gauge-fixed parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_landscape(f: pandas.Series) → None

Define the model parameters theta from function values.

Parameters:

fpd.Series: Landscape values indexed by sequences.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

K-order model for sequence-function relationships

Parameters:

Kint: The order of the model (size of interaction terms).
alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

Pairwise model for sequence-function relationships

Parameters:

alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

Additive model for sequence-function relationships

Parameters:

alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

K-adjacent model for sequence-function relationships

Parameters:

Kint: The order of the model (size of interaction terms).
alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.

Neighbor model for sequence-function relationships

Parameters:

Kint: The order of the model (size of interaction terms).
alphabetlist of str, optional: The set of possible characters that can appear in the sequences. Either alphabet or alphabet_name must be provided, but not both.
alphabet_namestr, optional: Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either alphabet or alphabet_name must be provided, but not both.
alphabet_listlist of list of str, optional: List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of alphabet_list determines the sequence length. Cannot be used with alphabet, alphabet_name, or L parameters.
Lint or None, optional: The length of the sequences for which features are being generated.
positionslist of int, optional: Positions to use if different from range(self.L). If positions=None, all positions are included.
thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

__call__(seqs: list[str]) → numpy.ndarray

Evaluate the model at specific input sequences.

Parameters:

seqslist of str: List of input sequences. All sequences must be of length L.

Returns:

fpd.Series: Landscape values indexed by sequences.

get_features() → list[tuple]

Get the features of the model, using user-specified positions if provided.

Returns:

featureslist[tuple]: Model’s list of binary features.

get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) → pd.Series | pd.DataFrame

Returns gauge-fixed parameters for a linear model.

Parameters:

gaugestr or None, optional

Specifies the type of gauge fixing to apply:

'wild-type': Fix parameters relative to a wild-type sequence.
'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).
'hierarchical': Hierarchical gauge fixing with provided pi_lc.

wt_seqstr or None, optional

Wild-type sequence for gauge='wild-type' gauge fixing.

pi_lclist of np.ndarray or None, optional

Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.

use_dense_matrix: bool

Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.

Returns:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

get_generating_orbits() → list[tuple]

Get the generating orbits of the hierarchical model, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.

get_orbits() → list[tuple]

Get the orbits of the encoder, using user-specified positions if provided.

Returns:

orbitslist of tuple, optional: Model’s set of orbits.

get_params() → pd.Series | pd.DataFrame

Returns parameters for a linear.

Returns:

thetapd.Series of shape (n_features,), optional: Model parameters indexed by features.

set_params(theta: pd.Series | pd.DataFrame) → None

Define the values of the model parameters theta.

Parameters:

thetapd.Series of shape (n_features,): Model parameters indexed by features.

set_random_params() → None: Initialize model with random values of the parameters.