API Reference
Linear Models
- class gaugefixer.HierarchicalModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, generating_orbits: list[tuple] | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
Linear hierarchical model for sequence-function relationships
- Parameters:
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.AllOrderModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
All-order model for sequence-function relationships
- Parameters:
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, lda: float | np.ndarray | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for the model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.'trivial': No gauge fixing (lambda=0).'euclidean': Euclidean gauge fixing (lambda=1).'equitable': Equitable gauge fixing (lambda=n_alleles).None: Custom gauge fixing with providedlambdaandpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- ldafloat, np.ndarray or None, optional
Lambda parameter, which controls how much variance should be explained by the lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- theta_fixedpd.Series of shape (n_features,)
Gauge-fixed parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_landscape(f: pandas.Series) None
Define the model parameters theta from function values.
- Parameters:
- fpd.Series
Landscape values indexed by sequences.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.KorderModel(K: int, alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
K-order model for sequence-function relationships
- Parameters:
- Kint
The order of the model (size of interaction terms).
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.PairwiseModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
Pairwise model for sequence-function relationships
- Parameters:
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.AdditiveModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
Additive model for sequence-function relationships
- Parameters:
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.KadjacentModel(K: int, alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
K-adjacent model for sequence-function relationships
- Parameters:
- Kint
The order of the model (size of interaction terms).
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.
- class gaugefixer.NeighborModel(alphabet: list[str] | None = None, alphabet_name: str | None = None, alphabet_list: list[list[str]] | None = None, L: int | None = None, positions: list[int] | None = None, theta: pd.Series | pd.DataFrame | None = None)
Neighbor model for sequence-function relationships
- Parameters:
- Kint
The order of the model (size of interaction terms).
- alphabetlist of str, optional
The set of possible characters that can appear in the sequences. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_namestr, optional
Name of a predefined alphabet to use, one of {“dna”, “rna”, “protein”, “binary”, “ternary”, “decimal”}. Either
alphabetoralphabet_namemust be provided, but not both.- alphabet_listlist of list of str, optional
List of alphabets, where each alphabet is a list of characters to sample from for that specific position. The length of
alphabet_listdetermines the sequence length. Cannot be used withalphabet,alphabet_name, orLparameters.- Lint or None, optional
The length of the sequences for which features are being generated.
- positionslist of int, optional
Positions to use if different from
range(self.L). Ifpositions=None, all positions are included.- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- __call__(seqs: list[str]) numpy.ndarray
Evaluate the model at specific input sequences.
- Parameters:
- seqslist of str
List of input sequences. All sequences must be of length
L.
- Returns:
- fpd.Series
Landscape values indexed by sequences.
- get_features() list[tuple]
Get the features of the model, using user-specified positions if provided.
- Returns:
- featureslist[tuple]
Model’s list of binary features.
- get_fixed_params(gauge: str | None = None, wt_seq: str | None = None, pi_lc: list[np.ndarray] | None = None, use_dense_matrix: bool = False) pd.Series | pd.DataFrame
Returns gauge-fixed parameters for a linear model.
- Parameters:
- gaugestr or None, optional
Specifies the type of gauge fixing to apply:
'wild-type': Fix parameters relative to a wild-type sequence.'zero-sum': Use uniform background frequencies (pi_lc=1/n_alleles).'hierarchical': Hierarchical gauge fixing with providedpi_lc.
- wt_seqstr or None, optional
Wild-type sequence for
gauge='wild-type'gauge fixing.- pi_lclist of np.ndarray or None, optional
Pi parameter, which specifies specifies the probability of each character at each position used when computing the variance explained by lower-order features.
- use_dense_matrix: bool
Fix the gauge building the explicit dense projection matrix. Implemented mainly for testing and benchmarking.
- Returns:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- get_generating_orbits() list[tuple]
Get the generating orbits of the hierarchical model, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Generating set of orbits, such that the model includes all features for all subsets of sites for each generating orbit.
- get_orbits() list[tuple]
Get the orbits of the encoder, using user-specified positions if provided.
- Returns:
- orbitslist of tuple, optional
Model’s set of orbits.
- get_params() pd.Series | pd.DataFrame
Returns parameters for a linear.
- Returns:
- thetapd.Series of shape (n_features,), optional
Model parameters indexed by features.
- set_params(theta: pd.Series | pd.DataFrame) None
Define the values of the model parameters theta.
- Parameters:
- thetapd.Series of shape (n_features,)
Model parameters indexed by features.
- set_random_params() None
Initialize model with random values of the parameters.