{ "cells": [ { "cell_type": "markdown", "id": "116bcc5c", "metadata": {}, "source": [ "# Example\n", "\n", "In this section, we show a specific application of `GaugeFixer` to study the local sequence requirements around different peaks in the Shine-Dalgarno sequence landscape as described in [Martí-Gómez et al. (2026)](https://www.biorxiv.org/content/10.64898/2025.12.08.693054v2)." ] }, { "cell_type": "code", "execution_count": 1, "id": "23db982a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import logomaker\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "from itertools import combinations, product\n", "from plot import heatmap_pairwise\n", "from scipy.stats import pearsonr\n", "from gaugefixer import AllOrderModel\n", "from gaugefixer.utils import get_subsets_of_set, get_orbits_features" ] }, { "cell_type": "markdown", "id": "ab3054d0", "metadata": {}, "source": [ "### Defining an all-order model for the Shine-Dalgarno sequence\n", "\n", "In this case, we use a previously inferred complete combinatorial landscape by [Martí-Gómez et al. (2026)](https://academic.oup.com/mbe/article/43/2/msag023/8456298) using high-throughput experimental data collected by [Kuo et al. (2020)](https://genome.cshlp.org/content/30/5/711) on nearly every possible 9-nucleotide sequence in the 5'UTR of the dmsC gene in *E. coli*. \n", "\n", "We start by loading the inferred landscape" ] }, { "cell_type": "code", "execution_count": 2, "id": "1ef8bc62", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "object", "type": "string" }, { "name": "f", "rawType": "float64", "type": "float" } ], "ref": "7d72ac85-df93-44e9-bb26-9ab116789062", "rows": [ [ "AAAAAAAAA", "0.5638568624854088" ], [ "AAAAAAAAC", "0.7015107274055481" ], [ "AAAAAAAAG", "0.6191882789134979" ], [ "AAAAAAAAU", "0.6287718787789345" ], [ "AAAAAAACA", "0.594683438539505" ], [ "AAAAAAACC", "0.5868444815278053" ], [ "AAAAAAACG", "0.6727708727121353" ], [ "AAAAAAACU", "0.6478897631168365" ], [ "AAAAAAAGA", "0.6396320313215256" ], [ "AAAAAAAGC", "0.5802233070135117" ], [ "AAAAAAAGG", "0.6302464604377747" ], [ "AAAAAAAGU", "0.6269779577851295" ], [ "AAAAAAAUA", "0.5150142461061478" ], [ "AAAAAAAUC", "0.6858302354812622" ], [ "AAAAAAAUG", "0.72210294008255" ], [ "AAAAAAAUU", "0.6273685991764069" ], [ "AAAAAACAA", "0.6003869958221912" ], [ "AAAAAACAC", "0.6278582885861397" ], [ "AAAAAACAG", "0.4741915259510278" ], [ "AAAAAACAU", "0.5944587886333466" ], [ "AAAAAACCA", "0.5504545196890831" ], [ "AAAAAACCC", "0.5203661322593689" ], [ "AAAAAACCG", "0.5586934387683868" ], [ "AAAAAACCU", "0.5673647373914719" ], [ "AAAAAACGA", "0.6045751720666885" ], [ "AAAAAACGC", "0.6105850348249078" ], [ "AAAAAACGG", "0.4990684352815151" ], [ "AAAAAACGU", "0.6573664750903845" ], [ "AAAAAACUA", "0.5626135319471359" ], [ "AAAAAACUC", "0.5640361309051514" ], [ "AAAAAACUG", "0.6872833054512739" ], [ "AAAAAACUU", "0.6237415429204702" ], [ "AAAAAAGAA", "0.6685855761170387" ], [ "AAAAAAGAC", "0.6549443751573563" ], [ "AAAAAAGAG", "0.6668623015284538" ], [ "AAAAAAGAU", "0.5943528115749359" ], [ "AAAAAAGCA", "0.5456972662359476" ], [ "AAAAAAGCC", "0.4914389997720718" ], [ "AAAAAAGCG", "0.6260806508362293" ], [ "AAAAAAGCU", "0.5608344525098801" ], [ "AAAAAAGGA", "0.6515529751777649" ], [ "AAAAAAGGC", "0.7578512728214264" ], [ "AAAAAAGGG", "0.4236851334571838" ], [ "AAAAAAGGU", "0.7181397974491119" ], [ "AAAAAAGUA", "0.5852416604757309" ], [ "AAAAAAGUC", "0.716905027627945" ], [ "AAAAAAGUG", "0.6548958718776703" ], [ "AAAAAAGUU", "0.6019890904426575" ], [ "AAAAAAUAA", "0.5940462350845337" ], [ "AAAAAAUAC", "0.7444006204605103" ] ], "shape": { "columns": 1, "rows": 262144 } }, "text/html": [ "
| \n", " | f | \n", "
|---|---|
| AAAAAAAAA | \n", "0.563857 | \n", "
| AAAAAAAAC | \n", "0.701511 | \n", "
| AAAAAAAAG | \n", "0.619188 | \n", "
| AAAAAAAAU | \n", "0.628772 | \n", "
| AAAAAAACA | \n", "0.594683 | \n", "
| ... | \n", "... | \n", "
| UUUUUUUGU | \n", "0.615910 | \n", "
| UUUUUUUUA | \n", "0.539058 | \n", "
| UUUUUUUUC | \n", "0.548782 | \n", "
| UUUUUUUUG | \n", "0.536610 | \n", "
| UUUUUUUUU | \n", "0.564960 | \n", "
262144 rows × 1 columns
\n", "| \n", " | -13 | \n", "-12 | \n", "-11 | \n", "-10 | \n", "-9 | \n", "orbit | \n", "subseq | \n", "
|---|---|---|---|---|---|---|---|
| ((), ) | \n", "1.861897 | \n", "2.186185 | \n", "2.211720 | \n", "1.906598 | \n", "1.101074 | \n", "() | \n", "\n", " |
| ((0,), A) | \n", "0.000000 | \n", "0.125384 | \n", "0.225124 | \n", "0.180145 | \n", "0.095675 | \n", "(0,) | \n", "A | \n", "
| ((0,), C) | \n", "-0.646397 | \n", "-0.382700 | \n", "-0.500224 | \n", "-0.274297 | \n", "-0.256858 | \n", "(0,) | \n", "C | \n", "
| ((0,), G) | \n", "0.149598 | \n", "0.049337 | \n", "0.161530 | \n", "-0.020173 | \n", "0.140285 | \n", "(0,) | \n", "G | \n", "
| ((0,), U) | \n", "0.000753 | \n", "0.207979 | \n", "0.113571 | \n", "0.114324 | \n", "0.020898 | \n", "(0,) | \n", "U | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| ((0, 1, 2, 3, 4, 5, 6, 7, 8), UUUUUUUGU) | \n", "0.104381 | \n", "-0.015834 | \n", "0.085375 | \n", "0.000000 | \n", "0.019661 | \n", "(0, 1, 2, 3, 4, 5, 6, 7, 8) | \n", "UUUUUUUGU | \n", "
| ((0, 1, 2, 3, 4, 5, 6, 7, 8), UUUUUUUUA) | \n", "0.020460 | \n", "0.006739 | \n", "0.019462 | \n", "0.061911 | \n", "0.009043 | \n", "(0, 1, 2, 3, 4, 5, 6, 7, 8) | \n", "UUUUUUUUA | \n", "
| ((0, 1, 2, 3, 4, 5, 6, 7, 8), UUUUUUUUC) | \n", "0.049570 | \n", "0.081591 | \n", "0.018703 | \n", "0.046591 | \n", "-0.016188 | \n", "(0, 1, 2, 3, 4, 5, 6, 7, 8) | \n", "UUUUUUUUC | \n", "
| ((0, 1, 2, 3, 4, 5, 6, 7, 8), UUUUUUUUG) | \n", "-0.027452 | \n", "-0.080011 | \n", "0.024590 | \n", "0.010537 | \n", "0.000000 | \n", "(0, 1, 2, 3, 4, 5, 6, 7, 8) | \n", "UUUUUUUUG | \n", "
| ((0, 1, 2, 3, 4, 5, 6, 7, 8), UUUUUUUUU) | \n", "-0.042578 | \n", "-0.008319 | \n", "-0.062755 | \n", "-0.119039 | \n", "-0.012198 | \n", "(0, 1, 2, 3, 4, 5, 6, 7, 8) | \n", "UUUUUUUUU | \n", "
1953125 rows × 7 columns
\n", "| \n", " | -13 | \n", "-12 | \n", "-11 | \n", "-10 | \n", "-9 | \n", "orbit | \n", "subseq | \n", "k | \n", "
|---|---|---|---|---|---|---|---|---|
| ((), ) | \n", "1.861897 | \n", "2.186185 | \n", "2.211720 | \n", "1.906598 | \n", "1.101074 | \n", "() | \n", "\n", " | 0 | \n", "
| ((0,), A) | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "(0,) | \n", "A | \n", "1 | \n", "
| ((0,), C) | \n", "-0.646397 | \n", "-0.966215 | \n", "-0.780846 | \n", "-0.678776 | \n", "-0.501391 | \n", "(0,) | \n", "C | \n", "1 | \n", "
| ((0,), G) | \n", "0.149598 | \n", "-0.152077 | \n", "-0.269654 | \n", "-0.416939 | \n", "-0.221410 | \n", "(0,) | \n", "G | \n", "1 | \n", "
| ((0,), U) | \n", "0.000753 | \n", "-0.699675 | \n", "-0.522361 | \n", "-0.472846 | \n", "-0.368357 | \n", "(0,) | \n", "U | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| ((0, 1, 2, 3, 4), UUUGU) | \n", "-0.036878 | \n", "-0.095886 | \n", "0.384084 | \n", "0.153701 | \n", "-0.012828 | \n", "(0, 1, 2, 3, 4) | \n", "UUUGU | \n", "5 | \n", "
| ((0, 1, 2, 3, 4), UUUUA) | \n", "0.222282 | \n", "-0.106347 | \n", "0.074687 | \n", "0.016267 | \n", "-0.060240 | \n", "(0, 1, 2, 3, 4) | \n", "UUUUA | \n", "5 | \n", "
| ((0, 1, 2, 3, 4), UUUUC) | \n", "0.040322 | \n", "-0.345761 | \n", "-0.079473 | \n", "-0.121644 | \n", "-0.042035 | \n", "(0, 1, 2, 3, 4) | \n", "UUUUC | \n", "5 | \n", "
| ((0, 1, 2, 3, 4), UUUUG) | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "(0, 1, 2, 3, 4) | \n", "UUUUG | \n", "5 | \n", "
| ((0, 1, 2, 3, 4), UUUUU) | \n", "-0.133842 | \n", "-0.119921 | \n", "0.181592 | \n", "0.015053 | \n", "0.109169 | \n", "(0, 1, 2, 3, 4) | \n", "UUUUU | \n", "5 | \n", "
3125 rows × 8 columns
\n", "| \n", " | -13 | \n", "-12 | \n", "-11 | \n", "-10 | \n", "-9 | \n", "orbit | \n", "subseq | \n", "k | \n", "
|---|---|---|---|---|---|---|---|---|
| ((), ) | \n", "1.861897 | \n", "2.186185 | \n", "2.21172 | \n", "1.906598 | \n", "1.101074 | \n", "() | \n", "\n", " | 0 | \n", "