What is fpseq?

Last updated 25 days ago

The fpseq package provides tools for working with Fluorescent Protein sequences and mutations, and has a simple python function for retrieving FP sequences from fpbase.org. It forms the basis of sequence analysis on FPbase, but can be used independently (as a basic way to grab and compare/mutate FPsequences using the same HGVS notation that is typically used in papers).

Source code at Github:

Example usage:

In [1]: from fpseq import from_fpbase
# retrieve sequence from FPbase.org
In [2]: avGFP = from_fpbase('avgfp')
In [3]: avGFP # sequences from FPbase.org
Out[3]:
Protein
------------------------------------------------------
MSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT
GKLPVPWPTL VTTFSYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF
KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV
YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY
LSTQSALSKD PNEKRDHMVL LEFVTAAGIT HGMDELYK
In [4]: EGFP = from_fpbase('egfp')
# calculate HGVS mutation string
In [6]: avGFP.mutations_to(EGFP)
Out[6]: <MutationSet: M1_S2insV/F64L/S65T/H231L>
In [7]: mEGFP = from_fpbase('megfp')
In [8]: EGFP.mutations_to(mEGFP)
Out[8]: <MutationSet: A207K>
# A207K does not match the literature, because of V1a...
# use reference parameter to enforce
# position numbering relative to avGFP
In [9]: EGFP.mutations_to(mEGFP, reference=avGFP)
Out[9]: <MutationSet: A206K>
In [10]: mCherry = from_fpbase('mcherry')
# attempt to apply the ‘mCherry2’ mutation
# reported in Shen et al. (2017) throws an error
# because the positions do not align with mCherry sequence
In [11]: newseq = mCherry.mutate('K92N/K138C/K139R/S147T/N196D/T202L')
---------------------------------------------------------------------
SequenceMismatch: Mutation K138C does not align with the parent seq: PSD>G<PVM...
But a match was found 5 positions away: K97N/K143C/K144R/S152T/N201D/T207L
# use correct_offset to apply a shift to the mutation set, if a match is found
In [12]: newseq, offset = mCherry.mutate('K92N/K138C/K139R/S147T/N196D/T202L', correct_offset=True)
UserWarning: An offset of 5 amino acids was detected between the sequence and the mutation set, and automatically corrected
In [13]: newseq == from_fpbase('mcherry2') # sequence equivalence checks
Out[13]: True