Database Schema

Last updated 24 days ago

FPbase is a database designed specifically for fluorescent proteins. The goal is to come up with a database design that can categorize the majority of the many subtle properties that fluorescent proteins can possess. If you have suggestions for ways to extend the database model to incorporate additional properties, feel free to contact us.

The database schema determines the fields and information that can be stored for any given object in the database, and how the objects relate to each other. The schema changes periodically to accommodate new fields, or to add features to the site. A graphical representation of the current database schema is shown below. For a definitions of terms: see the glossary.

a simplified graphic of the database schema (Jan 2019)

Items in bold code correspond to database objects:

Proteins are represented with a name, common aliases, amino acid sequence, external accession IDs (currently GenBank, UniProt, Protein Data Bank, and NCBI Identical Protein Groups), aggregation type, photochromicity/switch type, and required cofactors (e.g. biliverdin). Every protein has a primary reference (that introduced the protein), additional references ( limited to those that further characterize the protein), and the NCBI taxonomy ID for the parental organism from which the protein was evolved (e.g. 6100 for Aequorea victoria). Protein lineages are stored as recursively as parent-child relationships between two proteins, along with the mutation that generates the child sequence from the parent. Excerpts are snippets of text from references that convey key information about a protein that is otherwise difficult to capture within the current database schema, and appear on both the corresponding protein and reference pages.

Each protein can have one or more states that represent the protein in a certain intrinsic condition (e.g. pre/post photoactivation or photoconversion) or under a certain environmental condition (e.g. pH, calcium, etc.). States have typical fluorescence characteristics such as excitation maxima and emission maxima, extinction coefficient, quantum yield, pKa, fluorescence lifetime, and full spectra data (stored as an array of wavelength/value pairs, along with metadata describing, for instance, the pH or solvent under which the spectra was measured). Transitions represent conversions between two states in response to some stimulus, such as irradiation with a certain wavelength. Bleach measurements capture information about the photostability of a given protein state under one set of experimental conditions (such as microscope modality, light source and filter spectra, illumination power, temperature, fusion protein, cell type, etc.) along with the reference that made the measurement. OSER measurements are quantifications of the monomericity of the protein (Costantini et al 2012), taken from a given reference. References are stored as DOIs, and corresponding metadata (such as title, authors, date, journal, etc.) is pulled from Crossref. Every reference and author object has a dedicated page showing the corresponding proteins or references, respectively, attributed to that object.

When users register for an account at FPbase (which can be done either directly through FPbase, or using OAuth 2.0 authentication through Google or Twitter), a user object is created. Registered users can create protein collections and microscopes. Microscopes are stored as collections of optical configurations, each of which comprises a set of filters, light source, and camera, all of which are associated with a spectral data object.