Proteins are represented with a name, common aliases, amino acid sequence, external accession IDs (currently GenBank, UniProt, Protein Data Bank, and NCBI Identical Protein Groups), aggregation type, photochromicity/switch type, and required cofactors (e.g. biliverdin). Every protein has a primary reference (that introduced the protein), additional references ( limited to those that further characterize the protein), and the NCBI taxonomy ID for the parental organism from which the protein was evolved (e.g. 6100 for Aequorea victoria). Protein lineages are stored as recursively as parent-child relationships between two proteins, along with the mutation that generates the child sequence from the parent. Excerpts are snippets of text from references that convey key information about a protein that is otherwise difficult to capture within the current database schema, and appear on both the corresponding protein and reference pages.