Administrative Instructions under the Patent Cooperation Treaty
Annex C, Appendix 2
Nucleotide and Amino Acid Symbols and Feature Table
Table 6: List of Feature Keys Related to Protein Sequences
| key | description |
|---|---|
| CONFLICT | different papers report differing sequences |
| VARIANT | authors report that sequence variants exist |
| VARSPLIC | description of sequence variants produced by alternative splicing |
| MUTAGEN | site which has been experimentally altered |
| MOD_RES | post-translational modification of a residue |
| ACETYLATION | N-terminal or other |
| AMIDATION | generally at the C-terminal of a mature active peptide |
| BLOCKED | undetermined N- or C-terminal blocking group |
| FORMYLATION | of the N-terminal methionine |
| GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION |
of asparagine, aspartic acid, proline or lysine |
| METHYLATION | generally of lysine or arginine |
| PHOSPHORYLATION | of serine, threonine, tyrosine, aspartic acid or histidine |
| PYRROLIDONE CARBOXYLIC ACID |
N-terminal glutamate which has formed an internal cyclic lactam |
| SULFATATION | generally of tyrosine |
| LIPID | covalent binding of a lipidic moiety |
| MYRISTATE | myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue |
| PALMITATE | palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue |
| FARNESYL | farnesyl group attached through a thioether bond to a cysteine residue |
| GERANYL-GERANYL | geranyl-geranyl group attached through a thioether bond to a cysteine residue |
| GPI-ANCHOR | glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein |
| N-ACYL DIGLYCERIDE | N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages |
| DISULFID | disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link |
| THIOLEST | thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond |
| THIOETH | thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond |
| CARBOHYD | glycosylation site; the nature of the carbohydrate (if known) is given in the description field |
| METAL | binding site for a metal ion; the description field indicates the nature of the metal |
| BINDING | binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field |
| SIGNAL | extent of a signal sequence (prepeptide) |
| TRANSIT | extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody) |
| PROPEP | extent of a propeptide |
| CHAIN | extent of a polypeptide chain in the mature protein |
| PEPTIDE | extent of a released active peptide |
| DOMAIN | extent of a domain of interest on the sequence; the nature of that domain is given in the description field |
| CA_BIND | extent of a calcium-binding region |
| DNA_BIND | extent of a DNA-binding region |
| NP_BIND | extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field |
| TRANSMEM | extent of a transmembrane region |
| ZN_FING | extent of a zinc finger region |
| SIMILAR | extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field |
| REPEAT | extent of an internal sequence repetition |
| HELIX | secondary structure: Helices, for example, Alpha‑helix, 3(10) helix, or Pi‑helix |
| STRAND | secondary structure: Beta‑strand, for example, Hydrogen bonded beta‑strand, or Residue in an isolated beta‑bridge |
| TURN | secondary structure Turns, for example, H‑bonded turn (3‑turn, 4‑turn or 5‑turn) |
| ACT_SITE | amino acid(s) involved in the activity of an enzyme |
| SITE | any other interesting site on the sequence |
| INIT_MET | the sequence is known to start with an initiator methionine |
| NON_TER | the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key |
| NON_CONS | non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them |
| UNSURE | uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment |