World Intellectual Property Organization

Administrative Instructions under the Patent Cooperation Treaty 

Annex C, Appendix 2

Nucleotide and Amino Acid Symbols and Feature Table

Table 6:  List of Feature Keys Related to Protein Sequences

key description
CONFLICT different papers report differing sequences
VARIANT authors report that sequence variants exist
VARSPLIC description of sequence variants produced by alternative splicing
MUTAGEN site which has been experimentally altered
MOD_RES post-translational modification of a residue
      ACETYLATION N-terminal or other
      AMIDATION generally at the C-terminal of a mature active peptide
      BLOCKED undetermined N- or C-terminal blocking group
      FORMYLATION of the N-terminal methionine
      GAMMA-CARBOXYGLUTAMIC
      ACID HYDROXYLATION
of asparagine, aspartic acid, proline or lysine
      METHYLATION generally of lysine or arginine
      PHOSPHORYLATION of serine, threonine, tyrosine, aspartic acid or histidine
      PYRROLIDONE CARBOXYLIC
      ACID
N-terminal glutamate which has formed an internal cyclic lactam
      SULFATATION generally of tyrosine
 LIPID covalent binding of a lipidic moiety
     MYRISTATE myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue
     PALMITATE palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue
     FARNESYL farnesyl group attached through a thioether bond to a cysteine residue
     GERANYL-GERANYL geranyl-geranyl group attached through a thioether bond to a cysteine residue
     GPI-ANCHOR glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein
      N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages
DISULFID disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link
THIOLEST thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond
THIOETH thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond
CARBOHYD glycosylation site; the nature of the carbohydrate (if known) is given in the description field
METAL binding site for a metal ion; the description field indicates the nature of the metal
BINDING binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field
SIGNAL extent of a signal sequence (prepeptide)
TRANSIT extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody)
PROPEP extent of a propeptide
CHAIN extent of a polypeptide chain in the mature protein
PEPTIDE extent of a released active peptide
DOMAIN extent of a domain of interest on the sequence; the nature of that domain is given in the description field
CA_BIND extent of a calcium-binding region
DNA_BIND extent of a DNA-binding region
NP_BIND extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field
TRANSMEM extent of a transmembrane region
ZN_FING extent of a zinc finger region
SIMILAR extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field
REPEAT extent of an internal sequence repetition
HELIX secondary structure: Helices, for example, Alpha‑helix, 3(10) helix, or Pi‑helix
STRAND secondary structure: Beta‑strand, for example, Hydrogen bonded beta‑strand, or Residue in an isolated beta‑bridge
TURN secondary structure Turns, for example, H‑bonded turn (3‑turn, 4‑turn or 5‑turn)
ACT_SITE amino acid(s) involved in the activity of an enzyme
SITE any other interesting site on the sequence
INIT_MET the sequence is known to start with an initiator methionine
NON_TER the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key
NON_CONS non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them
UNSURE uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment

Explore WIPO