Administrative Instructions under the Patent Cooperation Treaty
Annex C, Appendix 2
Nucleotide and Amino Acid Symbols and Feature Table
Table 5: List of Feature Keys Related to Nucleotide Sequences
|
key
|
description
|
|---|---|
|
allele |
a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others) |
|
attenuator |
(1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; |
|
C_region |
constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain |
|
CAAT_signal |
CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT |
|
CDS |
coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation |
|
conflict |
independent determinations of the “same” sequence differ at this site or region |
|
D-loop |
displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein |
|
D-segment |
diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain |
|
enhancer |
a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter
|
|
exon |
region of genome that codes for portion of spliced mRNA; may contain 5’UTR, all CDSs, and 3’UTR |
|
GC_signal |
GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG |
|
gene |
region of biological interest identified as a gene and for which a name has been assigned |
|
iDNA |
intervening DNA; DNA which is eliminated through any of several kinds of recombination |
|
intron |
a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it |
|
J_segment |
joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains |
|
LTR |
long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses |
|
mat_peptide |
mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS) |
|
misc_binding |
site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind) |
|
misc_difference |
feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base) |
|
misc_feature |
region of biological interest which cannot be described by any other feature key; a new or rare feature |
|
misc_recomb |
site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral) |
|
misc_RNA |
any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5’clip, 3’clip, 5’UTR, 3’UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA) |
|
misc_signal |
any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin) |
|
misc_structure |
any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop) |
|
modified_base |
the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value) |
|
mRNA |
messenger RNA; includes 5’ untranslated region (5’UTR), coding sequences (CDS, exon) and 3’ untranslated region (3’UTR) |
|
mutation |
a related strain has an abrupt, inheritable change in the sequence at this location |
|
N_region |
extra nucleotides inserted between rearranged immunoglobulin segments |
|
old_sequence |
the presented sequence revises a previous version of the sequence at this location |
|
polyA_signal |
recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA |
|
polyA_site |
site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation |
|
precursor_RNA |
any RNA species that is not yet the mature RNA product; may include 5’ clipped region (5’clip), 5’ untranslated region (5’UTR), coding sequences (CDS, exon), intervening sequences (intron), 3’ untranslated region (3’UTR), and 3’ clipped region (3’clip) |
|
prim_transcript |
primary (initial, unprocessed) transcript; includes 5’ clipped region (5’clip), 5’ untranslated region (5’UTR), coding sequences (CDS, exon), intervening sequences (intron), 3’ untranslated region (3’UTR), and 3’ clipped region (3’clip) |
|
primer_bind |
non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements |
|
promoter |
region on a DNA molecule involved in RNA polymerase binding to initiate transcription |
|
protein_bind |
non-covalent protein binding site on nucleic acid |
|
RBS |
ribosome binding site |
|
repeat_region |
region of genome containing repeating units |
|
repeat_unit |
single repeat element |
|
rep_origin |
origin of replication; starting site for duplication of nucleic acid to give two identical copies |
|
rRNA |
mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins |
|
S_region |
switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell |
|
satellite |
many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA |
|
scRNA |
small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote |
|
sig_peptide |
signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence |
|
snRNA |
small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions |
|
source |
identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable |
|
stem_loop |
hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA |
|
STS |
Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs |
|
TATA_signal |
TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T) |
|
terminator |
sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein |
|
transit_peptide |
transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle |
|
tRNA |
mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence |
|
unsure |
author is unsure of exact sequence in this region |
|
V_region |
variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments |
|
V_segment |
variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide |
|
variation |
a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others) |
|
3’clip |
3’-most region of a precursor transcript that is clipped off during processing |
|
3’UTR |
region at the 3’ end of a mature transcript (following the stop codon) that is not translated into a protein |
|
5’clip |
5’-most region of a precursor transcript that is clipped off during processing |
|
5’UTR |
region at the 5’ end of a mature transcript (preceding the initiation codon) that is not translated into a protein |
|
-10_signal |
pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT |
|
-35_signal |
a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ] |
|
|


