The most unusual genetic codes. Genetic code: properties and functions The property of the genetic code is called

In the body's metabolism leading role belongs to proteins and nucleic acids.
Protein substances form the basis of all vital cell structures, have an unusually high reactivity, and are endowed with catalytic functions.
Nucleic acids are part of the most important organ of the cell - the nucleus, as well as the cytoplasm, ribosomes, mitochondria, etc. Nucleic acids play an important, primary role in heredity, body variability, and protein synthesis.

Plan synthesis protein is stored in the cell nucleus, and direct synthesis occurs outside the nucleus, so it is necessary delivery service encoded plan from the nucleus to the site of synthesis. This delivery service is performed by RNA molecules.

The process starts at core cells: part of the DNA "ladder" unwinds and opens. Due to this, the RNA letters form bonds with the open DNA letters of one of the DNA strands. The enzyme transfers the letters of the RNA to connect them into a thread. So the letters of DNA are "rewritten" into the letters of RNA. The newly formed RNA chain is separated, and the DNA "ladder" twists again. The process of reading information from DNA and synthesizing its RNA template is called transcription , and the synthesized RNA is called informational or i-RNA .

After further modifications, this kind of encoded mRNA is ready. i-RNA comes out of the nucleus and goes to the site of protein synthesis, where the letters i-RNA are deciphered. Each set of three letters of i-RNA forms a "letter" that stands for one particular amino acid.

Another type of RNA looks for this amino acid, captures it with the help of an enzyme, and delivers it to the site of protein synthesis. This RNA is called transfer RNA, or tRNA. As the mRNA message is read and translated, the chain of amino acids grows. This chain twists and folds into a unique shape, creating one kind of protein. Even the process of protein folding is remarkable: to use a computer to calculate all options it would take 1027 (!) years to fold a medium-sized protein consisting of 100 amino acids. And for the formation of a chain of 20 amino acids in the body, it takes no more than one second, and this process occurs continuously in all cells of the body.

Genes, genetic code and its properties.

About 7 billion people live on Earth. Except for 25-30 million pairs of identical twins, then genetically all people are different : each is unique, has unique hereditary characteristics, character traits, abilities, temperament.

Such differences are explained differences in genotypes- sets of genes of an organism; each one is unique. The genetic traits of a particular organism are embodied in proteins - consequently, the structure of the protein of one person differs, although quite a bit, from the protein of another person.

It does not mean that humans do not have exactly the same proteins. Proteins that perform the same functions may be the same or very slightly differ by one or two amino acids from each other. But does not exist on the Earth of people (with the exception of identical twins), in which all proteins would be are the same .

Information about the primary structure of a protein encoded as a sequence of nucleotides in a section of a DNA molecule, gene - a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism makes up its genotype . Thus,

A gene is a unit of hereditary information of an organism, which corresponds to a separate section of DNA

Hereditary information is encoded using genetic code , which is universal for all organisms and differs only in the alternation of nucleotides that form genes and code for proteins of specific organisms.

Genetic code consists of triplets (triplets) of DNA nucleotides, combined in different sequences (AAT, HCA, ACG, THC, etc.), each of which encodes a specific amino acid (which will be built into the polypeptide chain).

Actually code counts sequence of nucleotides in an i-RNA molecule , because it removes information from DNA (the process transcriptions ) and translates it into a sequence of amino acids in the molecules of synthesized proteins (process broadcasts ).
The composition of mRNA includes nucleotides A-C-G-U, the triplets of which are called codons : the CHT DNA triplet on mRNA will become the HCA triplet, and the AAG DNA triplet will become the UUC triplet. Exactly i-RNA codons reflects the genetic code in the record.

Thus, genetic code - a unified system for recording hereditary information in nucleic acid molecules in the form of a sequence of nucleotides . The genetic code is based on the use of an alphabet consisting of only four nucleotide letters that differ in nitrogenous bases: A, T, G, C.

The main properties of the genetic code:

1. Genetic code triplet. A triplet (codon) is a sequence of three nucleotides that codes for one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide ( since there are only four types of nucleotides in DNA, in this case 16 amino acids remain uncoded). Two nucleotides for coding amino acids are also not enough, since in this case only 16 amino acids can be encoded. This means that the smallest number of nucleotides encoding one amino acid must be at least three. In this case, the number of possible nucleotide triplets is 43 = 64.

2. Redundancy (degeneracy) The code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids, and there are 64 triplets), with the exception of methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions: in the mRNA molecule, the triplets UAA, UAG, UGA are terminating codons, i.e. stop-signals that stop the synthesis of the polypeptide chain. The triplet corresponding to methionine (AUG), standing at the beginning of the DNA chain, does not encode an amino acid, but performs the function of initiating (exciting) reading.

3. Unambiguity code - along with redundancy, the code has the property uniqueness : each codon matches only one specific amino acid.

4. Collinearity code, i.e. sequence of nucleotides in a gene exactly corresponds to the sequence of amino acids in the protein.

5. Genetic code non-overlapping and compact , i.e. does not contain "punctuation marks". This means that the reading process does not allow for the possibility of overlapping columns (triplets), and, starting at a certain codon, the reading goes continuously triplet by triplet until stop-signals ( termination codons).

6. Genetic code universal , i.e., the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and the systematic position of these organisms.

Exist genetic code tables for decryption codons i-RNA and building chains of protein molecules.

Matrix synthesis reactions.

In living systems, there are reactions unknown in inanimate nature - matrix synthesis reactions.

The term "matrix" in technology they denote the form used for casting coins, medals, typographic type: the hardened metal exactly reproduces all the details of the form used for casting. Matrix synthesis resembles a casting on a matrix: new molecules are synthesized in strict accordance with the plan laid down in the structure of already existing molecules.

The matrix principle lies at the core the most important synthetic reactions of the cell, such as the synthesis of nucleic acids and proteins. In these reactions, an exact, strictly specific sequence of monomeric units in the synthesized polymers is provided.

This is where directional pulling monomers to a specific location cells - into molecules that serve as a matrix where the reaction takes place. If such reactions occurred as a result of a random collision of molecules, they would proceed infinitely slowly. The synthesis of complex molecules based on the matrix principle is carried out quickly and accurately. The role of the matrix macromolecules of nucleic acids play in matrix reactions DNA or RNA .

monomeric molecules, from which the polymer is synthesized - nucleotides or amino acids - in accordance with the principle of complementarity are arranged and fixed on the matrix in a strictly defined, predetermined order.

Then comes "crosslinking" of monomer units into a polymer chain, and the finished polymer is dropped from the matrix.

After that matrix ready to the assembly of a new polymer molecule. It is clear that just as only one coin, one letter can be cast on a given mold, so only one polymer can be "assembled" on a given matrix molecule.

Matrix type of reactions- a specific feature of the chemistry of living systems. They are the basis of the fundamental property of all living things - its ability to reproduce its own kind.

Matrix synthesis reactions

1. DNA replication - replication (from lat. replicatio - renewal) - the process of synthesis of a daughter molecule of deoxyribonucleic acid on the matrix of the parent DNA molecule. During the subsequent division of the mother cell, each daughter cell receives one copy of a DNA molecule that is identical to the DNA of the original mother cell. This process ensures the accurate transmission of genetic information from generation to generation. DNA replication is carried out by a complex enzyme complex, consisting of 15-20 different proteins, called replisome . The material for synthesis is free nucleotides present in the cytoplasm of cells. The biological meaning of replication lies in the exact transfer of hereditary information from the parent molecule to the daughter ones, which normally occurs during the division of somatic cells.

The DNA molecule consists of two complementary strands. These chains are held together by weak hydrogen bonds that can be broken by enzymes. The DNA molecule is capable of self-doubling (replication), and a new half of it is synthesized on each old half of the molecule.
In addition, an mRNA molecule can be synthesized on a DNA molecule, which then transfers the information received from DNA to the site of protein synthesis.

Information transfer and protein synthesis follow a matrix principle, comparable to the work of a printing press in a printing house. Information from DNA is copied over and over again. If errors occur during copying, they will be repeated in all subsequent copies.

True, some errors in copying information by a DNA molecule can be corrected - the process of eliminating errors is called reparations. The first of the reactions in the process of information transfer is the replication of the DNA molecule and the synthesis of new DNA strands.

2. Transcription (from Latin transcriptio - rewriting) - the process of RNA synthesis using DNA as a template, occurring in all living cells. In other words, it is the transfer of genetic information from DNA to RNA.

Transcription is catalyzed by the enzyme DNA-dependent RNA polymerase. RNA polymerase moves along the DNA molecule in the direction 3 " → 5". Transcription consists of steps initiation, elongation and termination . The unit of transcription is the operon, a fragment of the DNA molecule consisting of promoter, transcribed moiety, and terminator . i-RNA consists of one strand and is synthesized on DNA in accordance with the rule of complementarity with the participation of an enzyme that activates the beginning and end of the synthesis of the i-RNA molecule.

The finished mRNA molecule enters the cytoplasm on the ribosomes, where the synthesis of polypeptide chains takes place.

3. Broadcast (from lat. translation- transfer, movement) - the process of protein synthesis from amino acids on the matrix of information (matrix) RNA (mRNA, mRNA) carried out by the ribosome. In other words, this is the process of translating the information contained in the nucleotide sequence of i-RNA into the sequence of amino acids in the polypeptide.

4. reverse transcription is the process of forming double-stranded DNA based on information from single-stranded RNA. This process is called reverse transcription, since the transfer of genetic information occurs in the “reverse” direction relative to transcription. The idea of ​​reverse transcription was initially very unpopular, as it went against the central dogma of molecular biology, which assumed that DNA is transcribed into RNA and then translated into proteins.

However, in 1970, Temin and Baltimore independently discovered an enzyme called reverse transcriptase (revertase) , and the possibility of reverse transcription was finally confirmed. In 1975, Temin and Baltimore were awarded the Nobel Prize in Physiology or Medicine. Some viruses (such as the human immunodeficiency virus that causes HIV infection) have the ability to transcribe RNA into DNA. HIV has an RNA genome that integrates into DNA. As a result, the DNA of the virus can be combined with the genome of the host cell. The main enzyme responsible for the synthesis of DNA from RNA is called revertase. One of the functions of reversease is to create complementary DNA (cDNA) from the viral genome. The associated enzyme ribonuclease cleaves RNA, and reversetase synthesizes cDNA from the DNA double helix. cDNA is integrated into the host cell genome by integrase. The result is synthesis of viral proteins by the host cell that form new viruses. In the case of HIV, apoptosis (cell death) of T-lymphocytes is also programmed. In other cases, the cell may remain a distributor of viruses.

The sequence of matrix reactions in protein biosynthesis can be represented as a diagram.

Thus, protein biosynthesis- this is one of the types of plastic exchange, during which the hereditary information encoded in the DNA genes is realized in a certain sequence of amino acids in protein molecules.

Protein molecules are essentially polypeptide chains made up of individual amino acids. But amino acids are not active enough to connect with each other on their own. Therefore, before they combine with each other and form a protein molecule, amino acids must activate . This activation occurs under the action of special enzymes.

As a result of activation, the amino acid becomes more labile and, under the action of the same enzyme, binds to t- RNA. Each amino acid corresponds to a strictly specific t- RNA, which finds "its" amino acid and endures it into the ribosome.

Therefore, the ribosome receives various activated amino acids linked to their t- RNA. The ribosome is like conveyor to assemble a protein chain from various amino acids entering it.

Simultaneously with t-RNA, on which its own amino acid "sits", " signal» from the DNA that is contained in the nucleus. In accordance with this signal, one or another protein is synthesized in the ribosome.

The directing influence of DNA on protein synthesis is not carried out directly, but with the help of a special intermediary - matrix or messenger RNA (mRNA or i-RNA), which synthesized into the nucleus It is not influenced by DNA, so its composition reflects the composition of DNA. The RNA molecule is, as it were, a cast from the form of DNA. The synthesized mRNA enters the ribosome and, as it were, transfers it to this structure plan- in what order should the activated amino acids entering the ribosome be combined with each other in order to synthesize a certain protein. Otherwise, genetic information encoded in DNA is transferred to mRNA and then to protein.

The mRNA molecule enters the ribosome and flashes her. That segment of it that is currently in the ribosome is determined codon (triplet), interacts in a completely specific way with a structure suitable for it triplet (anticodon) in the transfer RNA that brought the amino acid into the ribosome.

Transfer RNA with its amino acid approaches a certain codon of mRNA and connects with him; to the next, neighboring site of i-RNA joins another tRNA with a different amino acid and so on until the entire i-RNA chain is read, until all the amino acids are strung in the appropriate order, forming a protein molecule. And t-RNA, which delivered the amino acid to a specific site of the polypeptide chain, freed from its amino acid and exits the ribosome.

Then again in the cytoplasm, the desired amino acid can join it, and it will again transfer it to the ribosome. In the process of protein synthesis, not one, but several ribosomes, polyribosomes, are simultaneously involved.

The main stages of the transfer of genetic information:

1. Synthesis on DNA as on an mRNA template (transcription)
2. Synthesis of the polypeptide chain in ribosomes according to the program contained in i-RNA (translation) .

The stages are universal for all living beings, but the temporal and spatial relationships of these processes differ in pro- and eukaryotes.

At prokaryotes transcription and translation can occur simultaneously because DNA is located in the cytoplasm. At eukaryote transcription and translation are strictly separated in space and time: the synthesis of various RNAs occurs in the nucleus, after which the RNA molecules must leave the nucleus, passing through the nuclear membrane. The RNA is then transported in the cytoplasm to the site of protein synthesis.

is a way of encoding the amino acid sequence of proteins using the sequence of nucleotides in the DNA molecule, characteristic of all living organisms.

The implementation of genetic information in living cells (i.e., the synthesis of a protein encoded in DNA) is carried out using two matrix processes: transcription (i.e., mRNA synthesis on a DNA template) and translation (synthesis of a polypeptide chain on an mRNA template).

DNA uses four nucleotides - adenine (A), guanine (G), cytosine (C), thymine (T). These "letters" make up the alphabet of the genetic code. RNA uses the same nucleotides, except for thymine, which is replaced by uracil (U). In DNA and RNA molecules, nucleotides line up in chains and, thus, sequences of “letters” are obtained.

In the nucleotide sequence of DNA there are code "words" for each amino acid of the future protein molecule - the genetic code. It consists in a certain sequence of nucleotides in the DNA molecule.

Three consecutive nucleotides encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called a triplet or codon.

At present, the DNA code has been completely deciphered, and we can talk about certain properties that are characteristic of this unique biological system, which provides the translation of information from the "language" of DNA to the "language" of protein.

The carrier of genetic information is DNA, but since mRNA, a copy of one of the DNA strands, is directly involved in protein synthesis, the genetic code is most often written in the "RNA language".

Amino acid Coding RNA triplets
Alanine GCU GCC GCA GCG
Arginine TsGU TsGTs TsGA TsGG AGA AGG
Asparagine AAU AAC
Aspartic acid GAU GAC
Valine GUU GUTS GUA GUG
Histidine CAU CAC
Glycine GSU GGC GGA GYY
Glutamine CAA CAG
Glutamic acid GAA GAG
Isoleucine AAU AUC AUA
Leucine TSUU TSUT TSUA TSUG UUA UUG
Lysine AAA AAG
Methionine AUG
Proline CCC CCC CCA CCG
Serene UCU UCC UCA UCG ASU AGC
Tyrosine UAU UAC
Threonine ACC ACC ACA ACG
tryptophan UGG
Phenylalanine uuu uuc
Cysteine UGU UHC
STOP UGA UAG UAA

Properties of the genetic code

Three consecutive nucleotides (nitrogenous bases) encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called triplet or codon.

Triplet (codon)- a sequence of three nucleotides (nitrogenous bases) in a DNA or RNA molecule, which determines the inclusion of a certain amino acid in the protein molecule during its synthesis.

  • Unambiguity (discreteness)

One triplet cannot encode two different amino acids; it encodes only one amino acid. A certain codon corresponds to only one amino acid.

Each amino acid can be defined by more than one triplet. Exception - methionine and tryptophan. In other words, several codons can correspond to the same amino acid.

  • non-overlapping

The same base cannot be present in two adjacent codons at the same time.

Some triplets do not encode amino acids, but are a kind of "road signs" that determine the beginning and end of individual genes (UAA, UAG, UGA), each of which means the cessation of synthesis and is located at the end of each gene, so we can talk about the polarity of the genetic code.

In animals and plants, in fungi, bacteria and viruses, the same triplet encodes the same type of amino acid, that is, the genetic code is the same for all living beings. In other words, universality - the ability of the genetic code to work in the same way in organisms of different levels of complexity from viruses to humans.The universality of the DNA code confirms the unity of pthe origin of all life on our planet. Genetic engineering methods are based on the use of the universality property of the genetic code.

From the history of the discovery of the genetic code

For the first time the idea of ​​existence genetic code formulated by A. Down and in 1952 - 1954. Scientists have shown that a nucleotide sequence that uniquely determines the synthesis of a particular amino acid must contain at least three links. Later it was proved that such a sequence consists of three nucleotides, called codon or triplet .

The questions of which nucleotides are responsible for incorporating a certain amino acid into a protein molecule and how many nucleotides determine this inclusion remained unresolved until 1961. The theoretical analysis showed that the code cannot consist of one nucleotide, since in this case only 4 amino acids can be encoded. However, the code cannot be a doublet either, that is, a combination of two nucleotides from a four-letter “alphabet” cannot cover all amino acids, since only 16 such combinations are theoretically possible (4 2 = 16).

Three consecutive nucleotides are enough to encode 20 amino acids, as well as a “stop” signal, which means the end of the protein sequence, when the number of possible combinations is 64 (4 3 = 64).

GENETIC CODE(Greek, genetikos referring to origin; syn.: code, biological code, amino acid code, protein code, nucleic acid code) - a system for recording hereditary information in the nucleic acid molecules of animals, plants, bacteria and viruses by alternating the sequence of nucleotides.

Genetic information (Fig.) from cell to cell, from generation to generation, with the exception of RNA-containing viruses, is transmitted by reduplication of DNA molecules (see Replication). The implementation of DNA hereditary information in the process of cell life is carried out through 3 types of RNA: information (mRNA or mRNA), ribosomal (rRNA) and transport (tRNA), which are synthesized on DNA as on a matrix with the help of the RNA polymerase enzyme. At the same time, the sequence of nucleotides in a DNA molecule uniquely determines the sequence of nucleotides in all three types of RNA (see Transcription). The information of a gene (see) encoding a proteinaceous molecule is carried only by mRNA. The end product of the implementation of hereditary information is the synthesis of protein molecules, the specificity of which is determined by the sequence of their constituent amino acids (see Translation).

Since only 4 different nitrogenous bases are present in DNA or RNA [in DNA - adenine (A), thymine (T), guanine (G), cytosine (C); in RNA - adenine (A), uracil (U), cytosine (C), guanine (G)], the sequence of which determines the sequence of 20 amino acids in the protein, the problem of G. to., i.e., the problem of translating a 4-letter alphabet of nucleic acids into the 20-letter alphabet of polypeptides.

For the first time, the idea of ​​matrix synthesis of protein molecules with the correct prediction of the properties of a hypothetical matrix was formulated by N. K. Koltsov in 1928. In 1944, Avery (O. Avery) et al., found that DNA molecules are responsible for the transfer of hereditary traits during transformation in pneumococci . In 1948, E. Chargaff showed that in all DNA molecules there is a quantitative equality of the corresponding nucleotides (A-T, G-C). In 1953, F. Crick, J. Watson and Wilkins (M. H. F. Wilkins), based on this rule and data from X-ray diffraction analysis (see), came to the conclusion that a DNA molecule is a double helix, consisting of two polynucleotide strands linked together by hydrogen bonds. Moreover, only T can be located against A of one chain in the second, and only C against G. This complementarity leads to the fact that the nucleotide sequence of one chain uniquely determines the sequence of the other. The second significant conclusion that follows from this model is that the DNA molecule is capable of self-reproduction.

In 1954, G. Gamow formulated G.'s problem to. in its modern form. In 1957, F. Crick expressed the Adapter Hypothesis, suggesting that amino acids interact with the nucleic acid not directly, but through intermediaries (now known as tRNA). In the years that followed, all the principal links in the general scheme for the transmission of genetic information, initially hypothetical, were confirmed experimentally. In 1957 mRNAs were discovered [A. S. Spirin, A. N. Belozersky et al.; Folkin and Astrakhan (E. Volkin, L. Astrachan)] and tRNA [Hoagland (M. V. Hoagland)]; in 1960, DNA was synthesized outside the cell using existing DNA macromolecules as a template (A. Kornberg) and DNA-dependent RNA synthesis was discovered [Weiss (S. V. Weiss) et al.]. In 1961, a cell-free system was created, in which, in the presence of natural RNA or synthetic polyribonucleotides, protein-like substances were synthesized [M. Nirenberg and Matthaei (J. H. Matthaei)]. The problem of cognition of G. to. consisted of studying the general properties of the code and actually deciphering it, that is, finding out which combinations of nucleotides (codons) code for certain amino acids.

The general properties of the code were elucidated regardless of its decoding and mainly before it by analyzing the molecular patterns of the formation of mutations (F. Crick et al., 1961; N. V. Luchnik, 1963). They come down to this:

1. The code is universal, i.e. identical, at least in the main, for all living beings.

2. The code is triplet, that is, each amino acid is encoded by a triple of nucleotides.

3. The code is non-overlapping, i.e. a given nucleotide cannot be part of more than one codon.

4. The code is degenerate, that is, one amino acid can be encoded by several triplets.

5. Information about the primary structure of the protein is read from mRNA sequentially, starting from a fixed point.

6. Most of the possible triplets have "meaning", i.e., encode amino acids.

7. Of the three "letters" of the codon, only two (obligate) are of primary importance, while the third (optional) carries much less information.

Direct deciphering of the code would consist in comparing the nucleotide sequence in the structural gene (or the mRNA synthesized on it) with the amino acid sequence in the corresponding protein. However, this way is still technically impossible. Two other ways were applied: protein synthesis in a cell-free system using artificial polyribonucleotides of known composition as a matrix and analysis of the molecular patterns of mutation formation (see). The first brought positive results earlier and historically played a big role in deciphering G. to.

In 1961, M. Nirenberg and Mattei used as a matrix a homo-polymer - a synthetic polyuridyl acid (i.e., artificial RNA of the composition UUUU ...) and received polyphenylalanine. From this it followed that the codon of phenylalanine consists of several U, i.e., in the case of a triplet code, it stands for UUU. Later, along with homopolymers, polyribonucleotides consisting of different nucleotides were used. In this case, only the composition of the polymers was known, while the arrangement of nucleotides in them was statistical, and therefore the analysis of the results was statistical and gave indirect conclusions. Quite quickly, we managed to find at least one triplet for all 20 amino acids. It turned out that the presence of organic solvents, a change in pH or temperature, some cations, and especially antibiotics, make the code ambiguous: the same codons begin to stimulate the inclusion of other amino acids, in some cases one codon began to encode up to four different amino acids. Streptomycin affected the reading of information both in cell-free systems and in vivo, and was effective only on streptomycin-sensitive bacterial strains. In streptomycin-dependent strains, he "corrected" the reading from codons that had changed as a result of the mutation. Similar results gave reason to doubt the correctness of G.'s decoding to. with the help of a cell-free system; confirmation was required, and primarily by in vivo data.

The main data on G. to. in vivo were obtained by analyzing the amino acid composition of proteins in organisms treated with mutagens (see) with a known mechanism of action, for example, nitrogenous to-one, which causes the replacement of C by U and A by D. Useful information is also provided by the analysis of mutations caused by non-specific mutagens, comparison of differences in the primary structure of related proteins in different species, correlation between the composition of DNA and proteins, etc.

G.'s decoding to. on the basis of data in vivo and in vitro gave the coinciding results. Later, three other methods for deciphering the code in cell-free systems were developed: binding of aminoacyl-tRNA (i.e., tRNA with an attached activated amino acid) with trinucleotides of a known composition (M. Nirenberg et al., 1965), binding of aminoacyl-tRNA with polynucleotides starting with a certain triplet (Mattei et al., 1966), and the use of polymers as mRNA, in which not only the composition, but also the order of nucleotides is known (X. Korana et al., 1965). All three methods complement each other, and the results are consistent with the data obtained in experiments in vivo.

In the 70s. 20th century there were methods of especially reliable check of results of decoding G. to. It is known that the mutations arising under the influence of proflavin consist in loss or insertion of separate nucleotides that leads to shift of a reading frame. In the T4 phage, a number of mutations were induced by proflavin, in which the composition of lysozyme changed. This composition was analyzed and compared with those codons that should have been obtained by a shift in the reading frame. There was a complete match. Additionally, this method made it possible to establish which triplets of the degenerate code encode each of the amino acids. In 1970, Adams (J. M. Adams) and his collaborators managed to partially decipher G. to. by a direct method: in the R17 phage, the base sequence was determined in a fragment of 57 nucleotides in length and compared with the amino acid sequence of its shell protein. The results were in complete agreement with those obtained by less direct methods. Thus, the code is deciphered completely and correctly.

The results of decoding are summarized in a table. It lists the composition of codons and RNA. The composition of tRNA anticodons is complementary to mRNA codons, i.e. instead of U they contain A, instead of A - U, instead of C - G and instead of G - C, and corresponds to the codons of the structural gene (that strand of DNA, with which information is read) with the only difference being that uracil takes the place of thymine. Of the 64 triplets that can be formed by a combination of 4 nucleotides, 61 have "sense", i.e., encode amino acids, and 3 are "nonsense" (devoid of meaning). There is a fairly clear relationship between the composition of triplets and their meaning, which was discovered even when analyzing the general properties of the code. In some cases, triplets encoding a specific amino acid (eg, proline, alanine) are characterized by the fact that the first two nucleotides (obligate) are the same, and the third (optional) can be anything. In other cases (when coding, for example, asparagine, glutamine), two similar triplets have the same meaning, in which the first two nucleotides coincide, and any purine or any pyrimidine takes the place of the third.

Nonsense codons, 2 of which have special names corresponding to the designation of phage mutants (UAA-ocher, UAG-amber, UGA-opal), although they do not encode any amino acids, they are of great importance when reading information, encoding the end of the polypeptide chain .

Information is read in the direction from 5 1 -> 3 1 - to the end of the nucleotide chain (see Deoxyribonucleic acids). In this case, protein synthesis proceeds from an amino acid with a free amino group to an amino acid with a free carboxyl group. The start of synthesis is encoded by the AUG and GUG triplets, which in this case include a specific starting aminoacyl-tRNA, namely N-formylmethionyl-tRNA. The same triplets, when localized within the chain, encode methionine and valine, respectively. The ambiguity is removed by the fact that the beginning of reading is preceded by nonsense. There is evidence that the boundary between mRNA regions encoding different proteins consists of more than two triplets and that the secondary structure of RNA changes in these places; this issue is under investigation. If a nonsense codon occurs within a structural gene, then the corresponding protein is built only up to the location of this codon.

The discovery and decoding of the genetic code - an outstanding achievement of molecular biology - had an impact on all biol, sciences, in some cases laying the foundation for the development of special large sections (see Molecular genetics). G.'s opening effect to. and the researches connected with it compare with that effect which was rendered on biol, sciences by Darwin's theory.

The universality of G. to. is a direct proof of the universality of the basic molecular mechanisms of life in all representatives of the organic world. Meanwhile, the large differences in the functions of the genetic apparatus and its structure during the transition from prokaryotes to eukaryotes and from unicellular to multicellular ones are probably associated with molecular differences, the study of which is one of the tasks of the future. Since G.'s research to. is only a matter of recent years, the significance of the results obtained for practical medicine is only indirect in nature, allowing for the time being to understand the nature of diseases, the mechanism of action of pathogens and medicinal substances. However, the discovery of such phenomena as transformation (see), transduction (see), suppression (see), indicates the fundamental possibility of correcting pathologically altered hereditary information or its correction - the so-called. genetic engineering (see).

Table. GENETIC CODE

First nucleotide of the codon

Second nucleotide of the codon

Third, codon nucleotide

Phenylalanine

J Nonsense

tryptophan

Histidine

Glutamic acid

Isoleucine

Aspartic

Methionine

Asparagine

Glutamine

* Encodes the end of the chain.

** Also encodes the beginning of the chain.

Bibliography: Ichas M. Biological code, trans. from English, M., 1971; Archer N.B. Biophysics of cytogenetic defeats and a genetic code, L., 1968; Molecular genetics, trans. from English, ed. A. N. Belozersky, part 1, M., 1964; Nucleic acids, trans. from English, ed. A. N. Belozersky. Moscow, 1965. Watson JD Molecular biology of the gene, trans. from English, M., 1967; Physiological Genetics, ed. M. E. Lobasheva S. G., Inge-Vechtoma-va, L., 1976, bibliogr.; Desoxyribonucleins&ure, Schlttssel des Lebens, hrsg. v „E. Geissler, B., 1972; The genetic code, Gold Spr. Harb. Symp. quant. Biol., v. 31, 1966; W o e s e C. R. The genetic code, N. Y. a. o., 1967.

The same nucleotides are used, except for the nucleotide containing thymine, which is replaced by a similar nucleotide containing uracil, which is denoted by the letter ( in Russian-language literature). In DNA and RNA molecules, nucleotides line up in chains and, thus, sequences of genetic letters are obtained.

The proteins of almost all living organisms are built from only 20 types of amino acids. These amino acids are called canonical. Each protein is a chain or several chains of amino acids connected in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all its biological properties.

However, in the early 1960s, new data revealed the failure of the “comma-free code” hypothesis. Then experiments showed that codons, considered by Crick to be meaningless, could provoke protein synthesis in a test tube, and by 1965 the meaning of all 64 triplets had been established. It turned out that some codons are simply redundant, that is, a number of amino acids are encoded by two, four or even six triplets.

Properties

Correspondence tables of mRNA codons and amino acids

Genetic code common to most pro- and eukaryotes. The table lists all 64 codons and lists the corresponding amino acids. The base order is from the 5" to the 3" end of the mRNA.

standard genetic code
1st
base
2nd base 3rd
base
U C A G
U UUU (Phe/F) Phenylalanine UCU (Ser/S) Serine UAU (Tyr/Y) Tyrosine UGU (Cys/C) Cysteine U
UUC UCC UAC UGC C
UUA (Leu/L) Leucine UCA UAA Stop ( Ocher) UGA Stop ( Opal) A
UUG UCG UAG Stop ( Amber) UGG (Trp/W) Tryptophan G
C CUU CCU (Pro/P) Proline CAU (His/H) Histidine CGU (Arg/R) Arginine U
CUC CCC CAC CGC C
CUA CCA CAA (Gln/Q) Glutamine CGA A
CUG CCG CAG CGG G
A AUU (Ile/I) Isoleucine ACU (Thr/T) Threonine AAU (Asn/N) Asparagine AGU (Ser/S) Serine U
AUC ACC AAC AGC C
AUA ACA AAA (Lys/K) Lysine AGA (Arg/R) Arginine A
AUG (Met/M) Methionine ACG AAG AGG G
G GUU (Val/V) Valine GCU (Ala/A) Alanine GAU (Asp/D) Aspartic acid GGU (Gly/G) Glycine U
GUC GCC GAC GGC C
GUA GCA GAA (Glu/E) Glutamic acid GGA A
GUG GCG GAG GGG G
The AUG codon codes for methionine and is also the site of translation initiation: the first AUG codon in the mRNA coding region serves as the start of protein synthesis. Reverse table (codons for each amino acid are indicated, as well as stop codons)
Ala/A GCU, GCC, GCA, GCG Leu/L UUA, UUG, CUU, CUC, CUA, CUG
Arg/R CGU, CGC, CGA, CGG, AGA, AGG Lys/K AAA, AAG
Asn/N AAU, AAC Met/M AUG
Asp/D GAU, GAC Phe/F UUU, UUC
Cys/C UGU, UGC Pro/P CCU, CCC, CCA, CCG
Gln/Q CAA, CAG Ser/S UCU, UCC, UCA, UCG, AGU, AGC
Glu/E GAA, GAG Thr/T ACU, ACC, ACA, ACG
Gly/G GGU, GGC, GGA, GGG Trp/W UGG
His/H CAU, CAC Tyr/Y UAU, UAC
Ile/I AUU, AUC, AUA Val/V GUU, GUC, GUU, GUG
START AUG STOP UAG, UGA, UAA

Variations on the Standard Genetic Code

The first example of a deviation from the standard genetic code was discovered in 1979 during the study of human mitochondrial genes. Since that time, several such variants have been found, including a variety of alternative mitochondrial codes, such as reading the stop codon UGA as the codon defining tryptophan in mycoplasmas. In bacteria and archaea, GUG and UUG are often used as start codons. In some cases, genes start coding for a protein at a start codon that is different from the one normally used by the species.

In some proteins, non-standard amino acids, such as selenocysteine ​​and pyrrolysine, are inserted by the stop codon-reading ribosome, which depends on the sequences in the mRNA. Selenocysteine ​​is now regarded as the 21st, and pyrrolysine the 22nd of the amino acids that make up proteins.

Despite these exceptions, the genetic code of all living organisms has common features: codons consist of three nucleotides, where the first two are defining, codons are translated by tRNA and ribosomes into a sequence of amino acids.

Deviations from the standard genetic code.
Example codon Usual value Reads like:
Some types of yeast of the genus Candida CUG Leucine Serene
Mitochondria, in particular Saccharomyces cerevisiae CU(U, C, A, G) Leucine Serene
Mitochondria of higher plants CGG Arginine tryptophan
Mitochondria (in all studied organisms without exception) UGA Stop tryptophan
Nuclear genome of ciliates Euplotes UGA Stop Cysteine ​​or selenocysteine
Mammalian mitochondria, Drosophila, S.cerevisiae and many simple AUA Isoleucine Methionine = Start
prokaryotes GUG Valine Start
Eukaryotes (rare) CUG Leucine Start
Eukaryotes (rare) GUG Valine Start
Prokaryotes (rare) UUG Leucine Start
Eukaryotes (rare) ACG Threonine Start
Mammalian mitochondria AGC, AGU Serene Stop
Drosophila mitochondria AGA Arginine Stop
Mammalian mitochondria AG(A, G) Arginine Stop

Evolution

It is believed that the triplet code was formed quite early in the course of the evolution of life. But the existence of differences in some organisms that appeared at different evolutionary stages indicates that it was not always so.

According to some models, at first the code existed in a primitive form, when a small number of codons denoted a relatively small number of amino acids. A more precise codon value and more amino acids could be introduced later. At first, only the first two of the three bases could be used for recognition [which depends on the structure of the tRNA].

- Lewin b. Genes. M. : 1987. C. 62.

see also

Notes

  1. Sanger F. (1952). “The arrangement of amino acids in proteins”. Adv. Protein Chem. 7 : 1-67. PMID.
  2. Ichas M. biological code. - M.: Mir, 1971.
  3. Watson J. D., Crick F. H. (April 1953). “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid”. Nature. 171 : 737-738. PMID. reference)
  4. Watson J. D., Crick F. H. (May 1953). “Genetical implications of the structure of deoxyribonucleic acid”. Nature. 171 : 964-967. PMID. Uses deprecated |month= parameter (help)
  5. Crick F. H. (April 1966). “The genetic code - yesterday, today, and tomorrow”. Cold Spring Harb. Symp. quant. Biol.: 1-9. PMID. Uses deprecated |month= parameter (help)
  6. Gamow G. (February 1954). “Possible relation between deoxyribonucleic acid and protein structures”. Nature. 173 : 318. DOI: 10.1038/173318a0 . PMID. Uses deprecated |month= parameter (help)
  7. Gamow G., Rich A., Ycas M. (1956). “The problem of information transfer from the nucleic acids to proteins”. Adv. Bio.l Med. Phys. 4 : 23-68. PMID.
  8. Gamow G, Ycas M. (1955). “Statistical correlation of protein and ribonucleic acid composition” . Proc. Natl. Acad. sci. U.S.A. 41 : 1011-1019. PMID.
  9. Crick F. H., Griffith J. S., Orgel L. E. (1957).

Chapter USE: 2.6. Genetic information in a cell. Genes, genetic code and its properties. Matrix nature of biosynthetic reactions. Biosynthesis of protein and nucleic acids

More than 6 billion people live on Earth. Except for 25-30 million pairs of identical twins, then genetically all people are different. This means that each of them is unique, has unique hereditary characteristics, character traits, abilities, temperament and many other qualities. What determines such differences between people? Of course, the differences in their genotypes , i.e. set of genes in an organism. Each person is unique, just as the genotype of an individual animal or plant is unique. But the genetic characteristics of a given person are embodied in proteins synthesized in his body. Consequently, the structure of the protein of one person differs, although quite a bit, from the protein of another person. That's why the problem of organ transplants arises, that's why there are allergic reactions to food, insect bites, plant pollen, and so on. This does not mean that people do not have exactly the same proteins. Proteins that perform the same functions may be the same or very slightly differ by one or two amino acids from each other. But there are no people on Earth (with the exception of identical twins) in whom all proteins would be the same.

Information about the primary structure of a protein is encoded as a sequence of nucleotides in a region of the DNA molecule - the gene. Gene is a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism makes up its genotype.

Hereditary information is encoded using genetic code . The code is similar to the well-known Morse code, which encodes information with dots and dashes. Morse code is universal for all radio operators, and the differences are only in the translation of signals into different languages. The genetic code is also universal for all organisms and differs only in the alternation of nucleotides that form the genes and code for the proteins of specific organisms.

Properties of the genetic code : triplet, specificity, universality, redundancy and non-overlapping.

So what is the genetic code? Initially, it consists of triplets ( triplets ) DNA nucleotides combined in different sequences. For example, AAT, HCA, ACH, THC, etc. Each triplet of nucleotides encodes a specific amino acid that will be built into the polypeptide chain. For example, the CHT triplet codes for the amino acid alanine, and the AAG triplet codes for the amino acid phenylalanine. There are 20 amino acids, and there are 64 possibilities for combinations of four nucleotides in groups of three. Therefore, four nucleotides is enough to encode 20 amino acids. That is why one amino acid can be encoded by several triplets. Some of the triplets do not encode amino acids at all, but start or stop protein biosynthesis.

The actual genetic code is sequence of nucleotides in an mRNA molecule, because it removes information from DNA ( transcription process ) and translates it into a sequence of amino acids in the molecules of synthesized proteins ( translation process ). The composition of mRNA includes nucleotides of ACGU. The nucleotide triplets of mRNA are called codons. The already given examples of DNA triplets on mRNA will look like this - the CHT triplet on mRNA will become the HCA triplet, and the DNA triplet - AAG - will become the UUC triplet. It is the codons of mRNA that reflect the genetic code in the record. So, the genetic code is triplet, universal for all organisms on earth, degenerate (each amino acid is encrypted by more than one codon). Between the genes there are punctuation marks - these are triplets, which are called stop codons . They signal the end of the synthesis of one polypeptide chain. There are tables of the genetic code that you need to be able to use to decipher mRNA codons and build chains of protein molecules (complementary DNA in brackets).

Read also: