Ali and Phukan (2013) arranged all the codons in the genetic code table (Table 1) by using the Cartesian product of the ring X i.e., and denote it as , where
Each codon of the form PQR is associated with the element (P, Q, R) of and thus an one to one correspondence can be established between set and . Next, a sum and product operation is defined between the codons by the following way
With these two operations possesses ring structure and is isomorphic to . For example, the element has correspondence with the element . The genetic code with corresponding amino acids table is shown in Table 1.
(1) Corresponding elements of , (2) The codons,
(3) The amino acids, “-” the stop codons
Table 1:The table of the Genetic code
We propose the following
Definition: Codon in which all bases are purines are termed as even codons and the codons in which at least one base is a pyrimidine are odd codons.
We consider the term even as the position of all bases of these codons in P is even.
It is observed that the set of all even codons forms a subgroup of the group . The set of even codons is{AAA, AAG, GAA, GAG, AGA, AGG, GGA, GGG}.
The order of the elements of the group divides the group into three classes. The following table gives the order of the codons.
The transition mutation and transversion mutation of codons are connected with changes in parity (change form odd codon to even or vice-versa) and the order of codons (order as element of the ring). Following are a few connections that we have observed:
1. One-point transition of any base keeps the codon parity as well as codon order. There is no heavy changes happens in the properties of amino acids due to these mutations.
2. The transversion on the bases changes codon parity as well as codon order.
3. Transversion of codons having a pyrimidine as second base (biologically most significant position) keeps the codon parity as well as codon order.
4. All odd codons have maximal order and all even codons have order less than that.
5. During single base transversion, even codons are always muted to odd codons and for each codon, the resulting muted codons are algebraically inverse of one another. For example, first base transversion of the even codon AAG are CAG and UAG, which are algebraically inverse of one another.
6. In first base transversion, the even codons are changed to a codon that code to a polar amino acid, for the second base it is to hydrophobic and for the third base it is to a small amino acid. Also, due to third base transversion, the hydrophilic (hydrophobic) codon (even) changes to a hydrophilic (hydrophobic) codon.
We have eight even codons and the substitution of the bases of the codons with respect to the Watson-Crick base pairing (Adenine to Uracil, Guanine to Cytosine) gives another eight codons which are not zero-divisors of the group .
Table 3: Substitution of the bases of all even codons w.r.t Watson-Crick base pairs
The even codons with their muted codons (transversion) and the not zero-divisor codons with their muted codons (transversion) partitions the whole set of codons into two equal, disjoint subsets. The following table gives the even and not zero-divisor codons with their muted codons:
Thus, we can define a function such that for ,
.
where,
An alternative way of defining the function is such that for
It is observed that all the elements having order less than 4 maps to an element of order 4 and will give us the set of all not zero-divisors of CG. The function f represents the triple base mutation of all even codons in terms of Watson-Crick base pairs.
The set obtained by transversion of domain of f (even codons) together with the domain set and the range set together with the set obtained by the transversion of the range set (set of all not zero-divisors) partitions the whole set CG into two disjoints sets. In other words, ifM is the set of even codons and their one-point transversions,Nis the set of all not zero-divisors and their one-point transversions, then,
Distances between amino acid and their biological significance
A distance matrix is defined between the codons. To determine the distances between each pair of codons, the Hamming distance is considered. Further, this distance matrix of codons used to determine the distances between amino acids. For that purpose the average distances between the coded codons for the respective amino acids are considered.
If consider the codons GGA and CGG, then the number of base positions at which the corresponding codons are different gives the hamming distance between the respective codons.
Next, we consider the amino acids H and T then the distance between them can be calculated by using the hamming distances between their corresponding coded codons.
The amino acids H is coded by the codons CAC, CAU and the amino acid T is coded by the codon ACA, ACC, ACG, ACU. Therefore the distances between the codons of W and T are
Hence the distance between the amino acids H and T is 2.75.
In this way the distance matrix of the 20 amino acids is calculated as shown in table 5.
Table 5:The distance matrix of 20 amino acids obtained from the hamming distances between codons.
It is observed that as the distance values are increases, the differences of properties (physico-chemical) of amino acids increases in most of the cases.Also, the distance values are higher between most of the hydrophilic and hydrophobic amino acids.As for example, the strong hydrophilic amino acid Lysine and the strong hydrophobic amino acid Phenylalanine have maximum distance value 3. When there is small difference in distance value between two amino acids, there is also little difference in their properties. Further it is observed that the distance obtained above defines a metric on the set of amino acids.
It may be noted that similar results were also obtained by Sanchez et al. (2004) wherein he uses a different approach to obtain the distance table.
From the distance matrix of Table 6 we have obtained graph of the amino acids as explained below. Vertices are represented by the amino acids, andtwo vertices (amino acids) α and β are connected by an edge if their distance is less than some given threshold value . At first we consider the average distance (2.21) as threshold value.The corresponding graph is depicted below in Fig 1. Then we examine the graph structures for different thresholds. The graph of amino acids against different threshold values is shown below.
From the graph structures in Fig. 1, Fig. 2, Fig. 3 and Fig. 4; we observe that as we increase the threshold value, the accessibility of getting an amino acid from other decreases simultaneously. The graphs in Fig. 1 and Fig. 2 are connected while the others are disconnected. Also in Fig. 3, the amino acids A, G, P, T, V are isolated. These five amino acids are different from the remaining amino acids in the sense that all of them are coded by four codons, having same first and second bases.
In Fig. 4, we have observed that the amino acids V, L, F, R, G, S, A, T, P, Y are isolated and the amino acids I, W, K, E, Q are connected with M, C, N, D, H respectively. Here the non-isolated amino acidsdiffers from the other 10 isolated amino acids in the sense that each is obtained from any of the other by third base mutation of a codon. And the corresponding codons of the connected amino acids have same first and second bases. Also, for the isolated amino acids, the third base mutation of the corresponding codons of an amino acid produces synonymous codons. That is the muted codon codes the same amino acid.
Next we discuss a real life example and observe that the distance value is usually small in between frequently occurring codon mutations. At first we check the distance between the single point drug resistance mutations in HIV-1 protease gene. Next we go through the respective gene of the HXB2 strain and human beta globin gene. The distance value obtained is 1 between most of the codons in both the cases. And it is also noted that if a small change occurs in the physico-chemical properties of the amino acids in human beta-globin gene, then there is a change in the biological function of hemoglobin.
The hamming distance of the mutations observed in the HIV protease gene. It confers drug resistance with related to the wild type-HIV-HXB2.
The hamming distance of the mutations observed in the human beta-globin gene.
Therefore we can conclude that the physico-chemical properties of the amino acids are connected with the hamming distances determined in the genetic code.
Conclusion
In this paper we discussed an algebraic structure of the genetic code which exhibited some interesting connections of physico-chemical properties of amino acids with the algebraic structure. We observed that there is a closed connection between the order of the codons and transition/transversion mutations. We have shown that the set of all codons which are not zero divisors can be obtained from the even codons and the transversion of these two sets partitioned the whole set of codons into disjoint subsets.
Next a distance matrix of codons is obtained and from which a distance matrix of amino acids is constructed. The distance matrix reflects the fact that the difference of physico-chemical properties of amino acids is related to the distance between amino acids. A graph of the amino acids is generated from the distance matrix. This graph structureroughly depicts the evolutionary pathway of the amino acids.
References
[1] Ali, T. and Phukan, C. K.(2013): Topology in genetic code algebra, Math. Sci. Int. Res. Jour., 2(2), 179-182.
[2] Antoneli, F., Braggion, L., Forger, M. and Hornos, J. E. M. (2003): Extending the search for symmetries in the genetic code, Int. J. Mod. Phys., B17, 3135-3204.
[3] Balakrishnan, J. (2002): Symmetry scheme for amino acid codons, Phys. Rev. E, 65, 021912-5.
[4] Bashford, J. D., Tsohantjis, I., and Jarvis, P. D. (1998):A supersymmetric model for the evolution of the genetic code, Proc. Natl. Acad. Sci. USA, 95, 987-992.
[5] Bashford, J.D. and Jarvis, P.D. (2000): The genetic code as a periodic table, Biosystems57, 147-161.
[6] Beland, P. and Allen, T.F. (1994): The origin and evolution of the genetic code, J. Theor Biol.170, 359-365.
[7] Gohain, N, Ali. T, Akhtar, A. (2015): Lattice structure and distance matrix of genetic code, J. Bio. Sys., 23(3), 485-504.
[8] Hornos, J. E. M. and Hornos, Y. M. M. (1993): Algebraic model for the evolution of the genetic code, Phys. Rev. Lett., 71(26), 4401-4404.
[9] Lehmann, J. (2000): Physico-chemical constraints connected with the coding properties of the genetic system, J. Theor. Biol. 202, 129–144.
[10] Robin, D., Knight, R.D., Freeland, S.J., L.F. Landweber (1999): Selection, history and chemistry: the three faces of the genetic code, Trends Biochem. Sci. 24, 241–247.
[11] Sanchez, R., Morgado, E. and Grau, R. (2005c): Gene Algebra from a genetic Code algebra structure, J. Math. Biol, 51, 431 – 457.
[12] Sanchez, R., Morgado, E., and Grau, R. (2004): The Genetic Code Boolean Lattice. MATCH Commun. Math. Comput. Chem., 52, 29-46.
[13] Sanchez, R., Morgado, E., and Grau, R. (2005a): A Genetic Code Boolean Structure. I. The Meaning of Boolean Deductions, Bull. Math. Biol., 67, 1-14.
[14] Sanchez, R., Perfetti, L.A., Morgado, E. and Grau, R. (2005b): A New DNA sequences Vector Space on a Genetic Code Galois Field, MATCH Commun. Math. Comput. Chem., 54(1), 3-28.
[15] Schuster, P., Fontana, W., Hofacker, I.L. (1994): From sequences to shapes and back: a case study in RNA secondary structures. Proc. Biol. Sci., 255, 279-284.
[16] Stadler, B., Stadler, P., Wagner, G. and Fontana, W.(2001): The topology of the possible: Formal spaces underlying patterns of evolutionary change. J. Theor. Biol., 213, 241- 274.
[17] Watson JD and Crick FHC (1953) A Structure for Deoxyribose Nucleic Acid. Nature(3) 171: 737-738.