Complete mitochondrial genome of Chitala chitala

Table of Contents

Abstract

In this paper we report the complete mitochondrial genome of Chitala chitala which belongs to the order osteoglossiformes. The complete mitochondrial genome sequence (mtDNA) was determined by Next Generation Sequencing platform Illumina HiSeq 2500. The genome is 16,381 bp in length, with a standard set of 22 transfer RNAs (tRNAs), two ribosomal RNAs (rRNAs), 13 protein coding genes and a non-coding region. The ratio of synonymous and non synonymous substitutions (Ka/Ks) indicates that 10 genes evolved under purifying selection.

Keywords-

Introduction

Chitala chitala (Hamilton, 1822) is a freshwater fish commonly known as Indian featherback and is among the oldest extant teleost freshwater fish groups; (subdivision Osteoglossomorpha, order Osteoglossiformes, family Notopteridae) (Mandal et al., 2009). Teleosts represent the largest vertebrate group having over 24,000 species, accounting for more than half of all the vertebrates. The fish is found in freshwater bodies of Indian subcontinent including Nepal, Bangladesh, Pakistan, Myanmar, Thailand and Cambodia as well (Roberts TR., 1992; Jayaram KC., 1999; Froese and Pauly., 2003).

The only genetic material outside of nuclear DNA is mitochondrial DNA. There are many characteristics of mitochondrial DNA which makes it ideal for analyzing genetic relationship such as simple construction, small molecular weight, maternal inheritance, conserved gene content and accelerated rate of nucleotide substitution (Lin et al., 2004; Cameron 2014). Mitochondrial DNA has been widely used for phylogenetic analysis of many groups (Simon et al., 1994; Dowton et al., 2002; Simon et al., 2006; Behura et al., 2011; Cameron 2014). These markers have become popular for evolutionary studies.

Based on the high economic importance of the fish C. chitala there has been an increased interest in its evolutionary history. These freshwater fishes are a significant aspect of biogeographical studies as they do not spread easily through the saltwater areas so their evolution might be tightly linked to the geological histories of landmasses on which evolution took place. (Banarescu 1990, pp.11-55; Lundberg 1993).

Current progress in the DNA sequencing technology allows cost effective and rapid sequencing of the complete mitochondrial genome. Therefore it has become very popular when it comes to studies of molecular evolution, phylogenetic relationships and phylogeography (Wilson et al, 1985; Boore et al, 1995; Avise 2000; Boore et al, 2005; Cui et al, 2011). Mitogenomics data have popular use when it comes to phylogenetic, phylogeographic and ecological studies. The complete mitochondrial genome sequence is important for the study of genome evolution and species phylogeny. In this study we present the first complete nucleotide sequence for the mitochondrial genome of Chitala chitala. We also report the organization, gene arrangement and codon usage of C. chitala mitochondrial DNA and compare it with other freshwater fishes. Finally, we conduct the phylogenetic analysis based on the protein coding genes with the main aim of investigating the phylogenetic position of Chitala chitala.

Materials and Methods

Sample preparation and DNA extraction

Blood samples were collected from ______ through caudal puncture and immediately preserved in 95% ethanol. Total genomic DNA was extracted using the phenol-chloroform protocol (Singh et al, 2012). The concentration was checked in picodrop spectrophotometer (Picodrop Ltd, Cambridge, UK) and the quality of DNA was assessed in 0.8% agarose gel stained with ethidium bromide.

PCR amplification, library preparation and Sequencing

The mitogenome was amplified in its entirety through a long PCR technique (Cheng et al, 1994). The mitogenome was divided into two overlapping segments that were amplified with two primers: ______ and _______. The reaction mixture contained 0.5µl Takara LA-Taq Tm DNA polymerase (Takara, Japan), 0.5µl buffer (10X), 8µl dNTPs (2.5mM each), 0.5µl (20pM) each primer, 1 µl template DNA(500ng/ µl) and nuclease free water 34.5 µl in a final volume of 50µl. PCR conditions included an initial denaturation of 5 min at 94C followed by 30 cycles of denaturation of 30 s at 94C, annealing of 15 min at 68C, followed by extension step of 10 min at 72C and 4C, forever. The amplification was performed using Biorad C 1000 thermal cycler.

The quality and quantity of the PCR products was checked on NanodropTM 2000 spectrophotometer and dsDNA estimation was done on Qubit® 2.0 Flurometer. UltraTM DNA library prep kit was used for preparing library. The amplicon was validated for quality and length of library by Tape Station (Agilent, USA). Sequencing of libraries was done on Next Generation Sequencing platform Illumina HiSeq 2500 utilizing a 500 cycle Illumina Hi Seq Kit.

Annotation and Sequence analysis

The nucleotide composition was calculated by Mega 6 software. The AT and GC asymmetries called the AT skew and GC skew were calculated using the formulas by Hassanin et al : AT skew [(A-T)/(A+T] and GC skew [(G-C)/(G+C)]. The AT content, AT skew and GC skew were calculated to investigate the nucleotide- compositional behavior of mitogenome.

The majority of transfer RNA (tRNA) genes were found out by tRNA Scan-SE 1.21 (Schattner et al, 2007) using sequence source and genetic code as vertebrate mito, search mode was kept as default with a cut-off score of 5.

The codon usage of thirteen protein coding genes was summarized with Mega 6. To calculate the non-synonymous (Ka), synonymous (Ks) and their ratio (Ka/Ks) for protein coding genes, Mega 6 was used. The P-distance was calculated between 15 species of the order osteoglossiformes used in this study as ingroup also through the software Mega 6.

Codon usage bias patterns in C. chitala

The RSCU (Relative Synonymous codon usage) for all protein coding genes of 30 species was calculated using Mega 6. A heat map was drawn by CIMMiner using quantile binning method (https://discover.nci.nih.gov/cimminer) (Weinstein et al, 1997) and clustered the mitochondrial RSCU values using a Euclidean distance method and an average linkage cluster algorithm.

Phylogenetic relationships

For the phylogenetic analysis of C. chitala 12 concatenated protein coding sequences were considered of 30 species out of which 15 species belonged to Osteoglossiformes. The rest belonged to Hiodontiformes (1), Gonorynchiformes (1), Cypriniformes (2), Salmoniformes (3), Clupeiformes (2), Anguilliformes (3) and Polypteriformes (3) which were used as outgroups. The mitogenomes of these 30 species were collected from NCBI (Table. __) and the 12 protein coding genes excluding ND6 were aligned using the software BioEdit 7.2.5 version (Hall 1999) which drives clustal W program.

Phylogenetic trees were constructed by using Maximum Likelihood (ML) methods and Neighbour-Joining (NJ). The phylogenetic tree was build from molecular data with MEGA 6 software (Tamura et al. 2013). For these data sets GTR+G+I model was selected for ML analysis as these showed the lowest BIC (Bayesian information Criterion) and AICc (Akaike information Criterion, corrected) values. The bootstrap value of NJ and ML was kept as 1000.

Result and Discussion

Genome organization

The complete mitochondrial genome of Chitala chitala was found to be 16,381 in length which is within the range of other teleost mitogenomes. The genome size was found to be similar with other osteoglossiformes as shown in Table no 2. The genome has 37 protein coding genes just like other animals mtDNAs. There is no substantial variation from the general organization of mitochondrial genome. The genome consists of 13 protein coding genes, 22tRNAs (transfer RNAs), 2rRNAs (ribosomal RNAs) and a D-loop (Boore 1999; Kilpert et al, 2006; Gissi et al, 2008). (Figure 1) (GenBank accession no. _____).

Twelve out of thirteen protein coding genes are encoded on the heavy strand that is typically observed for vertebrate mitochondrial genomes, whereas only ND6 is located on the light strand (Table 1). Two overlapping regions were found namely between ATPase8 – ATPase6 and COI-tRNA Ser. ATpase8 and ATPase6 had the largest overlap of 9 nucleotides.

Overall Base composition

Generally AT skew, GC skew and A+T content are used in the investigation of the nucleotide- compositional behavior of the mitochondrial genomes (Hassanin et al., 2005; Wei et al., 2010). The nucleotide composition was calculated by Mega 6 software. The overall base composition of C. chitala mitogenome is 32.1% for A, 27.7% for C, 15.3% for G and 24.9% for T with A+T bias of 57% as shown in Table 3.

The H strand composition was 30.7% for A, 29.5% for C, 13.8% for G and 25.7% for T with AT biasness of 56.4%, whereas the L strand showed 43.8% for A, 29.9% for C, 12.6% for G and 13.7% for T with AT biasness of 57.5%. D-loop showed the highest AT content of 70.2%.

The rRNA genes, tRNA genes and protein coding genes showed positive AT skew values unlike D-loop which showed negative value. The overall AT skew value was 0.13 and GC skew value was -0.29 showing biasness towards cytosine residue.

Protein coding genes

The total length of 13 protein coding genes is 11,098 bp accounting for 67.7% of the complete genome. The base composition was 31.4% for A, 13.8% for G, 29.5% for C and 25.3% for T. In all the positions of codons (1st, 2nd and 3rd), G was found to be the least frequent nucleotide. The A+T content of protein coding genes were found to be 56.7%. These genes ranged in size from 168bp (ATPase8) to 1839bp (ND5). No obvious deviation was seen from the general organization of mitochondrial genome.

The total number of codons found in protein coding genes of chitala chitala was 3696 including the stop codons. AAU (N) Asparagine (4.56%) is the most frequently found codon among others followed by CCU (P) Proline (4.2%), ACA (T) Threonine (3.35%), AUU (I) Isoleucine (3.27%) and AGC (S) Serine (3.24%). On the other hand GUG (V) Valine (0.08%) and GCG (A) Alanine (0.14%) showed the lowest frequency of codons. All the amino acids were coded by two, three or four different codons except for Leucine and Arginine which were coded by six different codons. (Table 4)

Evolutionary rates of protein coding genes

In protein coding genes synonymous substitutions (Ks) occur more frequently than the non-synonymous substitutions (Ka) (Anton et al., 2002). In this study, few genes namely ND4L, ND5, ND6 and Cyt b did not support the above statement as they showed lower Ks values than Ka. The average Ka/Ks values of thirteen protein coding genes for the species Chitala chitala were <1 except for three genes (ND5, ND6 and Cytb). The genes which show values <1 indicate the existence of purifying selection. The values of Ka/Ks varied from 0.0 (ND4L) to 2.84 (ND5).Most showed values lower than 0.5 except for CO3, ND4, ND5, ND6 and Cytb.

ND5 and ND6 showed the highest values for Ka/Ks depicting that the selection pressures were not dependent on which strand the gene is located. The forces that are not able to adapt like random drift and mutation pressure frame the necessary base for genome evolution. However, the stronger the functional constraint is, the slower its rate of substitution will be so in a way functional constraint forces a burden on mutation. (Michael et al, 2006)

The conservation of mitochondrial genes was studied based on the p-genetic distance among fifteen Osteoglossiformes species which are Chitala chitala, Chitala ornate, Chitala blanci, Chitala lopis, Notopterus notopterus (Thai), Notopterus notopterus (India), Papyrocranus congoensis, Xenomystus nigri, Brienomyrus niger, Gnathonemus petersii, Gymnarchus niloticus, Scleropages formosus, Osteoglossum bicirrhosum, Heterotis niloticus and Arapaima gigas.

Of the 13 protein coding genes CO1 (0.046) showed the lowest value and ND6 (0.335) the highest, based on 1st and 2nd codon. According to the 3rd codon Cytb (0.04) has the lowest value and ATPase6 (0.472) has the highest value, whereas the full sequence analysis revealed CO3 (0.168) to have the lowest value and ATPase8 (0.264) to have the highest. The overall p-genetic distance was comparatively higher for 3rd codon as compared to 1st and 2nd codon or the full sequence which is in line with the fact that in fishes, most of the differences when it comes to protein coding genes occur at the third codon position. (Zhuang et al, 2013).

Heat Map for codon usage bias

The codon usage bias is an important feature that reflects the evolutionary pattern of genome that has been reported in various organisms (Sharp et al, 1988). The codon usage bias was compared between 30 species in which 15 species belong to the order osteoglossiformes forming the ingroup and rest are species from orders Hiodontiformes, Cypriniformes, Clupeiformes, Gonorynchiformes, Salmoniformes, Anguilliformes and Polypteriformes, forming the outgroup. The genetic code plays a crucial role in all living cells. The codon usage biases are affected by nucleotide composition (Osawa et al, 1988), tRNA abundance (Ikemura et al, 1981), protein structure (OreÅ¡iÄ et al, 1998), length (Moriyama et al, 1998), gene function (Chiapello et al, 1998), translation processes (Sharp et al, 1986), environment temperature (Sau et al, 2009), hydrophobicity (Romero et al, 2000) and other factors.

Codons that have RSCU values less than 0.1 were classified as rare codons. Codon usage patterns of protein coding genes for the mitochondrial genomes were investigated by calculating the RSCU values (Supplementary table). The RSCU is the observed frequency of a codon divided by the expected one. If the RSCU values are close to 1 then the synonymous codons are used without any apparent biases. Whereas, if the RSCU values are greater or less than 1, then the codons in question are used more or less frequently than expected, respectively. Bigger RSCU values are represented by darker shades of red and green represents the lower RSCU values as shown in the heat map (Figure 2).

Non-coding region

The control region or D-loop was located between the tRNA Pro and tRNA-Phe. The control region was determined to be 716bp in length and had an overall base composition that was rich in A and T (A+T= 70.2%). The D-loop composition was 34.2% for A, 16.6% for C, 13.1% for G and 36.0% for T.

rRNA genes and tRNA genes

The 12s and 16s rRNA gene of C. chitala is 956bp and 1704bp respectively. These rRNA genes are located between tRNA-Phe and tRNA-Leu and are separated by the tRNA-Val gene. The A+T content of rRNA genes were found to be 54.8%. The rRNA composition was 34.8% for A, 24.8% for C, 20.3% for G and 20.0% for T.

The twenty-two tRNA genes of mitochondrial genome were found to be 1553bp in size. They are found to be scattered between the rRNA and protein coding genes and range in size from 66-75bp. The A+T content of tRNA genes were found to be 56.5%. The tRNA composition was 31.7% for A, 25.0% for C, 18.5% for G and 24.8% for T.

Phylogenetic analysis

Phylogenetic relatedness was studied using the mitogenomes of 29 related fish species taken from NCBI. The phylogenetic tree includes 15 osteoglossiformes species representing 5 families and 11 genera. Other orders which were used as outgroups were Hiodontiformes (1), Gonorynchiformes (1), Cypriniformes (2), Salmoniformes (3), Clupeiformes (2), Anguilliformes (3) and Polypteriformes (3). Total number of species which were taken for phylogenetic analysis was 30 including outgroups.

The results obtained by both the methods i.e. NJ and ML were phylogenetically similar (Figure 3 and 4). Most of the nodes were statistically supported by high posterior probability and bootstrap values. The tree was rooted by a fish species belonging to the order Polypteriformes. Two species Chitala chitala and Chitala lopis are shown to be in the most evolved clade. C. chitala forms a monophyletic group with other species namely Chitala lopis, Chitala ornate, Chitala blanci, Notopterus notopterus Thai and Notopterus notopterus India. This comparative analysis sheds light on the evolutionary history of C. chitala

Essay: Complete mitochondrial genome of Chitala chitala

Essay details and download:

Text preview of this essay:

Abstract

Introduction

About this essay:

Essay details and download:

Text preview of this essay:

Abstract

Introduction

About this essay:

Essay Categories: