Discuss the advantages of Next Generation Sequencing technologies over conventional chain terminator technology and discuss their current and future applications in crop improvement
Introduction
With the global population projected to increase from around 7 billion people currently to well over 9.5 billion people in 2050 (USCB, 2015), and consumer trends expected to create more demand for animal products (Long, Marshall-Colon and Zhu, 2015), primary crop production increases must rise substantially quicker than current yield trends show in order to meet the 85% increase in primary foodstuffs predicted (Ray et al., 2013). In addition to this, required reductions in agricultural inputs such as fertilizers and herbicides that contribute to global warming (Stuart, Schewe and McDermott, 2014) and more meteorologically varied growing seasons with a higher likelihood of extreme weather events (Wolf et al., 2015), all present plant breeders and scientists significant challenges in order to obtain the doubling of yields required to meet current predictions.
It is important then, that agricultural sector utilizes all tools available in order to meet these challenges. One such tool is DNA sequencing technologies. Put simply, these technologies allow for the determination of the specific base order of a length of DNA, from a small fragment up to the assembly of an entire genome. Since the turn of the century, both conventional and Next generation sequencing (NGS) technologies have been employed to sequence the genomes of multiple plant species ranging from the model organism Arabidopsis thaliana (Arabidopsis genome, 2000) to commercially important crops such as rice (Goff, 2002) and more recently wheat (Appels et al., 2018).
This essay aims to examine how the conventional DNA sequencing chain terminator technology worked, and the advantages that newer, NGS technologies have. The current and future applications of these technologies in regard to crop improvement will also be discussed.
Sanger Sequencing
In the mid 1970’s the first DNA sequencing methods were developed. Sanger and colleagues published a chain termination based technique (Sanger, Nicklen and Coulson, 1977), and Maxam and Gilbert subsequently published a fragmentation-based technique (Maxam and Gilbert, 1977). The chain termination technique, (Also referred to as Sanger sequencing) gained providence as it required less handling of toxic chemicals and radioactive isotopes, which were necessary for Maxam and Gilbert sequencing.
Sanger sequencing is also known as dideoxy sequencing as it is based upon the use of both normal nucleotides (NTP’s) and dideoxynucleotides (ddNTP’s). These ddNTP’s are identical in structure to normal NTP’s however they lack a hydroxyl group at their 3’ carbon. Because this 3’ hydroxyl group is essential for the phosphodiester bond that forms between nucleotides during synthesis, when a ddNTP is incorporated, synthesis of a growing DNA strand is terminated, as further NTPs can no longer be added.
In this method, DNA sequencing requires a single stranded DNA template, the sequencing primer, radioactively labelled NTPs, ddNTPs, and DNA polymerase (Metzker, 2005). The primer, constructed so that its 3’ end is next to the DNA segment to be sequenced, can anneal to the template strands. Once annealed, the DNA containing solution is divided into four tubes, to each of which reagents containing normal NTP’s, DNA polymerase and a specific ddNTP (ddATP, ddGTP, ddCTP and ddTTP) are added. The specific ddNTP in each tube is added at a much lower concentration than the normal precursors.
As the DNA is synthesized, the growing chain will incorporate NTPs and occasionally a ddNTP, once this ddNTP is added, the chain synthesis will terminate. Because the DNA synthesis start point is ubiquitous for all newly synthesized strands, chain termination events downstream of this will occur for all the possible integration points for the specific ddNTP, resulting in the synthesis of DNA fragments with varied lengths. After the reaction has finished, the DNA products are denatured to separate the template strands from the newly synthesized strands. The contents of the different reaction tubes then undergo electrophoresis in separate lanes of a polyacrylamide gel, which separates strands into bands by size, where the distance of each band is inversely proportional to the size of the DNA fragment. The gel is then exposed to X-rays that allows for the reading of the bands with the sequence of nucleotides determined by reading the banding pattern of the gel plate.
Figure 1: An overview of the keys steps involved in dye-terminator Sanger sequencing. When a ddNTP is added to the newly synthesized strand by the polymerase enzyme, synthesis ceases. The fragments produced can the separated by size via capillary electrophoresis. A laser excites the fluorescent dyes attached to each ddNTP, with the colour of the light emitted corresponding to the type of ddNTP at the position. (Commons.wikimedia.org, 2012)
A later advancement to this technology, known as dye-terminator chemistry (figure one), greatly improved this method (Smith et al., 1986; Lee et al., 1997). The ddNTP’s added are each tagged with a different fluorescent dye which, when excited, would each produce a unique wavelength of light. This meant reactions did not have to be separated into ddNTP specific tubes and could instead all occur together, not to mention removing the need to use radioisotopes. Additionally, this enabled more effective automation to occur as the products of the reaction could be run together, and separated via capillary electrophoresis, where the synthesized products are separated by size based upon their total charge. As the products travel through the capillary, a laser beam excites the fluorescent dyes, with the resulting fluorescence detected. Software can then be used to interpret the resulting signals and put together the DNA sequence.
Various advancements in technology lead to the release of the Applied Biosciences 3730xl automatic sequencing machine in 1998, which was able to sequence up to 2.9 Mb per day with a run length of up to 900 bases and was most notably used as the main technique for the completion of the human genome project (Collins, 2003). However, demands for a reduced cost and a more efficient process ultimately lead to the development of newer NGS technologies.
Next Generation Sequencing
The primary advantage of NGS technologies is the ability to produce large volumes of sequence DNA at comparably little expense compared to the original Sanger sequencing technology. More specifically the cost per base sequenced has reduced dramatically as these technologies produce a very large number of individual sequence reads, allowing for very high sequence coverage of a target genome (Treangen and Salzberg, 2011). This makes them suitable for a variety of applications and has led to their large-scale uptake. However, there is no homogeneity between approaches, with different systems utilizing different technologies in order to set themselves apart. Due to the variety of technologies on the market, this essay will address the main two systems available, which differ significantly in their approach but share fundamental key themes.
Roche 454/ pyrosequencer
This method of NGS relies upon the attachment of the template DNA to a bead support. Adaptors that contain universal priming sites are ligated onto the ends of the target DNA fragments. This process enables amplification to occur for all DNA fragments via universal primers. The ligated DNA fragments are captured onto agarose beads, with each bead binding to only one fragment. Emulsion oil and PCR reactants are added to the solution, followed by the initiation of PCR (This process is known as emulsion PCR). Once amplification has taken place, each bead will contain thousands of copies of its template sequence. These are subsequently deposited into individual wells of a PicoTiterPlate (PTP), with each PTP capable of holding millions of individual beads (Metzker, 2009).
The sequencing and subsequent reading of DNA occurs through a method known as pyrosequencing. Here, bioluminescence is measured via the release of pyrophosphate that occurs when DNA polymerase adds a dNTP to the growing DNA strand. The light released is proportional to the number of dNTP bases incorporated by the polymerase enzyme (up to six nucleotides), which means the addition of the same base multiple times can be recorded. Small beads containing luciferase and sulphurylase added to each well catalyze this bioluminescence reaction (figure 2). This method does not rely upon modified NTPS such as ddNTPs to terminate the synthesis reaction; instead, dNTPs are washed over the well plate sequentially, with bases added when they are complimentary to the template strand. As the different dNTPs are added in a cyclical fashion, this allows for the light intensity and order to be recorded onto flowgrams, which indicate the DNA sequence of the newly synthesized strand.
Figure 2: The basic steps shown in the Roche 454 NGS approach. Individual ligated DNA strands attached one per bead and at clonally amplified via PCR until each bead contains thousands of copies of the same template. The beads are then loaded into separate wells of a PTP where pyrosequencing takes place. The release of an inorganic pyrophosphate that occurs via the binding of a nucleotide by DNA polymerase enables bioluminescence to occur (Adapted from Metzker, 2009).
Illumina/ Solexa Genome Anaylser
The Illumina method is similar to the Roche 454 NGS method in that it also requires a PCR stage to amplify its DNA templates, however here, instead of an emulsion-based PCR method, Solid phase amplification is used. Forward and reverse primers are covalently attached to a slide, with the ratio of primers to template determining the density of each amplified cluster. In this method, up to 200 million fixed and spatially separated template clusters can be created to which universal primers can attach and begin the NGS reaction (Figure 3). This reaction occurs via a process known as cyclic reversible termination. Here DNA polymerase attached to each template can attach only one new fluorescently modified nucleotide that has an attached terminator group. This group prevents further synthesis by the polymerase enzyme and so must be cleaved before further nucleotides can be added. Once the fluorescently tagged nucleotides have been incorporated and synthesis terminated, the slide is washed to remove any residual/ unattached NTPs. Imagining will then take place to capture the colour of the attached fluorophore, with each nucleotide type attached to a different colour emitting group, before a cleavage step removes both the fluorescent molecule and the terminator that prevents further synthesis. A second wash step will take place and then the cycle can be repeated again. Due to the fixed nature of each template cluster, the order of fluorescence recorded at each position corresponds to the base order complimentary to the template strand of DNA at that position. Figure 3: The Process of Illumina/ Solexa NGS. Here solid phase amplification occurs (top right) in order to generate clusters of each template. These clusters are fixed to the slide which means when sequencing occurs, at anyone one position, the sequence of fluorescence produced will correspond to the target strand amplified at that location. Instead of bioluminescence producing light (as occurs with Roche 454 above), lasers are used to excite the fluorophores, with the colour of the light released recorded (for example please see bottom left corner of the image). (Adapted from Metzker, 2009).
NGS and Sanger sequencing
It is clear that NGS technologies and Sanger sequencing are fundamentally different in their methodologies. The Sanger/ chain terminator sequencing approach relies upon electrophoresis in order to separate sequenced fragments that vary in length due to the random incorporation of ddNTPs. Whereas NGS is based upon the process of massively parallel sequencing where up to billions of separate short reads can occur during every instrument run (Voelkerding, Dames and Durtschi, 2009). The table below illustrates some of the main differences between the two NGS technologies described and the Sanger based method with the cost per million bases standing out as the main reason NGS technologies have become the preferred choice. This combined with the ability to generate drastically larger volumes of data has meant that genome sequencing has become far more efficient. However the start up costs of NGS technologies is high, with the machinery required often costing hundreds of thousands of dollars (Perkel and Fung, 2016) with the vast data produced requiring the upkeep and usage of data storage platforms which can often require specialist training (Wallace, 2016). Likewise, as NGS is based upon the sequencing of vast amounts of DNA, when individual genes, or only small regions are examined, often Sanger sequencing is preferred owing to its high accuracy and long read length.
Platform Template Preparation Chemistry Read length (bases) Reads / Run Run Times (hours) Cost per million bases
Roche 454 Clonal-emPCR Pyrosequencing 700 1 million 24 $10
Illumina HiSeq Clonal Bridge Amplification Reversible Dye Terminator ~500 ~2.5 billion 1-11 $0.1
Sanger PCR/ plasmid cloning Chain termination ~1000 N/A 3hrs $2400
Table 1: A summary of some of the key differences between two NGS technologies and Sanger sequencing. The high rate of progress within the field means that the figures obtained may vary depending upon the latest release of a particular system. (This data was adapted from Liu et al., 2012 and Besser et al., 2018).
Applications of NGS in Crop Improvement
The ability to read large amounts of DNA accurately, quickly and cheaply has enormous potential in a whole range of scientific fields and disciplines. Indeed NGS technologies have a plethora of current and potential uses within plant genetics. This includes the ability to more easily sequence the complete genomes of wide range of commercially important crops, transcriptome profiling, genomics assisted breeding (GAB) and gene mining for genes of agricultural importance. Several of the applications of NGS for crop improvement are discussed below.
Molecular Marker Characterization
One of the great benefits of NGS technologies is the ability to sequence vast amounts of DNA, which has lead to the ability to efficiently sequence the genomes of multiple individuals within a species. Once multiple copies of a sequence are available, comparison between them can then take place. This has allowed the determination of subtle differences in sequence between individuals and in particular has allowed for the efficient generation of single nucleotide polymorphisms (SNPs) that can act as markers for selection (Abdelkrim et al., 2009). An early example of this was the characterization of over 5000 maize SNPs in 2400 genes created through the comparison of multiple individuals within two inbred lines (Barbazuk et al., 2007). Because NGS technologies have made genome sequencing and transcriptome analysis highly accessible, this technology isn’t just limited to extremely commercially important crops such as maize. For instance, crops such as peppers (Nicolaï et al., 2012), tomato (Hamilton et al., 2012) and potato (Hamilton et al., 2011) have all had large numbers of SNP molecular markers generated through the sequencing by synthesis comparison of germplasm within their populations.
Marker Assisted Selection
The generation of large pools of molecular markers such as SNPs has multiple practical applications for plant breeders and is now employed for Marker Assisted Selection (MAS). Using molecular markers to track and screen for desired genes within a breeding population has allowed for greatly improved efficiency within plant breeding companies (Araus and Cairns, 2014). Now traits that are difficult to select for phenotypically can be done so at a genetic level. Likewise, offspring from crosses can be tested at an early age to determine whether they carry genes of interest. This can improve efficiency and reduce costs for breeders as germplasm that would eventually be discarded can be selected against earlier within a breeding program without having to wait for the phenotype to become apparent. MAS is particularly uses for complex traits such as disease resistance, for instance it can be used for gene pyramiding, where more than one resistance gene that encodes resistance to a pathogen can be tracked. Thus, this allows for the accumulation of multiple partial resistance genes within elite cultivars, which ultimately leads to durable resistance (Feuillet, Langridge and Waugh, 2008) and enables the avoidance of boom and bust cycles associated with the introduction of individual resistance genes (McDonald, 2009).
Genetic Mapping
The generation of a vast number of genetic markers has allowed for the creation of marker rich genetic maps. These maps can be used to track complex traits and their co-segregation within a population (Rafalski, 2002). This approach also enables researchers to screen for and understand the genetic variation within a population by sequencing multiple individuals across a species range. This can lead to the understanding of the allelic variability available within a gene pool and can be used to select for more sophisticated selection of complex multi-gene traits such as yield.
Genetic mapping is particularly useful for bulked segregant analysis (BSA) where DNA is analyzed from individuals that represent the extremes for a particular phenotype and screened together to speed up the identification of candidate genes for particular effects through the identifying the underlying variation of the trait of interest (Michelmore, Paran and Kesseli, 1991). An example of this concept was a QTLseq strategy that was able to identify partial rust resistance QTLs in rice (Takagi et al., 2013).
Linkage drag
A common goal of plant breeding is the incorporation of a gene from a donor parent or from wild germplasm, for example a gene for disease resistance. However, often the introgression of the desired allele will also carry mean the introduction of other genes genetically close to it. Whilst backcrossing allows for the introduction of the desired gene, often there is a reduction in the elite cultivar fitness as a result of linkage drag, where deleterious genes are introduced alongside the target one (Peng, Sun and Mumm, 2013). NGS technologies have become a useful resource to help the laborious backcrossing often required to remove genes acquired via linkage drag as it can be used to identify the rare recombinant individuals where linkage drag has been broken. For instance this technology has been used in rice, where individuals with a desired allele for blast resistance were selected for that had broken the linage drag associated with a reduction in grain quality gene (Fukuoka et al., 2009). This usage of NGS platforms has the capacity to dramatically speed up the development of elite cultivars as it allows for a much greater efficiency of the introgression of desired alleles into breeding populations by reducing the time usually required to remove linkage drag.
The Future
Newer more advanced platforms for sequencing are slowly coming onto the market that worked fundamentally different to the current NGS technologies, these, quipped third and fourth generation technologies do not rely upon the need for PCR (Check Hayden, 2009). The advents of these technologies is projected to further decrease the cost and increase the speed of DNA sequencing, making their potential applications for crop improvement only as limited as scientific imagination.
Conclusion
In conclusion, genome sequencing technologies have revolutionised a myriad of different fields ranging from medicine to crop improvement. Since their conception, these powerful technologies have dramatically increased our knowledge of how organisms behave and the underlying genetic basis for phenotypic traits. We are now able to sequence even the most complex of genomes such as wheat, which allows for the more efficient discovery of beneficial genetic variation among other numerous uses. The advent of NGS approaches resulted in dramatic reductions in the costs of genome sequencing which has subsequently led to wider uptake and therefore greater benefit from these technologies. The tools now available for plant breeders and researchers alike allows for the evermore efficient production of elite cultivars that are able to overcome challenges such as disease resistance, drought tolerances and the necessary yield increases required to meet the growing population.