Understanding Next Generation Sequencing and Collaborative Research for Disease Mutations

OPTION 2: SEQUENCING

Computer scientists are engaged in the analysis of large datasets generated through Next Generation Sequencing approaches. Much of this type of research will require extensive collaboration with biologists. This course was designed to prepare students for NGS based collaborative research by teaching the vocabulary and a basic understanding of the tools used in molecular biology labs as well as covering the vital roles nucleic acids play in prokaryotic and eukaryotic cells. In addition, CM505 aimed to prepare students for a more advanced lab course focused on the design of NGS experiments and preparation of libraries (MIP/BZ565). This exam hopes to test how good a job we did of that!

One common application of NGS is to discover the mutations responsible for diseases including cancer, and inherited diseases. For example, one recent study determined that mutations in the DCHS1 gene can cause Mitral Valve Prolapse (a leaky heart valve) by sequencing a region of the genome in 4 patients and comparing the sequence to that from people who don’t have this condition to identify the single nucleotide difference causing the disease. The questions below are connected to this study, but you will not need to read the paper in detail in order to answer them.

(1) How can mutations like this arise? 2pts

Mutations such as the one discussed in this study can arise in several ways:

– The sex cells of the patient’s parent may have been exposed to radiation, mutagenic toxins, infection or other factors that can damage DNA.

– The patient’s somatic cells may have been exposed to the aforementioned DNA-damaging circumstances, and copies of the damaged DNA are replacing correct copies of the DNA in the patient’s body.

– The patient’s DNA polymerase may have made an error during replication that was not caught by the proofreading process, and this error then became part of the patient’s genome.

– The patient’s parent’s DNA polymerase may have made an error that was not caught by proofreading and also affected the germline.

(2) There were almost 5000 SNPs between the patient and the non-patient genomes. What criteria could be used to select the SNPs that are likely to cause disease over those that are just natural variation? 2pts

SNPs that do not change the protein encoded by a given codon will be less likely to cause disease. We can check whether each SNP is silent or not.

The SNP discussed in the article is a missense mutation, meaning the pre-mRNA transcription will contain a codon that will not be translated as the correct protein. Practically, to determine which SNPs are known to be pathogenic, we can use NCBI’s SNP search, which allows filtering for pathogenic SNPs, on the gene in question.

(3) If mutations like this are detrimental, why has evolution not resulted in organisms that don’t generate mutations? 2pts

Mutations are often beyond the control of the organism’s internal processes. An organism can only prepare itself so well to deal with outside threats, such as radiation, infection and toxins. Even if an organism managed to somehow evolve a perfectly efficient polymerase, some mutations would still occur.

It is also important to mention that mutations are not always deleterious; some mutations increase an organism’s fitness. A species that has evolved in such a way as to stop mutation (which would not be possible in any case) might have written its own death sentence, since no new beneficial mutations could arise that might help face new obstacles the genome is not currently equipped to deal with.

(4) This disease affects only the heart, yet the DNA for gene sequencing was isolated from blood. Explain how/why this worked. 2pts

The DNA in heart cells and blood cells is the same. The genes are expressed differently in blood and in muscle tissue, since the two types of cell perform different functions. We would need mitral valve cells to obtain RNA that would inform our research, but since we are only interested in DNA, we can take any cells. A reasonably large volume of cells can be taken in the form of blood; it is difficult to obtain a large, homogenous volume of cells from a human in any other way without being intrusive.

(5) In the figure below, which is from the paper, why are there two bases detected at the position where the mutation is in the patient? 2pts

The read indicates that two bases occur at this location in the patient’s genome. This tells us that the patient has DNA both with the deleterious SNP and without it. This suggests that the patient did not inherit the SNP directly from his/her parent, but rather that he/she developed the condition after birth, possibly as a result of a polymerase error.

The less likely alternative here is that there was an error in replication during PCR, and that some copies were made incorrectly at exactly this SNP. We assume that the methods used preclude this interpretation of the results.

(6) Your family has a history of mitral valve prolapse, so you wonder whether you might have this mutation. In the next few questions you will be guided through the process of testing for that possibility!

a. Design primers that will allow you to amplify the correct region of your genome for sequencing. Note that the actual position of the changed nucleotide appears to be incorrect in the paper – it lies at nt7935 in the mRNA sequence and at nt31712 in the DNA sequence. Paste a screen shot below showing the position of these relative to the mutation and other pertinent information like product size, Tm etc. 8pts

The location of the SNP in relation to the primer binding sites is shown below, with the SNP location marked with a vertical black line.

I will choose primer pair 4 because of the low self-complementarity (which decreases the likelihood of forming primer dimers), the similar melting temperatures, similar GC content, and relatively short product.

Attributes

Forward Primer

Reverse Primer

Size

60.83

60.55

GC%

55.0

52.38

Self comp

4.0

1.0

Self 3’ comp

3.0

0.0

Product length

357

Noted below is the relevant portion of the DNA sequence, with primer binding sites marked in pink and the SNP location in blue.

b. In the paper they used the FlexiGene kit to isolate DNA from blood. It is up to you how you extract your blood for testing (you do not need to include this in your answer!), but list the steps involved in isolating your DNA from blood, compare the approach described with the method we used to isolate DNA from bacteria in class and explain any differences 5pts

Steps for DNA isolation:

1. Lyse the blood cells in a lysis buffer (Buffer FG1). Pellet the nuclei and proteins by centrifugation; discard the supernatant.

2. Add a protease-buffer mixture to denature the proteins. Mix by vortexing; centrifuge. Incubate at 65C to encourage the protease to digest the proteins.

3. Precipitate out the DNA by adding isopropanol, mixing by inversion, and centrifuging to pellet the DNA. Dry the pellet.

4. Add ethanol to wash the DNA, mix by vortexing, and centrifuge to pellet the DNA again. Dry the pellet.

5. Resuspend DNA in hydration buffer.

Differences between this protocol and the one used in class to extract bacterial DNA:

– Lysis is different in the two protocols. For blood, we need to use a lysis buffer (probably harsher than the one we used for bacteria) and centrifugation; for the bacteria, we needed lysis buffer and 5 minutes of 80C incubation. The gram-negative bacterial cell walls were weaker than the cell walls of human blood.

– Protease is added to digest the proteins in this case. In class, we pelleted the proteins and discarded them. This part of the protocol may have been designed to avoid users discarding the wrong component. Only supernatants are discarded in the blood DNA extraction protocol.

– The bacteria were incubated at a higher temperature early in the protocol. The blood was incubated later. This is probably because the protease needs incubation to digest the proteins; in the case of the bacteria, we precipitated out the proteins and pelleted them, discarding the pellet.

– I’m not sure what step in the blood DNA protocol removes RNA, but it is likely that one of the proprietary buffers contains an RNase. In the bacterial protocol we treated with RNase immediately after lysis.

– Otherwise, the procedures are similar. DNA is precipitated using isopropanol and washed with ethanol. Rather than aspirating the ethanol or pouring off the isopropanol, the blood protocol asks us to invert the tube onto a paper towel or something similar, preventing the supernatant from collecting around or under the pellet.

c. As you have received excellent training in CM505 you know that you have to do quality control on your DNA before you proceed with your PCR reaction. Here are your nanodrop readings. Do some analysis and report the concentration of your DNA and any other information you can infer. 4pts

A260 11.3

A280 5.7

A230 5.3

A260/A280 = 1.98: This ratio is a little higher than the ideal 1.7-1.9 range. The DNA probably is not contaminated by proteins, since this contamination would give us a ratio of < 1.7.

A260/A230 = 2.13: This is within the ideal 2.0-2.2 range. We can conclude that the DNA sample contains no organic contaminants.

DNA concentration: We do not have information about the A320 reading, which would be used to adjust the A260 reading before calculating concentration; I will assume this reading was zero. We also don’t have information about whether we diluted the sample before putting it into the Nanodrop; I will assume there was no dilution. Given these assumptions, we multiply our A260 reading by 50 to obtain our concentration in ng/ul.

11.3 * 50 = 565 ng/ul

d. Describe the cycling conditions you would use for PCR of your DCHS1 gene fragment (i.e. your PCR program!) and draw the results you would like to see on a gel. 5pts

Temperature (C)

Duration

Cycles

Purpose

2 min

Initial denaturation

20 sec

{Denaturation …

61*

20 sec

… Annealing

1 min

… Extension} (x 30 cycles)

5 min

Final extension

forever

Holding

*Tm given by NCBI Primer Blast was ~61 for each primer. I used Promega’s Tm calculator on each primer, though, and this gave me much higher Tms, varying between 66 and 71 for various buffer-primer combinations. I decided to choose 5 degrees below the Tm given for the combination of primers and 5x Green or Colorless GoTaq Reaction Buffer, which is what we used in class.

To describe the cycle more precisely, I would need to know exactly what buffer and polymerase I was using, as well as the concentration of my primers.

Below is a diagram of what I would expect to see on a gel.

e. What else would you need to do to figure out whether you have the mutation? Give details of how you would do it! 3pts

After amplification and quality control of my PCR product, I would need to prepare a sample of the amplified DNA to send for Sanger sequencing. This would involve removing the primers, enzyme, and free nucleotides from my PCR product. Once I received the sequence, I would inspect it for the deleterious SNP.

(7) In a follow-up study, the researchers found that another family with a high incidence of Mitral Valve Prolapse has a mutation in the promoter of DCHS1. They hypothesize that this mutation may result in reduced expression of the gene. You have mitral valve interstitial cells in culture from a patient and from their normal sibling. List the steps you would need to go through to test this hypothesis using dPCR, including appropriate controls. 8pts

Summary of response

– Isolate RNA from healthy sibling’s tissue and from patient’s tissue

– QC isolated RNA using various techniques

– Set up no-RT controls for each sample

– Set up reference gene control

– Use reverse transcription to make cDNA from each sample

– Run dPCR on two RT samples, two non-RT controls, two reference gene controls, and two non-RT reference gene controls

– Analyse dPCR data

Isolating RNA

To test gene expression, we need to sample RNA rather than DNA. I would first take cells from each of the tissue cultures and extract RNA from them. This would be done using a kit such as Qiagen’s RNeasy (protocol available here). The main steps in the protocol for this kit are:

1. Homogenize tissue; lyse the cells in lysis buffer. Centrifuge to pellet the remaining tissue, and pipet the supernatant into a clean tube. Discard the pellet. (The lysis buffer contains guanidine thiocyanate, which will inactivate nucleases that could damage the product.)

2. Add ethanol to lysate; mix well using pipet. (This causes the nucleic acids to precipitate.)

3. Transfer all contents of tube to spin column. Centrifuge; discard contents of collection tube.

4. Add first membrane wash buffer; centrifuge; discard contents of collection tube. (This washes proteins and other unwanted cell components from the membrane. )

5. Add second membrane wash buffer. Wash membrane three times with this wash buffer, centrifuging after each addition of buffer and discarding contents of collection tube. (This buffer washes any remaining salts from the membrane)

6. Elute.

Next, I would clean both samples by removing DNA from the solution using DNase. This involves mixing the RNA with a buffer and DNase, incubation at 37C, phenol extraction (which removes free nucleotides and the DNase), and precipitation of RNA using ethanol.

Quality Control

For each sample, I would then use a Nanodrop to test the quality and concentration of my isolate, and check for contamination using the A260/A280 and A260/230 ratios. If I had access to a Qubit, I would compare the RNA-specific Qubit reading to the Nanodrop reading to ensure that I had successfully removed gDNA. Following these measures, I would run my sample on a nuclease-free gel (containing formaldehyde to prevent RNA folding) to check that my RNA was intact and that DNA is not present. Correctly-extracted RNA would form distinct bands at ~5000nt and ~1900nt (from the 28S rRNA and 18S rRNA, respectively), while gDNA contamination would form a heavy band at the positive end of the gel, and degraded RNA would form a smear at the negative end of the gel. Using these three measures, I would decide whether to re-extract the RNA or proceed with dPCR.

Reference Gene Control

I would choose a reference gene that will be abundantly expressed in mitral valve tissue, and would be expressed similarly in both the patient and his/her sibling. I would use this gene to set up a reference gene control to ensure the validity of my dPCR results. For each of the two patients, I would have a reference gene sample and a no-RT reference gene control.

Reverse Transcription

To be completely certain I have no gDNA contamination in my samples, I would set up a control with no reverse transcription for both the mitral valve prolapse patient’s sample and the sibling’s sample, as well as for the reference gene. Running PCR on this control should give me no product; if I later observe product in this sample, I will know to redo my experiment, since either gDNA was present, or my primers were poorly chosen and dimerized.

In order to use the RNA in dPCR, I would need to perform reverse transcription on the RNA to obtain cDNA. I would set up this reaction using RNA template, appropriate primers, dNTPs, and a buffer. The RNA would first be heat-denatured to undo any folding, then cooled to allow primers to bind. Then a reverse transcriptase would be added, which would be given time to extend, turning the RNA into cDNA.

Digital PCR

First, I would use a droplet generator to transfer my sample into oil droplets. The droplets would then be carefully pipetted into wells in a plate. The plate would be sealed and put into the dPCR machine.

Digital PCR is typically 2-step rather than 3-step, meaning the annealing and extension portions of the cycle are done in one step.

Temperature (C)

Duration

Cycles

Purpose

5 min

Initial denaturation and heat activation of enzyme

30 sec

{Denaturation …

1 min

… Annealing/Extension}

5 min

Stopping extension

5 min

Droplet sealing

forever

Holding

After obtaining the dPCR readings for my two samples and their two corresponding controls, I would first verify that the no-RT controls had no product in them. I would also check that the mitral valve prolapse patients’ sample had less product than the sibling, since this is what we would expect. If the outcomes were the same, the experiment would be repeated to ensure that no errors were made, pipet tips were changed, etc., and that this outcome was actually valid.

(8) Here are the results of your assay. How is DCHS1 affected by the mutation (you will need to do a calculation to be specific enough to get full points!) 3 pts

Sample

DCHS1 mRNA copy number

Reference gene mRNA copy number

Patient

699

6831

Normal Sibling

5398

7852

The expression of DCHS1 is drastically reduced by the mutation. The mutated promoter causes DCHS1 to be expressed in the patient at only 12.9% of the normal sibling’s expression of the same gene.

If we correct for expression of the reference gene in each person, the patient’s GOI expression can be treated as 699/6831 = 0.102, and the sibling’s expression is 5398/7852 = 0.687. Using this correction, DCHS1 is expressed in the patient at 0.102/0.687 = 14.8% of their normal sibling’s expression. This correction probably isn’t necessary, though, since the reference gene expression was similar in both patients, especially when compared to the GOI expression.

Total 46 points

Essay: Understanding Next Generation Sequencing and Collaborative Research for Disease Mutations

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: