Gene
To understand the gene, we must first start by understanding the structure of DNA [1]. The double-helix structure of the Deoxyribonucleic Acid (DNA) was first proposed by Watson and Crick in 1953[2]. DNA is a macromolecule that encodes the genetic instructions necessary for coordinating cellular life in all organisms both prokaryote and eukaryote, except some viruses[3-5]. DNA is composed of five organic elements: carbon, hydrogen, oxygen, nitrogen, and phosphorous, which are organized into highly ordered substructures called nucleotides[2]. Single DNA molecules are polymers of repetitive nucleotides each containing a five-carbon sugar residue (deoxyribose), an aromatic nitrogenous base called a nucleobase (or base), and a phosphate group. Alternating phosphate and sugar residues form the backbone of the DNA polymer, with a phosphodiester bond spanning the fifth and third carbons of deoxyribose moieties in adjacent nucleotides. There are four different nucleotides: Adenine (A)[6], Guanine (G)[7], Cytosine (C)[8], and Thymidine (T)[9] as the bases for DNA. They are linked by a β-N-glycosidic bond from their first carbon. Each base on one strand forms a bond with just one of the bases on the other strand, that is, A with G and C with T. A gene is a segment of DNA that occupies a specific place on a chromosome. It is a very important basic unit of heredity. Genes act by directing the production of RNA, which determines the synthesis of proteins that make up living matter and are the catalysts of all cellular processes. The proteins that are determined by genetic DNA result in specific physical traits, such as the shape of a plant leaf, the coloration of an animal’s coat, or the texture of a person’s hair. Different forms of genes, called alleles, determine how these traits are expressed in a given individual. Humans are thought to have approximately 35,000 genes; approximately 19,000 coding genes[10], while mice also have 35,000 genes, depending on different analyses[11].
When searching for a candidate gene, it is important to not dismiss any possible options within a genomic region, and thus it is necessary to examine all possible genes and other genetic components, such as regulatory elements. Understanding the genetic elements within the genome region of interest ensures that no candidate gene will be missed. The first imperative step is to identify every possible genetic factor within this region. Although the entire genome of humans and many animals have been officially sequenced, there are still gaps and errors in the sequence assembly. It is essential to obtain not only accurate genome information, but also every possible candidate gene within the genome sequences. This seemingly simple step is crucial in the search for the candidate gene. With current bioinformatics, we can select candidate genes by hierarchically examining every nucleotide in the gene regulatory region[12]. First, all of the coding sequences of a chromosomal region of interest are identified. Second, introns and 5’ and 3’ sequences, are determined. Third, nucleotide organization, gene order, and chromosomal structure are analyzed[2]. Very importantly, gene regulatory elements in non-coding or coding gene regions should also be carefully identified.
Gene Mutation
A gene mutation gene is a permanent alteration in the DNA sequence. The gene sequence differs from what is found in most normal people. The mutation can affect anywhere from a single nucleotide polymorphism on DNA to a large segment of a chromosome that includes multiple genes affected together [13, 14]. A gene mutation can be separated into two major categories, hereditary mutations and acquired mutations. Hereditary mutations are inherited from a parent and are present throughout a person’s life in all cells. These mutations are also called germline mutations because they are present in the parent’s germ cells. When an egg and a sperm cell unite, the resulting fertilized egg cell receives DNA from both parents. If this DNA has a mutation, the child that grows from the fertilized egg will have the mutation in each of his or her cells. Acquired mutations occur at some time during a person’s life and are present only in certain cells[15]. These changes can be caused by environmental factors such as radiation, or can occur if an error is made as DNA copies itself during cell division. Acquired mutations in somatic cells cannot be passed to the next generation. Genetic changes that are described as new or de novo mutations can be either hereditary or acquired. In some cases, the mutation occurs in a person’s egg or sperm cell but is not present in any of the person’s other cells. In other cases, the mutation occurs in the fertilized egg shortly after the egg and sperm cells unite. As the fertilized egg divides, each resulting cell in the growing embryo will have the mutation[16]. De novo mutations may explain genetic disorders in which an affected child has a mutation in every cell in the body but the parents do not, and there is no family history of the disorder. Most disease-causing gene mutations are uncommon in the general population. However, other genetic changes occur more frequently. Genetic alterations that occur in more than one percent of the population are called polymorphisms [17, 18]. They are common enough to be considered a normal variation in the DNA. Polymorphisms are responsible for many of the normal differences between people such as eye color, hair color, and blood type. Although many polymorphisms have no negative effects on a person’s health, some of these variations may influence the risk of developing certain disorders [18].
Genetic Disorders
Genes are the cornerstone of heredity; they pass from parents to their children and hold the DNA and instructions for producing proteins [19]. Proteins do most of the work in cells. They transfer molecules from one place to another, build structures, break down toxins, and perform many other maintenance tasks[20, 21]. Mutations sometimes occur in a gene or genes and this can affect the resulting protein. A mutation changes the make-up of a gene, and this can in turn change the protein so that it does not function correctly, is overexpressed, or is completely missing[22]. This may lead to a medical condition called a genetic disease. Genetic mutations can be inherited from one or both parents[23].
There are three types of genetic diseases:
- Single-gene disease: A disease caused by a mutation, deletion and/or insertion of nucleotides in one gene. The changes in DNA affect the product of the gene, usually a protein. The effect of the changes in the gene can be an altered or missing product. The characteristics of each disease are related to the specific gene that is affected, and the role its product plays in the body[24]. Tight skin disease is an example.
- A chromosomal abnormality: There is deletion or alteration of a chromosome (or part of a chromosome). Chromosomes are the structures that hold our genes. Down syndrome is a chromosomal disorder[25].
- Multifactorial diseases: These diseases are the result of changes in two or more genes, as well as lifestyle and environmental factors. The inheritance of these diseases is not clear-cut. They can be inherited or sporadic[26]. Spontaneous arthritis is an example.
Candidate Gene
The candidate mutation gene is a gene that is believed to harbor alleles causing a mutation disorder, or contributing to a complex phenotype, based on an a priori understanding of that gene’s biochemical function or mutant phenotypes associated with that gene. For most chronic animal disease models, it appears the causative mutation is not only in well-known candidate genes. Understanding the genetic elements within the genome region of interest ensures that no candidate gene will be missed[27]. An imperative first step is to identify every possible genetic factor within this region. It is essential to engage not only accurate genome information, but also every possible candidate gene within the genome sequences. Unfortunately, there are still gaps and errors in the sequence assembly. Candidate gene studies depend on markers based on a priori hypothesis regarding the phenotypic effect of one selected gene or multiple genes. The advantage of the gene approach is that highly relevant genes can be prioritized and tested first. Researchers use various methods to detect and identify the genes that are most likely to treat diseases. One method is the pathway gene method, which involves several candidate genes that perform related functions in common. Pathways are genes associated with drug metabolism (pharmacokinetics) and drug reactions (pharmacodynamics)[28, 29]. The main advantage of the candidate pathway genes strategy is the ability to identify the effect of gene aggregation on the phenotype with less influence of the individual gene on the downstream phenotype. However, its success largely depends on the hypothesis used to select the gene to be studied. A risk of this method is that an unexpected gene that plays an important role in pharmacokinetics and pharmacodynamics may not be studied [28, 29]. In general, experience with candidate gene studies across all traits has been disappointing. Usually, the disease is related to many genes and it is difficult to judge whether they are mutation genes or pathway genes [13, 30]. Candidate gene studies have failed to determine the genetic basis of common characteristics suggesting that this method has some limitations. First, the selection of candidate genes may not be suitable. Second, the causative gene may be located at the point of action of the selected candidate or upstream of the downstream signaling pathway. Third, the selected SNP may provide incomplete coverage of all mutations in the gene under study. Fourth, most of the studies are underpowered and face population stratification and phenotypic and site heterogeneity issues. Lastly, candidate gene studies rely on previous hypotheses about disease mechanisms, which rule out the discovery of genetic variants in previously unknown pathways.
Gene Expression
Genes encode proteins and proteins determine cell function. Therefore, thousands of genes expressed in specific cells determine what the cell can do[31]. In addition, every step in the information flow of DNA to RNA to protein provides the cell with a potential control point for self-regulating its functions by adjusting the amount and type of proteins it manufactures[32]. At any time, the amount of a particular protein in a cell reflects the balance between the synthesis and degradation biochemical pathways of that protein. In the case of synthesis, recall that protein production begins with transcription (DNA to RNA) and continues with translation of RNA to protein [33-35]. Therefore, controlling these processes determines what proteins are present in the cell and in what amounts [36]. In addition, the way in which a cell processes its RNA transcripts and newly made proteins also greatly influences protein levels. Depending on the function of the cell, it uses different genes to make proteins by copying the code of the gene into messenger RNA (mRNA) in a procedure called transcription [32, 35]. A transient transfection of specific genes into a cell line can be used to test the function of these particular genes. Transfection means an introduction of DNA into the cell line, where these specific genes are expressed only a short period of time[37].
Gene Expression Regulation
The amounts and types of mRNA molecules in each cell type can reflect the function of this cell [34]. In fact, thousands of transcripts are produced every second in each cell. The initiation of transcription, the main control point for gene expression, is usually at the very beginning of the protein production process. RNA transcription is an effective control point because many proteins can be made from a single mRNA molecule[34, 38, 39]. Transcript processing provides an additional level of regulation for eukaryotes, and the presence of a nucleus makes this possible. In prokaryotes, translation of transcripts begins before transcription is completed; ribosomes bind to mRNA as it is made [4, 5]. However, in eukaryotes, transcripts are modified in the nucleus before being exported to the cytoplasm for translation. Eukaryotic transcripts are more complex than prokaryotic transcripts. For example, a primary transcript synthesized by RNA polymerase contains a sequence that does not become part of the mature RNA[40]. These sequences are called introns[41]. The introns can be removed before the mature mRNA leaves the cell nucleus. The remaining regions of the transcript contain the protein coding region and are called exons. These regions are spliced together to produce the mature mRNA. Eukaryotic transcripts can be modified at the ends and this affects their stability and translation. Cells must also respond quickly to changing environmental conditions. In these situations, the regulatory control point may come well after transcription. In the case of degradation, cells can rapidly regulate their protein levels by enzymatic breakdown of RNA transcripts and existing protein molecules[41]. Both actions can reduce the amount of certain proteins. In general, this breakdown is associated with specific events in the cell. The eukaryotic cell cycle provides a good example of how protein breakdown correlates with cellular events. The cycle is divided into several phases, with each being separated by different cyclins that act as key regulators of each phase. Before a cell progresses from one stage of the cell cycle to the next, the cell must degrade the cyclin that controls that stage of the cycle.
Promoter and Enhancer
The promoter sequence is a DNA sequence that defines the start of RNA polymerase transcription[33, 35]. The promoter sequence is typically located directly upstream or at the 5′ end of the transcription initiation site. RNA polymerase and the necessary transcription factors bind to the promoter sequence and initiate transcription. The promoter sequence defines the direction of transcription and indicates which DNA strand will be transcribed; this strand is called the sense strand. Many eukaryotic genes have a conserved promoter sequence, called the TATA box, 25 to 35 base pairs upstream of the transcription start site[42-44]. Transcription factors bind to the TATA box and initiate the formation of an RNA polymerase transcription complex that promotes transcription. Promoters can also have enhancers that act on them in a cis-regulatory fashion to increase or decrease transcription. Enhancers can be located up to 1,000,000 base pairs away, and are 50-1500 base pairs in size. They interact with promoters through DNA looping[45]. Through their interactions with promoters, enhancers can regulate the spatial and temporal patterns of gene expression [46, 47].
Microarray
Microarray technology is a great exploratory tool[32]. It has been widely used in scientific research and clinical support. Microarray technology can help us look at the gene expression profile. It can be used to measure thousands of gene expression values simultaneously[48]. DNA microarrays are microscope slides printed with thousands of small dots at defined locations, each dot containing a known DNA sequence or gene. These slides are usually called gene chips or DNA chips. The DNA molecule attached to each slide serves as a probe for detecting gene expression, which is also referred to as a transcriptome or a set of messenger RNA (mRNA) transcripts expressed from a set of genes[32, 48-50]. For microarray analysis, mRNA molecules are typically collected from experimental and reference samples. For example, the reference sample is collected from a healthy individual, and the experimental sample is collected from a diseased individual, such as someone with cancer. The two mRNA samples are then converted to complementary DNA (cDNA) and each sample is labeled with fluorescent probes containing different colors. For example, the experimental cDNA sample can be labeled with a red fluorescent dye, while the reference cDNA can be labeled with a green fluorescent dye. The two samples are then mixed together and combined with the microarray slide. The process of binding a cDNA molecule to a DNA probe on a slide is called hybridization[32, 48]. After hybridization, the microarray is scanned to measure the expression of each gene printed on the slide. If the expression of a specific gene in the experimental sample is higher than the reference sample, the corresponding spot on the microarray is red. In contrast, if the expression in the experimental sample is lower than the expression in the reference sample, the spots are green[50]. Finally, if there is the same expression in both samples, the spots will be displayed in yellow. The data collected by the microarray can be used to create a gene expression profile that shows simultaneous changes in the expression of many genes in response to specific conditions or treatments. Microarray data only shows the difference in expression, not the fold change; thus, it cannot be looked at as accurate because it may seem like the difference is low, but there may be a large fold change[48, 50].
GeneNetwork
GeneNetwork is a set of connected data sets and tools for the study of complex networks of genes, molecules, and higher order gene functions and phenotypes. GeneNetwork combines sequence data (SNPs) and extensive transcriptome data (expressing genetic or eQTL datasets)[51, 52]. The quantitative trait locus (QTL) mapping module is optimized for fast online analysis of traits that are jointly controlled by genetic variation and environmental factors [53]. GeneNetwork can be used to study humans, mice (BXD, AXB, LXS, etc.), rats (HXB), fruit flies, etc. Most of these population datasets are associated with extensive genetic maps (genotypes) and can be used to locate genetic modifiers that cause expression and phenotypic differences, including disease susceptibility. GeneNetwork can be used for multiple mappings, including interval mapping, simple interval mapping, composite interval mapping, and paired scans. Interval mapping is the statistical testing of the association of trait values with the genotype of the marker locus through the genome. An important association is interpreted to mean the presence of a QTL associated with the tag that shows the association. Simple interval mapping can assess the association between the trait values at multiple analysis points between each pair of adjacent marker loci and the expected genotype of the hypothetical QTL (target QTL). The analysis point that produces the most significant correlation can be used as the location of the putative QTL[51-54]. The Bootstrap method can be performed to estimate the confidence interval at the QTL location. Composite interval mapping is like simple interval mapping in that it evaluates the likelihood of a target QTL at multiple analysis points across each focused interval. However, at each point it also includes the effect of analyzing one or more markers at other locations in the genome. These markers, also referred to as background markers, have previously been shown to be associated with traits, so each may be close to another QTL (background QTL)[54]. Paired scans evaluate all pairs of markers in the dual-track model, including the main effects and interactions at each locus. These allow for multiple QTL models that find complex phenotypes. For all mapping methods, a displacement test can also be selected to determine the empirical significance threshold [51, 54].
Genetic Correlation
Genetic correlations are a measure of the extent to which two traits are correlated with the same biological substrates in genetically similar individuals. Genetic correlations can analyze gene expressions and a trait within different strains of interest[55-57]. Genetic correlation is related only to genetic causes, such as the ease with which various traits can be shown when breeding, or what will happen if only one trait is selected, and other traits that are genetically related to it are shown. Genetically related assays can be applied to the same methods used to determine heritability[56-58]. There are multiple types of genetic correlations. One type is correlation Matrix/PCA. With this, the correlation matrix can compare the values of up to 100 features in a feature set. The correlation matrices can be exported and new composite phenotypes can be generated by using the principal component derivative of the feature set. QTL heatmaps are another one that can be used to analyze up to 100 traits simultaneously. These traits can be sorted by their similarity (hierarchical clustering) or their order in the genome[55, 56]. QTL Heatmaps easily identify the common and unique genetic determinants of large phenotypes. Comparing dependencies is another type of genetic correlation that finds shared genetic correlations in a set of features by associating them with all records from any database. Lastly, network diagrams examine the network of associations between large phenotypes. Most graphics are interactive and allow users to define interesting sets of features that can be temporarily stored for further analysis in GeneNetwork [51, 55-58].
Sequencing
Sequencing is a very important and useful tool to detect genetic mutations. Emerging and future sequencing technologies are making the digitization of new genomes ever more accessible to researchers[59, 60]. Sequencing is a method that can determine the sequence of a DNA molecule. This method was developed by Frederick Sanger in 1975 and he later won the Nobel Prize for it [61]. In sequencing, the DNA to be sequenced serves as a template. DNA primers acting as a starting point are designed to synthesize a fragment of DNA, then the DNA fragment is sequenced[60]. Four separate DNA synthesis reactions are performed. The four reactions include normal A, G, C, and T deoxynucleotide triphosphates (dNTPs), and each contains a low level of one of the four dideoxynucleotide triphosphates (ddNTPs): ddATP, ddGTP, ddCTP or ddTTP[62]. When the ddNTPs are incorporated into the nucleotide chain, the synthesis is terminated. This is because the ddNTP molecule lacks the 3′ hydroxyl needed to form a linkage with the next nucleotide in the chain. Since ddNTPs are randomly introduced, the synthesis terminates at many different positions in each reaction[63]. After the synthesis, the products of the A, G, C and T reactions are separately loaded into four lanes of a single gel and separated, which is a method of separating DNA fragments according to their size difference. The bands were detected and the sequence was read from the bottom to the top of the gel, including the bands in all four lanes[63]. For example, if the lowest band across all four lanes appears in the A reaction lane, then the first nucleotide in the sequence is A. Then if the next band from bottom to top appears in the T lane, the second nucleotide in the sequence is T, and so on[59, 63]. Due to the use of dideoxynucleotides in the reactions, Sanger sequencing is also referred to as “dideoxy” sequencing.
Immune System
The immune system is the body’s defense against infectious organisms and other intruders that cause disease. Without an immune system, our bodies would be open to attack from bacteria, viruses, parasites, and more[64, 65]. The immune system is the series of steps for immune response. The immune system is made up of a network of cells, tissues, proteins, and organs that work together from the nervous system to protect the body. It is the most complex system within the human systems. In the immune system, white blood cells or leukocytes are one of the most important cells[65, 66]. Leukocytes are produced or stored in many locations in the body, including the thymus, spleen, and bone marrow. Leukocytes circulate between the organs and nodes via lymphatic vessels and blood vessels in the body[64]. In this way, the immune system works in a coordinated manner to find the cause of problems. There are two basic types of leukocytes: phagocyte and lymphocyte. The phagocyte cell chews up the invading organisms[66]. The lymphocyte cell allows the body to remember and recognize previous invaders and helps the body destroy them. There are two kinds of lymphocytes, B lymphocytes and T lymphocytes. Lymphocytes start in the bone marrow and mature to B cells, some of them leave for the thymus gland and mature to T cells. B cells and T cells have separate functions. B cells are more like the body’s military intelligence system, seeking out their targets and sending defenses to lock onto them. T cells are more like the soldiers, destroying the invaders that the intelligence system has identified[65-67]. Macrophages are also an integral part of the immune system. These are large white blood cells that work to remove debris and invaders from the body. Macrophages come from monocytes, which are white blood cells made by stem cells in our bone marrow. The monocytes move through the blood stream and mature into macrophages once they leave the blood. Macrophages patrol our body and engulf unwanted particles[68-70].
Gene Therapy
Gene therapy is a technique to treat and/or prevent disease. It involves the insertion of one or more corrective genes designed in the laboratory. Currently, many labs want to use the genetic material of a patient’s cells to treat a genetic disorder[71]. The expression of a gene or genes can alter the DNA or RNA transcripts used to synthesize proteins. Through the unique function of proteins, this can be used to correct the disease. Gene therapy is still at an experimental stage, so its use is not yet universal[72, 73]. In the future, this technique can help doctors treat a disease by inserting a gene or genes into a patient instead of using chemical drugs or surgery. There are several approaches to gene therapy that have been tested[71, 73]. 1) Replacing a mutated gene that causes disease with a healthy copy of the gene. 2) Inactivating, or “knocking out,” a mutated gene that is functioning improperly. 3) Introducing a new gene into the body to help fight a disease. Although gene therapy is a promising treatment option for many diseases, such as inherited disorders, cancer, and viral infections, the technique is still risky. It is still being studied to make sure that it will be safe and effective [73]. For example, gene therapy using adenoviral vectors is being tested. This is when a new gene is injected into an adenoviral vector that is used to introduce the modified DNA into human cells. There are still some doubts about the viral vectors[74].
Animal Model
An animal model usually refers to an animal with a disease or a phenotype that is similar to a human condition. Animal models are widely used in studying genetic mapping and transgenes. There are many reasons for using animal models in genetic studies. The basic reason for using animal models in genetic research is to obtain data that cannot be obtained from humans[75]. In a broad sense, we can apply many procedures that we will not be able to use on humans. An example is gene therapy; genetic material can be put into animals while it is not yet possible to test on humans. Genetic studies need heritable materials such as DNA and RNA. We can easily obtain tissues such as the liver, heart, and brain from animals to extract RNA while it is difficult to get them from humans. Several benefits to using the animal model include: a short generation time, a short life span, large population size, easy manipulation of environmental conditions, diseases or conditions can be created that are similar to the human’s, and manipulation of genetic materials is possible. However, there is also a disadvantage of using animal models. Depending on different situations, sometimes the differences between humans and animals are so large that the results from animal models are not applicable to humans. Selection of a species of animal models should not be based on availability, familiarity, or cost. The selection should be based on the biology, the suitability for the disease or trait, and the impact of the study on humans. There are many animal models that have been well developed. Mammals have been widely used because of their obvious similarities in both structures and function to that of humans[76].
The mouse is an excellent model system for understanding the development of mammals, especially human development. Mice and humans are very similar in development and physiology, and direct comparisons between the two systems can be made. There is a lot of knowledge about human and mouse genomes. In most regions of the genome, the order of genes, and therefore the linkage of genes, is conserved between mice and humans. These homologous or heterologous regions drive the advancement of human and mouse genome projects because information about one system can be directly related to the other system. In addition, since the human and mouse genomes have been sequenced to a large extent, direct sequence comparisons are now possible. Moreover, the mouse genome has been extensively characterized and its whole genome sequence has been completed, allowing for the execution of gene-targeted “knockouts” of every gene and transgenic over-expression in mice. We will use the mouse model to discuss the methodology and application of animal models in QTL mapping [76].
2018-6-21-1529609172