Introduction
Since the discovery of genes, scientists have always been aware that they are probably more complex than they thought (Raf 1996, Davidson 2001). How the nucleus was responsible for regulating gene expression was always of great scientific interest (Cremer & Cremer, 2001). Chromosomes were found to be organized in discrete territories. Genes that are found in these territories appear to influence nuclear functions such as transcription and splicing (Cremer & Cremer, 2001). Using in situ hybridization, scientists were able to describe how these chromosome territories are composed. Distinct chromosome arm domains and chromosome band domains form the chromosomal structure (Dietzel et al., 1998). Using these structures, scientists can pin point the positions of active and inactive genes (Volpi et. al, 2000). All these discoveries were made 30-40 years ago, but they are still relevant to us now.
Since then, Sun et al. (2000) discovered that smaller chromosomes are mostly found in the center of the nucleus, while larger chromosomes are found in the periphery. Another piece of evidence for the position of the chromosome in the nucleus was discovered by Croft et al. (1999). The discovery was based on the number of the genes that were comprised in the chromosome. They looked at chromosomes 18 and 19 each having roughly the same size. The difference between them was that chromosome 18 territories (gene-poor) were found at the periphery of the nucleus, while chromosome 19 territories (gene-rich) were found in the interior of the nucleus (Croft et al., 1999).
Another question that was then asked was how to better understand diseases using this new structural information that scientists just learned. The answer to that question was to use Genome Wide Association Studies (GWAS) using genotyping technologies. In order to gather information about different types of diseases scientists looked into single nucleotide polymorphisms (SNPs) using microarrays techniques (Tak & Farnham, 2015). SNPs are variations in nucleotide bases in humans. 59% of the SNPs were observed in the whole population: Europeans, Asian, African American and Yoruban (Gabriel et al., 2002). It was observed that there were dramatic changes within each population, depending on the origin of the individual (Cavalli-Sforza et al., 1994). These SNPs are markers that can be linked to the disease rather than actually causing the disease. Until February 2015 there were ~15.000 SNPs identified for different diseases using GWAS and the number increases rapidly because of the new sequencing technologies science acquires (Tak & Farnham, 2015). Most SNPs that were discovered were found in non-coding regions within the genome. The question that was then asked was how SNPs changing in a non-coding region can influence the appearance of a disease. One possible answer is that the change in SNPs also alter the gene expression levels rather than the change do not affect the function of the protein (Tak & Farnham, 2015). Further studies revealed that SNPs are enriched in enhancers (Ernst et al., 2011). Some SNPs were at a more distal position but results showed that there were 68 unique enhancers that were correlated with SNPs (Yao et al., 2014). In my dissertation I will analyze each question that scientists asked and I will answer those questions using relevant examples. First I will start with the techniques used to understand how chromosomal organization works and how is linked to gene regulation, and then I will present how treatment for a special type of cancer can be achieved by understanding higher order chromosomes.
Hi-C technique and next generation: C-HiC
To study chromosomal interactions, scientists developed a technique called Hi-C for the purpose of identifying all the interactions in the genome in one experiment (Lieberman-Aiden, 2009). The Hi-C method looks at the genomes in a 3D manner and looks at proximity-based ligations with parallel sequencing. Hi-C is a method that detects chromatin interactions in the nucleus (Belton et al., 2012). The properties and structures of chromatin are studied using Hi-C (Belton et al., 2012).
Using this technique, scientists were able to map chromosomes at a resolution of 1 Mb (Lieberman-Aiden, 2009). Chromosomes have a 3D structure, enabling them to compartmentalize the nucleus and to bring elements in spatial proximity (Cremer & Cremer, 2001). Chromatin organization is not yet understood (except for nucleosomes), but it is believed that chromatin structure and how the chromosomes fold will help them to understand and make connections between the gene activity and how the cell functions (Lieberman-Aiden, 2009). Other techniques for studying chromosomes are available like: Chromosome Conformation Capture (3C) and 5C (enhanced 3C by inverse PCR) (Dostie et al, 2006). The problem with these techniques is that they target specific loci rather than a whole genome analysis (Dekker et al, 2002).
In early experiments, Mifsud et al. (2015) only detected a few long-range interactions between genes and promoters that had a significant outcome because the depth of the Hi-C sequencing was not high enough. Using Hi-C, identification of chromatin interactions through an entire genome was possible (Lieberman-Aiden, 2009). The information gathered from Mifsud et al. (2015) experiment was spread throughout the whole genome rather to a specific locus. In order to overcome this impediment scientists came up with capture Hi-C (CHi-C) that can focus on a specific locus in the genome using hybridization based capture combined with Hi-C technique (Mifsud et al., 2015, Jäger et al., 2015). CHi-C technique allows scientists to sequence any loci and create high-resolution maps of the chromosome (Mifsud et al., 2015). Results showed that CHi-C had 67-fold and 45-fold significant promoter interactions that were significant proving that CHi-C technique was suitable and more efficient for this experiment (Mifsud et al., 2015). Many SNPs that are associated with diseases interact with proximal promoters through long-range regulation (Jäger et al., 2015). This is the first piece of evidence that suggests that higher order chromosomal organization regulates gene activity and that these techniques are of high importance because they allow scientists to study chromosomes in depth.
In a study by Mifsud et al. (2015), experiments were conducted to explain how different conformations of the chromosomes regulate the entire genome. They used the Hi-C technique on 22,000 promoters in 2 human blood cells: GM12878 (cells from the human Epstein-Barr virus) and CD34+ cells in order to study the interactions that occur between promoters and distal loci (Mifsud et al., 2015). The experiment showed that the density between fragments in a cis conformation was higher rather than fragments that had a trans conformation also seen in Jäger et al., 2015 paper. The highest density was seen in fragments that were only 20 kb apart. This study showed that the genome was separated into distinct fragments – or domains and those fragments were intraconnected topologically associated domains (TADs) (Mifsud et al., 2015). Fragments that are in close linear proximity form contacts through DNA looping.
Promoter interactions and how they work to influence gene activity
Promoters are regions where protein binding occurs more easily (Riethoven 2010). Promoters interact with enhancers in order to regulate gene activity. Enhancers are specific DNA motifs and they can acquire transcription factors. Enhancers can be found on the same chromosome as the gene or it can be found on another chromosome. Transcription factors are responsible for upregulating the interactions between enhancers and the core of promoters (Riethoven 2010). It is not clear yet how enhancers function, but it is known that several transcription factors in a cis conformation need to bind to it to activate it (Riethoven, 2010). Looping in chromatin helps bring enhancers proximity with core promoters of a specific gene (Riethoven, 2010). It was then further studied if promoters interact with other promoters. Mifsud et al. (2015) showed that interactions occurred in between baited promoters and non-baited fragments in 85% and 90% of the cases in a cis conformation. Both inactive and active genes presented promoter interactions. Genes that were active had longer interactions than inactive ones. The interactions were symmetrically positioned in relation to the promoter as shown in figure 1. Studies revealed that there was an asymmetry between proximal promoters that activated the gene. The promoter interacted more with the gene body rather than the upstream gene sequence (Mifsud et al., 2015). This was expected to happen when active promoters were involved. This discovery also could predict that activating regulatory elements are more likely to be found in high concentrations in an intronic region (Tan Wong et al., 2012). Tan Wong et al., 2012 explained that genomes are transcribed by forming messenger RNAs (mRNAs) and noncoding RNA (ncRNAs). Bidirectional promoters usually enforce the process of transcribing RNA. The gene loop conformation associates both with the promoter and the terminator. Tan Wong et al., 2012 showed that gene looping enforces transcriptional directionality of bidirectional promoters. Using CHi-C Mifsud et al., (2015) was able to show that 2/3 of gene regulation is possible because of the interaction of the gene with the nearest promoter. This leaves the rest of the 1/3 part of the genes that seemed to be activated by promoters that were not in close proximity to the gene sequence. From these discoveries Mifsud et al., (2015) concluded that (1) activation of the gene is not related to the close vicinity of the promoter and (2) that gene-promoter regulation can extend to distances that are higher that hundreds of kilobases. From this, one can conclude that gene regulation is more complex than thought in the past. The gene does not have a linear structure and it is not necessary to be activated by the promoter that lays upstream the gene. Next, Mifsud et al., (2015) assessed if binding of the protein CTCF (has insulator activities) impedes long-range promoter interactions but it appears it was not the case. Insulators can block enhancers hence gene activation. Insulators are also able to disrupt the enhancer-promoter interaction, but only if the insulator is situated in the middle (Riethoven 2010). Their experiment revealed that the CTCF protein skipped at least one binding site and that it would interact with a far promoter and that insulator elements have no effect on promoters that interact with distant genes. Another question that was asked was how promoters regulate gene activation by interacting with other elements that have transcriptional abilities? Mifsud et al., (2015) studies revealed that the active promoters regions “were also enriched for DNase I-hypersensitive and FAIRE (formaldehyde-assisted isolation of regulatory elements) sites”. RNA polymerase II and insulator factors were also found in higher concentrations than in inactive regions (Thurman et al, 2012). Enhancer-associated histone marks -trimethylation of histone H3 at lysine 9 (H3K9me3)- were enriched marking transcriptional elongation. Fragments that interacted with weak promoters show high concentrations of histones marks that are responsible for repressing the gene. These studies reveal that gene regulation is tightly controlled by histone marks and that there is cell type specificity. In agreement with Pauler’s et al. (2009) work, the team of scientists concluded that gene silencing is mediated due to different changes in histone marks. These modifications could occur in a subnuclear region. Gene activity or inactivity was another thing that needed to be considered. Results show that rather than having universal promoters, there are different sets of association. When looking at the active MYC promoter there were different patterns of long-range interaction showing that each active gene has its own long-range interaction, which could be responsible for the large amount of transcription processes.
FISH assay, a different way of looking at chromosomes
Another way to look at chromosomal structure is by using fluorescent in situ hybridization assay (FISH). Over the past 30-40 years, scientists used FISH analysis to look at different DNA or RNA sequences in only one cell (Cremer & Cremer, 2001). Schoenfelder et al. (2010) studied DNA or nascent mRNA by using FISH assays and it revealed that there is co-localization between FISH loci but only in a small part of the population. This new piece of information made scientist believe that a particular part of the population may have enriched chromosomal interactions. Chromosomal interactions are important because the DNA form proximal and distal loops that organize the architecture of the nucleus, hence regulating gene activity (Miele and Dekker 2008). When cells undergo meiosis they undergo chromosomal arrangement. Because of that, scientists can assume that each chromosome is unique in the way its arrangement is concerned (Orlova et al., 2012). Not only looping is responsible for controlling gene activity. Looping is also responsible for bringing distal genes close to each other forming “multigene complexes” (Schoenfelder et al., 2010). The study by Li et al. (2012) revealed that in 95% of metazoans, multigene complexes are the main way of transcription. When they furthered investigated the GREB1 multigene complex, they discovered that the complex was activated by the estrogen receptor alpha. Upon disruption of the estrogen factor activity, all four interactions of genes activity were disrupted. Even though these multigene complexes are not present in all population, Li et al. (2012) suggests that they are of a high importance for gene regulation. In order to study these multigene complexes, Papantonis et al. (2012) stimulated genes with Tumour Necrosis Factor alpha (TNFα) to form the multigene complex. Unfortunately both FISH and 3C assay failed to reveal how gene looping regulated gene activity so, Fanucchi et al. (2013) came up with a new single cell strategy. This strategy involves TALE nucleases (TALENs), which will be responsible for disturbing the gene loop formation, hence chromosomal contact would also be affected. As mentioned before, GREB1 is a multigene complex that is regulated by the estrogen receptor alpha (ERα). Upon targeting the ERα with synthetic RNA (siRNA) –which will lead to degradation of the mRNA- Fanucchi et al. (2013) discovered that the GREB1 translation was disrupted. To look at how DNA looping regulates gene activity, a single cell assay was developed. This single cell assay is showed in figure 2 method also used by Raj et al. 2008. The results show that looping occurred but not in the right conformation (Fanucchi et al., 2013). This is another piece of evidence that suggests looping needs to occur correctly for the gene to function.
Diseases relate to promoters
Dysfunctional promoters and genes are responsible for causing different disease. Scientists found out that some SNPs –provided by GWAS- were in more proximity to promoter interaction fragments. These SNPs were related to diseases that occurred due to heamatological causes, diabetes and autoimmune diseases (Mifsud et al., 2015). Studies revealed that active and inactive promoters may play a role in regulating different SNPs causing the diseases either in a positive or a negative way. Further experiments on CD34+ cells, revealed a cluster of SNPs that interacted with a nuclear receptor cofactor gene that was situated at 380 kb away from the SNPs, rather than the proximal anti-inflammatory USP25 gene (Mifsud et al., 2015). Studies show that BCL6 promoter interacted with a specific region that was located 1.2 Mb away. This particular BCL6 promoter had high concentrations of repressive histone marks that would lead to silencing of the gene. These findings could help regulate cancer genes by enabling scientists to activate transcription factors that will repress cancerous cells (Mifsud et al., 2015).
Cancer, promoters and gene silencing
NUT midline carcinoma (NMC) is one of the most aggressive epithelial cancers (skin cancer) and it occurs frequently in children (Bauer et al. 2012). It is caused because of the BRD4-NUT fusion oncoprotein that will make differentiation of epithelial cells impossible leading to excessive growth of NMC cells (Alekseyenko et al, 2015). Cancers normally occur because of deregulations not allowing cells to die causing them to proliferate indefinitely (Hanahan & Weinberg, 2011). Another case of cancers is the defects in chromatin. In NMC the only cause that was yet identified was the fusion of some onconegenes creating “megadomains”. Megadomains were first described by Alekseyenko et al. (2015), and their name suggests how enormous they are: 100kb – 2Mb. The oncogenes that are fused are: NUT, BRD3 (bromodomain containing protein 3), BRD4 and NSD3 cause NMC aggressiveness. As mentioned before, these genes will fuse into “megadomains” that will lead to tumorigenesis. Megadomians seem to appear from already existing enhancers that broaden progressively but they stop when encounter topologically associating domain (TAD boundaries).
In order to test whether the theory by Alekseyenko et al. (2015) about megadomains was correct, they knocked down the BRD4-NUT megadomain. The results showed that the NMC cell would continue their cycle and would proliferate and then differentiate having their function restored. This means that the BRD4-NUT megadomain play an important role in blocking the differentiation process that will lead to cancer (French et al. 2008). It is important for this paper because this is a piece of evidence that demonstrates that organizations of higher order chromosomes are responsible for gene regulation. If scientists know what causes the NMC cancer, they can further concentrate on a treatment to cure it. Indeed, Filippakopoulous et al. (2010) managed to find such a strategy using BET inhibitors (lysine mimetic molecules). When studied both in vitro and in vivo, BET inhibitors seem to re-induce the function of the NMC cells, leading to their differentiation. Interestingly, the nuclear foci (hyperacetylated) disappeared from the nucleus (Filippakopoulous et al., 2010). Now, BET inhibitors are in trials and numerous studies show that BET inhibitors could cause other cancers as well. Delmore (2011) and Asangani (2014) show that the BRD4 nonmutant is essential for normal cell growth. Grayson (2014) further studied the function of the BRD4 gene and it appears to be linked with genes that are responsible for cell identity and that it encodes information for a critical oncogene: MYC. Scientists also discovered 3 long ncRNAs that map next to MYC up-regulating it. These 3 lncRNAs are: PVT1, CCAT1 and CASC19 (Kim et al 2014). PVT1 plays a post-transcriptional role in MYC regulation (Alekseyenko, 2015). This finding could then further lead to treatment that would regulate MYC gene. Another gene involved in regulating cancer is MED24. Its role is in transcription regulation during embryonic development (Ito et al, 2002). The postembryonic role is to co-activate gene expression (Gu et al., 1999). MED24 appears to be an oncogene after its knock down showed that cells continue having normal functions.
Lastly, the epithelial developmental gene that Alekseyenko et al. (2015) looked into was TP63 that maps to a specific region in megadomains, meaning that this gene is expressed in majority of NMC cells. TP63 is sensible to JQ1 treatment. Results showed that knocking down TP63 would slow down cancer but interestingly would not induce cell differentiation. Scientists concluded that TP63 is a megadomain-associated gene that is required for cancerous cells viability but plays no role in differentiation. This shows that BRD4-NUT domains are regulated not only by the MYC gene but also positively regulated by MED24 and p63. All these genes impede the differential process to occur and allow cancerous cells to grow indefinitely.
Megadomains, structure, definition and how they work
Megadomains seem to regulate NMC. What Alekseyenko’s et al. (2015) team aimed to find out (1) were the genomic loci associated with the BRD4-NUT megadomain and (2) what are the effects of the megadomain upon transcription. The team of scientists found out that in NMC cancer, megadomains are “associated with the induction of BET inhibitor sensitive transcription of underlying DNA” (Alekseyenko et al., 2015). BET inhibitors dysregulation will then further lead to dysregulations in other genes MYC, p63, MED24 and other potential targets. When megadomains were inserted in naïve cells, megadomains appear to be at seed sites for enhancers rather than superenhancers. This shows that megadomains appear from a seed that is an enhancer or a regulatory region that was already present in the cell. Figure 3 shows the size differentiation between enhancers, superenhancers and megadomains and how acetylated are each of those.
Furthermore, megadomains appear not to interact with the most active enhancers. These megadomains tend to spread in the gene until they reach topologically associated domains (TADs). These TADs form the boundaries of the chromosome, resulting in their precise delimitation from one to another (Dixon et al., 2012). TADs were studied using Hi-C sequence and were defined as chromatin compartments that have positions that are conserved in every cell (Dixon et al., 2012). The conclusion that Alekseyenko (2015) gathered from looking how TADs regulate megadomains is that mysregulation of chromatin appears in the TAD boundary delimitated domain.
Megadomains appear to be classified in three categories based in the domains. Class 1 of megadomains have only one seed enhancer, class 2 is a more complex one that has multiple seed enhancers and class 3 contain enhancer-like regions that arise plus pre-existing enhancers. These three classes only account for 46.5% of the megadomains. The rest of the megadomains do not have a pattern or a number of seeds. As mentioned before, most of megadomains reach TAD boundaries and stop. Alekseyenko’s et al. (2015) team found out that some megadomains have no defined boundaries because they can overcome TADs. This piece of information suggests that megadomains have a good balance of acetylation and deacetylation. This is a specific characteristic of BRD4-NUT that helps promote its oncogenic properties. Also, studies showed that the BRD4-NUT oncoproteins could recruit p300 (endogenous potential) that will further activate the histone acetyltransferase activity (Reynoird et al, 2010). Based on the information that scientists acquired on p300, they were able to come up with two different models of the BRD4-NUT proteins in order to study how those proteins are responsible for causing cancer. One model would impede prodifferentiation of genes by making p300 available for the genes. The other model only directs p300 to genes that are responsible for proliferation (French et al., 2014).
BRD4-NUT, chromatin and cancer treatment
Results show that BRD4-NUT associate with acetylated chromatin using the bromodomains of BRD4 (Grayson et al., 2014). These associations will lead to the formation of 80-100 nuclear foci, easily seen in interphase and metaphase during mitosis. Scientists now speculate that megadomains may be able to help them identify loci responsible for defining heritable cell states (Alekseyenko et al., 2015). The size of the foci is large suggesting that protein interactions between the megadomain and chromatin occur in large aggregates (Alekseyenko et al., 2015, Reynord et al., 2010). Further studies based on naïve cells that do not have BRD4-NUT oncoproteins revealed that expression of BRD4-NUT in cells would lead to de novo megadomain formations and increased transcription. As mentioned before, BRD4-NUT use bromodomains in order to associate with acetylated chromatin and are responsible for blocking differentiation in NMC cells. To study if BET inhibitors are the accurate treatment to cure this type of cancer, scientists used BET inhibitor JQ1 in order to induce differentiation in NMC cells (Filippakopoulous et al., 2010). Results showed that indeed, upon treatment with JQ1 the megadomain size decreased significantly just within 4 hours after the treatment was performed (figure 4). The affected genes were the ones found in the megadomains rather the ones outside it and down-regulation of expression occurred in really high levels: 85.8%. These results confirm yet again that BRD4-NUT targets histone acetylation and increase transcription. Another type of treatment appears to be bromodomain inhibition therapy. This type of therapy can break cycles between the interactions of BRD4-NUT and acetylated lysines.
Conclusion
Finding out that higher order chromosomes are responsible for gene regulation, scientists are now able to understand the types of interactions that occur in the genome. By being able to understand these kinds of interactions and being able to know exactly the promoter or enhancer that activates a specific gene, they can use that information and learn new insights in specific diseases. The data set that is gathered from scientists all over the world can result in the largest and extensive map, in this case chromosomal map. Using this map, researchers can look at a specific SNP linked to the disease and understand why is that disease caused in the first place. After knowing exactly what caused the disease, scientists can further look and find various treatments. Techniques for how chromosomes are being studied were presented. Chromosomes interactions that regulate gene activity were explained (interact using promoters and enhancers). Then, the paper focused on explaining how are diseases related to promoters, and talked about NUT midline carcinoma. It was explained how megadomains are structures that affect cell function by dysregulating them and how these structure influence cancer. Because these megadomains are starting to be understood now, scientists came up with different ways of treatment and tested it in vivo and in vitro (JQ1 and BET inhibitors). So, the further question that remains unanswered is: will scientist be able to completely understand how genes are regulated? And are they going to be able to find treatments for every disease using SNPs and chromosomal organization?
2016-3-15-1458080324