Trichomoniasis | EssaySauce.com

Trichomoniasis is caused by the parasitic protozoan trichomonas vaginalis and affects approximately 300 million people annually worldwide. The majority of cases occur in developing nations, however each year there are 5-8 million new cases in North America alone.1-3 The Center for Disease Control and Prevention recognizes trichomoniasis as a neglected parasitic infection, which affects approximately 4 million people in the United States.4 Trichomonal infection increases individual susceptibility to severe and sometimes fatal conditions including but not limited to pelvic inflammatory disease, cervical cancer, and HIV-1.5,6 Moreover, evidence exists to suggest that population by the parasite can increase risk of both prostatic hyperplasia and even prostate cancer.7-9 Trichomoniasis presents differently in men and women, making it more difficult to effectively diagnose and treat pre-propagation. T. vaginalis infection in males has been linked to asymptomatic urethritis and prostatitis. Some of the complications of T. vaginalis in females include: preterm delivery, low birth weight, and increased mortality as well as predisposition to HIV infection, AIDS, and cervical cancer. T. vaginalis has also been reported in the urinary tract, fallopian tubes, and pelvis and can cause pneumonia, bronchitis, and oral lesions. Condoms are effective at reducing, but not wholly preventing, transmission of T. vaginalis.
Trichomoniasis is typically treated with 5-nitroimidazole drugs such as metronidazole and tinidazole, with metronidazole being the primary treatment used in the United States since the 1960s.5,10 While themselves chemically neutral towards either the mammalian or parasitic biome, these compounds are readily anaerobically reduced by redox enzymes that cycle through repeated redox manipulations in the parasitic hydrogenosome forming toxic nitro radical anions which target thymine and adenine residues in the pathogen’s DNA and induce teratogenesis.5,10 The 4’/5’ nitromidazole class of compounds shares a reductive electronic preference with other nitroheterocycles, making nitromidazoles an attractive antibacterial/antiparasitic scaffold. Resistance, however, to metronidazole is increasing, with an estimated 5% of trichomoniasis clinical cases resulting from resistant T. vaginalis strains.10 The need to combat these strains of T. vaginalis has created a demand for new therapeutic strategies with novel mechanisms of action.
T. vaginalis is incapable of the de novo synthesis of purines11 and pyrimidines12 and therefore relies on salvage pathway enzymes to scavenge nucleosides from host cells to obtain its necessary nucleobases. The enzymes used in this pathway are nucleoside hydrolases (NHs). NHs catalyze the hydrolysis of nucleosides obtained from the host at the N-glycosidic bond, yielding a molecule of ribose and a free nucleic base. The parasite in turn uses the free nucleic bases for DNA synthesis to carry out its metabolic processes. NHs are specific for ribonucleosides, but display variability in their preference for the nucleic base.13 The identification of NH inhibitors that block this pathway can lead to the development of antiparasitic drugs that are mechanistically distinct from current treatment strategies.
Nucleoside hydrolases comprise a superfamily of structurally related metalloproteins with a unique β-sheet topology. Functionally, NHs are glycosidases that hydrolyse the N-glycosidic bond of β-ribonucleosides, forming the free nucleic base and ribose (Figure 1). All characterized members impose a stringent specificity for the ribose moiety, but exhibit variability in their preferences for the nature of the nucleic base. Sequence alignments highlight a recurring N-terminal DXDXXXDD motif as a hallmark of NH activity.13
Figure 1: Hydrolysis of uridine by uridine nucleoside ribohydrolase, cleavage of N-glycosidic bond.
NHs are widely distributed in nature, and have been found in bacteria14,15, yeast16, protozoa17,18, insects19 and mesozoa.20 Genes containing the characteristic NH fingerprint motif are also present in plants, amphibians and fish. Surprisingly, neither NH activity nor the encoding genes have ever been detected in mammals. The metabolic role of the NHs is well established only for parasitic protozoa (Trypanosoma, Leishmania, Giardia and so on). In these organisms, NHs are key enzymes of the salvage pathway that aims to scavenge purines from their environment.21,22 In this pathway, the NHs catalyze the hydrolysis of the assimilated nucleosides, allowing recycling of the purine bases and ribose. Parasitic protozoa rely on the purine salvage pathway for survival because – in contrast to most other living organisms – they lack a de novo biosynthetic pathway for purines. Considering this divergence in purine metabolism between parasite and host, the parasitic NHs have been studied extensively in recent years as potential targets for chemotherapeutic intervention.
A unique element of NH evolution is the generation of a variety of quaternary themes built on the same common fold scaffold. Various crystallographic determinations of archetypical NH examples have been published and show remarkable variation in quaternary structure, while maintaining common secondary and tertiary motifs, as well as function. The structure of IU-NH from the trypanosome parasite Crithidia fasciculata (Cf NH) was solved by Schramm and co-workers as a free enzyme29 and in complex with the inhibitor para-aminophenyliminoribitol (pAPIR).30 The same group also solved the crystal structure of the IU-NH from Leishmania major (LmNH). The crystal structures of the free IAG-specific enzyme from Trypanosoma vivax (TvNH) and its complex with the inhibitor 3-deaza-adenosine were reported in 2001.31 Consistent with their behaviour in size exclusion chromatography, both IU-NHs crystallized as similar homotetramers, whereas the IAG-NH is a homodimer in the crystal. The monomeric subunits of both subgroups are similar in architecture and topology, and consist of a single globular domain. Remarkably, the subunits of the IU-NH tetramers and the IAG-NH dimer are arranged in different quaternary architectures, involving different subunit–subunit interfaces. The α/β core of the NH monomer is composed of a characteristic eight-stranded mixed β sheet, with seven parallel and one antiparallel strand, and several surrounding α helices. The first six strands of this β sheet are arranged in a structure that resembles a dinucleotide-binding or Rossmann fold.29 The active site is located at the c-terminal end of this central sheet. Two flexible loops are positioned on either side of the active site. Upon binding of the transition state analogue inhibitor pAPIR to the Cf NH enzyme, these loops change conformation to position additional side chains in the active site, thus restricting the access of solvent.30
The NHs contain one deep narrow active site per subunit. A Ca2+ ion is tightly bound at the bottom of the active site.31 This octacoordinated metal is chelated through a conserved network of interactions involving the side chain oxygens of Asp10, Asp15 and Asp261, the main chain carbonyl oxygen of Thr137 (TvNH numbering) and three water molecules. Upon substrate binding, the ribose moiety is fixed deep inside the active site cleft. In the complex, two Ca2+-bound water molecules are replaced by the 20- and 30-hydroxyl groups of the sugar. The single remaining Ca2+-chelated water molecule interacts with a conserved aspartate (Asp10). In the various structures, this water molecule is located 3.2–3.3 A˚ from the backside of the scissile bond (C10), poised to attack the anomeric carbon. These motifs indicate a specificity for the ribose sugar and a variable promiscuity for the attached nucleobase, further supported by studies that have revealed that the NH catalyzed hydrolysis of β-ribonucleosides proceeds via an SN1 mechanism with an oxocarbenium-ion-like transition state that further defines the inhibitor composition as it needs to be able to perform this oxocarbenium transition intermediate kinetic. Figure 2 indicates a possible expansion on this activated anomeric water catalyst that proposes an orbital interaction that accounts for the unusual bond geometries that promote bond breaking.
Figure 2: Structure-based reaction mechanism that resolves the apparent orthogonal paradox for electron transpositions by altering the substrate stereochemistry. (A) A simplified valence-bond representation of the glycosydic bond dissociation hides the paradox that the three electron pairs to be transposed are involved in orthogonal orbitals. (B) In the normal anti-conformation of deoxyuridine, the σ*-orbital involved in the anomeric effect and the π-orbital of the C2⩵O bond are orthogonal to one another, thus preventing orbital overlap. (C) Severe distortions of the deoxyribose and the glycosylic bond in the strained conformation of deoxyuridine enforced by the UDG active center align the pairs of atomic orbitals participating in each electron transposition, thereby electronically coupling the anomeric and σ-πArom effects to promote bond cleavage.32
The consensus amino acid sequence DXDXXXDD13 was used to probe the T. vaginalis genome23 and identify at least three NHs: adenosine/guanosine nucleoside hydrolase (TVAG_213720),24 guanosine/adenosine/cytidine nucleoside hydrolase (TVAG_305790),25 and uridine nucleoside hydrolase (TVAG_092730).26 Kinetic characterizations with reducing sugar assays were used to determine their respective substrate specificities based on kcat/Km catalytic efficiency values.24,25 AGNH efficiently hydrolyzes adenosine and guanosine but has only barely detectable activity toward cytidine or uridine. GACNH has broad activity toward guanosine, adenosine, and cytidine but does not hydrolyze uridine. UNH is highly specific for uridine, with only marginal activity toward cytidine and no measurable activity for the other nucleosides. AGNH and GACNH likely have larger active sites compared to UNH, enabling them to accommodate the larger purine rings. A fourth putative NH was also identified, TVAG_424130, but it has not yet been characterized to confirm its function. These four nucleoside hydrolases share a high degree of sequence homology with a T-Coffee score of 86,24 and their putative function was independently annotated in the T. vaginalis genome database.
The Stockman Laboratory at Adelphi University studies these three enzymes to identify parasite specific metabolic inhibitors of these life-essential salvage pathways. UNH is highly specific for uridine, with only marginal activity toward cytidine and no measurable activity for the other nucleosides. AGNH and GACNH likely have larger active sites compared to UNH, enabling them to accommodate the larger purine rings. The technique by which the Stockman Group investigates potential inhibitors is described in Figure 3, wherein a library of small compounds is screened against enzymatic and metabolic activity to then inform organic synthesis of larger and more complex compounds with high specificity and low half maximum inhibitory concentration. One of the major challenges involved in this otherwise effective technique is the availability of crystal structures for the enzymes in question. It is immensely challenging to produce effective competitive inhibitors without a thorough and high resolution understanding of the active site being targeted. Traditional crystallographic methods have yet to produce a working or complete model of these enzymes, and specifically uridine nucleoside ribohydrolase is the target of this study.
Figure 3: Overview of the library fragment screening technique used to identify potential inhibitors of UNH and to inform subsequent organic synthesis of an anti-trichomonal drug.
The gap between the number of proteins with known sequences and the number of proteins with experimentally characterized structure and function keeps increasing. One way to narrow this gap is by developing advanced computational approaches for modeling structure and function from sequences. Besides traditional crystallographic methods, computational techniques provide a powerful tool for structure prediction and kinetic evaluation of macromolecular systems like enzymes and their substrates. Protein homology modeling is a technique that uses kindred proteins based on sequence and function to inform the development of a model of the unknown target that can then be refined using molecular dynamics. Homology modeling can be conducted with various degrees of resolution (Figure 4).
Figure 4: Schematic representation of possible sources of errors in modeling and important areas of applications of theoretical protein structure models, requisite model veracity vs. functional application33
Knowledge of the expected accuracy of a protein structure model is the utmost priority for the structural biologist or chemist looking to apply said model for their work, whichever of the above categories it may fall under. Therefore, a key element of homology modeling becomes quality estimation.
Applications of protein models have been a topic of interest in computationally conscious biochemistry for decades. The figure above was taken from a workshop titled “Applications of Protein Models in Biomedical Research” at the UCSF in July 2008.33 The quality of a model determines its usefulness. The importance of quality estimation in modeling has been underlined in the literature.34-40 There are two essential sources of information supporting the estimation of the accuracy of homology models.
The first source is the availability of structural knowledge, primarily determined by the evolutionary distance between the query protein and template proteins of known structure. This is based on the observation that there is a direct correlation between sequence identity of a pair of proteins and the structural similarity of their common core.41,42 These determinants of model accuracy fall under three major categories – sequence identity between target and template entities, actuality of template selection, and variability among available templates. Sequence identity between large and template entities is commonly seen as a first indicator for the expected accuracy of a model, as confirmed by various studies.41-44 Three zones of similarity are defined: midnight zone (zone A, red), twilight zone (zone B, yellow), safe zone (zone C, green). Zone A: In models based on a target-template sequence alignment lower than 30% sequence identity, substantial alignment errors and suboptimal template selection are frequently observed.42,45 Careful validation of these models’ quality is strongly advised. Zone B: In models based on a target-template sequence alignment between 30% and 50% sequence identity, alignment errors in non-conserved segments of the target protein, structural variation in templates, and incorrect reconstruction of loops (insertions and deletions) are frequent sources of model inaccuracies.45,46 Careful validation of the model quality and variability among template structures is advised. Zone C: Models based on a target-template sequence alignment higher than 50% sequence identity typically have the correct fold, and the alignments tend to be mainly correct. Structural variation in templates and incorrect reconstruction of loops (insertions and deletions) are the main sources of model inaccuracies.46,47 Validation of the model quality and analysis of the variability among template structures is advised.
These classifications are aggregated into Figure 5.
Figure 5: Schematic representation of the 3 zones of sequence/structure similarity.
Actuality of template selection refers to the importance of models based on the best available template at the time of model building. It should therefore always be checked whether a newer template with a considerably higher sequence identity with respect to the query protein has become available in the PDB.
Variability among target templates is the standard of model accuracy determination. In homology modeling, often several evolutionarily related proteins with known experimental structure are detected for a given query protein of interest. Depending on the protein family these templates may be structurally quite similar or vary considerably. Usually, some regions in the core of the templates agree more (the “structural core”) and some parts, mainly protein surface loops, are less similar (the “structurally variable regions”). The structural core, which also tends to be more conserved in sequence, serves as a template for structural extrapolation. These parts of the model which are directly inherited from the template(s) are generally more accurate compared to the remaining regions which need to be predicted from scratch. Structural variations among templates can have several regions such as differences in experimental conditions, presence or absence of ligands/co-factors but also evolutionary reasons. The variations may be characteristic for the family and a sign for flexibility or disorder. There are many examples of proteins which are largely disordered and whose function can only be explained by taking into account the non-existence of a well-defined three-dimensional structure48,49(Figure 6).
Figure 6 – Adenylate kinases, which catalyze the interconversion of adenine nucleotides. They undergo large conformational changes from the open form (PDB id 4AKE, depicted in grey) to the enzymatically- active closed conformation in presence of the ligand (1AKE, structure colored according to the local deviation to the open form). In homology modeling, template selection in this case would have a strong effect on the explanatory value of the resulting model and its applicability for subsequent experiment.
The second source of information comes from the analysis of the geometry of the model. Especially when the sequence identity is low, individual models may vary considerably from the expected average quality due to various sources of errors in modeling (see Figure 5 above) and inaccuracies introduced by the modeling programs. It is therefore necessary to independently check the geometric plausibility and the “energy” of the model. For this purpose scoring (or energy) functions have been developed.
To sum, the validation of a model must consider template viability: Is the model based on the best available template?
check up-to-dateness of template selection -> “verification date”
sequence identity correlates with modeling difficulty; check the resolution of the experimental structure
check the experimental conditions and the environment (e.g. solvated with or w/o ligand)?
The analysis of variability among templates:
– regions not differing between various templates (i.e. the structural core) can be inherited  directly and are therefore modeled potentially more accurately than structurally variable  regions (e.g. surface loops)
Where is the structural variability located?
Are flexible loops part of the active site?
Are there shift/distortions in the core of the protein (e.g. among secondary structure  elements)? This would indicate a difficult modeling case with lower expected model accuracy
Variation may be sign of flexibility in the protein family or there may even be disordered regions (i.e. regions not resolved in many templates). This flexibility may be needed for protein function (use of disorder prediction tools may help in this situation49)
Errors in models tend to increase with decreasing sequence identity to available templates (see Figure 5), at the same time inaccuracies introduced by the modeling programs increase as well, which make it necessary to independently check the geometry (or “energy”) of the models.
Several methods and scoring functions have been described in the literature analyzing different aspects of proteins and investigating both the global quality of the entire model as well as local aspects.50,51
In the early 1990’s tools analyzing the stereo-chemical plausibility of a protein structures were first designed.34,52 Deviations from ideal stereo-chemical values are reported by programs such as ProCheck and WhatCheck, which are still widely used especially in the field of experimental structure determination. But they can also help in identifying “suspicious geometries” in models. Another category of methods investigates the compatibility of individual amino acids or the entire sequence (threading energies, etc) with the structural environment described by the model.53,54
The most extensively used methods for assessing protein models are scoring functions based on statistical potentials or potentials of mean force (PMF’s).55 Statistical potentials are usually formalized as distance-dependent non-bonded interaction potentials51,56-58 but other structural features such as torsion angles, contacts, degree to which residues are buried, hydrogen bonds, etc., are also used. Combining different geometrical features in a composite scoring function has been shown to further improve the performance of these methods in identifying good models.59-64
The modeling server used in the structural development of uridine nucleoside ribohydrolase is the I-TASSER system. I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations. Function insights of the target are then derived by threading the 3D models through protein function database BioLiP.65-67
The I-TASSER Suite pipeline consists of four general steps: threading template identification, iterative structure assembly simulation, model selection and refinement, and structure-based function annotation. In the first step, the query is threaded by LOMETS through a non-redundant structure library (PDB) to identify structural templates. LOMETS is a meta-threading method containing eight fold-recognition programs (PPAS, Env-PPAS, wPPAS, dPPAS, dPPAS2, wdPPAS, MUSTER and wMUSTER). These programs are generally based on sequence profile-to- profile alignments, but with various structural features combined. Such variation is important for generating complementary alignments, which increase the coverage of template detections.
Following the query-to-template alignments, the sequence is divided into threading-aligned and threading-unaligned regions. The topology of full-length models is constructed by reassembling the continuously aligned fragments excised from templates, where the structure of unaligned regions is built from scratch by ab initio folding. The structure folding and reassembly are conducted by replica-exchange Monte Carlo simulations under the guidance of an optimized knowledge-based force field, consisting of three major components: (i) generic statistical potentials, (ii) hydrogen-bonding networks and (iii) threading-based restraints from LOMETS.
The lowest free-energy conformations are identified by structure clustering. A second round of assembly simulation is conducted, starting from the centroid models, to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering; the residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ, built on the variation of modeling simulations and the uncertainty of homologous alignments through support vector regression training.
For function annotation, the structure models with the highest confidence scores are matched against the BioLiP database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates.
The I-TASSER Suite pipeline was tested in recent community-wide structure and function prediction experiments, including CASP10 and CAMEO.68,69 Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 “New Fold” (NF) targets in the CASP10, which have no homologous templates in the PDB. Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threading- aligned regions.70 In CAMEO, COACH generated LBS predictions for 4,271 targets with an average accuracy 0.86, which was 20% higher than that of the second-best method in the experiment.
The primary sequence of UNH (Figure 7) was uploaded and run through the I-Tasser Server as well as its periphery suite tools ten times to vary the parameters with which the server allows homology to be identified and modeling to be undertaken. The highest scoring model was then analyzed by the suite tools to develop a case for the veracity and validity of the UNH model proposal. Each run of initial homology/modeling takes an average of several days to receive raw data for analysis from the I-TASSER server.
Figure 7 – Primary structure of UNH, by single letter amino acid code.
After the structure-assembly simulation, I-TASSER uses TM-align program to match the first I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB which have the closest structural similarity (i.e. the highest TM-score) to the predicted I-TASSER model. Due to the structural similarity, these proteins often have similar function to the target. C-score is a confidence score for estimating the quality of predicted models by I-TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa. TM-score is a recently proposed scale for measuring the structural similarity between two structures.71 The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will yield a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. These cutoffs do not depend on the protein length.
TM-score (or RMSD) is a known standard for measuring structural similarity between two structures which is usually used to measure the accuracy of structure modeling when the native structure is known, while C-score is a metric that I-TASSER developed to estimate the confidence of the modeling. In the case where the native structure is not known, it becomes necessary to predict the quality of the modeling prediction, i.e. what is the distance between the predicted model and the native structures? To answer this question, analysis of the TM-score and RMSD of the predicted models was necessary relative to the native structures based on the C-score.
In a benchmark test set of 500 non-homologous proteins, C-score was highly correlated with TM-score and RMSD. Correlation coefficient of C-score of the first model with TM-score to the native structure is 0.91, while the coefficient of C-score with RMSD to the native structure is 0.75. These data lay the base for the reliable prediction of the TM-score and RMSD using C-score. In the output section, I-TASSER only reports the quality prediction (TM-score and RMSD) for the first model, because it was found that the correlation between C-score and TM-score is weak for lower ranking models. However, the C-score is listed for all models just as a reference.
The I-TASSER sequence alignment yielded an array of PDB hits that, when researched, are primarily non-mammalian enzymes that are involved in interacting with/modifying nucleotides and specifically the N-glycosidic bond between the nucleobase and the associated ribose (Figure 8).
Figure 8 – I-TASSER sequence alignment, highlighting elucidated active site motif.
The most closely aligned hit in the PDB is 2MAS72, a trypanosomal nucleoside hydrolase crystallized with a calcium domain and a docked 2-(4-amino-phenyl)-5-hydroxymethyl-pyrrolidine-3,4-diol (Figure 9), with 2.3 angstrom resolution.
Figure 9 – 2-(4-amino-phenyl)-5-hydroxymethyl-pyrrolidine-3,4-diol
The percent identity for 2MAS when analyzed against the T. vaginalis UNH primary sequence is ~30%. However, keeping in mind the variability of quaternary nucleohydrolase structure, when comparing the sequences immediately surrounding the proposed active site motif, that sequence identity is upwards of ~88%.
The next step was to run the calculations on predicted secondary structure of UNH, and Figure 10 aggregates a section of the UNH sequence with proposed secondary structure and the confidence score associated with it. The active site motif is highlighted, and the confidence score associated with that region especially is high.
Figure 10 – Predicted secondary structure and associated confidence score for UNH primary sequence
Furthermore, the solvent accessibility was predicted in the top rated model based on secondary, tertiary and quaternary structure.73 This is integrated through EDTSurf, an open source program to construct triangulated surfaces for macromolecules. It can generate three major macromolecular surfaces of van der Waals surface, solvent-accessible surface and molecular surface (solvent-excluded surface), and identify cavities which are inside of macromolecules (Figure 11). It is important to verify that active site residues are thoroughly buried within the enzymatic scaffold for three major reasons.
Protection of the active site from the surrounding environment in order to prevent water molecules from integrating into the active site /or to keep a fixed water molecule inside it (molecules required for substrate fixation and/or catalysis).
Only substrate interaction with the active site surface provokes a correct conformational change of the active site in order to accommodate the substrate molecule correctly: fixation and orientation against the active site catalyzing groups.
A correct accommodation of the substrate molecule leads to a rapid and correct chemical transformation (specific catalysis). By this way, substrate catalysis requires a relatively low level of activation energy. Furthermore, the whole enzyme reaction (substrate fixation, chemical transformation and product release) is exergonic or requires the lowest level possible of exchangeable energy (ΔG). The reaction is therefore thermodynamically favored.
Figure 11 – Prediction of exposure to solvent, highlighting active site motif
The residues of the active site motif seem to be predicted to be entirely buried within the enzyme, implying that the active site is effectively sequestered for catalytic activity and that the model is demonstrating reasonable and defensible secondary, tertiary and quaternary structure.
This verification encouraged the running of this model through COACH ligand-binding prediction calculations. COACH is a meta-server approach to protein-ligand binding site prediction. Starting from given structure of target proteins, COACH will generate complementary ligand binding site predictions using two comparative methods, TM-SITE and S-SITE, which recognize ligand-binding templates from the BioLiP database by substructure and binding-specific sequence-profile comparisons. These predictions will be combined with results from other methods (including COFACTOR, FINDSITE and ConCavity to generate final ligand binding site predictions.74 Table 1 aggregates this ligand prediction data. C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction. Cluster size is the total number of templates in a cluster. Lig Name is the name of possible binding ligand.
Table 1- Aggregate ligand prediction calculations & associated confidence values.
It is encouraging from the perspective of modeling a nucleoside hydrolase to determine that there is a 74% confidence from an unrefined structure that the enzyme is specific for the docking of nucleotide bases (DNB) and a 46% confidence for the chelation of calcium ions, with insignificant predicted ligation of any other materials. It is also encouraging that several of the predicted binding site residues match the proposed active site motif elucidated in this enzyme and discussed earlier. It is useful to look at the other predicted binding site residues and analyze their relative location in the three dimensional model to the accepted motif, and one another. The full list of the residues involved with DNB ligation is aggregated in Table 2.
Table 2 – Outline of residues predicted to be involved with binding of nucleotides in the active site of UNH
The final analytical step taken to support veracity of the generated model is a genetic ontology analysis. Gene ontology is a bioinformatics initiative to unify the representation of genes, their associated genetic products and their attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. MetaGO is an algorithm for predicting Gene Ontology (GO) of proteins. It consists of three pipelines to detect functional homologs through local and global structure alignments, sequence and sequence profile comparison, and parter’s-homology based protein-protein interaction mapping. The final function insights are a combination of the three pipelines through logistic regression.75
Figure 12 and Table 3 summarize the results of the gene ontology analysis.
Figure 12 – Gene ontology raw data, CscoreGO is a combined measure for evaluating global and local similarity between query and template protein. It’s range is [0-1] and higher values indicate more confident predictions.
Table 3 – Gene ontology code translated from Open Biomedical Ontologies (OBO)
Gene ontology predictions have clearly predicted that an enzyme of this pedigree and design, even without refinement, is structured with 98% certainty for uridine catabolism, and 98% certainty for pyrimidine ribonucleoside catabolism. The other predictions seem to predicate a logical evolutionary progression from the onset of salvage metabolism as a general mechanism for survival and reproduction to a highly efficient and specific catabolic process with nucleotide specific enzymes that target and cleave requisite nucleobases with exceptionally high affinity that allows the parasite to prosper in a mammalian host.
The process of refinement is ongoing and elaborate. The structure was first analyzed atom by atom to make sure that reasonable atomic distances are being respected. The structure is then overlaid with the scaffold on which it was based, and the overlaps are compared. Any regions that do not overlap (predominantly loops that have the greatest degree of steric freedom) need to be hand modeled ab initio. In terms of sequence composition, loops are the most variable parts of proteins and tend to be more frequently subject to insertions, deletions and substitutions than secondary structure regions.76 Consequently, the accuracy of loop structure prediction by template-based methods is generally lower than that of other regions Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.
Proton pump inhibitors based on benzimidazole scaffolds like omeprazole, pantoprazole, and rabeprazole have been identified as potent fragment inhibitors of uridine nucleoside ribohydrolase.28 These benzimidazole scaffolds have also been demonstrated as effective inhibitors of poly (ADP-ribose) polymerase (PARP) enzymes.87 PARP is a family of proteins critical to DNA repair, genomic stability, and apoptosis, with four major domains of interest.88 The main roles of PARP enzymes within the nucleus of the cell is to detect and initiate rapid deployed cellular response to metabolic, chemical, or radiation induced single strand DNA breaks by inducing a signal cascade that galvanizes the activity of enzymatic activity that is responsible for single strand DNA break repair. These four domains of interest include a DNA-binding domain, a caspase-cleaved domain, an auto-modification domain, and a catalytic domain.89 The DNA-binding domain is composed of two zinc finger motifs and in the presence of damaged DNA (base pair-excised), the DNA-binding domain will bind the DNA and induce a conformational shift. It has been shown that this binding occurs independently of the other domains and that this is integral in a programmed cell death model based on caspase cleavage inhibition of PARP. The auto-modification domain is responsible for releasing the protein from the DNA after catalysis, as well as cleavage induced inactivation. This mechanism implies a homologous specificity for ribose with a variational specificity for the nucleobase attached to it that is observed in nucleohydrolases. The observation of these enzymes being inhibited by the benzimidazole scaffolded compounds is an encouraging connection between proteinic nucleoside interaction by similar mechanisms from two entirely different enzymes that show a kinetic and dynamic vulnerability to similar electronic and steric inhibition platforms and inspire the continued investigation into the ribose targeting moiety for efficient inhibitor design of both the UNH modeled in this study as well as the functionally homologous enzymes used by the t. vaginalis parasite for the harvesting of nucleobases via its metabolic salvage pathways.
This structural model will be optimized with molecular dynamics, starting from the active site, for both the endogenous uridine substrate as well as the Ca2+ cation and several experimentally identified fragment inhibitors. The static docking of ligands and gradual, radial relaxation of residues allows for the controlling of the degree of freedom with which the tertiary and quaternary structure can be adjusted for amelioration of steric and electronic hindrance. These optimized structures, with their refined scaffolds, will then be used to study the catalysis pathway and identify possible inhibition mechanisms for this enzyme, as well as guiding the rational design of synthetic organic inhibitors for the UNH enzyme that interact competitively with the active site domain of the catalyst.

Essay: Trichomoniasis

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: