Page 260 - Big Data Analytics for Intelligent Healthcare Management
P. 260

10.5 EXTRACTION OF GENES AND SCREENING OF DRUGS              253




               the better performances. Our study covers the comparison of assembly tools using the bacterial ge-
               nome of X. fastidiosa (strain DSM 10026). We conclude that the hash length of 89 gave the best
               performance through Velvet software. Therefore, Velvet was the best assembly tool in comparison
               to SOAPdenovo2.






               10.5 DATA COLLECTION, EXTRACTION OF GENES, AND SCREENING
               OF DRUGS
               Within this work, we are aiming to examine gene interaction of Alzheimer’s disease (AD), which is
               analyzed using a string database as well as finding the binding energy, binding residue, bond name,
               and bond length of interaction between the proteins that are most targeted with drugs for AD
               through molecular docking analysis. AD is characterized by the presence of amyloid β mediated
               extracellular amyloid fibrils, which effect neuronal synaptic activity and antioxidant response
               [43]. Other than genetic factors, there are some risk factors associated with the etiology of
               AD. Over the last two decades, AD associated research has accomplished an overwhelming mo-
               mentum, as it is one of the major current healthcare issues in the developed world. The need for
               research on the effective therapeutic approaches in AD has directed this chapter more towards
               specific drug selection, and their subsequent interaction with target molecules. In this study,
               we have also searched the molecular data of AD from the three most cited and exponentially
               growing databases to collect all possible genes and proteins. These sources are GWAS, Uniprot,
               and gene cards. We downloaded and prepared gene lists of 498 genes from Uniprot, 385 genes
               from GWAS, and 384 genes from gene cards. These gene lists were used for statistically identi-
               fying the most relevant genes/proteins. Merging all these genes resulted in a superset of 1127
               genes in total.
                  To further investigate the¼genes that are reported and common in all the three databases, we
               studied the venn diagram using the online tool Venny 2.0. After this first tier selection, we found
               that 10 genes were falling within the intersection subset area of all the databases viz. Uniprot,
               GWAS, and gene cards; whereas 130 genes were only reported by any two of the three databases.
               We took this domain of these 130 genes and subjected this gene list to network analysis to find the
               within group association and weightage, which we could use to further reduce our gene set relevant
               to the cause of AD. Gene-gene interaction studies helped in the second tier selection process of
               the gene list. The network analysis was carried out using a STRING database and gene mania
               database [44]. The gene interaction of AD, which was analyzed using the string database, is shown
               in Fig. 10.1.
                  The most significant genes for AD were 28 genes and 25 genes as depicted by genemania and string
               software respectively. For the gene SNCA, we discovered seventy interactions using genemania soft-
               ware and fourteen interactions using string software. For GSK3B, we discovered forty-six interactions
               and ten interactions by genemania software and string software respectively. For CDK5, forty-three
               and eighteen interactions were found through genemania software and string software respectively.
               We found the common genes within these two subsets and selected ten genes that were highly asso-
               ciated with AD. This final subset of ten genes was thereafter taken as genes that encode the target pro-
               teins for which we made the search for the drug compounds.
   255   256   257   258   259   260   261   262   263   264   265