Page 260 - Big Data Analytics for Intelligent Healthcare Management
P. 260
10.5 EXTRACTION OF GENES AND SCREENING OF DRUGS 253
the better performances. Our study covers the comparison of assembly tools using the bacterial ge-
nome of X. fastidiosa (strain DSM 10026). We conclude that the hash length of 89 gave the best
performance through Velvet software. Therefore, Velvet was the best assembly tool in comparison
to SOAPdenovo2.
10.5 DATA COLLECTION, EXTRACTION OF GENES, AND SCREENING
OF DRUGS
Within this work, we are aiming to examine gene interaction of Alzheimer’s disease (AD), which is
analyzed using a string database as well as finding the binding energy, binding residue, bond name,
and bond length of interaction between the proteins that are most targeted with drugs for AD
through molecular docking analysis. AD is characterized by the presence of amyloid β mediated
extracellular amyloid fibrils, which effect neuronal synaptic activity and antioxidant response
[43]. Other than genetic factors, there are some risk factors associated with the etiology of
AD. Over the last two decades, AD associated research has accomplished an overwhelming mo-
mentum, as it is one of the major current healthcare issues in the developed world. The need for
research on the effective therapeutic approaches in AD has directed this chapter more towards
specific drug selection, and their subsequent interaction with target molecules. In this study,
we have also searched the molecular data of AD from the three most cited and exponentially
growing databases to collect all possible genes and proteins. These sources are GWAS, Uniprot,
and gene cards. We downloaded and prepared gene lists of 498 genes from Uniprot, 385 genes
from GWAS, and 384 genes from gene cards. These gene lists were used for statistically identi-
fying the most relevant genes/proteins. Merging all these genes resulted in a superset of 1127
genes in total.
To further investigate the¼genes that are reported and common in all the three databases, we
studied the venn diagram using the online tool Venny 2.0. After this first tier selection, we found
that 10 genes were falling within the intersection subset area of all the databases viz. Uniprot,
GWAS, and gene cards; whereas 130 genes were only reported by any two of the three databases.
We took this domain of these 130 genes and subjected this gene list to network analysis to find the
within group association and weightage, which we could use to further reduce our gene set relevant
to the cause of AD. Gene-gene interaction studies helped in the second tier selection process of
the gene list. The network analysis was carried out using a STRING database and gene mania
database [44]. The gene interaction of AD, which was analyzed using the string database, is shown
in Fig. 10.1.
The most significant genes for AD were 28 genes and 25 genes as depicted by genemania and string
software respectively. For the gene SNCA, we discovered seventy interactions using genemania soft-
ware and fourteen interactions using string software. For GSK3B, we discovered forty-six interactions
and ten interactions by genemania software and string software respectively. For CDK5, forty-three
and eighteen interactions were found through genemania software and string software respectively.
We found the common genes within these two subsets and selected ten genes that were highly asso-
ciated with AD. This final subset of ten genes was thereafter taken as genes that encode the target pro-
teins for which we made the search for the drug compounds.