Page 257 - Big Data Analytics for Intelligent Healthcare Management

P. 257

250 CHAPTER 10 COMPUTATIONAL BIOLOGY APPROACH ON GENETIC
DISORDER

are deduced using this technique [31]. Additionally, the study of metagenomic reorganization plays an
important role in bioinformatics. Mapreduce, which is based on mapping and reducing, is able to tackle
genomic diversity. Dey and coworker [32] studied the De-Brujin graph using the mapreducing frame-
work in metagenomic data classification. It is based on a compact representation of k-mers to learn the
optimal path for genome assembly [32].

10.4 DE NOVO ASSEMBLY, RE-SEQUENCING, TRANSCRIPTOMICS
SEQUENCING AND EPIGENETICS
De novo assembly sequencing is the genome of a particular organism that doesn’t have a reference
genome sequence. De novo assembly helps obtain a good understanding at the level of the genome
and it assists in prediction of coding of protein regions and the different pathways [33]. Re-sequencing
is defined as sequencing of an organism from a known genome that helps in the understanding of the
relationship between phenotype and genotype [34]. Transcriptone sequencing encircles a wide variety
of utilization from simple mRNA profiling. It helps in analysis of the entire transcriptone, which in-
cludes both coding mRNA and noncoding RNA [35]. Epigenetics is the modification of DNA, which
affects gene expression. It does not change the underlying sequence [36]. These sequencers produce
high-throughput sequencing data at a moderate cost and are accelerating biological research in the area
of genomics and transcriptomic studies [37].
NGS technology started with the introduction of second-generation sequencing. The second-
generation sequencing platforms include Hiseq, Miseq, and GA from Illumina; 454 from Roche;
Ion torrent and SOLiD from Life Technologies; Heliscope from helicos Biosciences; and the
RS system from Pacific Bioscience [38]. The NGS results/outputs are typically available in an on-
line database at the Sequence Read Archive (SRA) [39]. Implication of NGS technology was per-
formed using several assemblers. Several tools have been developed to improve assembly in terms
of speed, contig (a contig is a continguous length of genomic sequence), and scaffold length. Other
tools were developed such as Velvet, Minia, and short oligonucleotide analysis package (SOAP)
denovo2 [40]. The aim of the present study is to use some of these tools to compare bacterial ge-
nome sequences and see which tool performs best. We are using two different assembly tools. Raw
read files of the bacteria Xylella fastidiosa were analyzed for assembling. X. fastidiosa is a fastid-
ious, gram-negative, xylem-limited bacterium. The whole genome paired sequence of X. fastidiosa
was downloaded from the European Bioinformatics Institute. The genome sequence was sequenced
using the Illumina platform. For quality checking, the whole genome sequence was studied using
the FastQC tool. Then it was filtered using NGS QC tool (a toolkit for the quality control (QC) of
next generation sequencing (NGS) data). De novo assembly was performed using two different
software programs: Velvet [41] and SOAPdenovo2 [42] with different Kmers values (ranging
from 31 to 89). To evaluate the quality of assembly, the contig files were used to generate the post
assemblies of the sequence using the N50 median value. Our study comprises of comparison of
assembly tools using bacterial genome sequence data. We used a sample of X. fastidiosa of strain
DSM 10026, which is a plant pathogen bacterium. For comparison of assembly of the bacterial
genome, we performed this using two assembly tools such as Velvet and SOAPdenovo2 with

252 253 254 255 256 257 258 259 260 261 262