corner
corner

Phys. Rev. E 75, 011915 (2007) [10 pages]

Markov models of genome segmentation

Download: PDF (195 kB) Buy this article Export: BibTeX or EndNote (RIS)

Vivek Thakur1, Rajeev K. Azad2, and Ram Ramaswamy1,3
1Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India
2Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
3School of Physical Sciences, Jawaharlal Nehru University, New Delhi 110 067, India

Received 2 March 2006; revised 19 June 2006; published 17 January 2007

We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.

© 2007 The American Physical Society

URL:
http://link.aps.org/doi/10.1103/PhysRevE.75.011915
DOI:
10.1103/PhysRevE.75.011915
PACS:
87.15.Cc