corner
corner

Phys. Rev. E 65, 041905 (2002) [16 pages]

Analysis of symbolic sequences using the Jensen-Shannon divergence

Download: PDF (270 kB) Buy this article Export: BibTeX or EndNote (RIS)

Ivo Grosse1,2, Pedro Bernaola-Galván2,3, Pedro Carpena2,3, Ramón Román-Roldán4, Jose Oliver5, and H. Eugene Stanley2
1Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724
2Center for Polymer Studies and Department of Physics, Boston University, Boston, Massachusetts 02215
3Departamento de Física Aplicada II, ETSI de Telecomunicación, Universidad de Málaga, E-29071 Málaga, Spain
4Departamento de Física Aplicada, Universidad de Granada, E-18071 Granada, Spain
5Departamento de Genética e Instituto de Biotecnología, Universidad de Granada, E-18071 Granada, Spain

Received 22 December 2000; revised 8 August 2001; published 25 March 2002

We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.

© 2002 The American Physical Society

URL:
http://link.aps.org/doi/10.1103/PhysRevE.65.041905
DOI:
10.1103/PhysRevE.65.041905
PACS:
87.15.Cc