Application of N-Gram Based Distances to Genetic Texts Comparison

Biosemiotics:1-15 (forthcoming)
  Copy   BIBTEX

Abstract

The article discusses the possible “physical” meaning of the distance between genetic sequences, based on comparing the set of all words of fixed length occurring in two genomic sequences. The considered distances suitable describe phylogenetic relationships and allow ranking by the genomes similarities in situations where it is practically impossible to provide by alignment methods. A simulation shows that the distances between the N-gram distributions change almost linearly, with genome lengths growing for relatively small artificial evolutionary modifications. In the general case of comparing two genetic texts, a function for “calibrating” the distance between N-gram distributions is found. This fact makes it possible to interpret the considered distances by means of the number of elementary operators performed in an alignment process between the compared sequences.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,010

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Distances between formal theories.Michele Friend, Mohamed Khaled, Koen Lefever & Gergely Székely - unknown - Review of Symbolic Logic 13 (3):633-654.
Conceptual Distance and Algebras of Concepts.Mohamed Khaled & Gergely Székely - forthcoming - Review of Symbolic Logic:1-16.
Distances, Hesitancy Degree and Flexible Querying via Neutrosophic Sets.A. A. Salama - 2014 - International Journal of Computer Applications 101 (10):7-12.
Polish metric spaces with fixed distance set.Riccardo Camerlo, Alberto Marcone & Luca Motto Ros - 2020 - Annals of Pure and Applied Logic 171 (10):102832.

Analytics

Added to PP
2021-08-20

Downloads
11 (#1,419,405)

6 months
3 (#1,471,455)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references