TY - JOUR
T1 - Adaptive seeds tame genomic sequence comparison
AU - Kiełbasa, Szymon M.
AU - Wan, Raymond
AU - Sato, Kengo
AU - Horton, Paul
AU - Frith, Martin C.
PY - 2011/3
Y1 - 2011/3
N2 - The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.
AB - The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.
UR - http://www.scopus.com/inward/record.url?scp=79952256999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952256999&partnerID=8YFLogxK
U2 - 10.1101/gr.113985.110
DO - 10.1101/gr.113985.110
M3 - Article
C2 - 21209072
AN - SCOPUS:79952256999
SN - 1088-9051
VL - 21
SP - 487
EP - 493
JO - Genome Research
JF - Genome Research
IS - 3
ER -