TY - JOUR
T1 - Development of a large-scale comparative genome system and its application to the analysis of mycobacteria genomes
AU - Sakakibara, Yasubumi
AU - Osana, Yasunori
AU - Popendorf, Kris
PY - 2007
Y1 - 2007
N2 - As the number of whole genome sequences available continues to increase rapidly, the raw scale of the sequence data being used in analysis is the first hurdle for comparative genome analysis. When performing whole genome alignments, large-scale rearrangements make it necessary to first find out roughly which short well-conserved segments correspond to what other segments (termed anchors). Successful results have been achieved by adapting tools like BLAT and BLASTZ on a problem-to-problem basis, but the work required to perform a single alignment is considerable. Recently, new programs such as Mauve and Pattern-Hunter can handle slightly larger inputs, but the memory/time requirements for sequences like Human and Chimp X chromosomes are prohibitive for most computational environments. Our novel algorithm, which we have implemented in a program called Murasaki (available at http://murasaki.dna.bio.keio.ac.jp), makes it possible to identify anchors of multiple large sequences on the scale of several hundred megabases (e.g. three mammal chromosomes) in a matter of minutes. We also demonstrate an application of Murasaki to the comparative analysis of multiple mycobacteria genomes.
AB - As the number of whole genome sequences available continues to increase rapidly, the raw scale of the sequence data being used in analysis is the first hurdle for comparative genome analysis. When performing whole genome alignments, large-scale rearrangements make it necessary to first find out roughly which short well-conserved segments correspond to what other segments (termed anchors). Successful results have been achieved by adapting tools like BLAT and BLASTZ on a problem-to-problem basis, but the work required to perform a single alignment is considerable. Recently, new programs such as Mauve and Pattern-Hunter can handle slightly larger inputs, but the memory/time requirements for sequences like Human and Chimp X chromosomes are prohibitive for most computational environments. Our novel algorithm, which we have implemented in a program called Murasaki (available at http://murasaki.dna.bio.keio.ac.jp), makes it possible to identify anchors of multiple large sequences on the scale of several hundred megabases (e.g. three mammal chromosomes) in a matter of minutes. We also demonstrate an application of Murasaki to the comparative analysis of multiple mycobacteria genomes.
KW - Comparative genomics
KW - Dotplot
KW - Mycobacteria
KW - Pseudogene
KW - Sequence analysis
UR - http://www.scopus.com/inward/record.url?scp=35548955038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35548955038&partnerID=8YFLogxK
U2 - 10.5025/hansen.76.251
DO - 10.5025/hansen.76.251
M3 - Article
C2 - 17877037
AN - SCOPUS:35548955038
SN - 1342-3681
VL - 76
SP - 251
EP - 256
JO - Japanese Journal of Leprosy
JF - Japanese Journal of Leprosy
IS - 3
ER -