TY - JOUR
T1 - Directed acyclic graph kernels for structural RNA analysis
AU - Sato, Kengo
AU - Mituyama, Toutai
AU - Asai, Kiyoshi
AU - Sakakibara, Yasubumi
N1 - Funding Information:
for Scientific Research on Priority Area "Comparative Genomics" No. 17018029 from the Ministry of Education, Culture, Sports, Science and Technology of Japan. We thank Dr. S. Washietl and Dr. I. L. Hofacker for providing us with their large-scale dataset of multiple alignments of noncoding RNAs. We also thank our colleagues from the RNA Informatics Team at the Computational Biology Research Center (CBRC) for fruitful discussions.
Funding Information:
This work was supported in part by a grant from "Functional RNA Project" funded by the New Energy and Industrial Technology Development Organization (NEDO) of Japan, and was also supported in part by Grant-in-Aid
PY - 2008/7/22
Y1 - 2008/7/22
N2 - Background: Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity. Results: We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering. Conclusion: Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.
AB - Background: Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity. Results: We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering. Conclusion: Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.
UR - http://www.scopus.com/inward/record.url?scp=49649096693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49649096693&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-9-318
DO - 10.1186/1471-2105-9-318
M3 - Article
C2 - 18647390
AN - SCOPUS:49649096693
SN - 1471-2105
VL - 9
JO - BMC bioinformatics
JF - BMC bioinformatics
M1 - 318
ER -