TY - JOUR
T1 - Convolutional neural networks for classification of alignments of non-coding RNA sequences
AU - Aoki, Genta
AU - Sakakibara, Yasubumi
N1 - Funding Information:
This work was supported by a Grant-in-Aid for Scientific Research in Innovative Areas [no. 221S0002] from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), by a Grant-in-Aid for Scientific Research (A) (KAKENHI) [no. 23241066] from the Japan Society for the Promotion of Science (JSPS) and by another JSPS KAKENHI grant [no. 16H06279]. This work was also funded by a MEXT-supported Program for the Strategic Research Foundation at Private Universities.
Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.
AB - Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.
UR - http://www.scopus.com/inward/record.url?scp=85050806688&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050806688&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty228
DO - 10.1093/bioinformatics/bty228
M3 - Article
C2 - 29949978
AN - SCOPUS:85050806688
SN - 1367-4803
VL - 34
SP - i237-i244
JO - Bioinformatics
JF - Bioinformatics
IS - 13
ER -