TY - GEN
T1 - Recent methods for rna modeling using stochastic context-free grammars
AU - Sakakibara, Yasubumi
AU - Brown, Michael
AU - Hughey, Richard
AU - Mian, Saira
AU - Sjölander, Kimmen
AU - Underwood, Rebecca C.
AU - Haussler, David
N1 - Funding Information:
Tools for analyzing RNA will become increasingly important as in vitro evolution and selection techniques produce greater numbers of synthesized RNA families to supplement those related by phylogeny. Recent efforts have applied stochastic context-free grammars (SCFGs) to the problenls of statistical modeling, multiple alignment, discrimination and prediction of the secondary structure of RNA families. Our approach in applying SCF(;s to modeling RNA is highly related to our work on modeling protein families and domains with HMMs \[HKMS93,K BM+94\]. In RNA, the nucleotides adenine (A), cytosine (C), guanine (~) and uracil (U) interact to form characteristic secondary-structure motifs such as helices, loops and bulges \[Sae84, WPT89\]. Intramolecular a-O and G-C Watson-Crick pairs as well as G-U and, more rarely, G-A base pairs constitute the so-called biological palindromes in the genome, When RNA sequences are aligned, bottl primary * Y. Sakakibara's current address is ISIS, Fujitsu Labs Ltd., 140, Miyamoto, Numazu, Shizuoka 410-03, Japan. We thank Anders Krogh, Harry Noller and Bryn Weiser for discussions and assistance, and Michael Waterman and David Searls for discus-sions. This work was supported by NSF grants CDA-9115268 and I RI-9123692 and NIH grant number GM17129. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship.
Funding Information:
Y. Sakakibara?s current address is ISIS, Fujitsu Labs Ltd., 140, Miyamoto, Numazu, Shizuoka 410-03, Japan. We thank Anders Krogh, Harry Noller and Bryn Weiser for discussions and assistance, and Michael Waterman and David Searls for discussions. This work was supported by NSF grants CDA-9115268 and I RI-9123692 and NIH grant number GM17129. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship.
Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1994.
PY - 1994
Y1 - 1994
N2 - Stochastic context-free grammars (SC, FGs) Call be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture tile sequences’ common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of tile HMM forward-backward algorithm, is based on tree grammars and is faster than tile previously proposed inside-outside SCFG training algorithm. Independently, Scan Eddy and Richard Durbin have introduced a trainable “covariance model” (CM) to perform similar tasks. We compare and contrast our methods with theirs.
AB - Stochastic context-free grammars (SC, FGs) Call be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture tile sequences’ common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of tile HMM forward-backward algorithm, is based on tree grammars and is faster than tile previously proposed inside-outside SCFG training algorithm. Independently, Scan Eddy and Richard Durbin have introduced a trainable “covariance model” (CM) to perform similar tasks. We compare and contrast our methods with theirs.
UR - http://www.scopus.com/inward/record.url?scp=85015182436&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015182436&partnerID=8YFLogxK
U2 - 10.1007/3-540-58094-8_25
DO - 10.1007/3-540-58094-8_25
M3 - Conference contribution
AN - SCOPUS:85015182436
SN - 9783540580942
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 290
EP - 306
BT - Combinatorial Pattern Matching - 5th Annual Symposium, CPM 1994, Proceedings
A2 - Crochemore, Maxime
A2 - Gusfield, Dan
PB - Springer Verlag
T2 - 5th Annual Symposium on Combinatorial Pattern Matching, CPM 1994
Y2 - 5 June 1994 through 8 June 1994
ER -