TY - JOUR
T1 - Comparative analysis of base correlations in 5′ untranslated regions of various species
AU - Osada, Yuko
AU - Saito, Rintaro
AU - Tomita, Masaru
PY - 2006/6/21
Y1 - 2006/6/21
N2 - Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.
AB - Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.
KW - Kozak consensus sequence
KW - Mutual information
KW - Shine-Dalgarno sequence
KW - Start codon
KW - Translation initiation sites
UR - http://www.scopus.com/inward/record.url?scp=33744543456&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744543456&partnerID=8YFLogxK
U2 - 10.1016/j.gene.2006.02.018
DO - 10.1016/j.gene.2006.02.018
M3 - Article
C2 - 16618531
AN - SCOPUS:33744543456
SN - 0378-1119
VL - 375
SP - 80
EP - 86
JO - Gene
JF - Gene
IS - 1-2
ER -