TY - JOUR
T1 - Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning
AU - Koido, Masaru
AU - Hon, Chung Chau
AU - Koyama, Satoshi
AU - Kawaji, Hideya
AU - Murakawa, Yasuhiro
AU - Ishigaki, Kazuyoshi
AU - Ito, Kaoru
AU - Sese, Jun
AU - Parrish, Nicholas F.
AU - Kamatani, Yoichiro
AU - Carninci, Piero
AU - Terao, Chikashi
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2023/6
Y1 - 2023/6
N2 - Gene transcription is regulated through complex mechanisms involving non-coding RNAs (ncRNAs). As the transcription of ncRNAs, especially of enhancer RNAs, is often low and cell type specific, how the levels of RNA transcription depend on genotype remains largely unexplored. Here we report the development and utility of a machine-learning model (MENTR) that reliably links genome sequence and ncRNA expression at the cell type level. Effects on ncRNA transcription predicted by the model were concordant with estimates from published studies in a cell-type-dependent manner, regardless of allele frequency and genetic linkage. Among 41,223 variants from genome-wide association studies, the model identified 7,775 enhancer RNAs and 3,548 long ncRNAs causally associated with complex traits across 348 major human primary cells and tissues, such as rare variants plausibly altering the transcription of enhancer RNAs to influence the risks of Crohn’s disease and asthma. The model may aid the discovery of causal variants and the generation of testable hypotheses for biological mechanisms driving complex traits.
AB - Gene transcription is regulated through complex mechanisms involving non-coding RNAs (ncRNAs). As the transcription of ncRNAs, especially of enhancer RNAs, is often low and cell type specific, how the levels of RNA transcription depend on genotype remains largely unexplored. Here we report the development and utility of a machine-learning model (MENTR) that reliably links genome sequence and ncRNA expression at the cell type level. Effects on ncRNA transcription predicted by the model were concordant with estimates from published studies in a cell-type-dependent manner, regardless of allele frequency and genetic linkage. Among 41,223 variants from genome-wide association studies, the model identified 7,775 enhancer RNAs and 3,548 long ncRNAs causally associated with complex traits across 348 major human primary cells and tissues, such as rare variants plausibly altering the transcription of enhancer RNAs to influence the risks of Crohn’s disease and asthma. The model may aid the discovery of causal variants and the generation of testable hypotheses for biological mechanisms driving complex traits.
UR - http://www.scopus.com/inward/record.url?scp=85142364182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142364182&partnerID=8YFLogxK
U2 - 10.1038/s41551-022-00961-8
DO - 10.1038/s41551-022-00961-8
M3 - Article
C2 - 36411359
AN - SCOPUS:85142364182
SN - 2157-846X
VL - 7
SP - 830
EP - 844
JO - Nature Biomedical Engineering
JF - Nature Biomedical Engineering
IS - 6
ER -