TY - JOUR
T1 - Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
AU - Shiraishi, Yuichi
AU - Okada, Ai
AU - Chiba, Kenichi
AU - Kawachi, Asuka
AU - Omori, Ikuko
AU - Mateos, Raúl Nicolás
AU - Iida, Naoko
AU - Yamauchi, Hirofumi
AU - Kosaki, Kenjiro
AU - Yoshimi, Akihide
N1 - Funding Information:
This work is supported by Grant-in-Aid for Scientific Research (KAKENHI 18H03327, 21H03549) and Grand-in-Aid from the Japan Agency for Medical Research and Development (Platform Program for Promotion of Genome Medicine: 20km0405207h9905, Program for an Integrated Database of Clinical and Genomic Information: 20kk0205014h0005, Practical Research Project for Rare/Intractable Diseases: 20ek0109485h0001, Practical Research for Innovative Cancer Control: 21ck0106641h0001), National Cancer Center Research and Development Funds (2020-A-7, 2021-A-3). A.Y. acknowledges support from the Japan Society for the Promotion of Science (JSPS) Home-Returning Researcher Development Research (grant number 19K24691), KAKENHI (grant number 21H04828), the Japan Agency for Medical Research and Development (grant number 21jm0210085h0002), and National Cancer Center Research and Development Funds (2020-A-2). We used the super-computing resource provided by the Human Genome Center (The University of Tokyo) and ROIS National Institute of Genetics. The results shown here are partly based upon data generated by TCGA Research Network ( https://cancergenome.nih.gov/ ) and the Genotype-Tissue Expression (GTEx) Project. The authors want to thank Yuichi Kodama for helpful suggestions on the International Nucleotide Sequence Database Collaboration Policy.
Funding Information:
This work is supported by Grant-in-Aid for Scientific Research (KAKENHI 18H03327, 21H03549) and Grand-in-Aid from the Japan Agency for Medical Research and Development (Platform Program for Promotion of Genome Medicine: 20km0405207h9905, Program for an Integrated Database of Clinical and Genomic Information: 20kk0205014h0005, Practical Research Project for Rare/Intractable Diseases: 20ek0109485h0001, Practical Research for Innovative Cancer Control: 21ck0106641h0001), National Cancer Center Research and Development Funds (2020-A-7, 2021-A-3). A.Y. acknowledges support from the Japan Society for the Promotion of Science (JSPS) Home-Returning Researcher Development Research (grant number 19K24691), KAKENHI (grant number 21H04828), the Japan Agency for Medical Research and Development (grant number 21jm0210085h0002), and National Cancer Center Research and Development Funds (2020-A-2). We used the super-computing resource provided by the Human Genome Center (The University of Tokyo) and ROIS National Institute of Genetics. The results shown here are partly based upon data generated by TCGA Research Network (https://cancergenome.nih.gov/) and the Genotype-Tissue Expression (GTEx) Project. The authors want to thank Yuichi Kodama for helpful suggestions on the International Nucleotide Sequence Database Collaboration Policy.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/).
AB - Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/).
UR - http://www.scopus.com/inward/record.url?scp=85138960958&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138960958&partnerID=8YFLogxK
U2 - 10.1038/s41467-022-32887-9
DO - 10.1038/s41467-022-32887-9
M3 - Article
C2 - 36175409
AN - SCOPUS:85138960958
SN - 2041-1723
VL - 13
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 5357
ER -