TY - JOUR
T1 - IMSindel
T2 - An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis
AU - Shigemizu, Daichi
AU - Miya, Fuyuki
AU - Akiyama, Shintaro
AU - Okuda, Shujiro
AU - Boroevich, Keith A.
AU - Fujimoto, Akihiro
AU - Nakagawa, Hidewaki
AU - Ozaki, Kouichi
AU - Niida, Shumpei
AU - Kanemura, Yonehiro
AU - Okamoto, Nobuhiko
AU - Saitoh, Shinji
AU - Kato, Mitsuhiro
AU - Yamasaki, Mami
AU - Matsunaga, Tatsuo
AU - Mutai, Hideki
AU - Kosaki, Kenjiro
AU - Tsunoda, Tatsuhiko
N1 - Funding Information:
This work was supported partially by The Ichiro Kanehara Foundation, Grant-in-Aid for Young Scientists (B) (Number: 16K19068) of Ministry of Education, Culture, Sports, Science and Technology, and JST CREST (Grant Number JPMJCR1412), Japan.
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.
AB - Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.
UR - http://www.scopus.com/inward/record.url?scp=85044986698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044986698&partnerID=8YFLogxK
U2 - 10.1038/s41598-018-23978-z
DO - 10.1038/s41598-018-23978-z
M3 - Article
C2 - 29618752
AN - SCOPUS:85044986698
SN - 2045-2322
VL - 8
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 5608
ER -