TY - JOUR
T1 - Multi-Layer Combined Frequency and Periodicity Representations for Multi-Pitch Estimation of Multi-Instrument Music
AU - Matsunaga, Tomoki
AU - Saito, Hiroaki
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Multi-pitch estimation (MPE) is one of the most important tasks in automatic music transcription (AMT). Since music generally involves a wide variety of instruments, MPE should be applied to multi-instrument music. The combined frequency and periodicity (CFP) approach detects pitches in multi-instrument music by comparing the frequency-domain spectrum with the quefrency-domain cepstrum. Although CFP considers the matching peak positions of the spectrum and cepstrum mapped according to the equal-tempered scale as pitches, its pitch selection method lacks sufficient rationality when dealing with stacked harmonics. In this paper, we propose an unsupervised method to effectively detect pitches from the stacked harmonics by extending CFP with partial cepstra extracted by suitably liftering the cepstrum. The frequency-domain and quefrency-domain features are multilayered by the partial cepstra; thus, the proposed method is constructed as a multi-layer CFP (ML-CFP). We compare the proposed ML-CFP with existing state-of-the-art MPE methods on one single-instrument and three multi-instrument datasets and demonstrate that ML-CFP provides the best overall performance among unsupervised methods. In addition, the generalizability of ML-CFP in terms of the degree of polyphony, duration scale, and instrument type is evaluated on another large-scale multi-instrument dataset. The results reveal the performance differences for different values of each measure with the limitations on musical properties of music signals required for ML-CFP to perform well.
AB - Multi-pitch estimation (MPE) is one of the most important tasks in automatic music transcription (AMT). Since music generally involves a wide variety of instruments, MPE should be applied to multi-instrument music. The combined frequency and periodicity (CFP) approach detects pitches in multi-instrument music by comparing the frequency-domain spectrum with the quefrency-domain cepstrum. Although CFP considers the matching peak positions of the spectrum and cepstrum mapped according to the equal-tempered scale as pitches, its pitch selection method lacks sufficient rationality when dealing with stacked harmonics. In this paper, we propose an unsupervised method to effectively detect pitches from the stacked harmonics by extending CFP with partial cepstra extracted by suitably liftering the cepstrum. The frequency-domain and quefrency-domain features are multilayered by the partial cepstra; thus, the proposed method is constructed as a multi-layer CFP (ML-CFP). We compare the proposed ML-CFP with existing state-of-the-art MPE methods on one single-instrument and three multi-instrument datasets and demonstrate that ML-CFP provides the best overall performance among unsupervised methods. In addition, the generalizability of ML-CFP in terms of the degree of polyphony, duration scale, and instrument type is evaluated on another large-scale multi-instrument dataset. The results reveal the performance differences for different values of each measure with the limitations on musical properties of music signals required for ML-CFP to perform well.
KW - Automatic music transcription
KW - multi-pitch estimation
KW - music signal processing
KW - partial cepstrum
UR - http://www.scopus.com/inward/record.url?scp=85196762267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196762267&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2024.3416730
DO - 10.1109/TASLP.2024.3416730
M3 - Article
AN - SCOPUS:85196762267
SN - 2329-9290
VL - 32
SP - 3171
EP - 3184
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -