TY - GEN
T1 - Music Source Separation with Generative Adversarial Network and Waveform Averaging
AU - Tanabe, Ryosuke
AU - Ichikawa, Yuto
AU - Fujisawa, Takanori
AU - Ikehara, Masaaki
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - The task of music source separation is to extract a target sound from mixed sound. A popular approach for this task uses a DNN which learns the relationship of the spectrum of mixed sound and one of separated sound. However, many DNN algorithms does not consider the clearness of the output sound, this tends to produce artifact in the output spectrum. We adopt a generative adversarial network (GAN) to improve the clearness of the separated sound. In addition, we propose data augmentation by pitch-shift. The performance of DNN strongly depends on the quantity of the dataset for train. In other words, the limited kinds of the training datasets gives poor knowledge for the unknown sound sources. Learning the pitch-shifted signal can compensate the kinds of training set and makes the network robust to estimate the sound spectrum with various pitches. Furthermore, we process the pitch-shifted signals and average them to reduce artifacts. This proposal is based on the idea that network once learned can also separate pitch-shifted sound sources not only original one. Compared with the conventional method, our method achieves to obtain well-separated signal with smaller artifacts.
AB - The task of music source separation is to extract a target sound from mixed sound. A popular approach for this task uses a DNN which learns the relationship of the spectrum of mixed sound and one of separated sound. However, many DNN algorithms does not consider the clearness of the output sound, this tends to produce artifact in the output spectrum. We adopt a generative adversarial network (GAN) to improve the clearness of the separated sound. In addition, we propose data augmentation by pitch-shift. The performance of DNN strongly depends on the quantity of the dataset for train. In other words, the limited kinds of the training datasets gives poor knowledge for the unknown sound sources. Learning the pitch-shifted signal can compensate the kinds of training set and makes the network robust to estimate the sound spectrum with various pitches. Furthermore, we process the pitch-shifted signals and average them to reduce artifacts. This proposal is based on the idea that network once learned can also separate pitch-shifted sound sources not only original one. Compared with the conventional method, our method achieves to obtain well-separated signal with smaller artifacts.
UR - http://www.scopus.com/inward/record.url?scp=85083305989&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083305989&partnerID=8YFLogxK
U2 - 10.1109/IEEECONF44664.2019.9048852
DO - 10.1109/IEEECONF44664.2019.9048852
M3 - Conference contribution
AN - SCOPUS:85083305989
T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers
SP - 1796
EP - 1800
BT - Conference Record - 53rd Asilomar Conference on Circuits, Systems and Computers, ACSSC 2019
A2 - Matthews, Michael B.
PB - IEEE Computer Society
T2 - 53rd Asilomar Conference on Circuits, Systems and Computers, ACSSC 2019
Y2 - 3 November 2019 through 6 November 2019
ER -