TY - GEN
T1 - Non-monologue HMM-based speech synthesis for service robots
T2 - 2014 IEEE International Conference on Robotics and Automation, ICRA 2014
AU - Sugiura, Komei
AU - Shiga, Yoshinori
AU - Kawai, Hisashi
AU - Misu, Teruhisa
AU - Hori, Chiori
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/22
Y1 - 2014/9/22
N2 - Robot utterances generally sound monotonous, unnatural, and unfriendly because their Text-to-Speech (TTS) systems are not optimized for communication but for text-reading. Here we present a non-monologue speech synthesis for robots. We collected a speech corpus in a non-monologue style in which two professional voice talents read scripted dialogues. Hidden Markov models (HMMs) were then trained with the corpus and used for speech synthesis. We conducted experiments in which the proposed method was evaluated by 24 subjects in three scenarios: text-reading, dialogue, and domestic service robot (DSR) scenarios. In the DSR scenario, we used a physical robot and compared our proposed method with a baseline method using the standard Mean Opinion Score (MOS) criterion. Our experimental results showed that our proposed method's performance was (1) at the same level as the baseline method in the text-reading scenario and (2) exceeded it in the DSR scenario. We deployed our proposed system as a cloud-based speech synthesis service so that it can be used without any cost.
AB - Robot utterances generally sound monotonous, unnatural, and unfriendly because their Text-to-Speech (TTS) systems are not optimized for communication but for text-reading. Here we present a non-monologue speech synthesis for robots. We collected a speech corpus in a non-monologue style in which two professional voice talents read scripted dialogues. Hidden Markov models (HMMs) were then trained with the corpus and used for speech synthesis. We conducted experiments in which the proposed method was evaluated by 24 subjects in three scenarios: text-reading, dialogue, and domestic service robot (DSR) scenarios. In the DSR scenario, we used a physical robot and compared our proposed method with a baseline method using the standard Mean Opinion Score (MOS) criterion. Our experimental results showed that our proposed method's performance was (1) at the same level as the baseline method in the text-reading scenario and (2) exceeded it in the DSR scenario. We deployed our proposed system as a cloud-based speech synthesis service so that it can be used without any cost.
UR - http://www.scopus.com/inward/record.url?scp=84928636828&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84928636828&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2014.6907168
DO - 10.1109/ICRA.2014.6907168
M3 - Conference contribution
AN - SCOPUS:84928636828
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 2237
EP - 2242
BT - Proceedings - IEEE International Conference on Robotics and Automation
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 31 May 2014 through 7 June 2014
ER -