Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

This paper proposes a sentence selection technique for constructing phonetically and prosodically balanced compact recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, balances of multiple phonetic and prosodic contextual factors are important as well as the coverage. To take account of both of the phonetic and prosodic contextual balances in sentence selection, we introduce an extended entropy of phonetic and prosodic contexts, such as biphone/triphone, accent/stress/tone, and sentence length. For detailed investigation, conventional and proposed techniques are evaluated using Japanese, English, and Chinese corpora. The objective experimental results show that the proposed technique achieves better coverage and balance of contexts. In addition, speech synthesis experiments based on hidden Markov models reveal that the generated speech parameters become closer to those of the natural speech compared with other conventional sentence selection techniques. Subjective evaluations show that the proposed sentence selection based on the extended entropy improves the naturalness of the synthetic speech while maintaining the similarity to the original sample.

Original languageEnglish
Pages (from-to)1107-1116
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume25
Issue number5
DOIs
Publication statusPublished - 2017 May
Externally publishedYes

Keywords

  • Corpus design
  • entropy
  • hidden Markov model (HMM)
  • prosodic context
  • sentence selection
  • statistical parametric speech synthesis

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis'. Together they form a unique fingerprint.

Cite this