Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition

Shinsuke Okita, Yasue Mitsukura, Nozomu Hamada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013
PublisherIEEE Computer Society
Pages62-67
Number of pages6
ISBN (Print)9781479922093
DOIs
Publication statusPublished - 2013
Event2013 IEEE Conference on Systems, Process and Control, ICSPC 2013 - Kuala Lumpur, Malaysia
Duration: 2013 Dec 132013 Dec 15

Publication series

NameProceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013

Other

Other2013 IEEE Conference on Systems, Process and Control, ICSPC 2013
Country/TerritoryMalaysia
CityKuala Lumpur
Period13/12/1313/12/15

Keywords

  • image processing
  • pattern recognition
  • visemes
  • visual speech recognition

ASJC Scopus subject areas

  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition'. Together they form a unique fingerprint.

Cite this