TY - GEN
T1 - Demo
T2 - 17th ACM International Conference on Mobile Systems, Applications, and Services, MobiSys 2019
AU - Katayama, Shin
AU - Mathur, Akhil
AU - Okoshi, Tadashi
AU - Nakazawa, Jin
AU - Kawsar, Fahim
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019/6/12
Y1 - 2019/6/12
N2 - Conversational agents are increasingly becoming digital partners of our everyday computing experiences offering a variety of purposeful information and utility services. Although rich on competency, these agents are entirely oblivious to their users’ situational and emotional context today and incapable of adjusting their interaction style and tone contextually. To this end, we present a first-of-its-kind situation-aware conversational agent on kinetic earable that dynamically adjusts its conversation style, tone, volume in response to users emotional, environmental, social and activity context gathered through speech prosody, ambient sound and motion signatures. In particular the system is composed of the following components: • Perception Builder: This component is responsible for building an approximate view of user’s momentary experience by sensing his/her 1) physical activity, 2) emotional state, 3) social context and 4) environmental context using different purpose-built acoustic and motion sensory models [4, 5]. • Conversation Builder: This component enables a user to interact with the agent using a predefined dialogue base, and for this demo, we have used Dialogflow [1] populated with a set of situation-specific dialogues. • Affect Adapter: This component is responsible for guiding the adaptation strategy for the agent’s response corresponding to the user’s context, taking into account the output of the perception builder and a data-driven rule engine. We have devised a set of adaptation rules using multiple quantitative and qualitative studies that describe the prosody, volume and speed to shape agents response. • Text-to-Speech Builder: This component is responsible for synthesising the agent’s response in a voice that accurately reflects a user’s situation using IBM Bluemix Voice service [2]. This synthesis process interplays various voice attributes, e.g., pitch, rate, breathiness, glottal tension etc. to transform agents voice according to the rule of the Affect Adapter.
AB - Conversational agents are increasingly becoming digital partners of our everyday computing experiences offering a variety of purposeful information and utility services. Although rich on competency, these agents are entirely oblivious to their users’ situational and emotional context today and incapable of adjusting their interaction style and tone contextually. To this end, we present a first-of-its-kind situation-aware conversational agent on kinetic earable that dynamically adjusts its conversation style, tone, volume in response to users emotional, environmental, social and activity context gathered through speech prosody, ambient sound and motion signatures. In particular the system is composed of the following components: • Perception Builder: This component is responsible for building an approximate view of user’s momentary experience by sensing his/her 1) physical activity, 2) emotional state, 3) social context and 4) environmental context using different purpose-built acoustic and motion sensory models [4, 5]. • Conversation Builder: This component enables a user to interact with the agent using a predefined dialogue base, and for this demo, we have used Dialogflow [1] populated with a set of situation-specific dialogues. • Affect Adapter: This component is responsible for guiding the adaptation strategy for the agent’s response corresponding to the user’s context, taking into account the output of the perception builder and a data-driven rule engine. We have devised a set of adaptation rules using multiple quantitative and qualitative studies that describe the prosody, volume and speed to shape agents response. • Text-to-Speech Builder: This component is responsible for synthesising the agent’s response in a voice that accurately reflects a user’s situation using IBM Bluemix Voice service [2]. This synthesis process interplays various voice attributes, e.g., pitch, rate, breathiness, glottal tension etc. to transform agents voice according to the rule of the Affect Adapter.
UR - http://www.scopus.com/inward/record.url?scp=85069186099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069186099&partnerID=8YFLogxK
U2 - 10.1145/3307334.3328569
DO - 10.1145/3307334.3328569
M3 - Conference contribution
AN - SCOPUS:85069186099
T3 - MobiSys 2019 - Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services
SP - 657
EP - 658
BT - MobiSys 2019 - Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services
PB - Association for Computing Machinery, Inc
Y2 - 17 June 2019 through 21 June 2019
ER -