TY - JOUR
T1 - Representation learning of logic words by an RNN
T2 - From word sequences to robot actions
AU - Yamada, Tatsuro
AU - Murata, Shingo
AU - Arie, Hiroaki
AU - Ogata, Tetsuya
N1 - Funding Information:
This work was supported by a Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Young Scientists (A) (No. 16H05878), a JSPS Grant-in-Aid for JSPS Research Fellow (No. 17J1058), a JST CREST Grant (No. JPMJCR15E3), and the Program for Leading Graduate Schools, “Graduate Program for Embodiment Informatics” of the Ministry of Education, Culture, Sports, Science, and Technology.
Publisher Copyright:
Copyright © 2017 Yamada, Murata, Arie and Ogata.
PY - 2017
Y1 - 2017
N2 - An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network’s internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot’s own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an unstable space of the network’s dynamical system.
AB - An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network’s internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot’s own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an unstable space of the network’s dynamical system.
KW - Human–robot interaction
KW - Language understanding
KW - Logic words
KW - Neural network
KW - Sequence-to-sequence learning
KW - Symbol grounding
UR - http://www.scopus.com/inward/record.url?scp=85063305469&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063305469&partnerID=8YFLogxK
U2 - 10.3389/fnbot.2017.00070
DO - 10.3389/fnbot.2017.00070
M3 - Article
AN - SCOPUS:85063305469
SN - 1662-5218
VL - 11
JO - Frontiers in Neurorobotics
JF - Frontiers in Neurorobotics
M1 - 70
ER -