TY - GEN
T1 - Pose-aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
AU - Shibasaki, Kei
AU - Ikehara, Masaaki
N1 - Publisher Copyright:
© 2023 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Pose Guided Person Image Generation (PGPIG) is the task that transforms the pose of a person's image from the source image, its pose information, and the target pose information. Most existing PGPIG methods require additional pose information or tasks, which limits their application. Moreover, they use CNNs, which can only extract features from neighboring pixels and cannot consider the consistency of the entire image. This paper proposes a PGPIG network solving these problems by using a module containing Axial Transformers with large receptive field. The proposed method disentangles the PGPIG task into two subtasks: “rough pose transformation” and “detailed texture generation”. In the former task, low-resolution feature maps are transformed by blocks containing Axial Transformer. The latter task uses a CNN network with Adaptive Instance Normalization. Experiments show that the proposed method has competitive performance with other state-of-the-art methods. Furthermore, despite achieving excellent performance, the proposed network has a significantly fewer parameters than existing methods.
AB - Pose Guided Person Image Generation (PGPIG) is the task that transforms the pose of a person's image from the source image, its pose information, and the target pose information. Most existing PGPIG methods require additional pose information or tasks, which limits their application. Moreover, they use CNNs, which can only extract features from neighboring pixels and cannot consider the consistency of the entire image. This paper proposes a PGPIG network solving these problems by using a module containing Axial Transformers with large receptive field. The proposed method disentangles the PGPIG task into two subtasks: “rough pose transformation” and “detailed texture generation”. In the former task, low-resolution feature maps are transformed by blocks containing Axial Transformer. The latter task uses a CNN network with Adaptive Instance Normalization. Experiments show that the proposed method has competitive performance with other state-of-the-art methods. Furthermore, despite achieving excellent performance, the proposed network has a significantly fewer parameters than existing methods.
KW - Deep Learning
KW - Machine Learning
KW - Pose Guided Person Image Generation
KW - Pose Transfer
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85178355037&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178355037&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO58844.2023.10290101
DO - 10.23919/EUSIPCO58844.2023.10290101
M3 - Conference contribution
AN - SCOPUS:85178355037
T3 - European Signal Processing Conference
SP - 506
EP - 510
BT - 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 31st European Signal Processing Conference, EUSIPCO 2023
Y2 - 4 September 2023 through 8 September 2023
ER -