Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation

Kei Shibasaki, Masaaki Ikehara

研究成果: Article査読

抄録

Pose Guided Person Image Generation (PGPIG) is a task that involves generating an image of a person in a target pose, given an image in a source pose, source pose information and target pose information. Many of the existing PGPIG techniques need extra pose-related data or tasks, which cause the limitations on their applicability. In addition CNNs are used as the feature extractor which do not have long-range dependency. However, CNNs can only extract features from neighboring pixels and cannot consider image consistency. This paper introduces a PGPIG network that solves these challenges by incorporating modules that use Axial Transformers with wide receptive fields. The proposed approach disentangles the PGPIG task into two subtasks: 'rough pose transformation' and 'detailed texture generation.' In 'rough pose transformation,' lower-resolution feature maps is processed by Axial Transformer-based blocks. These blocks employ an Encoder-Decoder structure, which allows the network to use the pose information well and improves the stability and performance of the training. The latter subtask employs a CNN network with Adaptive Instance Normalization. Experimental results show the competitive performance of the proposed method compared to existing methods. The proposed method achieves lowest LPIPS in Deep Fashion dataset and FID in Market-1501 dataset. Remarkably, despite the great results obtained, the number of parameters of the proposed network is significantly less in contrast to existing methods.

本文言語English
ページ(範囲)146054-146064
ページ数11
ジャーナルIEEE Access
11
DOI
出版ステータスPublished - 2023

ASJC Scopus subject areas

  • コンピュータサイエンス一般
  • 材料科学一般
  • 工学一般

フィンガープリント

「Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル