超解像のための画像及び言語の統合特徴を利用したPerceptual Lossの改善

Go Ohtani, Hirokatsu Kataoka, Yoshimitsu Aoki

研究成果: Article査読

抄録

Perceptual loss, calculated by VGG network pre-trained on ImageNet, has been widely employed in the past for super-resolution tasks, enabling the generation of photo-realistic images. However, it has been reported that grid-like artifacts frequently appear in the generated images. To address this problem, we consider that large-scale pre-trained models can make significant contributions to super-resolution across different scenes. In particular, by combining language, those models can exhibit a strong capability to comprehend complex scenes, potentially enhancing super-resolution performance. Therefore, this paper proposes new perceptual loss with Contrastive Language-Image Pre-training (CLIP) based on Vision Transformer (ViT) instead of VGG network. The results demonstrate our proposed perceptual loss can generate photorealistic images without grid-like artifacts.

寄稿の翻訳タイトルImproving Perceptual Loss with CLIP for Super-Resolution
本文言語Japanese
ページ(範囲)217-223
ページ数7
ジャーナルSeimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering
90
2
DOI
出版ステータスPublished - 2024

Keywords

  • CLIP
  • large-scale pre-training
  • perceptual loss
  • super-resolution
  • vision transformer

ASJC Scopus subject areas

  • 機械工学

引用スタイル