Transformer Networks for Future Person Localization in First-Person Videos

Amar Alikadic, Hideo Saito, Ryo Hachiuma

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Reliably and accurately forecasting future trajectories of pedestrians is necessary for systems like autonomous vehicles or visual assistive devices to function correctly. While previous state-of-the-art methods relied on modeling social interactions with LSTMs, with videos captured with a static camera from a bird’s-eye view, our paper presents a new method that leverages the Transformers architecture and offers a reliable way to model future trajectories in first-person videos captured by a body-mounted camera, without having to model any social interactions. Accurately forecasting future trajectories is a challenging task, mainly due to how unpredictably humans move. We tackle this issue by using information about target persons’ previous locations, scales, and dynamic poses, as well as information about the camera wearer’s ego-motion. The model we propose predicts future trajectories in a simple way, modeling each target’s trajectory separately, without the use of complex social interactions between humans or interactions between targets and the scene. Experimental results show that our method overall outperforms previous state-of-the-art methods, and yields better results in challenging situations where previous state-of-the-art methods fail.

Original languageEnglish
Title of host publicationAdvances in Visual Computing - 17th International Symposium, ISVC 2022, Proceedings
EditorsGeorge Bebis, Bo Li, Angela Yao, Yang Liu, Ye Duan, Manfred Lau, Rajiv Khadka, Ana Crisan, Remco Chang
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages12
ISBN (Print)9783031207150
Publication statusPublished - 2022
Event17th International Symposium on Visual Computing, ISVC 2022 - San Diego, United States
Duration: 2022 Oct 32022 Oct 5

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13599 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference17th International Symposium on Visual Computing, ISVC 2022
Country/TerritoryUnited States
CitySan Diego


  • Future person localization
  • Trajectory forecasting
  • Transformer networks

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Transformer Networks for Future Person Localization in First-Person Videos'. Together they form a unique fingerprint.

Cite this