Skip to main navigation Skip to search Skip to main content

Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation

  • Cheng Chen
  • , Jiang Liu
  • , Liao Yuan Zeng
  • , Fang Duan
  • , Sean McGrath
  • , Tian Dan

Research output: Contribution to conferencePaperpeer-review

Abstract

In 3D human pose estimation,the effective use of temporal and spatial information is key. Transformers have shown considerable potential in this field. However, existing models often utilize basic temporal position embedding, which restricts their ability to fully leverage temporal information. Additionally, while human body information like bone lengths are known in some cases, current networks do not incorporate this prior information, leading to limitations in estimation accuracy. To address these issues, we propose a transformer-based network for 3D human pose estimation that uses cross-attention with Rotary Position Embedding (RoPE). This network integrates RoPE with windows mechanism, allowing for flexible inference across varying sequence lengths while maintaining strong relative position awareness. Furthermore, we introduce bone length prior input to the network, and a cross-attention to integrate bone constraints into 3D pose estimation. Experimentally, our approach demonstrates that the inclusion of bone length information and longer sequences significantly reduces estimation errors, while improving the continuity of pose sequences. Notably, the performance surpasses state-of-the-art methods, showcasing the benefits of incorporating bone priors and advanced position embedding into 3D human pose estimation.

Original languageEnglish
Publication statusPublished - 2024
Event35th British Machine Vision Conference, BMVC 2024 - Glasgow, United Kingdom
Duration: 25 Nov 202428 Nov 2024

Conference

Conference35th British Machine Vision Conference, BMVC 2024
Country/TerritoryUnited Kingdom
CityGlasgow
Period25/11/2428/11/24

Fingerprint

Dive into the research topics of 'Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation'. Together they form a unique fingerprint.

Cite this