Abstract
In 3D human pose estimation,the effective use of temporal and spatial information is key. Transformers have shown considerable potential in this field. However, existing models often utilize basic temporal position embedding, which restricts their ability to fully leverage temporal information. Additionally, while human body information like bone lengths are known in some cases, current networks do not incorporate this prior information, leading to limitations in estimation accuracy. To address these issues, we propose a transformer-based network for 3D human pose estimation that uses cross-attention with Rotary Position Embedding (RoPE). This network integrates RoPE with windows mechanism, allowing for flexible inference across varying sequence lengths while maintaining strong relative position awareness. Furthermore, we introduce bone length prior input to the network, and a cross-attention to integrate bone constraints into 3D pose estimation. Experimentally, our approach demonstrates that the inclusion of bone length information and longer sequences significantly reduces estimation errors, while improving the continuity of pose sequences. Notably, the performance surpasses state-of-the-art methods, showcasing the benefits of incorporating bone priors and advanced position embedding into 3D human pose estimation.
| Original language | English |
|---|---|
| Publication status | Published - 2024 |
| Event | 35th British Machine Vision Conference, BMVC 2024 - Glasgow, United Kingdom Duration: 25 Nov 2024 → 28 Nov 2024 |
Conference
| Conference | 35th British Machine Vision Conference, BMVC 2024 |
|---|---|
| Country/Territory | United Kingdom |
| City | Glasgow |
| Period | 25/11/24 → 28/11/24 |
Fingerprint
Dive into the research topics of 'Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver