TY - GEN
T1 - Revisiting Modality Imbalance In Multimodal Pedestrian Detection
AU - Das, Arindam
AU - Das, Sudip
AU - Sistu, Ganesh
AU - Horgan, Jonathan
AU - Bhattacharya, Ujjwal
AU - Jones, Edward
AU - Glavin, Martin
AU - Eising, Ciarán
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one modality. Hence, the generalization of such models becomes a significant problem where the non-dominant input modality during training could be contributing more to the course of inference. Here, we introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities. Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training to extract the multimodal distribution which is referred to as removing the imbalance problem. Furthermore, our decoupling concept of output stream helps the detection task by sharing the spatial sensitive information mutually. Extensive experiments of the proposed method on KAIST and UTokyo datasets shows improvement of the respective state-of-the-art performance.
AB - Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one modality. Hence, the generalization of such models becomes a significant problem where the non-dominant input modality during training could be contributing more to the course of inference. Here, we introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities. Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training to extract the multimodal distribution which is referred to as removing the imbalance problem. Furthermore, our decoupling concept of output stream helps the detection task by sharing the spatial sensitive information mutually. Extensive experiments of the proposed method on KAIST and UTokyo datasets shows improvement of the respective state-of-the-art performance.
KW - Modality Imbalance
KW - Multimodal Feature Fusion
KW - Multimodal Learning
KW - Pedestrian Detection
UR - http://www.scopus.com/inward/record.url?scp=85180744595&partnerID=8YFLogxK
U2 - 10.1109/ICIP49359.2023.10222711
DO - 10.1109/ICIP49359.2023.10222711
M3 - Conference contribution
AN - SCOPUS:85180744595
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1755
EP - 1759
BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Image Processing, ICIP 2023
Y2 - 8 October 2023 through 11 October 2023
ER -