Impact Factor:0.0
DOI number:10.1016/j.jag.2024.103918
Journal:International Journal of Applied Earth Observation and Geoinformation
Abstract:Fusing multi-modal information from visible (VIS) and thermal infrared (TIR) images is crucial for object detection in fully adapting to varied lighting conditions. However, the existing models usually treat VIS and TIR images as independent information and extract corresponding features from separate networks due to the scarcity of training data with labeled instances from both VIS and TIR registration images. To fill this gap, a novel Multi-Modal Fusion NETwork (M2FNet) based on the Transformer architecture is proposed in this paper, which contains two effective modules: the Union-Modal Attention (UMA) and the Cross-Modal Attention (CMA). The UMA module aggregates multi-spectral features from VIS and TIR images and then extracts multi-modal features via a convolutional neural network (CNN) backbone. The CMA module is designed to learn cross-attention features from VIS and TIR pairwise features by Transformer architecture. Evaluation results by the mean average precision (mAP) metric show that the M2FNet method significantly advances the baseline methods trained using only VIS or TIR images by 10.71 % and 2.97 %, respectively. The increments in mAP are observed in the M2FNet method compared with the existing multi-modal methods on two public datasets. Sensitivity analysis of eight illumination thresholds shows that the M2FNet method presents robustness performance on varied illumination conditions and achieves the maximum increase in accuracy of 25.6 %. Moreover, this method is subsequently applied to a new testing dataset, VI2DA (Visible-Infrared paired Video and Image DAtaset), observed by diverse sensors and platforms for testing the generalization ability of object detectors, which will be publicly available at https://github.com/TIR-OD/Datasets.
Indexed by:Journal paper
Document Code:103918
Discipline:Natural Science
First-Level Discipline:Geography
Document Type:J
Volume:130
Page Number:103918
Translation or Not:no
Included Journals:SCI
Links to published journals:https://www.sciencedirect.com/science/article/pii/S1569843224002723
First Author:Chenchen Jiang
Correspondence Author:Huazhong Ren
All the Authors:Hong Yang,Hongtao Huo,Pengfei Zhu,Zhaoyuan Yao,Jing Li,Min Sun,Shihao Yang
Date of Publication:2024-05-23
任华忠
+
Date of Birth: 1985-10-05
Gender: Male
Education Level: With Certificate of Graduation for Doctorate Study
Administrative Position: Associate Professor with Tenure
Alma Mater: Beijing Normal University
Paper Publications
M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images
Hits: