任华忠中文主页--论文成果

任华忠研究员

北京大学地球与空间科学学院遥感与地理信息系统研究所研究员，长聘副教授，博士生导师。主要研究方向：热红外遥感，开展多源热红外遥感数据的地表温度与发射率反演方法研究、地表温度角度归一化、红外遥感影像目标识别、热异常监测等研究工作。国家自然科学基金委优秀青年基金获得者；获高校GIS创新人物奖（2022）、李小文遥感科学青年奖（2019）、北京市科技新星人才计划（2017）、科技部国家遥感中心遥感青年科技人才创新计划（2016）；入选中国科学院-美国科学院空间科学新领军人物（2019）；担任全球定量遥感最新进展国际会议（International Symposium on Recent Advances in Quantitative Remote Sensing）科学委员会委员、中国遥感应用...

同专业硕导同专业博导

论文成果中文主页 > 科学研究 > 论文成果

M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images

发布时间： 2024-06-06 点击次数：

影响因子：0.0
DOI码：10.1016/j.jag.2024.103918
发表刊物：International Journal of Applied Earth Observation and Geoinformation
摘要：Fusing multi-modal information from visible (VIS) and thermal infrared (TIR) images is crucial for object detection in fully adapting to varied lighting conditions. However, the existing models usually treat VIS and TIR images as independent information and extract corresponding features from separate networks due to the scarcity of training data with labeled instances from both VIS and TIR registration images. To fill this gap, a novel Multi-Modal Fusion NETwork (M2FNet) based on the Transformer architecture is proposed in this paper, which contains two effective modules: the Union-Modal Attention (UMA) and the Cross-Modal Attention (CMA). The UMA module aggregates multi-spectral features from VIS and TIR images and then extracts multi-modal features via a convolutional neural network (CNN) backbone. The CMA module is designed to learn cross-attention features from VIS and TIR pairwise features by Transformer architecture. Evaluation results by the mean average precision (mAP) metric show that the M2FNet method significantly advances the baseline methods trained using only VIS or TIR images by 10.71 % and 2.97 %, respectively. The increments in mAP are observed in the M2FNet method compared with the existing multi-modal methods on two public datasets. Sensitivity analysis of eight illumination thresholds shows that the M2FNet method presents robustness performance on varied illumination conditions and achieves the maximum increase in accuracy of 25.6 %. Moreover, this method is subsequently applied to a new testing dataset, VI2DA (Visible-Infrared paired Video and Image DAtaset), observed by diverse sensors and platforms for testing the generalization ability of object detectors, which will be publicly available at https://github.com/TIR-OD/Datasets.
论文类型：期刊论文
论文编号：103918
学科门类：理学
一级学科：地理学
文献类型：J
卷号：130
页面范围：103918
是否译文：否
收录刊物：SCI
发布期刊链接：https://www.sciencedirect.com/science/article/pii/S1569843224002723
第一作者：Chenchen Jiang
通讯作者：Huazhong Ren
全部作者：Hong Yang,Hongtao Huo,Pengfei Zhu,Zhaoyuan Yao,Jing Li,Min Sun,Shihao Yang
发表时间：2024-05-23

附件：

1.蒋晨琛_JAG_2024.pdf

上一条： Low Lunar Surface Temperature Retrieval From LRO Diviner Radiometer Observation Data

下一条： Global Lunar Christiansen Feature From LRO Diviner Radiometer Observation Data