Enhancing crack segmentation in fused RGB-IR images with CSWin transformer and semantic feature pyramid network

  • Nguyen Ngoc Long

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Vu Manh Trung

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Phung Ngoc Hung

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Nguyen Dan Le

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Nguyen Ngoc Lan

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
Email: nngoclan@utc.edu.vn
Từ khóa: Crack segmentation, computer vision, cross-shaped window attention, semantic feature pyramid network, asphalt pavement.

Tóm tắt

Surface crack segmentation is a critical task in structural health monitoring (SHM), serving as an early indicator of structural deterioration and safety risks. Recently, deep learning-based computer vision has emerged as a dominant approach for automating defect detection, gradually replacing manual inspections. However, traditional convolutional neural networks (CNNs) often struggle to capture long-range dependencies. To address this limitation, this paper introduces the CSWin-Semantic FPN, a model integrating the transformer architecture with cross-shaped window (CSWin) attention and a semantic feature pyramid network (Semantic FPN) to optimize feature extraction. Notably, this study utilizes a multi-modal fusion dataset-combining optical and thermal infrared images collected in real-world environments. This fusion approach significantly enhances crack signals against complex backgrounds, facilitating more effective model training. Experimental results demonstrate that the CSWin-Semantic FPN achieves an impressive intersection over union (IoU) of 70.53%, significantly outperforming ResUNet (59.44%), SwinUNet (57.91%), and UNet (51.79%). These findings confirm the potential of hybrid Transformer architectures combined with multi-modal data in providing reliable and automated SHM solutions.

Tài liệu tham khảo

[1]. W. Choi, Y.-J. Cha, SDDNet: Real-Time Crack Segmentation, IEEE Trans. Ind. Electron., 67 (2020) 8016-8025. https://doi.org/10.1109/TIE.2019.2945265
[2]. Y. Liu, J. Yao, X. Lu, R. Xie, L. Li, DeepCrack: A deep hierarchical feature learning architecture for crack segmentation, Neurocomputing, 338 (2019) 139-153. https://doi.org/10.1016/j.neucom.2019.01.036
[3]. H. Liu, J. Yang, X. Miao, C. Mertz, H. Kong, CrackFormer Network for Pavement Crack Segmentation, IEE Xplore, 24 (2023) 9240-9252. https://doi.org/10.1109/TITS.2023.3266776
[4]. Y. Yao, S.-T. E. Tung, B. Glisic, Crack detection and characterization techniques—An overview, Struct. Control Health Monit., 21 (2014) 1387-1413. https://doi.org/10.1002/stc.1655
[5]. N. C. Thi Nguyen, T. M. Vu, Damage detection in structural health monitoring using BiLSTM-1DCNN hybrid network: a case study on a large-scale steel truss bridge, Eng. Comput., 2 (2025) 2226-2242. https://doi.org/10.1108/EC-08-2024-0714
[6]. D. N. L. Minh, N. H. Xuan, T. V. Manh, B. N. K. Ngoc, Detection of damage in steel truss bridges using a hybrid 1DCNN-BIGRU model and time-series data augmentation techniques, Transp. Commun. Sci. J., 76 (2025) 1281-1295. https://doi.org/10.47869/tcsj.76.9.11
[7]. T.-V. Manh, H.-T. Ngoc, M.-T. Duc, L.-B. Phuc, L.-N. Duc, An Effective Damage Detection Approach for a Truss Bridge Using a Hybrid Deep Learning Model, Proceedings of the 5th International Conference on Sustainability in Civil Engineering, 2 (2025) 91-101. https://doi.org/10.1007/978-981-96-5206-8_10
[8]. A. Di Benedetto, M. Fiani, L. M. Gujski, U-Net-Based CNN Architecture for Road Crack Segmentation, Infrastructures, 8 (2023) 90. https://doi.org/10.3390/infrastructures8050090
[9]. X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, B. Guo, CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022) 12114-12124. https://doi.org/10.1109/CVPR52688.2022.01181
[10].J. He, J. Wang, Z. Han, B. Li, M. Lv, Y. Shi, Cancer detection for small-size and ambiguous tumors based on semantic FPN and transformer, PLOS ONE, 18 (2023) e0275194. https://doi.org/10.1371/journal.pone.0275194
[11] .F. Liu, J. Liu, L. Wang, Asphalt Pavement Crack Detection Based on Convolutional Neural Network and Infrared Thermography, IEEE Trans. Intell. Transp. Syst., 23 (2022) 22145-22155. https://doi.org/10.1109/IWAGPR65621.2025.11109044
[12].F. Liu, J. Liu, L. Wang, I. L. Al-Qadi, Multiple-type distress detection in asphalt concrete pavement using infrared thermography and deep learning, Autom. Constr., 161 (2024) 105355. https://doi.org/10.1016/j.autcon.2024.105355
[13] .X. Liu, P. Gao, T. Yu, F. Wang, R.-Y. Yuan, CSWin-UNet: Transformer UNet with cross-shaped windows for medical image segmentation, Inf. Fusion, 113 (2025) 102634. https://doi.org/10.1016/j.inffus.2024.102634
[14] .T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature Pyramid Networks for Object Detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017) 936-944. https://doi.org/10.1109/CVPR.2017.106
[15] .O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015, (2015) 234-241. https://doi.org/10.1007/978-3-319-24574-4_2
[16] .F. I. Diakogiannis, F. Waldner, P. Caccetta, C. Wu, ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data, ISPRS J. Photogramm. Remote Sens., 162 (2020) 94-114. https://doi.org/10.1016/j.isprsjprs.2020.01.013

Tải xuống

Chưa có dữ liệu thống kê
Nhận bài
21/01/2026
Nhận bài sửa
11/03/2026
Chấp nhận đăng
17/03/2026
Xuất bản
15/05/2026
Chuyên mục
Công trình khoa học