Urban traffic flow prediction and traffic state identification using catboost with shap based analysis
Email:
hoanth@utc.edu.vn
Từ khóa:
Traffic prediction, catboost, machine learning, time series analysis, intelligent transportation systems, SHAP.
Tóm tắt
Traffic flow forecasting is a critical component in intelligent transportation systems, supporting traffic management, reducing congestion, and improving the operational efficiency of urban road networks. However, this is a challenging problem due to the temporal variability of traffic data and the influence of numerous complex factors. In this study, we propose a traffic flow forecasting method based on the CatBoost algorithm to effectively exploit tabular traffic data collected from traffic sensors. The dataset consists of 2,976 records containing temporal information and vehicle counts across four categories (cars, motorcycles, buses, and trucks). In addition to the original features, the study constructs supplementary temporal and time-series features, including Hour, DayOfWeek, IsWeekend, Total_lag1, and Total_roll3, to enhance the model's ability to capture traffic flow variation trends. Based on this, two independent machine learning tasks are established: (i) total traffic flow forecasting as a regression problem, and (ii) traffic condition classification into four levels. Experimental results demonstrate that the proposed model achieves strong predictive performance. Furthermore, feature importance analysis using the SHAP method reveals that vehicle count-related variables, particularly CarCount and BusCount, have a significant impact on prediction outcomes. The study demonstrates that CatBoost is an effective approach for traffic flow forecasting with tabular data and holds strong potential for application in intelligent traffic management systems.Tài liệu tham khảo
[1]. Y. Zheng, R. Jiang, X. Song, D. Yin, Z. Wang, R. Shibasaki, A Comprehensive Survey on Traffic Prediction, IEEE Transactions on Knowledge and Data Engineering, 2025 (Early Access). https://doi: 10.1109/TKDE.2025.3461234
[2]. R. Jiang, D. Yin, Z. Wang, Y. Wang, J. Deng, H. Liu, Z. Cai, J. Deng, X. Song, R. Shibasaki, "DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction, in Proc. 30th ACM Int. Conf. on Information and Knowledge Management (CIKM), (2021) 4515–4525. https://doi: 10.1145/3459637.3482000
[3]. B. L. Smith, B. M. Williams, R. K. Oswald, Comparison of Parametric and Nonparametric Models for Traffic Flow Forecasting, Transportation Research Part C: Emerging Technologies, 10 (2002) 303–321. https://doi: 10.1016/S0968-090X(02)00023-8
[4]. B. M. Williams, L. A. Hoel, Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results, Journal of Transportation Engineering, 129 (2003) 664–672. https://doi: 10.1061/(ASCE)0733-947X(2003)129:6(664)
[5]. Y. Freund, R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55 (1997) 119–139. https:// doi: 10.1006/jcss.1997.1504
[6]. T. Chen and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2016) 785–794. https://doi: 10.1145/2939672.2939785
[7]. G. Ke et al., LightGBM: A Highly Efficient Gradient Boosting Decision Tree, in Advances in Neural Information Processing Systems (NeurIPS), 30 (2017) 3146–3154.
[8]. H. Lv, W. Duan, Y. Wang, C. Hua, J. Li, Short-Term Traffic Flow Prediction Based on Ensemble Machine Learning Strategies, IEEE Access, 7 (2019) 160140–160150. https://doi: 10.1109/ACCESS.2019.2951815
[9]. L. Prokhorenkova, G. Gusev, A. Vorobev, A. Dorogush, A. Gulin, CatBoost: unbiased boosting with categorical features, in Advances in Neural Information Processing Systems (NeurIPS), 31(2018) 6638–6648.
[10]. A. A. Alghamdi, A Study on the Traffic Flow Prediction through CatBoost Algorithm, International Journal of Advanced Computer Science and Applications, 13 (2022)123-132.
[11]. X. Zhang, Traffic flow prediction based on explainable machine learning, Highlights in Science, Engineering and Technology, 56 (2023) 56–65. https://doi: 10.54097/hset.v56i.10620
[12]. X. Ma, Z. Tao, Y. Wang, H. Yu, Y. Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies, 54 (2015) 187-197.
[13]. Y. Lv, Y. Duan, W. Kang, Z. Li, F.-Y. Wang, Traffic Flow Prediction with Big Data: A Deep Learning Approach, IEEE Transactions on Intelligent Transportation Systems, 16 (2015) 865–873. https://doi: 10.1109/TITS.2014.2346413
[14]. J. Jin, F. Chen, Y. Zhang, Traffic Flow Forecasting Based on Hybrid Deep Learning Framework, IEEE Access, 7 (2019) 82502–82513. https://doi: 10.1109/ACCESS.2019.2922667
[15]. L. Bai, L. Yao, C. K. M. Lee, S. R. K. A. Yeung, X. Zhang, Y. Wang, Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting, in Advances in Neural Information Processing Systems (NeurIPS), 33 (2020) 17830-17842.
[16]. J. Guo, Z. Xie, Y. Qin, L. Jia, Y. Wang, Short-term abnormal passenger flow detection with deep learning method for subway stations, Transportation Research Part C: Emerging Technologies, 136 (2022) 103556. https://doi: 10.1016/j.trc.2022.103556
[17]. J. Jiang, C. Han, W. X. Zhao, J. Wang, PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction, in Proceedings of the AAAI Conference on Artificial Intelligence, 37 (2023) 4365-4373.
[18]. Y. Zhang, Z. Chen, J. Li, Spatio-Temporal Graph ODE Networks for Traffic Flow Forecasting, in Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022) 4086-4094.
[19]. J. Zhou et al., Spatio-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting, in Advances in Neural Information Processing Systems (NeurIPS), 36 (2023).
[20]. B. Yu, H. Yin, and Z. Zhu, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, in Proceedings of the AAAI Conference on Artificial Intelligence, (2018) 3634-3640.
[21]. H. Liu, Z. Dong, R. Jiang, X. Song, I. W. Tsang, STAEformer: Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting, in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), Birmingham, U.K., Oct. 2023, pp. 1–10. https://doi: 10.1145/3583780.3615160
[22]. C. Song, Y. Lin, S. Liu, X. Hu, and Z. Wang, Spatial-temporal synchronous graph convolutional networks: A new framework for spatio-temporal network data forecasting, in AAAI, 35 (2021) 11806-11814.
[2]. R. Jiang, D. Yin, Z. Wang, Y. Wang, J. Deng, H. Liu, Z. Cai, J. Deng, X. Song, R. Shibasaki, "DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction, in Proc. 30th ACM Int. Conf. on Information and Knowledge Management (CIKM), (2021) 4515–4525. https://doi: 10.1145/3459637.3482000
[3]. B. L. Smith, B. M. Williams, R. K. Oswald, Comparison of Parametric and Nonparametric Models for Traffic Flow Forecasting, Transportation Research Part C: Emerging Technologies, 10 (2002) 303–321. https://doi: 10.1016/S0968-090X(02)00023-8
[4]. B. M. Williams, L. A. Hoel, Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results, Journal of Transportation Engineering, 129 (2003) 664–672. https://doi: 10.1061/(ASCE)0733-947X(2003)129:6(664)
[5]. Y. Freund, R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55 (1997) 119–139. https:// doi: 10.1006/jcss.1997.1504
[6]. T. Chen and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2016) 785–794. https://doi: 10.1145/2939672.2939785
[7]. G. Ke et al., LightGBM: A Highly Efficient Gradient Boosting Decision Tree, in Advances in Neural Information Processing Systems (NeurIPS), 30 (2017) 3146–3154.
[8]. H. Lv, W. Duan, Y. Wang, C. Hua, J. Li, Short-Term Traffic Flow Prediction Based on Ensemble Machine Learning Strategies, IEEE Access, 7 (2019) 160140–160150. https://doi: 10.1109/ACCESS.2019.2951815
[9]. L. Prokhorenkova, G. Gusev, A. Vorobev, A. Dorogush, A. Gulin, CatBoost: unbiased boosting with categorical features, in Advances in Neural Information Processing Systems (NeurIPS), 31(2018) 6638–6648.
[10]. A. A. Alghamdi, A Study on the Traffic Flow Prediction through CatBoost Algorithm, International Journal of Advanced Computer Science and Applications, 13 (2022)123-132.
[11]. X. Zhang, Traffic flow prediction based on explainable machine learning, Highlights in Science, Engineering and Technology, 56 (2023) 56–65. https://doi: 10.54097/hset.v56i.10620
[12]. X. Ma, Z. Tao, Y. Wang, H. Yu, Y. Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies, 54 (2015) 187-197.
[13]. Y. Lv, Y. Duan, W. Kang, Z. Li, F.-Y. Wang, Traffic Flow Prediction with Big Data: A Deep Learning Approach, IEEE Transactions on Intelligent Transportation Systems, 16 (2015) 865–873. https://doi: 10.1109/TITS.2014.2346413
[14]. J. Jin, F. Chen, Y. Zhang, Traffic Flow Forecasting Based on Hybrid Deep Learning Framework, IEEE Access, 7 (2019) 82502–82513. https://doi: 10.1109/ACCESS.2019.2922667
[15]. L. Bai, L. Yao, C. K. M. Lee, S. R. K. A. Yeung, X. Zhang, Y. Wang, Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting, in Advances in Neural Information Processing Systems (NeurIPS), 33 (2020) 17830-17842.
[16]. J. Guo, Z. Xie, Y. Qin, L. Jia, Y. Wang, Short-term abnormal passenger flow detection with deep learning method for subway stations, Transportation Research Part C: Emerging Technologies, 136 (2022) 103556. https://doi: 10.1016/j.trc.2022.103556
[17]. J. Jiang, C. Han, W. X. Zhao, J. Wang, PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction, in Proceedings of the AAAI Conference on Artificial Intelligence, 37 (2023) 4365-4373.
[18]. Y. Zhang, Z. Chen, J. Li, Spatio-Temporal Graph ODE Networks for Traffic Flow Forecasting, in Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022) 4086-4094.
[19]. J. Zhou et al., Spatio-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting, in Advances in Neural Information Processing Systems (NeurIPS), 36 (2023).
[20]. B. Yu, H. Yin, and Z. Zhu, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, in Proceedings of the AAAI Conference on Artificial Intelligence, (2018) 3634-3640.
[21]. H. Liu, Z. Dong, R. Jiang, X. Song, I. W. Tsang, STAEformer: Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting, in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), Birmingham, U.K., Oct. 2023, pp. 1–10. https://doi: 10.1145/3583780.3615160
[22]. C. Song, Y. Lin, S. Liu, X. Hu, and Z. Wang, Spatial-temporal synchronous graph convolutional networks: A new framework for spatio-temporal network data forecasting, in AAAI, 35 (2021) 11806-11814.
Tải xuống
Chưa có dữ liệu thống kê
Nhận bài
15/03/2026
Nhận bài sửa
29/04/2026
Chấp nhận đăng
30/04/2026
Xuất bản
15/05/2026
Chuyên mục
Công trình khoa học
Kiểu trích dẫn
Pham Thi, L., Nguyen Duc, D., & Nguyen Thi Hong, H. (1778778000). Urban traffic flow prediction and traffic state identification using catboost with shap based analysis. Tạp Chí Khoa Học Giao Thông Vận Tải, 77(4), 571-582. https://doi.org/10.47869/tcsj.77.4.17





