Applying a two-step cluster algorithm in traffic accident data analysis

  • Khanh Giang Le

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Ho Thi Lan Huong

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Van Manh Do

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
  • Quang Hoc Tran

    University of Transport and Communications, No 3 Cau Giay Street, Hanoi, Vietnam
Email: gianglk@utc.edu.vn
Từ khóa: Traffic accident, clustering algorithms, two-step cluster, k-means, geographical information system

Tóm tắt

Cluster analysis is often employed as the initial stage in organizing heterogeneous data into homogeneous groups. Choosing an effective clustering approach and an ideal number of clusters in a traffic accident dataset might be complex and challenging. This study aims to evaluate the effectiveness of k-means and two-step cluster methods. Subsequently, the two-step cluster method and GIS are applied to analyze the traffic accident datasets from 2015 to 2017 in Hanoi, Vietnam. First, according to the Silhouette score, the two-step cluster method achieved a higher score of 0.563, while the k-means method scored 0.341. A higher Silhouette score indicates more well-defined clusters. Second, the research suggests combining the two-step cluster method with GIS for analyzing traffic accident datasets. The outcome identifies five typical types of accidents in Hanoi. In addition, the locations of various accident types were visually illustrated on a map, enabling traffic officials to recommend precise and urgent countermeasures. Importantly, the clustering results reveal that the two-step cluster method exhibits a significantly higher rate of homogeneous data in the clusters compared to the k-means method. This study demonstrates that the two-step cluster method is not only more effective than the k-means method in terms of clustering ability but also in data pre-processing. The study's results enable authorities to gain a more detailed understanding of typical traffic accident patterns in Hanoi. Besides, the employed methods could potentially be applied to other regions, providing an additional avenue for analysis

Tài liệu tham khảo

[1]. Road Safety Annual Report 2022. https://www.itf-oecd.org/road-safety-annual-report-2022, (accessed 15 November 2023).
[2]. In 2022, Handle more than 2.8 million cases of traffic violations and fine more than 4,124 billion VND. https://baochinhphu.vn/nam-2022-xu-ly-hon-28-trieu-truong-hop-vi-pham-giao-thong-phat-tien-hon-4124-ty-dong-102221223112959466.html, (accessed 15 November 2023).
[3]. M. Amiruzzaman, Prediction of traffic-violation using data mining techniques, in Proceedings of the Future Technologies Conference (FTC), Vancouver, Canada, (2018) 15-16. https://doi.org/10.1007/978-3-030-02686-8_23
[4]. S. Kumar, D. Toshniwal, A data mining framework to analyze road accident data, J. Big Data, 2 (2015) 1–18. https://doi.org/10.1186/s40537-015-0035-y
[5]. M. Mashfiq Rizvee, M. Amiruzzaman, M. R. Islam, Data Mining and Visualization to Understand Accident-Prone Areas, in Proceedings of International Joint Conference on Advances in Computational Intelligence, Singapore, (2020) 20–21. https://doi.org/10.48550/arXiv.2103.09062
[6]. S. Pasupathi, V. Shanmuganathan, K. Madasamy, H. R. Yesudhas, M. Kim, Trend analysis using agglomerative hierarchical clustering approach for time series big data, J. Supercomput., 7 (2021) 1–20. https://doi.org/10.1007/s11227-020-03580-9
[7]. S. Kumar, D. Toshniwal, A data mining approach to characterize road accident locations, J. Mod. Transp., 24 (2016) 62–72. https://doi.org/10.1007/s40534-016-0095-5
[8]. T. K. Anderson, Kernel density estimation and k-means clustering to profile road accident hotspots, Accid Anal Prev., 41 (2009) 359–364. https://doi.org/10.1016/j.aap.2008.12.014
[9]. V. Prasannakumar, H. Vijith, R. Charutha, N. Geetha, Spatiotemporal clustering of road accidents: GIS based analysis and assessment, Procedia Soc. Behav. Sci., 21 (2011) 317–325. https://doi.org/10.1016/j.sbspro.2011.07.020
[10]. J. Lu, A. Gan, K. Haleem, W. Wu, Clustering-based roadway segment division for the identification of high-crash locations, J. Transp. Saf. Secur., 5 (2013) 224–239. https://doi.org/10.1080/19439962.2012.730118
[11]. B. Depaire, G. Wets, K. Vanhoof, Traffic accident segmentation by means of latent class clustering, Accid Anal Prev., 40 (2008) 1257-1266. https://doi.org/10.1016/j.aap.2008.01.007
[12]. J. Han, J. Pei, M. Kamber, Data Mining: Concepts and Techniques, Fourth ed., Morgan Kaufmann, 2023.
[13]. C. X. Gao, D. Dwyer, Y. Zhu, C. L. Smith, L. Du, K. M. Filia, J. Bayer, J. M. Menssink, T. Wang, C. Bergmeir, S. Wood, An overview of clustering methods with guidelines for application in mental health research, Psychiatry Res., 327 (2023) 115265. https://doi.org/10.1016/j.psychres.2023.115265
[14]. N. Manap, M. N. Borhan, M. R. M. Yazid, M. K. A. Hambali, A. Rohan, Identification of hotspot segments with a risk of heavy-vehicle accidents based on spatial analysis at controlled-access highway, Sustainability, 13 (2021) 1487. https://doi.org/10.3390/su13031487
[15]. S. S. A. Kazmi, M. Ahmed, R. Mumtaz, Z. Anwar, Spatiotemporal clustering and analysis of road accident hotspots by exploiting GIS technology and Kernel density estimation, Comput J., 65 (2022) 155-176. https://doi.org/10.1093/comjnl/bxz158
[16]. A. Ganjali Khosrowshahi, I. Aghayan, M. M. Kunt, A. A. Choupani. Detecting crash hotspots using grid and density-based spatial clustering, in Proceedings of the Institution of Civil Engineers – Transport, 176 (2023) 200–212. https://doi.org/10.1680/jtran.20.00028
[17]. M. Bonera, R. Mutti, B. Barabino, G. Guastaroba, A. Mor, C. Archetti, C. Filippi, M. G. Speranza, G. Maternini, Identifying clusters and patterns of road crash involving pedestrians and cyclists. A case study on the Province of Brescia (IT), Transp. Res. Procedia, 60 (2022) 512-519. https://doi.org/10.1016/j.trpro.2021.12.066
[18]. K. S. Ng, W. T. Hung, W. G. Wong, An algorithm for assessing the risk of traffic accident. J Safety Res., 33 (2002) 387-410. https://doi.org/10.1016/S0022-4375(02)00033-6
[19]. C. Zhang, W. Huang, T. Niu, Z. Liu, G. Li, D. Cao, Review of Clustering Technology and Its Application in Coordinating Vehicle Subsystems. Automot. Innov., 6 (2023) 89-115. https://doi.org/10.1007/s42154-022-00205-0
[20]. M. R. Islam, I. J. Jenny, M. Nayon, M. R. Islam, M. Amiruzzaman, M. Abdullah-Al-Wadud, Clustering algorithms to analyze the road traffic crashes, in Proceedings of the 2021 International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 5–7 August 2021. https://doi.org/10.48550/arXiv.2108.03490
[21]. J. M. Pardillo-Mayora, C. A. Domínguez-Lira, R. Jurado-Piña, Empirical calibration of a roadside hazardousness index for Spanish two-lane rural roads. Accid Anal Prev., 42 (2010) 2018-2023. https://doi.org/10.1016/j.aap.2010.06.012
[22]. D. Şchiopu, Applying Two-step cluster analysis for identifying bank customers' profile. Buletinul, 62 (2010) 66-75.
[23]. Y. Li, C. Liang, The analysis of spatial pattern and hotspots of aviation accident and ranking the potential risk airports based on GIS platform, J Adv Transp., (2018) 1–12. https://doi.org/10.1155/2018/4027498
[24]. Hanoi urban transport development, Needs a long-term vision and breakthrough approach. https://hanoimoi.vn/phat-trien-giao-thong-do-thi-ha-noi-can-tam-nhin-dai-han-va-cach-lam-dot-pha-640326.html (accessed 01 December 2023).
[25]. More than 400 people died in traffic accidents in 2022 in Hanoi. https://vtv.vn/xa-hoi/ha-noi-hon-400-nguoi-tu-vong-vi-tai-nan-giao-thong-trong-nam-2022-20230106174606764.htm (accessed 01 December 2023).
[26]. D. Endalie, W. T. Abebe, Analysis and Detection of Road Traffic Accident Severity via Data Mining Techniques: Case Study Addis Ababa, Ethiopia. Math. Probl. Eng., 2023. https://doi.org/10.1155/2023/6536768
[27]. H. Z. Selvi, B. Caglar, Using cluster analysis methods for multivariate map of traffic accidents, Open Geosci., 10 (2018) 772-781. https://doi.org/10.1515/geo-2018-0060
[28]. J. P. Verma, Data Analysis in Management with SPSS Software, Springer India, 2013. https://doi.org/10.1007/978-81-322-0786-3
[29]. G. D. Garson, Cluster analysis, Statistical Publishing Associates, 2014.
[30]. J. Bacher, K. Wenzig, M. Vogler, SPSS Two-step Cluster – A First Evaluation, 23 (2004).
[31]. J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabilities, University of California Press, Berkeley, 1967, 1, 281-296. http://projecteuclid.org/euclid.bsmsp/1200512992
[32]. J. Han, J. G. Lee, M. Kamber, An overview of clustering methods in geographic data analysis, Data Min Knowl Discov., 2 (2009) 149-170. https://doi.org/10.1201/9781420073980
[33]. M. A. Syakur, B. K. Khotimah, E. M. S. Rochman, B. D. Satoto, Integration k-means clustering method and elbow method for identification of the best customer profile cluster, in IOP conference series: materials science and engineering, IOP Publishing, 336 (2018) 012-017. https://doi.org/10.1088/1757-899X/336/1/012017
[34]. Silhouette Coefficient, An Overview, ScienceDirect Topics. https://www.sciencedirect.com/topics/computer-science/silhouette-coefficient (accessed 01 December 2023).
[35]. A. Supandi, A. Saefuddin, I. D. Sulvianti, Two step Cluster Application to Classify Villages in Kabupaten Madiun Based on Village Potential Data, Xplore J. Stat., 10 (2021) 12–26. https://doi.org/10.29244/xplore.v10i1.272
[36]. J. D. Oña, G. López, R. Mujalli, F. J. Calvo, Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks, Accid Anal Prev., 51 (2013) 1-10. https://doi.org/10.1016/j.aap.2012.10.016

Tải xuống

Chưa có dữ liệu thống kê