Statistical and multivariate analysis of the IoT-23 dataset: a comprehensive approach to network traffic pattern discovery

Ghani, Humera, Salekzamankhani, Shahram and Virdee, Bal Singh (2025) Statistical and multivariate analysis of the IoT-23 dataset: a comprehensive approach to network traffic pattern discovery. Journal of Cybersecurity and Privacy, 5 (4) (112). pp. 1-22. ISSN 2624-800X

Abstract

The rapid expansion of Internet of Things (IoT) technologies has introduced significant challenges in understanding the complexity and structure of network traffic data, which is essential for developing effective cybersecurity solutions. This research presents a comprehensive statistical and multivariate analysis of the IoT-23 dataset to identify meaningful network traffic patterns and assess the effectiveness of various analytical methods for IoT security research. The study applies descriptive statistics, inferential analysis, and multivariate techniques, including Principal Component Analysis (PCA), DBSCAN clustering, and factor analysis (FA), to the publicly available IoT-23 dataset. Descriptive analysis reveals clear evidence of non-normal distributions: for example, the features src_bytes, dst_bytes, and src_pkts have skewness values of −4.21, −3.87, and −2.98, and kurtosis values of 38.45, 29.67, and 18.23, respectively. These values indicate highly skewed, heavy-tailed distributions with frequent outliers. Correlation analysis revealed a strong positive correlation (0.97) between orig_bytes and resp_bytes, and a strong negative correlation (−0.76) between duration and resp_bytes, while inferential statistics indicate that linear regression provides optimal modeling of data relationships. Key findings show that PCA is highly effective, capturing 99% of the dataset’s variance and enabling significant dimensionality reduction. DBSCAN clustering identifies six distinct clusters, highlighting diverse network traffic behaviors within IoT environments. In contrast, FA explains only 11.63% of the variance, indicating limited suitability for this dataset. These results establish important benchmarks for future IoT cybersecurity research and demonstrate the superior effectiveness of PCA and DBSCAN for analyzing complex IoT network traffic data. The findings offer practical guidance for researchers in selecting appropriate statistical methods for IoT dataset analysis, ultimately supporting the development of more robust cybersecurity solutions.

Documents
11136:55860
[thumbnail of jcp-05-00112.pdf]
Preview
jcp-05-00112.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (2MB) | Preview
Details
Record
View Item View Item