Modified Weight Optimized XG Boost (MWO-XGB) for Concept Drift and Data Imbalance Problems in the Online Environment

Sagargouda S Patil; Dinesha H.A.

doi:https://doi.org/10.14445/22490183/IJETT-V70I6P232

Research Article | Open Access | Download PDF

Volume 70 | Issue 6 | Year 2022 | Article Id. IJETT-V70I6P232 | DOI : https://doi.org/10.14445/22490183/IJETT-V70I6P232

Modified Weight Optimized XG Boost (MWO-XGB) for Concept Drift and Data Imbalance Problems in the Online Environment

Sagargouda S Patil, Dinesha H.A.

Received	Revised	Accepted	Published
15 Apr 2022	30 May 2022	04 Jun 2022	29 Jun 2022

Citation :

Sagargouda S Patil, Dinesha H.A., "Modified Weight Optimized XG Boost (MWO-XGB) for Concept Drift and Data Imbalance Problems in the Online Environment," International Journal of Engineering Trends and Technology (IJETT), vol. 70, no. 6, pp. 308-316, 2022. Crossref, https://doi.org/10.14445/22490183/IJETT-V70I6P232

Abstract

Nowadays, many websites on the internet are being used for sharing information, connecting people, video streaming, browsing, etc. All these websites are accessed using the links which the host provides. The host provides the links with proper security and good content. But some of the sites have Malicious Uniform Resource Allocators (URL) using which the attacker can access the user information. When the user clicks or taps on the links or hyperlinks of these websites, then he is redirected to another website. In this case, the user has no idea that he is getting attacked by the user, and they are providing personal information to the attacker. Hence, in this paper, the machine learning system, XGBoost, using which the model can identify the malicious links, classify them and remove them using the proposed modified XGBoost model. In this paper, the proposed modified XGBoost method, Modified Weight Optimized XGBoost (MWO-XGB), detects the URL in an online environment with class imbalance and concept drift problems. This paper mainly focused on the popular NSL-KDD dataset and other social media datasets to identify and detect the malicious URL using the proposed model. The experimental results are better when compared with the existing system such as XGBoost etc. This model's main focus is to reduce the malicious attacks in the online environment using the MWO-XGB model.

Keywords

Malicious URL, MWO-XGBoost, NSL-KDD, Attack.

References

[1] Anthi, Eirini & Williams, Lowri & Rhode, Matilda & Burnap, Pete & Wedgbury, Adam., Adversarial Attacks on Machine Learning Cybersecurity Defences in Industrial Control Systems. Journal of Information Security and Applications, 58 (2021) 102717. 10.1016/J.Jisa.2020.102717.
[2] M. Bagaa, T. Taleb, J. B. Bernabe and A. Skarmeta, A Machine Learning Security Framework for Iot Systems, in IEEE Access, 8 (2020) 114066-114077. Doi: 10.1109/ACCESS.2020.2996214.
[3] Sarker, Iqbal., Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science. 2 (2021). 10.1007/S42979-021-00592-X.
[4] Ligthart, Alexander & Catal, Cagatay & Tekinerdogan, Bedir., Analyzing the Effectiveness of Semi-Supervised Learning Approaches for Opinion Spam Classification. Applied Soft Computing, (2021). 101. 107023. 10.1016/J.Asoc.2020.107023.
[5] Zhou, Yuyang & Cheng, Guang & Jiang, Shanqing & Dai, Mian., Building an Efficient Intrusion Detection System Based on Feature Selection and Ensemble Classifier. Computer Networks, (2020). 174. 10.1016/J.Comnet.2020.107247.
[6] Museba, Tino & Nelwamondo, Fulufhelo & Ouahada, Khmaies., An Adaptive Heterogeneous Online Learning Ensemble Classifier for Nonstationary Environments. Computational Intelligence and Neuroscience, (2021). 2021. 1-11. 10.1155/2021/6669706.
[7] X. Liu Et Al., NADS-RA: Network Anomaly Detection Scheme Based on Feature Representation and Data Augmentation, in IEEE Access, 8 (2020) 214781-214800. Doi: 10.1109/ACCESS.2020.3040510.
[8] Sahoo, Somya Ranjan & Gupta, B., Classification of Spammer and Nonspammer Content in Online Social Network Using Genetic Algorithm-Based Feature Selection. Enterprise Information Systems, 14 (2020) 1-27. 10.1080/17517575.2020.1712742.
[9] Barushka, Aliaksandr & Hájek, Petr., Spam Detection on Social Networks Using Cost-Sensitive Feature Selection and Ensemble- Based Regularized Deep Neural Networks. Neural Computing and Applications., 32 (2020). 10.1007/S00521-019-04331-5.
[10] F. Masood Et Al., Spammer Detection and Fake User Identification on Social Networks, in IEEE Access, 7 (2019) 68140-68152. Doi: 10.1109/ACCESS.2019.2918196.
[11] Washha, Mahdi & Qaroush, Aziz & Mezghani, Manel & Sedes, Florence., Unsupervised Collective-Based Framework for Dynamic Retraining of Supervised Real-Time Spam Tweets Detection Model. Expert Systems with Applications, 135 (2019). 10.1016/J.Eswa.2019.05.052.
[12] Abkenar, Sepideh & Haghi Kashani, Mostafa & Akbari, Mohammad & Mahdipour, Ebrahim., Twitter Spam Detection: A Systematic Review, (2020).
[13] X. Wang, Q. Kang, J. an and M. Zhou, Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence, in IEEE Access, 7 (2019) 108384-108394. doi: 10.1109/ACCESS.2019.2932018.
[14] X. Wang, Q. Kang, M. Zhou, L. Pan and A. Abusorrah, Multiscale Drift Detection Test to Enable Fast Learning in Nonstationary Environments, in IEEE Transactions on Cybernetics, 51(7) (2021) 3483-3495. doi: 10.1109/TCYB.2020.2989213.
[15] Yang, Li & Shami, Abdallah., A Lightweight Concept Drift Detection and Adaptation Framework for IoT Data Streams.
[16] Wahab, Omar., Sustaining the Effectiveness of IoT-Driven Intrusion Detection over Time: Defeating Concept and Data Drifts, (2021). 10.36227/techrxiv.13669199.
[17] Mehmood, Hassan & Kostakos, Panos & Cortés, Marta & Anagnostopoulos, Theodoros & Pirttikangas, Susanna & Gilman, Ekaterina., Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams. Smart Cities, 4 (2021) 349-371. 10.3390/smartcities4010021.
[18] C. -C. Lin, D. -J. Deng, C. -H. Kuo and L. Chen, Concept Drift Detection and Adaption in Big Imbalance Industrial IoT Data Using an Ensemble Learning Method of Offline Classifiers, in IEEE Access, 7 (2019) 56198-56207. doi: 10.1109/ACCESS.2019.2912631.
[19] C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou and G. Min, Statistical Features-Based Real-Time Detection of Drifted Twitter Spam, in IEEE Transactions on Information Forensics and Security, 12(4) (2017) 914-925. doi: 10.1109/TIFS.2016.2621888.
[20] B. H. Schwengber, A. Vergütz, N. G. Prates and M. Nogueira, A Method Aware of Concept Drift for Online Botnet Detection, GLOBECOM 2020 - 2020 IEEE Global Communications Conference, (2020) 1-6. doi: 10.1109/GLOBECOM42002.2020.9347990.
[21] Museba, Tino & Nelwamondo, Fulufhelo & Ouahada, Khmaies & S.A, Akinola., Recurrent Adaptive Classifier Ensemble for Handling Recurring Concept Drifts. Applied Computational Intelligence and Soft Computing, (2021) 1-13. 10.1155/2021/5533777.
[22] Yusheng Dai, Hui Li, Yekui Qian, Yunling Guo, Min Zheng, Anticoncept Drift Method for Malware Detector Based on Generative Adversarial Network, Security and Communication Networks, Article ID 6644107, 2021 (2021) 12. https://doi.org/10.1155/2021/6644107
[23] Korycki, Łukasz & Krawczyk, Bartosz., Concept Drift Detection from Multi-Class Imbalanced Data Streams, (2021). 10.1109/ICDE51399.2021.00097.