Prediction of COVID-19 Vaccine Side Effects using SMOTE and Ensemble Machine Learning Models

Prediction of COVID-19 Vaccine Side Effects using SMOTE and Ensemble Machine Learning Models

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-4
Year of Publication : 2024
Author : R Vaishali, B Sarojini, D Sobya
DOI : 10.14445/22315381/IJETT-V72I4P133

How to Cite?

R Vaishali, B Sarojini, D Sobya, "Prediction of COVID-19 Vaccine Side Effects using SMOTE and Ensemble Machine Learning Models," International Journal of Engineering Trends and Technology, vol. 72, no. 4, pp. 324-332, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I4P133

Abstract
While the rapid development of vaccines during the COVID-19 pandemic helped save billions of lives, a significant percentage of the population reported adverse reactions after vaccination. Post-vaccination surveys were conducted to understand the possible side effects. Analyzing and understanding the side effects enables the relevant stakeholders to gain more understanding of the vaccine and allows the individual to make informed decisions about whether to receive the vaccine. This work aims to identify a robust approach that can be used to predict the possibility of potential adverse symptoms. A dataset comprising 840 participants with 15 Post-Vaccination Symptoms was considered for the study. Synthetic Minority Oversampling TEchnique (SMOTE) was used to handle the class imbalance of the dataset. A combination of SMOTE and ensemble machine learning models was used to predict the adverse reactions to COVID-19 vaccines. The ensemble Machine Learning (ML) models that are considered for this study are Random Forest, Extreme Gradient Boosting Machine (XGBoost), Light Gradient Boosting Machine, and AdaBoost. The metrics accuracy, precision, recall and Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) were used to measure the performance of the models. The dataset was pre-processed to handle missing values and one-hot encoding was applied to convert categorical variables into the numerical format. Insights into the data distribution and relationships between variables were gained through exploratory data and correlation analysis, respectively. Class imbalance in the target variables was addressed using SMOTE, resulting in a significantly improved F1-score and ROC-AUC score. Among the ensemble ML models, XGBoost delivered the best performance metrics. A combined performance score was calculated by averaging the F1-score and accuracy to identify the best model. XGBoost obtained the highest performance score among the ensemble ML models, and its performance is further enhanced by performing the threshold adjustment using the maximum F1- score strategy. The findings suggest that the combination of SMOTE and ensemble learning models with threshold adjustment provides a more efficient prediction of adverse effects after COVID-19 vaccination, aiding in healthcare decision-making.

Keywords
COVID-19, Ensemble Machine Learning, Exploratory Data Analysis, Imbalanced Dataset, SMOTE.

References
[1] Neha Purohit et al., “COVID-19 Management: The Vaccination Drive in India,” Health Policy Technology, vol. 11, no. 2, pp. 1-15, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Simran Preet Kaur, and Vandana Gupta, “COVID-19 Vaccine: A Comprehensive Status Report,” Virus Research, vol. 288, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Qutaiba A. Al Khames Aga et al., “Safety of COVID-19 Vaccines,” Journal of Medical Virology, vol. 93, no. 12, pp. 6588-6594, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Mangalakumari Jeyanathan et al., “Immunological Considerations for COVID-19 Vaccine Strategies,” Nature Reviews Immunology, vol. 20, pp. 615-632, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Dovy Djanas et al., “Survey Data of COVID-19 Vaccine Side Effects among Hospital Staff in a National Referral Hospital in Indonesia,” Data in Brief, vol. 36, pp. 1-7, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Nikolaos Tzenios, Mohamad Chahine, and Mary Tazanios, “Better Strategies For Coronavirus (COVID-19) Vaccination,” Special Journal of the Medical Academy and other Life Sciences, vol. 1, no. 2, pp. 1-7, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Eng Eong Ooi et al., “Use of Analgesics/Antipyretics in the Management of Symptoms Associated with COVID-19 Vaccination,” NPJ Vaccines, vol. 7, no. 1, pp. 1-10, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Georgi Bogdanov et al., “Cutaneous Adverse Effects of the Available COVID-19 Vaccines,” Clinics in Dermatology, vol. 39, no. 3, pp. 523-531, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Jeffrey V. Lazarus et al., “A Survey of COVID-19 Vaccine Acceptance across 23 Countries in 2022,” Nature Medicine, vol. 29, no. 2, pp. 366-375, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Abanoub Riad et al., “COVID-19 Vaccines Safety Tracking (CoVaST): Protocol of a Multi-Center Prospective Cohort Study for Active Surveillance of COVID-19 Vaccines’ Side Effects,” International Journal of Environmental Research and Public Health, vol. 18, no. 15, pp. 1-10, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Yung-Tsan Jou et al., “Factors Affecting Perceived Effectiveness of Government Response towards COVID-19 Vaccination in Occidental Mindoro, Philippines,” Healthcare, vol. 10, no. 8, pp. 1-18, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Mesfin Tafa Segni et al., “Post COVID-19 Vaccination Side Effects and Associated Factors among Vaccinated Health Care Providers in Oromia Region, Ethiopia in 2021,” PLoS One, vol. 17, no. 12, pp. 1-20, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Matteo Castaldo et al., “Headache Onset after Vaccination against SARS-CoV-2: A Systematic Literature Review and Meta-Analysis,” The Journal of Headache and Pain, vol. 23, no. 1, pp. 1-16, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Kyung Ae Lee, Yu Ji Kim, and Heung Yong Jin, “Thyrotoxicosis after COVID-19 Vaccination: Seven Case Reports and a Literature Review,” Endocrine, vol. 74, pp. 470-472, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Balsam Qubais Saeed et al., “Side Effects and Perceptions Following Sinopharm COVID-19 Vaccination,” International Journal of Infectious Diseases, vol. 111, pp. 219-226, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Liane S. Canas et al., “Disentangling Post-Vaccination Symptoms from Early COVID-19,” EClinicalMedicine, vol. 42, pp. 1-10, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Kin Israel Notarte et al., “Impact of COVID-19 Vaccination on the Risk of Developing Long-COVID and on Existing Long-COVID Symptoms: A Systematic Review,” EClinicalMedicine, vol. 53, pp. 1-19, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Martuza Ahamad et al., “Adverse Effects of COVID-19 Vaccination: Machine Learning and Statistical Approach to Identify and Classify Incidences of Morbidity and Postvaccination Reactogenicity,” Healthcare, vol. 11, no. 1, pp. 1-22, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[19] V. Indira et al., “Prediction and Analysis of Covid-19 using the Deep Learning Models,” Research on Biomedical Engineering, vol. 40, pp. 183-197, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] James Flora et al., “Usefulness of Vaccine Adverse Event Reporting System for Machine-Learning Based Vaccine Research: A Case Study for COVID-19 Vaccines,” International Journal of Molecular Sciences, vol. 23, no. 15, pp. 1-19, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Himanshu Gupta, and Om Prakash Verma, “Vaccine Hesitancy in the Post-Vaccination COVID-19 Era: A Machine Learning and Statistical Analysis Driven Study,” Evolutionary Intelligence, vol. 16, pp. 739-757, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Nitesh V. Chawla et al., “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Mimi Mukherjee, and Matloob Khushi, “SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features,” Applied System Innovation, vol. 4, no. 1, pp. 1-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Elif Ceren Gök, and Mehmet Onur Olgun, “SMOTE-NC and Gradient Boosting Imputation Based Random Forest Classifier for Predicting Severity Level of COVID-19 Patients with Blood Samples,” Neural Computing and Applications, vol. 33, pp. 15693-15707, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Jenni A. M. Sidey-Gibbons, and Chris J. Sidey-Gibbons, “Machine Learning in Medicine: A Practical Introduction,” BMC Medical Research Methodology, vol. 19, pp. 1-18, 2019.
[CrossRef] [Google Scholar] [Publisher Link]