Identifying Optimal Feature Set for Improved Autism Classification Using Machine Learning Techniques

Identifying Optimal Feature Set for Improved Autism Classification Using Machine Learning Techniques

	© 2024 by IJETT Journal
	Volume-72 Issue-4
	Year of Publication : 2024
	Author : Karpagam C, Deepa C
	DOI : 10.14445/22315381/IJETT-V72I4P131

How to Cite?

Karpagam C, Deepa C, "Identifying Optimal Feature Set for Improved Autism Classification Using Machine Learning Techniques," International Journal of Engineering Trends and Technology, vol. 72, no. 4, pp. 306-314, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I4P131

Abstract
Administering standard medical prognosis tools for autism disorder is a time-consuming process. Furthermore, only a trained and experienced professional can supervise the assessment. The attempts that failed to evaluate ASD (Autism Spectrum Disorder) at the right time lead to critical medical care costs and a high impact on an individual’s performance in regular activities. More flexible and evident accessible methods would assist parents and caretakers in mitigating the hurdles faced during conventional clinical diagnosis. The previous work represents the exploratory data analysis made on the autism dataset. Here, an expansion to build a model by combining Machine Learning classification algorithms on selected feature sets for improved accuracy is administered. The autism dataset used in the experiment is collected from a public repository that includes 1054 instances. RFE (Recursive Feature Elimination) and the Boruta method are preferred to determine the relevant feature set with the highest rank. A significant improvement in the accuracy of results is noted when a Random Forest (RF) is ensembled with a Support Vector Machine (SVM) with 98.97% accuracy in the toddler dataset. The resultant model maximizes accuracy and minimizes the efforts taken by practitioners with the extensive diagnosis process.

Keywords
Autism Detection, Boruta, Random Forest, Recursive Feature Elimination, Support Vector Machine.

References
[1] Catherine Lord et al., “Autism Spectrum Disorder,” The Lancet, vol. 392, no. 10146, pp. 508-520, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Haylie L. Miller, and Nicoleta L. Bugnariu, “Level of Immersion in Virtual Environments Impacts the Ability to Assess and Teach Social Skills in Autism Spectrum Disorder,” Cyberpsychology, Behavior, and Social Networking, vol. 19, no. 4, pp. 246-256, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Joseph K. Gona et al., “Parents’ and Professionals’ Perceptions on Causes and Treatment Options for Autism Spectrum Disorders (ASD) in a Multicultural Context on the Kenyan Coast,” Plos One, vol. 10, no. 8, pp. 1-13, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Lauren Rylaarsdam, and Alicia Guemez-Gamboa, “Genetic Causes and Modifiers of Autism Spectrum Disorder,” Frontiers in Cellular Neuroscience, vol. 13, pp. 1-15, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Tony Charman et al., “Non-ASD Outcomes at 36 Months in Siblings at Familial Risk for Autism Spectrum Disorder (ASD): A Baby Siblings Research Consortium (BSRC) Study,” Autism Research, vol. 10, no. 1, pp. 169-178, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Rachel Loomes, Laura Hull, and William Polmear Locke Mandy, “What is the Male-to-Female Ratio in Autism Spectrum Disorder? A Systematic Review and Meta-Analysis,” Journal of the American Academy of Child & Adolescent Psychiatry, vol. 56, no. 6, pp. 466-474, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Susan L. Hyman et al., “Identification, Evaluation, and Management of Children with Autism Spectrum Disorder,” Pediatrics, vol. 145, no. 1. pp. 1-69, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[8] John Knutsen et al., “A Systematic Review of Telemedicine in Autism Spectrum Disorders,” Review Journal of Autism and Developmental Disorders, vol. 3, pp. 330-344, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Katherine Shedlock et al., “Autism Spectrum Disorders and Metabolic Complications of Obesity,” The Journal of Pediatrics, vol. 178, pp. 183-187, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Melissa DeFilippis, “Depression in Children and Adolescents with Autism Spectrum Disorder,” Children, vol. 5, no. 9, pp. 1-9, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Cai Jie et al., “Feature Selection in Machine Learning: A New Perspective,” Neurocomputing, vol. 300, pp. 70-79, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] E. Emary, Hossam M. Zawbaa, and Aboul Ella Hassanien, “Binary Gray Wolf Optimization Approaches for Feature Selection,” Neurocomputing, vol. 172, pp. 371-381, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Ryan J. Urbanowicz et al., “Relief-Based Feature Selection: Introduction and Review,” Journal of Biomedical Informatics, vol. 85, pp. 189-203, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[14] R. Vaishali, and R. Sasikala, “A Machine Learning based Approach to Classify Autism with Optimum Behaviour Sets,” International Journal of Engineering and Technology, vol. 7, no. 4, pp. 1-6, 2018.
[Google Scholar] [Publisher Link]
[15] Girish Chandrashekar, and Ferat Sahin, “A Survey on Feature Selection Methods,” Computers and Electrical Engineering, vol. 40, no. 1, pp. 16-28, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Mokhlesur Rahman et al., “A Review of Machine Learning Methods of Feature Selection and Classification for Autism Spectrum Disorder,” Brain Sciences, vol. 10, no. 12, pp. 1-23, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] In-On Wiratsin, and Lalita Narupiyakul, “Feature Selection Technique for Autism Spectrum Disorder,” Proceedings of the 5th International Conference on Control Engineering and Artificial Intelligence, Sanya China, pp. 53-56, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Peter Washington et al., “Feature Selection and Dimension Reduction of Social Autism Data,” Pacific Symposium on Biocomputing 2020, pp. 707-718, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Raid Alzubi, Naeem Ramzan, and Hadeel Alzoubi, “Hybrid Feature Selection Method for Autism Spectrum Disorder SNPs,” 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Manchester, UK, pp. 1-7, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[20] S. Guruvammal, T. Chellatamilan, and L. Jegatha Deborah, “Optimal Feature Selection and Hybrid Classification for Autism Detection in Young Children,” The Computer Journal, vol. 64, no. 11, pp. 1760-1774, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Delowar Hossain et al., “Detecting Autism Spectrum Disorder using Machine Learning Techniques: An Experimental Analysis on Toddler, Child, Adolescent and Adult Datasets,” Health Information Science and Systems, vol. 9, pp. 1-13, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Autism Screening Data for Toddlers, Kaggle. [Online]. Available: https://www.kaggle.com/datasets/fabdelja/autism-screening-for-toddlers
[23] K.K. Mujeeb Rahman, and M. Monica Subashini, “A Deep Neural Network-Based Model for Screening Autism Spectrum Disorder Using the Quantitative Checklist for Autism in Toddlers (QCHAT),” Journal of Autism and Developmental Disorders, vol. 52, pp. 2732-2746, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Ashima Sindhu Mohanty, Krishna Chandra Patra, and Priyadarsan Parida, “Toddler ASD Classification Using Machine Learning Techniques,” International Journal of Online and Biomedical Engineering, vol. 17, no. 7, pp. 156-171, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Arjun Singh et al., “Using Machine Learning Optimization to Predict Autism in Toddlers,” Proceedings of the 11th Annual International Conference on Industrial Engineering and Operations Management, Singapore, pp. 1-12, 2021.
[Google Scholar] [Publisher Link]
[26] Haseeb Ali et al., “Imbalance Class Problems in Data Mining: A Review,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, pp. 1560-1571, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Mostofa Ahsan, Rahul Gomes, and Anne Denton, “SMOTE Implementation on Phishing Data to Enhance Cybersecurity,” 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, pp. 531-536, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Shahzad Ashraf, and Tauqeer Ahmed, “Machine Learning Shrewd Approach for an Imbalanced Dataset Conversion Samples,” Journal of Engineering and Technology, vol. 11, no. 1, pp. 1-22, 2020.
[Google Scholar] [Publisher Link]
[29] Abid Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707-39716, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Ahmed Jameel Mohammed, Masoud Muhammed Hassan, and Dler Hussein Kadir, “Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 3, pp. 3161-3172, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Jundong Li et al., “Feature Selection: A Data Perspective,” ACM Computing Surveys, vol. 50, no. 6, pp. 1-45, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[32] A. Jović, K. Brkić, and N. Bogunović, “A Review of Feature Selection Methods with Applications,” 2015 38^th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, pp. 1200-1205, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Ebrahime Mohammed Senan et al., “Diagnosis of Chronic Kidney Disease Using Effective Classification Algorithms and Recursive Feature Elimination Techniques,” Journal of Healthcare Engineering, vol. 2021, pp. 1-10, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Herve Nkiama, Syed Zainudeen Mohd Said, and Muhammad Saidu, “A Subset Feature Elimination Mechanism for Intrusion Detection System,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 4, pp. 148-157, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Frauke Degenhardt, Stephan Seifert, and Silke Szymczak, “Evaluation of Variable Selection Methods for Random Forests and Omics Data Sets,” Briefings in Bioinformatics, vol. 20, no. 2, pp. 492-503, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Lee Kuok Leong, and Azian Azamimi Abdullah, “Prediction of Alzheimer’s Disease (AD) Using Machine Learning Techniques with Boruta Algorithm as Feature Selection Method,” Journal of Physics: Conference Series, vol. 1372, pp. 1-9, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Rong Tang, and Xiaojun Zhang, “CART Decision Tree Combined with Boruta Feature Selection for Medical Data Classification,” 2020 5^th IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, pp. 80-84, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Dragutin Petkovic et al., “Improving the Explainability of Random Forest Classifier-User Centered Approach,” Pacific Symposium on Biocomputing 2018: Proceedings of the Pacific Symposium, pp. 204-215, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Amrita Roy Chowdhury, Tamojit Chatterjee, and Sreeparna Banerjee, “A Random Forest Classifier-based Approach in the Detection of Abnormalities in the Retina,” Medical & Biological Engineering & Computing, vol. 57, pp. 193-203, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Jair Cervantes et al., “A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends,” Neurocomputing, vol. 408, pp. 189-215, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Diana C. Toledo-Pérez et al., “Support Vector Machine-Based EMG Signal Classification Techniques: A Review,” Applied Sciences, vol. 9, no. 20, pp. 1-28, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Shan Suthaharan, “Support Vector Machine,” Machine Learning Models and Algorithms for Big Data Classification, Integrated Series in Information Systems, vol. 36, pp. 207-235, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Zuherman Rustam, Ely Sudarsono, and Devvi Sarwinda, “Random-Forest (RF) and Support Vector Machine (SVM) Implementation for Analysis of Gene Expression Data in Chronic Kidney Disease (CKD),” IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, pp. 1-6, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Seyed Reza Shahamiri, and Fadi Thabtah, “Autism AI: A New Autism Screening System Based on Artificial Intelligence,” Cognitive Computation, vol. 12, pp. 766-777, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

IJBTT

Identifying Optimal Feature Set for Improved Autism Classification Using Machine Learning Techniques