Follow us :-
AQUAPHISH: Leveraging Metaheuristics and Automated Machine Learning for Precision Phishing Detection
Ph.D. in Financial Markets

  uma.shankar@uniq.edu.iq


Abstract

Phishing is an ongoing and dynamic threat in the field of cybersecurity, targeting user trust to capture sensitive data through fraudulent websites. Conventional detection systems tend to use binary classification and static features, which make them less flexible to new attack paradigms. This paper seeks to design a solid and comprehensible phishing detection system that alleviates the drawbacks of binary labeling by proposing a regression-based risk scoring model. The aim is to improve accuracy, feature interpretability, and deployment in real-time settings. The new method combines Whale Optimization Algorithm (WOA) for feature selection and H2O AutoML for model creation and assessment. A filtered dataset of 10,000 phishing and normal websites is operated upon using 48 features, which are then reduced to 36 using WOA. The last models are optimized with H2O AutoML, encompassing ensemble learners, and tested on various regression metrics. Interpretability is achieved with SHAP analysis. The best model had an R² of 0.9534, RMSE of 0.1079, and MSE of 0.0116, better than traditional classification-based phishing detectors. The system, with only 36 features, had training time decreased by 23.6% and inference latency reduced by ~18%, without any sacrifice in detection accuracy (98.3%). Regression-based scoring also supported adaptive threat ranking in real time. By posing phishing detection as a regression problem and integrating metaheuristic feature selection with AutoML, this work introduces a scalable and explainable framework ready for real-world deployment. The low-latency yet high-accuracy model is best suited for integration into browser-level phishing filters and cloud-based threat intelligence platforms.