TY - JOUR
T1 - Ensemble learning and graph topological indices for predicting physical properties of mental disorder drugs
AU - Ejima, O.
AU - Abubakar, M. S.
AU - Sarkin Pawa, S. S.
AU - Ibrahim, A. H.
AU - Aremu, K. O.
N1 - Publisher Copyright:
© 2024 The Author(s). Published by IOP Publishing Ltd.
PY - 2024/10/1
Y1 - 2024/10/1
N2 - In this paper, we use the ensemble machine learning technique to evaluate the strength of three supervised machine learning algorithms, namely, the random forest regression (RFR), support vector regression (SVR) and the gradient boosting regression (GBR) in the prediction of physical properties of mental disorder drugs with small dataset. The model was implemented on a dataset of neighborhood degree-based topological indices which served as predictor variables and physical properties of the drugs which served as target variables. To compute the neighborhood degree-based indices, we employed an algorithm that utilizes the canonical SmilES notations of the drugs. The ensemble method identifies the neighborhood third Zagreb index (NM3(G)) as an efficient predictor of boiling point, flash point and enthalpy of vaporization. The neighborhood Randic index (NR(G)) provides better prediction for molar refractivity, molar volume and polarizability. In the same vein, the neighborhood sum connectivity index (NSC(G)) is an efficient predictor of surface tension while the neighborhood reciprocal Randic index (NRR(G)) is most effective in the prediction of polar surface area. Furthermore, the comparison of the average performance between the ensemble method and the base models (RFR, SVR, GBR) over the neighborhood topological indices shows efficient performance of the individual models across multiple physical properties of mental disorder drugs, when using the neighborhood topological indices as the predictor or input feature. Overall, this research highlights the combination of three supervised machine learning models in an ensemble environment to mitigating the challenges associated with small datasets when applying machine learning models in QSPR analysis.
AB - In this paper, we use the ensemble machine learning technique to evaluate the strength of three supervised machine learning algorithms, namely, the random forest regression (RFR), support vector regression (SVR) and the gradient boosting regression (GBR) in the prediction of physical properties of mental disorder drugs with small dataset. The model was implemented on a dataset of neighborhood degree-based topological indices which served as predictor variables and physical properties of the drugs which served as target variables. To compute the neighborhood degree-based indices, we employed an algorithm that utilizes the canonical SmilES notations of the drugs. The ensemble method identifies the neighborhood third Zagreb index (NM3(G)) as an efficient predictor of boiling point, flash point and enthalpy of vaporization. The neighborhood Randic index (NR(G)) provides better prediction for molar refractivity, molar volume and polarizability. In the same vein, the neighborhood sum connectivity index (NSC(G)) is an efficient predictor of surface tension while the neighborhood reciprocal Randic index (NRR(G)) is most effective in the prediction of polar surface area. Furthermore, the comparison of the average performance between the ensemble method and the base models (RFR, SVR, GBR) over the neighborhood topological indices shows efficient performance of the individual models across multiple physical properties of mental disorder drugs, when using the neighborhood topological indices as the predictor or input feature. Overall, this research highlights the combination of three supervised machine learning models in an ensemble environment to mitigating the challenges associated with small datasets when applying machine learning models in QSPR analysis.
KW - ensemble learning
KW - gradient boosting regression
KW - mental health disorder
KW - random forest regression
KW - support vector regression
KW - topological indices
UR - http://www.scopus.com/inward/record.url?scp=85205392451&partnerID=8YFLogxK
U2 - 10.1088/1402-4896/ad79a4
DO - 10.1088/1402-4896/ad79a4
M3 - Article
AN - SCOPUS:85205392451
SN - 0031-8949
VL - 99
JO - Physica Scripta
JF - Physica Scripta
IS - 10
M1 - 106009
ER -