Skip to main content

Research Repository

Advanced Search

Multiclass stand-alone and ensemble machine learning algorithms utilised to classify 1 soils based on their physicochemical characteristics

Abbey, Samuel; Eyo, Eyo


Profile Image

Samuel Abbey
Senior Lecturer in Geotechnical Engineering

Eyo Eyo


10 This study has provided an approach to soil classification by using the concept of machine 11 learning (ML). Multiclass elements of stand-alone ML algorithms (Logistic regression (LR) 12 and artificial neural networks (ANN)), decision tree ensembles (decision forest (DF) and 13 decision jungles (DJ)) and meta-ensemble models (stacking (SE) and voting (VE)) were used 14 to classify soils based on their intrinsic physicochemical properties. Also, and for the first time, 15 multiclass prediction was carried out across multiple cross-validation methods. Results 16 indicated that the soils' clay fraction (CF) had the most influence on the multiclass prediction 17 of natural soils' plasticity while specific surafce and carbonate content (CC) possessed the least 18 significance within the context and nature of dataset used in this study. Stand-alone ML models 19 (LR and ANN) produced relatively less accurate predictive performance (accuracy of 0.45, 20 average precision of 0.50 and avaerge recall of 0.44) compared to tree-based models (accuracy 21 of 0.68, average precision of 0.71 and recall rate of 0.68) while the meta-ensembles (SE and 22 VE) outperformed (accuracy of 0.75, average precision of 0.74 and average recall rate of 0.72) 23 all the models utilised for multiclass classification. Sensitivity analysis of the meta-ensembles 24 proved their capacities to discriminate between soil classes across the methods of cross-25 validation considered. ML training and validation by using Monte Carlo and k-fold cross-26 validation methods enabled better prediction while also ensuring the datsetset were not 27 overfitted by the ML models. Further confirmation of this phenomenon was depicted by the 28 continuous rise of the cumulative lift curve (LC) of the best performing models when using the 29 Monte Carlo cross-validation technique. Overall, this study demonstrated that soil's 30 physiochemical properties do have a direct influence on plastic behaviour and therefore can be 31 relied upon to classify soils. 32


Abbey, S., & Eyo, E. (in press). Multiclass stand-alone and ensemble machine learning algorithms utilised to classify 1 soils based on their physicochemical characteristics. Journal of Rock Mechanics and Geotechnical Engineering, 14(2),

Journal Article Type Article
Acceptance Date Aug 20, 2021
Deposit Date Aug 20, 2021
Journal Journal of Rock Mechanics and Geotechnical Engineering
Print ISSN 1674-7755
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 14
Issue 2
Keywords Soil classification; physiochemistry; soil plasticity; machine learning; regression; 33 logistic regression; machine learning ensembles; artificial neural network
Public URL