Mapping spatial patterns of plant species based on machine-learning and regression models

Document Type : Research Paper


Department of Arid and Mountainous Regions Reclamation, Faculty of Natural Resources, University of Tehran, Karaj, Iran


Various statistical techniques have been used for species distribution modeling that attempt to predict the occurrence of a given species with respect to environmental conditions. The current study was conducted to compare the performance of three regression-based models (multivariate adaptive regression splines, generalized additive models, and generalized linear models) with three machine-learning algorithms (random forest, artificial neural networks, and generalized boosted models). Also in this study, three sets of explanatory variables (climate-only, topography-only and combined topography-climate) for each species (i.e. Achillea millefolium, Festuca rupicola, and Centaurea jacea) were quantified and the effect of the interaction of the predictor variables with the modeling approaches on determining the accuracy of the predictions was tested. Model accuracy was evaluated using the area under the curve (AUC) of the receiver operating characteristics and true skill statistics (TSS). It was found that regression-based approaches, especially generalized additive model, performed better than those of machine-learning. The results showed that the topography-climate variables were the most important for mapping potentially suitable habitats of target species. The response curves associated with these variables indicate that there are ecological thresholds for favorable growth of all plant species studied.