A new methodological framework by geophysical sensors combinations associated with machine learning algorithms to understand soil attributes

Geophysical sensors combined with machine learning algorithms have been used to understand the pedosphere system, landscape processes and to model soil attributes. In this research, we used parent material, terrain attributes and data from geophysical sensors in different combinations, to test and compare different and novel machine learning algorithms to 20 model soil attributes. Also, we analyzed the importance of pedoenvironmental variables in predictive models. For that, we collected soil physico-chemical and geophysical data (gamma-ray emission from uranium, thorium and potassium, magnetic susceptibility and apparent electric conductivity) by three sensors, gamma-ray spectrometer RS 230, susceptibilimeter KT10 – Terraplus and Conductivimeter – EM38 Geonics) at 75 points and, we performed soil analysis afterwards. The results showed varying models with the best performance (R > 0.2) for clay, sand, Fe2O3, TiO2, SiO2 and Cation Exchange 25 Capacity prediction. Modeling with selection of covariates at three phases (variance close to zero, removal by correction and removal by importance), demonstrated to be adequate to increase the parsimony. The prediction of soil attributes by machine learning algorithms demonstrated adequate values for field collected data, without any sample preparation, for most of the tested predictors (R ranging from 0.20 to 0.50). Also, the use of four regression algorithms proved important, since at least one of the predictors used one of the tested algorithms. The performances of the best algorithms for each predictor were 30 higher than the use of a mean value for the entire area comparing the values of Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The best combination of sensors that reached the best model performance to predict soil attributes were gamma-ray spectrometer and susceptibilimeter. The most important variables were parent material, digital elevation model, standardized height and magnetic susceptibility for most predictions. We concluded that soil attributes can be https://doi.org/10.5194/gmd-2021-153 Preprint. Discussion started: 16 July 2021 c © Author(s) 2021. CC BY 4.0 License.

efficiently modelled by geophysical data using machine learning techniques and geophysical sensors combinations. The 35 technique can bring light for future soil mapping with gain of time and environment friendly.

Introduction
The pedosphere is composed by soils and their connections with hydrosphere, lithosphere, atmosphere and biosphere (Targulian et al, 2019). Soils are the result of several processes and factors and their interactions, which produces specific soils types or horizons. The main soil processes are weathering and pedogenesis (Breemen and Buurman, 2003;Schaetzl and 40 Anderson, 2005), while the soil-forming factors are parent material, relief, climate, organisms and time (Jenny, 1994). Their interactions during soil genesis results in different soil attributes such as texture, mineralogy, color, structure, base saturation, clay activity and others.
In the last decades, there is a growing demand for soil resource information worldwide (Amundson et al., 2015;Montanarella et al., 2015). Soils are recognized as having a key influence on global issues such as, water availability, food 45 security, sustainable energy, climate change and environmental degradation (Amundson et al., 2015;Pozza and Field, 2020).
Understanding the role of spatial variations in surface and subsurface soil is fundamental for its sustainable use as well as other connected environmental resources and monitoring (Agbu et al., 1990). Therefore, it is necessary to increase the acquisition of information on the functional attributes of soils an ever-growing. To achieve this goal, relevant and reliable soil information, applicable from local to global scales is required (Arrouays et al., 2014). 50 The acquisition of soils data and their attributes are traditionally achieved by traditional soil survey techniques. However, new geotechnologies have emerged in the last decades, allowing the acquisition of data at shorter times, with non-invasive and accurate methods, such as reflectance spectroscopy, satellite imagery and geophysical techniques (Mello et al., 2020;Demattê et al., 2017Demattê et al., , 2007Fioriob, 2013;Fongaro et al., 2018;Mello et al., 2021;Terra et al., 2018aTerra et al., , 2018b. Among these technologies, geophysical sensors have been recently used in pedology to understand pedogenesis and the relationship 55 between these processes and soil attributes (Son et al., 2010;Schuler et al., 2011;Beamish, 2013;McFadden and Scott, 2013;Sarmast et al., 2017;Reinhardt and Herrmann, 2019). Among these geophysical techniques used, we highlight the gamma-spectrometry, magnetic susceptibility (κ) and apparent electrical conductivity (ECa).
Gamma-ray spectrometry can be defined as the measurements of natural gamma radiation emission from natural emitters, such as K 40 , the daughter radionuclides of U 238 and Th 232 , and total emissions from all elements in soils, rocks and sediments 60 (Minty, 1988). It is known that weathering and pedogenesis concomitantly with geochemical behavior of each radionuclide determine their distribution and concentration in the pedosphere (Dickson and Scott, 1997;Wilford and Minty, 2006;Mello et al., 2021). Therefore gamma-ray spectrometry can provide important information for comprehension of soil processes and attributes (Reinhardt and Herrmann, 2019), soil texture (Taylor et al., 2002a), mineralogy (Wilford and Minty 2006; Barbuena et al. 2013), pH (Wong and Harper, 1999) and organic carbon (Priori et al., 2016) and others. deepen the knowledge of the soil weathering, genesis and their relation to soil attributes. A standard approach to selecting 100 the bests input data to soil prediction models has yet to be developed (Levi and Rasmussen, 2014), mainly for geophysical sensors, little few used in soil science. The identification of such covariates may improve the understanding of the soil processes and attributes interplays, allowing an enhanced comprehension of soils from punctual to landscape scale, supporting digital soil mapping and better soil use and management.
Given this, the research aimed to: i) develop a new methodological framework on modelling soil attributes using combined 105 data from three different geophysical sensors in five different sensors combinations; ii) using different machine learning algorithms for prediction and selection of suitable models for each soil attribute evaluated; iii) evaluating the results and the importance of the variables and relate to pedogeomorphological processes. Our main hypothesis is that the combined use of three geophysical sensors data affords a better prediction of soil attributes by different machine learning algorithms and better model performance. Also, this research can provide an important background for geoscience studies and improving 110 geophysical and soil survey procedures.

Study area
The study area is located on a sugarcane farm of 184 hectares, located in São Paulo State, Brazil (23º 0' 31.37" to 22º 58' 115 53.97" S and 53º 39' 47.81" to 53º 37' 25.65" W), in the Capivari River catchment, part of the Paulista Peripheric Depression geomorphological unit (Fig. 1). The lithology is mainly composed by Paleozoic sedimentary rocks, dominant by Itararé formation (siltites/meta-siltites) crossed by intrusive diabase dykes of the Serra Geral Formation. The lowlands are covered by Quaternary alluvial sediments deposited by the Capivari River in ancient fluvial terraces (Fig. 2a).  The heterogeneity of landform and parent materials drove the formation of several soil types (Fig. 2b). Previous soil survey and mapping was performed in the study area by expert pedologists (Bazaglia Filho et al., 2013;Nanni and Demattê, 2006), 125 in which the main soil classes mapped were: Cambisols, Phaeozems, Nitisols, Acrisols and Lixisols (IUSS Working Group WRB, 2015). Besides the soil profiles, 75 subsamples from 75 points (0 -20 cm layer) were collected by augering for physicochemical analyses, according to  According to the Köppen classification the region's climate is subtropical, mesothermal (Cwa), with an average temperature from 18 °C (July -Winter) to 22 °C (February -Summer), and mean annual precipitation between 1100 and 1700 mm 135 (Alvares et al., 2013).

Laboratory physico-chemical analysis
For soil physical analyses, the soil samples were firstly air-dried, grounded and sieved to 2mm mesh and then, the granulometric analysis were performed. After that, clay, silt and sand contents were determined by the densimeter method (Camargo et al., 1986). Using the granulometry data, the textural groups were determined following EMBRAPA (2011), 140 methodology.
The soil chemical analysis were performed: The exchangeable cations aluminium, calcium and magnesium (Al 3+ , Ca 2+ and Mg 2+ ) were determined by KCl solution (1 mol L -1 ) and quantified by titration (Teixeira et al., 2017). Mehlich-1 solution were used to extract K + , which were quantified by flame photometry. Potential acidity (H + + Al 3 ) were determined using calcium acetate solution (0.5 mol L -1 ) at pH 7.0 and, for the pH in water determination, the soil:solution ratio of 1:2.5 was 145 used (Teixeira et al., 2017). More details about the analysis methods, can be found in (EMBRAPA, 2017). The determination of soil organic carbon was performed using the Walkley-Black method, by oxidation with potassium the method (EMBRAPA, 2017;Pansu, M., Gautheyrou, J., 2006). Total iron content were determined using selective dissolution by attack with sulfuric acid (EMBRAPA, 2017;Lim, C.H., Jackson, 1986). The resulting extract was used to determine the contents of silicon dioxide (SiO2) and titanium dioxide (TiO2) EMBRAPA (2017) methodology. All other chemical 150 parameters such as: Base Sum (BS) Cation Exchange Capacity (CEC), Base Saturation (V%) and Aluminum Saturation (m%) were determined using the analytical data obtained previously, following the methodology (EMBRAPA, 2017). The same methodology for physico-chemical soil analysis was used by Mello et al., (2020); Mello et al., (2021).

Radionuclides and gamma-ray spectrometry data 155
The radionuclide K 40 was quantified in total amount, measured by the absorption energy (1.46 MeV). The thorium (Th 232 ) and uranium (U 238 ) are quantified by absorption energy, (approximately 2.62 MeV and 1.76 MeV, respectively). This quantification is indirectly performed through thallium (Tl 208 ) and bismuth (Bi 214 ) derived by radioactive decay, respectively for Th 232 and U 238 , which are used by expression eTh and eU (equivalent thorium and uranium respectively).
For soil gamma spectrometric characterization, we used the near-gamma-ray spectrometer (GM) model Radiation Solution 160 RS 230 -Radiation Solution INC -Ontario -Canada (Fig. 1A). The sensor is able to quantify the radionuclides eTh and eU concentration in parts per million (ppm), while the K 40 is quantified in % due to its major content in pedosphere.
Firstly, the GM was automatically calibrated by switching on and leaving the sensor on the ground surface for five minutes until readings of eU, eTh and, K 40 contents be stabilized (Radiation Solutions, 2009). The measurements of radionuclides were taken in the "assay-mode" of highest precision for quantification, which the GM was kept at the soil surface for two minutes, in each sampling point (79 total collect points) (Fig. 1). The geographic position was taken by a GPS coupled in the GM (GPS -Radiation Solution INC -Ontario -Canadaprecision of 1m). The collected data in all points were 170 concatenated with their respective information from the soil physico-chemical analyzes for later geoprocessing. The same methodology for gamma-ray spectrometric data acquisition has been applied by Mello et al., (2021).

Magnetic susceptibility (κ)
For soil magneitic susceptibility (κ) characterization, surface readings were recorded at all 79 points using a geophysical 175 susceptibility meter sensor (KT10 -Terraplus) (Fig. 1b). This sensor is able to measure κ to a depth of 2 cm below the soil surface, with a precision 10 -6 SI units, expressed in m 3 kg -1 . To perform the readings, the sensor was firstly calibrated by determining the frequency of the outdoor oscillator. Then, we followed the sequence required to obtain the measurements performed in three steps: 1-determine the frequency and amplitude of the oscillator in free air; 2 -The frequency and amplitude of the oscillator were measured with the coil placed directly on the soil surface (sample) outcrop; 3 -We repeated 180 the step 1, and then the results were displayed. More information about the procedures see Sales, (2021). We performed the readings at scanner mode, which uses the best geometric correlation to direct κ readings, providing fast and accurate quantification. We performed three readings in triangulation around each augering/collected point and used the mean value of κ in all our analyses. This procedure was adopted to reduce noise. The same methodology for κ readings was performed by Mello et al., (2020). 185
The values of the ECa are a function of calibration, coil orientation, and coil separation (Heil and Schmidhalter, 2019). More details about EM38 operation is discussed in Hendrickx and Kachanoski, (2002).
After calibration, the ECa readings were performed at all 75collection points (Fig. 1), using the EM38 at vertical dipole orientation, which provide data from effective soil depth at 1.5 m. The field incursions to collect the data were undertaken in 195 dry season, bare soil and, at same hour interval during the day to reduce environmental variables influence. Also, all metal objects were kept distant from the EM 38 to avoid readings interferences.

Modelling processing 205
The modeling process is demonstrated in the flowchart (Fig. 3). The modeling can be divided into two parts: selection of covariates and training/test of the data. In the selection phase, the algorithm tries to produce the ideal set of covariates, following the principle of parsimony. This is performed by removing highly correlated variables, evaluating the importance of covariables and remove variables that have minor importance in training the model in the prediction process of each algorithm. Darst et al., (2018), considered joint application of the methods of selection of covariates by correlation and 210 importance (RFE), since only the use RFE reduces the effect of highly correlated covariates, but does not eliminate it.  The correlation selection process was used to calculate the correlation of the set of covariates and covariables, which were evaluated with a correlation greater than the limit (Pearson test > 95%). The pairs that showed higher values are evaluated due to their correlation with the complete set of covariates, eliminating the one with the highest value of the sum of the absolute correlation with the other covariables that started in this process. For this phase we applied the cor and find 220 correlation functions of the "stats" (Hothorn, 2021) and "caret" (Kuhn et al., 2020) packages, in R software, respectively (Kuhn and Johnson, 2013). In this phase, the covariables: curv_cross_secational and curv_longitudinal were eliminated for all tested sensor sets. The set of covariables that passed this phase joined the samples followed by the separation of samples from training and test.
The separation of training and test was performed using the "nested" leave one out (nested LOOCV) method (Clevers et al., 225 2007;Honeyborne et al., 2016;Rytky et al., 2020). It is important to highlight that our number of soil samples and readings with geophysical sensors is small (75), due to several difficulties encountered in the field in data collection (high sugar cane size, sloping terrain, dense forest, etc.). In this sense, the nested LOOCV method is indicated for small sample sets (values near 100 samples) to which other validation/test methods (as holdout validation) would not be viable due to the low sample set in the test and /or training group (Ferreira et al., 2021). This is one of the main innovations of this research. 230 The nested LOOCV method is a double loop process, where in the first loop the model is trained with a data set of size n-1, and the test is done in the second loop with the missing sample and used to validate the training performance (Jung et al., 2020;Neogi and Dauwels, 2019). The final result of the performance of the machine learning algorithm will be the mean performance indicators for all points (Training / test). This is a robust method to evaluate the performance of the algorithm and detects possible samples with problems in the collections or outliers. The training set generated in each loop went 235 through the process of selecting covariates for importance and subsequent training.
The selection of covariates by importance is made using the back forward method using the Recursive Feature Elimination (RFE) function contained in the "caret" package (Kuhn and Johnson, 2013). The RFE is unique for each algorithm, the result being the set of selected covariates used in the prediction of the final model in the same algorithm. The RFE is a selection method that eliminates the variables that least contribute to the model, based on a measure of importance for each algorithm 240 (Kuhn and Johnson, 2013). The algorithm will be applied to complete sets of data (variable by the set of tested sensors) and 18 more subsets 5,6,7, ... 19, 20 and 30 covariables. Reaching a set of fewer variables (more parsimonious), achieving better prediction performance. The optimization of the ideal covariate subset was based on leave one out (LOOCV), a repetition and 5 values of each of the internal hype parameters of each tested algorithm (tuneLength). The hyperparameters of each algorithm are described in the caret package manual in chapter 6. "Models described" available at 245 https://topepo.github.io/caret/train-models-by-tag.html. The metric for choosing the best subset for each model were R². linear algorithms. The algorithms used are commonly used in soil attribute mapping studies. At the end of the selection phase by importance, the most optimized set of covariates for training was generated for each algorithm. 250 The training was performed with the variables selected in the previous step each tested algorithm by using leave one out (LOOCV) and ten repetitions. Five values of each of the internal hype parameters of each tested algorithm were also tested (tuneLength). At the end of the training phase, a sample prediction was made that was not used in the training and the result was saved for the performance study. The performance of the prediction of the algorithms and set of sensors was performed with a set of samples from the outer loop of the nested LOOCV method. Three evaluation parameters are used: R-square -255 R2 (Eq. (1)), root mean squared error -RMSE (Eq. (2)), mean absolute error -MAE, (Eq. (3)).
For comparison purposes, null model values (NULL_RMSE and NULL_MAE) were also calculated. The null model considers using the average value quantified by the collected samples (EQ. 4 and EQ. 5). This methodology is widely used 260 and spatialization processes in kriging when the variable in which spatialization is desired has spatial dependence (pure nugget effect).
The NULL_RMSE and NULL_MAE values lower than those observed in the prediction of the algorithm in the validation phase show that the use of mean of the samples of the desired propriety consist with the model created by the algorithms of machine learning. The NULL_RMSE and NULL_MAE were calculated using the nullMode function of the caret package (Kuhn et al., 2020). The final result of the performance of the algorithms of each attribute was made using the 75 loops, the training results being 270 the average of the performance and the results of the test samples calculated from the 75 external loops results using equations 1, 2 and 3. The importance of the algorithms was calculated by the caret package (Kuhn and Johnson, 2013), each model presents its creation methodology. The final importance for each algorithm and attribute, was created from the importance created in the loop, being the average of the importance of the 75 repetitions.

Geophysical sensors combinations, models performance, uncertainty and covariates importance
The worst performance in modeling soil attributes occurred excluding the use of geophysical sensors, where only parent material and terrain attributes were used ( Table 2). In this case, the algorithms selected particular groups of terrain attributes for modelling each soil attributes ( Table 1). 280 The Cubist algorithm showed the best performance to predict soil texture, clay (R 2 of 0.386) and sand (R 2 of 0.292) content, with the highest R 2 and lowest RMSE and MAE, concomitantly ( prediction showed that minimal curvature, was the most important variable, contributing 100% on the decreasing of the mean accuracy. On the other hand, for clay content the most important variable was parent material. In addition, for clay and 290 sand the tangential curvature and DEM showed importance higher than 50% (Fig. 4).
The SVM algorithm presented moderate performance, for Fe2O3 (R 2 0.279), TiO2 (R 2 0.226); whereas for SiO2, the LM presented the best result, also with a moderate performance (R 2 0.247) ( Table 2). The selected models presented the highest R 2 and lowest RMSE and MAE, simultaneously. The most important covariates for Fe2O3 and TiO2 prediction by the SVM model, were parent material (100%) and DEM (more than 50%). For SiO2 prediction by LM model, the most important 295 covariates were DEM (100%) and standardized height (90%), while parent material contributed with 40% (Fig. 4).
For cation exchange capacity (CEC) the model with the best performance, after 75 runs was SVM, (R 2 of 0.223) ( Table 2).
The most important covariates for CEC prediction to mean accuracy were DEM (100%), topographic wetness index (80%) and parent material (75%) (Fig. 4). In all models, there was a very low performance in the prediction of base saturation (BS) and organic matter (OM), with R 2 between 0.001 and 0.1 (Table 2, 3, 4, 5 and 6). 305 The different combinations of geophysical sensors that contributed to moderate modeling performance of soil attributes were: Susceptibilimeter + Conductivimeter (S + C), Gamma-ray spectrometer + Conductivimeter (G + C), Combined use of the three geophysical sensors (G + S + C) (Tables 3, 4 and 6, respectively). The R 2 values presented some variation between the R 2 of best combination of geophysical sensors and the lowest R 2 values from the without the use geophysical sensors in predictive models (Tables 3, 4 and 6). Among all the values of R 2 evaluated for this session, we consider all the highest 310 values and, among the highest values the lowest values we considered the worst result.

330
For clay, the model with the best performance was the SVM algorithm (R 2 0.484) by S + C (Table 3), while the worst was by the Cubist algorithm (R 2 0.38) by (G + S + C) ( Table 6). For sand, the best model performance was the Cubist algorithm (R 2 0.365) by S + C (Table 3) and the worst also by Cubist (R 2 0.387) by (G + S + C). The most important covariates for clay prediction by the SVM model in S + C sensors combination were magnetic susceptibility (κ) (100%) and parent material (90%) (Fig. 5). For clay prediction by the Cubist model in G + S + C sensors combination, the most important covariate was 335 parent material (100%) (Fig. 6). With respect to sand prediction, the most important covariates by the Cubist model in S + C were minimal curvature (100%) and magnetic susceptibility (κ) (80%) (Fig. 5) On the other hand, for G + S + C, the covariates that most contributed for sand prediction were DEM (100%), general curvature (80%) and minimal curvature (75%) (Fig. 6). For the elemental composition, the models employed greatly variable performance.For Fe2O3 the best model performance, was reached by the LM algorithm (R 2 0.441) by G + S + C ( Table 6), while the worst performance was by the Cubist (R 2 0.282) by G + C ( Table 4). With respect to TiO2, the best model performance was by Cubist algorithm (R 2 0.358) by G + S 350 + C ( Table 6) and the worst was RF (R 2 0.248) by G + C ( Table 4). For SiO2, the best model performance was the Cubist algorithm (R 2 0.250) by S + C (Table 3) and the worst was the LM (R2 0.178) by G + C ( Table 4). The importance of covariates in predicting Fe2O3 by LM in G + S + C, demonstrated that magnetic susceptibility (κ), standardized height and DEM were the most important variables, contributing 100%, 65%, 55%, respectively (Fig. 6). For Fe2O3 predicted by the Cubist algorithm by G + C, the most important covariates were standardized height, parent material, ECa and DEM (100%) 355 (Fig. 7). For TiO2 prediction by the Cubist algorithm by G +S + C the most important covariate was magnetic susceptibility (κ) (100%) (Fig. 6), while for the RF algorithm by G + C were parent material (100%) and ECa (75%) (Fig. 7). In relation to SiO2 prediction by the Cubist by S + C, the most important covariates were standardized height, mid-slope position magnetic susceptibility (κ) and DEM (100%) (Fig. 5), while SiO2 predicted by the LM algorithm by G + C were DEM and standardized height (100% and 65%, respectively) to mean accuracy (Fig. 7).  In relation to CEC, the LM algorithm was the best model (R 2 0.317) by G + S + C ( Table 6) and the worst was the SVM 365 algorithm (R 2 0.223) by S + C ( Table 3). The most important covariate for prediction of CEC by LM algorithm by G + S + C and by S + C was magnetic susceptibility (κ) (100%) (Fig. 6 and 5).
Overall, the best combination of geophysical sensors, which allowed the best model performance for different algorithms in the prediction of soil attributes, was Gamma-ray spectrometer + Susceptibilimeter (G + S) ( Table 5).  Clay and sand content in g.kg -1 ; Fe2O3, TiO2 and SiO2 in g.kg -1 CEC in mmolc dm -3 ; Abbreviations: CEC: Cation Exchange Capacity; OM g.dm -3 ; BS: mmolc dm -3 . 390

370
For soil texture, the SVM and RF algorithms, showed the best performance for clay (R 2 0.494) and sand (R 2 0.422), respectively, by G + S, with the highest R 2 and lowest RMSE and MAE, simultaneously ( Table 5). The importance of covariates in predicting soil texture by the SVM (for clay) and the RF (for sand) demonstrated that, magnetic susceptibility (κ) was the most important covariate (100%). In addition, parent material contributed 60% for clay prediction and DEM 60% 395 for sand prediction (Fig. 8).
The LM algorithm presented the best performance for Fe2O3 (R 2 0.470) and TiO2 (R 2 0.328), by G + S, while for SiO2 was the Cubist algorithm (R 2 0.207), also by G + S ( Table 5). The most important covariates for Fe2O3 and TiO2 prediction by LM by G + S were magnetic susceptibility (κ) and standardized height (100% and 60%, respectively for both) (Fig. 8). For SiO2 prediction by the Cubist by G + S, the most important covariates were mid-slope position and magnetic susceptibility 400 (κ) (100% for both) (Fig. 8).
For CEC, the best model performance was the LM algorithm (R 2 0.303) by G + S ( Table 5). In this case, the covariates that most contributed to model prediction were magnetic susceptibility (κ) (100%) and DEM (60%) (Fig. 8).

Geophysical sensors combinations, models performance and uncertainty
The methodological approach optimized the prediction of soil variables by applying different geophysical sensors 410 combination, parent material and terrain attributes for selecting covariates and models, as well as for assessing prediction uncertainty.
In general, without the use of geophysical sensors the poorest results were obtained, in terms of R 2 , RMSE and MAE, for all the prediction algorithms used for modeling soil attributes ( Table 2). These results are consistent with Frihy et al., (1995) who also compared combined use and the non-use of sensors to model geochemical attributes of soil by the Cubist algorithm 415 and obtained the worst result without using the sensors. The worst performance of the models can be attributed to a very complex interaction between soil forming factors and processes, determining soil attributes (Jenny, 1994).
The moderate performance of the models can be attributed to the different combinations of the geophysical sensors pairwise, and the different data presented by the sensors contributed in different ways to the modelling process. In this concern, O'Rourke et al., (2016) also demonstrated a moderate performance of the models (R 2 ranging from 0.21 to 0.94) when using 420 data from the VisNir and, with R 2 ranging from 0.61 to 0.94, when using XRF sensor, to model soil attributes. The explanation can be related to correlations of different data provided by different sensors and, their relation with soil attributes.
The best combination of geophysical sensors was Gamma-ray spectrometer + Susceptibilimeter (G+S), with the highest values of R 2 and lowest values of RMSE and MAE, concomitantly, among all combinations of geophysical sensors and 425 algorithms used in the modeling processes (Table 5). Probably the explanation lies in the fact that the gamma-ray spectrometer and susceptibilimeter are more closely associated with pedogenesis, pedogeomorphology and soil attributes, as recently demonstrated by Mello et al. (2020); Mello et al. (2021), who modeled soil attributes such as texture, Fe2O3, TiO2, SiO2 and CEC in relation to Thorium, Uranium and Potassium (K 40 ) levels and magnetic susceptibility.
In general, the Cubist algorithm was the best model for clay and sand content prediction, ( Table 7). Similar results were 430 found by Greve and Malone, (2013); Ballabio et al., (2016);Nawar et al., (2016) and Silva, (2019) who used the Cubist and Earth algorithm to predict soil texture using different data source (3D imagery, Land Use and Cover Area frame Statistical survey and reflectance spectroscopy), reaching satisfactory performance. In all of these models the R 2 was not greater than 0.5 in all cases. The probable explanation is the small variation or limited distribution of the data set, which caused a poor prediction in the modeling. Zhang and Hartemink, (2020), states that textural classes with fewer samples presented more 435 unstable prediction performances than those with more samples, which agree with our results. Clay and sand content in g.kg -1 ; Fe2O3, TiO2 and SiO2 in g.kg -1 CEC in mmolc dm -3 ; Abbreviations: CEC: Cation Exchange

Capacity
The better model performance for elemental composition (Fe2O3, TiO2 and SiO2) was the Cubist (Table 7) with a R 2 (0.2 -445 0.47). This is contrasting with results obtained by Henrique et al., (2018), who showed that the best models for predicting soil mineralogy Fe2O3 and TiO2 (R 2 0.89 and 0.96, respectively) and RF only for Fe2O3 (R 2 0.95) by pXRF was the simple linear regression. The R 2 variation in our results for G + S combination is probably related to low correlation with parent material and consequently with soil mineralogy, or to the low representativeness by the limited number of samples, and high soil variability (Fiorio, 2013). However, it is important to highlight that in-situ have many intrinsic environmental influences 450 that can interfere in modelling processes. It can justify the low R 2 values obtained. For soil mineralogical attributes predicted by machine learning algorithms, results can be classified as satisfactory from 0.2 to 0.5, as for preliminary evaluation, since these values present more informative results (Beckett, 1971;Dobos, 2003;Malone et al., 2009). According to Nanni and Demattê (2006), the R 2 may be explained by standardized laboratory conditions during their determination, which have less environmental interference compared with direct field methods. 455 For CEC the best model performance was SVM (R 2 0.296) ( Table 5). This results is corroborated by Liao et al., (2014), who compared the models performance of multiple stepwise regression, artificial neural network models and SVM for CEC prediction, and attributed their results to a nonlinear relationship between CEC and soil physicochemical properties. In addition, other study (Jafarzadeh et al., 2016) demonstrated that, despite of the ability of SVM to predict CEC in acceptable limits, there is a poor performance in extrapolating the maximum and minimum values of CEC data. Despite this, 460 uncertainties estimated for SVM predictions may not be associated with an incorrect classification. As pointed out by Cracknell and Reading, (2013).
Even for the best combination of sensors (G + S) and the overall models' performance, the R 2 values were not greater than 0.5 ( Table 5). Models generated by field data, without sample preparation, R 2 values varying between 0.20 -0.50 can be considered satisfactory and reliable results (Dobos, 2003;Malone et al., 2009). In our study, low R 2 values can be related to 465 https://doi.org/10.5194/gmd-2021-153 Preprint. Discussion started: 16 July 2021 c Author(s) 2021. CC BY 4.0 License. the limited number of collecting points or field distribution, which does not represent the spatial variation of soil attributes, in agreement with Johnston et al. (1997) and Lesch et al. (1992), who evaluated soil salinity.
The best results for predictors of soil attributes through geophysical data, have the lowest values when compared to the values of NULL_RMSE and NULL_MAE. This demonstrates that the use of machine learning models has lesser errors than the use of means values for the entire area (Table 5) so that it shows better performance and accuracy. 470 There are little studies using NULL_RMSE and NULL_MAE as parameters for model evaluation and decision making.
These values can be used to evaluate the performance of the models. Algorithms that have RMSE and MAE values greater than the values found in the NULL method, perform less than the use of the mean value for the entire area. The values of NULL_RMSE and NULL_MAE can be used concurrently with kriging to evaluate the performance of the models. However, we could not apply ordinary kriging in our case because the most predictors did not have spatial dependence (pure nugget 475 effect), as demonstrated by Mello et al., (2021).

Variables importance, models performance and pedogeomorphology
In general, for all geophysical sensor combinations, the majority of terrain attributes used did influence significatively sand and clay content prediction (Fig. 4, 5, 6 and 8). However, in most cases parent material and magnetic susceptibility strongly 480 influences clay content prediction, except for G + C (Fig. 7). Ließ et al. (2012) found that the best performance was by the RF model with altitude and overland flow distance strongly affecting the model performance. According to Bauer (2010), the greater relation sand/clay ratio upslope is explained by selective transport of fine material downslope, whereas in the present study, clay content increased by the influence of parent material (diabase) as demonstrated by Mello et al. (2020).
The magnetic susceptibility (κ), followed by DEM and parent material were key the variables that contributed to sand and 485 clay content prediction by RF and SVM, respectively for G + S (Fig. 8). Siqueira et al. (2010) and Mello et al. (2020) found a positive correlation between soil magnetic susceptibility and clay content and a negative correlation between magnetic susceptibility and sand content. In fact, the mineralogical composition of parent material strongly affects soil magnetic susceptibility (Ayoubi et al., 2018), mainly in tropical soils under top of basalt spills (Da Costa et al., 1999), where our study was undertaken. 490 In general, for Fe2O3 and TiO2 the most important variables were parent material, magnetic susceptibility and DEM, which in most cases contributed 100% (Fig. 4, 5, 6, 7 and 8). In fact, the mineralogical composition of the parent material and pedoenvironmental conditions strongly influences the amount of Fe/Ti oxides in soils (Schwertmann and Taylor, 1989;Kämpf and Curi, 2000;Bigham et al., 2002), and faster redistributed by erosion downslope (Mello et al., 2020). Also, the mineralogical composition of parent material (Mullins, 1977;Ayoubi et al., 2018) and landform evolution (Blundell et al., 495 2009;Sarmast et al., 2017) controls the magnetic susceptibility of soil. Since the sensors used record the surface response and topography effect, it is expected that the most important variables indicated by the models would be related to surface processes. For the best combination of sensors (G + S), magnetic susceptibility and standardized height were more important variables in the prediction of Fe2O3 (100%) and TiO2 (55%) contents (Fig. 8), corroborating the expected surface processes https://doi.org/10.5194/gmd-2021-153 Preprint. Discussion started: 16 July 2021 c Author(s) 2021. CC BY 4.0 License. and materials in the magnetic susceptibility of the soil (Shenggao, 2000;Damaceno et al., 2017) and the relief in the 500 distribution of these materials (De Jong et al., 2000).
For SiO2, the most important variable was DEM which in most of cases contributed 100% (Fig. 4, 5, 6 and 7). The levels of SiO2 in soil is directly related to the nature of parent material and erosion processes at different topographic positions at the landscape (Bockheim et al., 2014;Breemen and Buurman, 2003). This can explain the greater contribution of the DEM in the prediction models. For the best sensor combination (G + S), the variable that most contributed was mid-slope position, 505 which also is related to topographic features.
For CEC, the variables DEM and magnetic susceptibility were the most important, contributing 100% in most of cases (Fig.   4, 5, 6, 7 and 8). This can be explained by the high correlation between magnetic susceptibility and clay content, and that with CEC (Siqueira et al., 2010;de Souza Bahia et al., 2017;Mello et al., 2020). They vary with parent material and surface geomorphic processes, concentrating the rich ferrimagnetic minerals (Frihy et al., 1995;Mello et al., 2020). 510 Considering that the gamma spectrometer sensor is composed of three channels (eU, eTh and K 40 ), we can call it "three sensors". Thus, considering the combination of sensors used, it is possible to create a performance graph of the modeling by the number of sensors used through learning curves (Fig. 9). A learning curve shows a measure of predictive performance of a given domain as a function of some measure of varying amounts of learning effort (Perlich, 2010). In our case, the varying amounts were the number of sensors: non-use of geophysical sensors (0 sensors), S + C (2 sensors), G + S (4 sensors) and S 515 + G + C (5 sensors). In this analysis, the combination of G + C sensors will not be used because they present the same number of G + S sensors (4 sensors). However, the combination G + C presented lower results than those for G + S.
The results show that for 5 soil properties (clay, sand, CEC, Fe2O3 and SiO3), the best results did not occur with a greater number of sensors, showing that increasing number of covariables can lead to lower performance (Fig. 9). This fact is associated with the addition of a new sensor as a covariate that may be leading to conflicting information to the set of other 520 sensors found, where the ECa may have presented conflicting values with the sensors generated by the gamma spectrometry channels, which generates a loss of performance when with sensor sets together. The application of the RFE importance selection method was able to amortize this, being a reliable method to reduce this effect.

General evaluation
For this study, the independent RMQS data set was not large enough (75 sites). So, validation using 74 sites provided erratic 530 and inconsistent results, mainly when compared different pedoenvironmental indicators, even considering that this dataset, in theory, provide "unbiased" estimates of forecast performance (Loiseau et al., 2020). Similarly, Lagacherie et al., (2019) showed that the location and number of samples used for independent assessment can significantly impact the value of these indicators. This indicates the greatest variations were observed for evaluation sets with less than 100 samples. Modeling soil attributes using relief and geophysical data presented promising results for geosciences studies and soil 535 scientists. The use of several algorithms from different "families", the training and validation method also made the study more robust and more reliable. In addition, machine learning models allowed to define the importance of covariates, which are, sometimes, not possible use ordinary spatialization methods, such as kriging and the inverse square of distance.
The "nested leave-one-out validation" method was usefulness with small samples, being a potential tool to be used in geosciences studies. However, still there is a poor knowledge in the academic community on the potential applicability of 540 machine learning techniques.

Conclusions
It is possible to model soil attributes satisfactorily, with easily acquired input data (parent material + DEM) combined with data set from different geophysical sensors. In addition, geophysical data from proximal sensors coupled with Cubist 545 algorithms can provide accurate estimates for several soil attributes. This may assist soil survey programs to reduce the need for new soil samples and wet chemistry.
The combination of geophysical sensors with the best model performance (higher R 2 and lower RMSE and MAE, concomitantly) for the prediction of soil attributes, was Gamma-ray spectrometer + Susceptibilimeter (G + S). The use of three equipment in simultaneous did not optimize model's performance. On the other hand, the Non-use of geophysical 550 sensors, presented the lower performance of soil attributes prediction by machine learning algorithms.
In general, the algorithms showed varying performances. In general, the Cubist was the best one for clay, sand, Fe2O3, TiO2, SiO2. For CEC the best performance was by SVM. The second-best algorithm performance was SVM for clay, RF for sand and LM for Fe2O3, TiO2, SiO2 and CEC.
The prediction performance for most soil attributes showed R 2 greater than 0.2, considered satisfactory for machine learning 555 algorithms applied to field data without expensive laboratory analysis, especially when compared with data from fieldwork with the use of remote sensing covariates. All soil attributes obtained superior performance considering an average value for the entire area.
The use of the null model methodology provided a way of comparing those generated by machine learning, when it is not possible to use other methods. The use of four algorithms proved necessary since at least one of the soils attributes 560 performed better in each of the tested algorithms.
The final model was more parsimonious with an ideal number of covariates with a three steps selection. This reduced the effect of overfitting by the use of a large number of covariates. Also, the nested leave-one-out validation methodology proved to be appropriate for small number of samples when compared to the hold-out validation and cross-validation.
The covariables that most contributed to the prediction of soil attributes (clay, sand, Fe2O3, TiO2, SiO2 and CEC), in the most 565 of algorithms used and sensors combinations were DEM, magnetic susceptibility, parent material and standardized height. For each study area, a conceptual pedogeomorphological and geophysical model must be created due to the complex interaction between environmental variables, pedogenesis and soil attributes. These factors affect geophysical variables which are detected and quantified by the sensors and will later serve as input data for the modeling processes.
The machine learning technique is a potential tool for modelling soil attributes with geophysical data, when only field data 570 with proximal sensors are available. The combined use of gamma-ray spectrometer and susceptibilimeter, allowed for an optimization of the models.

Declaration of non-availability of relevant model code for the manuscript 575
☒The authors declare that they have no code or data relevant to the paper.
☒The authors declare that they have used basic R software packages, described in details in material

Authors contribution 580
Danilo César de Mello: conceived of the presented idea, carried out the experiment, developed the theoretical formalism, contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript. He provided critical feedback and helped shape the research, analysis and manuscript.
Gustavo Vieira Veloso: designed the model and the computational framework and analysed the data, planned and carried 585 out the simulations, performed the analytic calculations and performed the numerical simulations, modelling processing, evaluate algorithms performance, variables importance and statistical analyses.
Marcos Guedes de Lana: contributed to the interpretation of the results, took the lead in writing the manuscript. Devised the project, the main conceptual ideas and proof outline. He worked out almost all of the technical details. All authors 590 provided critical feedback and helped shape the research, analysis and manuscript.
Fellipe Alcantara de Oliveira Mello: contributed to the interpretation of the results, took the lead in writing the manuscript.
All authors provided critical feedback and helped shape the research, analysis and manuscript. contributed to the interpretation of the results and verified the analytical methods. Encouraged the co-authors to investigate a specific aspect and supervised the findings of this work.

Acknowledgements 620
We would like to thank the National Council for Scientific and Technological Development (CNPq) for the first author