Geostatistical interpolation model selection based on ArcGIS and spatio-temporal variability analysis of groundwater level in piedmont plains, northwest China

Based on the geo-statistical theory and ArcGIS geo-statistical module, datas of 30 groundwater level observation wells were used to estimate the decline of groundwater level in Beijing piedmont. Seven different interpolation methods (inverse distance weighted interpolation, global polynomial interpolation, local polynomial interpolation, tension spline interpolation, ordinary Kriging interpolation, simple Kriging interpolation and universal Kriging interpolation) were used for interpolating groundwater level between 2001 and 2013. Cross-validation, absolute error and coefficient of determination (R2) was applied to evaluate the accuracy of different methods. The result shows that simple Kriging method gave the best fit. The analysis of spatial and temporal variability suggest that the nugget effects from 2001 to 2013 were increasing, which means the spatial correlation weakened gradually under the influence of human activities. The spatial variability in the middle areas of the alluvial–proluvial fan is relatively higher than area in top and bottom. Since the changes of the land use, groundwater level also has a temporal variation, the average decline rate of groundwater level between 2007 and 2013 increases compared with 2001–2006. Urban development and population growth cause over-exploitation of residential and industrial areas. The decline rate of the groundwater level in residential, industrial and river areas is relatively high, while the decreasing of farmland area and development of water-saving irrigation reduce the quantity of water using by agriculture and decline rate of groundwater level in agricultural area is not significant.

integrated in various interpolating methods. Therefore, selecting the best method to estimate the temporal and spatial variations of groundwater level has an important strategic significance for reasonable groundwater management and sustainable development of water resources.
Spatial interpolation is a method to estimate the data in contiguous area and forecast the unknown points (the information is missing or cannot be obtained) with available observation data (Chai et al. 2011;Losser et al. 2014), including geo-statistical interpolation and deterministic interpolation. Geo-statistical interpolation consists of ordinary Kriging interpolation (OK), simple Kriging interpolation (SK) and universal Kriging interpolation (UK); deterministic interpolation comprise global polynomial interpolation (GPI), local polynomial interpolation [inverse distance weighted interpolation (IDW), planar spline interpolation and local polynomial interpolation (LPI)].
The geo-statistical interpolation, or local space interpolation, is known as the unbiased optimal estimation method for describing regionalized variables (Gundogdu and Guney 2007), but has some defects in smooth effects. The local errors can be corrected without reducing the accuracy by posteriori method (Yamamoto 2005); GPI can analyze surface trend of regionalized variables, test the long-term and global trend effect. Based on the sample data, GPI is susceptible to extreme values and is suitable for small surface changes of regionalized variables (Ding et al. 2011;Mutua 2012;Wang et al. 2014); LPI can be used to establish smooth surface, calculate short-term variability and reflect local variability, but has errors for long-distance global interpolation; weight principle is used by IDW for interpolation analysis, as an accurate interpolation method (Xie et al. 2011), it requires sufficient and well-distributed samples (Rabah et al. 2011). Uneven distribution or outliers may lead to errors; tension spline interpolation (TSPLINE) is an accurate interpolation method (Hofierka et al. 2002) and the interpolation surface goes through every sample point, which is suitable for a large number of data interpolation. The accuracy of which is close to Kriging methods. TSPLINE has an advantage to avoid estimation of covariance function structure.
Interpolation methods have been widely used to analyze spatial variability of precipitation, evapotranspiration, temperature and groundwater level by comparing different interpolation methods. Cross-validation along with applicable conditions of different models are applied to obtain the best-fit interpolation model (Mardikis et al. 2005;Yang et al. 2011). With the development of GIS technology, interpolation analysis of groundwater level with geo-statistical modules has become easer and more operable (Salah 2009;Ta'any et al. 2009;Nikroo et al. 2010), which can characterize the spatial variability of variables in detail (Uyan and Cay 2013;Triki et al. 2013;Dinka et al. 2013;Bao et al. 2014).
Geo-statistical method is a good way for analyzing spatial variability of groundwater level by summarizing the previous researches. Based on seven interpolation methods and semi-variable function model in GIS geo-statistical module, the purposes of the study are (1) compare the prediction accuracies of different methods and select the best-fit interpolation model for piedmont plain area; analyze the spatial variability of the groundwater level in piedmont plain based on hydrogeological conditions and semi-variable function, (2) identify the spatial variability characteristics of different hydrogeological units, carry out groundwater level variability partitions considering the land use of piedmont plain and discuss effects of different land use on spatial variability characteristics of groundwater level.

Survey of study area
As shown in Fig. 1a, the study area is located in the piedmont plain in west of Beijing and covers an area of 550 km 2 . The topography is high in the northwest and low in the southeast (the elevation is between 30 and 100 m). Owing to the influence of continental monsoon climate, the spatial and temporal distribution of the precipitation is uneven, the mean annual precipitation is 551.4 mm. The piedmont plain is the place where the mountainous area transitions to the plain and the topographic slope is between 1 and 3 %, the aquifer is mainly composed of clay, spall and gravel, the permeability is strong. While the lithological in plain mainly comprises clay and silty, covered by quaternary sediments (Fig. 1b).
The land is mainly used for agriculture. The agricultural area, residential area, industrial area, forest area and the river cover 35.1, 20.5, 10.3, 26.2 and 6.9 % respectively. The water system in this area belongs to Wenyu River basin of the north canal river system. The groundwater regimes type is infiltration-exploitation. Groundwater level shows a seasonal change trend during the year. The drawdown of groundwater level during 2001 and 2013 reaches 22 m and the annual speed of reduction is approximately 2 m.

Thesis data
In order to monitor the dynamic changes of the groundwater level, 30 observation wells (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) are set up in the area of 550 km 2 (the plain area and the piedmont area). Quaternary Stata in the piedmont plain is taken as the studying object. Water level measuring tape is used to monitor groundwater level and the observation is made every 5 days (i.e. observations are made on 1st, 6th, 11th, 21th, 26th days of each month). The land use data is collected from the aerial map in 2005, the resolution is 1 m.
To understand the character of the datasets, observed values of 30 observation wells for 2013 have been statistically analysed (Fig. 2). It was found that the groundwater level ranges from −7.8 to 61.4 m, the mean value is 26.4 m, and the standard deviation of the data is 17.8 m. The kurtosis is 2.66 and the skewness is 0.01. The dataset follows a normal distribution and the Kriging interpolation can be applied. In order to obtain the distribution characteristics of groundwater level, the selection of model parameters shall meet the principle that the error is minimum and the parameters are used to verify the model. The selected parameters are shown in Table 1.

Research method
The study method is mainly composed of the following steps: 1. Construction of basic information database ArcGIS data management is used to obtain the locator and attribute data of groundwater level, land use for each observation well and use them as the basic data for calculation and analysis with geo-statistical module. Inverse distance interpolation (IDW), GPI, LPI, TSPLINE, OK, SK and UK are applied to establish the groundwater level model, mean errors, root-mean-square errors, coefficient of determinations and absolute errors are calculated for each method to select the optimal model. The computational formulas are as follows (Nikroo et al. 2010;Sun et al. 2009):

Evaluation of different interpolation model
where Ẑ i is the estimated value, Z i is the measured value at sampling point i (i = 1, . . . , n), n is the number of values used for the estimation. 2. Root-mean-square error

Fig. 2 Interpolation effect of groundwater level in Beijing piedmont plain through the seven interpolation models in 2013
where Ẑ i is the estimated value, Z i is the measured value at sampling point i (i = 1, . . . , n), n is the number of parameters selected by the empirical formula; n is the number of values used for the estimation.

Coefficient of determination
The coefficient of determination R 2 is used to measure the correlation between the predicted value and the measured value. P ave is the averaged estimated value, Q ave is the averaged measured value; and n is the number of values used for estimation. 3. Semivariogram, also known as semi-variogram, is a unique function of geostatistical analysis. It can be used to describe the spatial variability of groundwater levels.
Assume that the mean of the random function is stable and the value is only related to the distance between samples, semivariogram r(h) may be defined as half the incremental variance of random function Z(x).
where r(h) is the semivariogram, Z(x) is the random function, h is the distance between samples. A spherical model of the tested semivariogram models was fitted to the experimental semivariograms. The spherical model is defined by the following equation: where H 0S is the nugget value arising from random components such as measurement error and physical factors, H S is the structural variance arising from spatial autocorrelation, H 0S + H S is the sill, and b is the distance at which the semivariogram equals 95 % of its sill variance.
(3) 4. Spatial variability characteristics, is represented by the nugget to sill radio. The nugget value represents the variability while the sill value represents the overall variability inside the variables. The nugget to sill radio shall be between 0 and 1. When the nugget effect is <0.25, the considered strongly spatially dependent; the variable is considered moderately spatially dependent as the nugget effect is between 0.25 and 0.75, while the nugget effect is >0.75, the variable is considered weakly spatially dependent (Ghazi et al. 2014). 5. Analysis of spatial distribution pattern: based on the hydrogeological conditions and different land use, the spatial variability pattern of typical partitions in the piedmont plain are analyzed, the effect of human activities on spatial variability of groundwater level is also discussed.

Model parameter selection and verification
In order to obtain the distribution characteristics of groundwater level, the selection of model parameters shall meet the principle that the mean errors and root-mean-square errors are close to 0 and 1, respectively. The parameters occur in the Geostatistical Analyst Model of Arcmap 10.3, which are used to verify the model and approach the best interpolation effect for each method. The selected parameters are shown in Table 1.

Interpolation calculation results
The cross validation method is a statistical analysis method used to verify the accuracy of interpolation model, the basic idea is to classify the original dataset into the train set and the validation set. The validation set is used to test the model obtained from the training set, which is the indicators to evaluate the accuracy of the model. The error being the minimum is the evaluation criteria for best-fit interpolation model (Dashtpagerdi et al. 2013). Table 2 is the cross validation results of the groundwater observation data in the last 5 years (2009)(2010)(2011)(2012)(2013). Take the year 2013 as an example, the root-mean-square error sorting is SK < UK < OK < IDW < LPI < TSPLINE < GPI; the mean error sorting is TSPL INE = SK < UK < OK < LPI < IDW < GPI, the R 2 sorting is GPI < TSPLINE < LPI < IDW < UK < OK < SK. By combining the results from different methods, GPI is the worse-fit model, which is based on the global interpolation and is not precise interpolation. With the increase of number of times, the interpolation effect is better but the complexity and error also increase (Johnson et al. 2001a). LPI interpolation method is suitable for local spatial interpolation and is of high accuracy in simulating short-term variability. With regard to analysis of groundwater level for a period of 10 years, the prediction results fluctuate greatly and doesn't fit well (Rajagopalan and Lall 1998). Because of extreme points, the mean error and coefficient of determination of IDW model as well as TSPLINE model are relatively high and the simulation accuracy is poor. The error results of three Kriging interpolation methods (OK, UK and SK) are close to each other. In combination with Table 2 and Fig. 3, the geostatistical interpolation method, including original Kriging, simple Kriging and UK interpolaiton method, can obtain good simulation results, and R 2 of simple Kriging is 0.99, which is considerd as the optimal method for interpolating groundwater level. Figure 2 shows the interpolation effect of groundwater level in Beijing piedmont plain from seven interpolation model for 2013. In order to evaluate the simulation results, the "buphthalmos" phenomenon is introduced . Which is caused by some extreme values in the processing of interpolation, it means the interpolation effect is not good.

Interpolation effect comparison among different models
As can be seen from Fig. 2, uneven distribution of observation points and the existence of some extreme value points leads to the "buphthalmos" phenomena for IDW and LPI methods (Rajagopalan and Lall 1998) and the interpolation effect is poor; as a function form of radial basis function (RBF) interpolation model, TSPLINE is suitable for establishing a smooth surface with a large amount of data and the interpolation effect is good for undulating surface. However, it is not suitable for the situation where the short-term variability is high, sample data errors and uncertainties exist. Therefore, the smoothness of the model is poor. GPI considers the global variation trend during the interpolation process and the smoothness is high, which is suitable for spatial variability interpolation whose variability is slow. However, GPI ignores the local variability and its fitting effect of the first-order model is not good (Johnston et al. 2001b). With the increase of order, it is difficult to explain the precise physical meaning (Apaydin et al. 2004).
As the unbiased optimal interpolation method, Kriging method combines the effects of distance parameters and direction parameters, it represents the spatially continuous and irregular change of variables. Therefore, the interpolation effect of Kriging model is better than others in theory. From the perspective of interpolation results, the SK interpolation is more approximate to the real situation and the groundwater surface looks  . 3 Comparison of simulated and measured groundwater levels smooth, which means the interpolation effect is good. Based on the above analysis, SK interpolation method is evaluated as the optimal interpolation model.

Spatial variability analysis of groundwater level
The SK interpolation is used to analyze the spatial variability characteristics of groundwater level between 2001 and 2013 to understand the distribution pattern and variability characteristics of the groundwater level. The spatial variability of the groundwater level is affected by both the natural factors and human factors. The nugget effect reflects the spatial correlation characteristics of the groundwater level (Ahmadi and Sedghamiz 2007;Desbarats et al. 2002). Factors such as precipitation, topographic inequality, aquifer lithology and different hydrogeological units will increase the spatial correlation of groundwater level, while factors such as large-scale construction of water conservancy establishments and human exploitation will decrease the spatial correlation. As can be seen from Table 3, the nugget effect of the groundwater level during 2001 and 2013 is gradually increasing, which means the spatial correlation decreases and the effect of human activities increases continuously. During 2001 and 2005, the nugget effect of the groundwater level is between 0.25 and 0.75, which means the groundwater level is affected by both natural factors and human factors, while natural factors have more significance than human factors. In 2006 and 2007, the spatial correlation decreases as a result of less precipitation, the nugget effect increases to 0.65. The precipitation in 2008 is abundant and the source of supply is sufficient, while Olympic Games in Beijing leads to a large-scale development in urban, the establish of Machikou emergency water sources and the expansion of Nankou water plant increase the consumption of the groundwater. The precipitation recharge cannot meet the shortfall of the groundwater and the storage resource is used. The groundwater level is affected mostly by human exploitation and the nugget effect increases to 0.73, the spatial correlation of groundwater level is weakened. During 2009 and 2013, the excessive exploration of the groundwater is controlled to some extent. The nugget effect begins to decrease while the spatial correlation increases. Figure 4 shows that flow filed of groundwater flows from the recharge area in the northwest to the plain area in the southeast for 2013. The groundwater level is between −7 and 61 m, the average groundwater level is 30 m. The spatial variability of the groundwater level in the middle part of the alluvial-proluvial fan is relatively higher than in the top and bottom. During 2001 and 2013, the groundwater level declines generally and the maximum drawdown is 22 m. The average reduction speed is 2 m annually (Fig. 5). The piedmont area is located in the top of alluvial-proluvial fan, the aquifer is the sand and gravel of coarse particles with good vertical infiltration and sufficient precipitation recharge. At the same time, groundwater level is deep, which is not conductive for exploitation, so the spatial variability of the groundwater level in the top area is small; the areas whose drawdown is between 16 and 22 m are mainly distributed in the middle and upper part of the alluvial-proluvial fan, the aquifer is of many layers of sand and the groundwater level is shallow, which make it the concentrated exploitation regions, Nankou water plant and Xiguan water plant are located in the north with annual water supply capacity of 3,000,000 m 3 . Machikou (No. 4) emergency water sources supply water for new urban district. In the vicinity, the groundwater drawdown is more than 20 m. Shahe reservoir and Shahe water plant are located in the south, the southeast area is the urban water supply partition. Due to speeding up of urbanization in recent years and large exploitation quantity of the groundwater, the spatial variability is very high in the middle and upper part of the alluvial-proluvial fan; the spatial variability is very small in the east part of the plain, which is located in the lower part of the alluvial-proluvial fan and the aqueous layer is not continuous. The cone of groundwater depression caused by overexploration cannot be effectively recharged. In recent years the prohibition of exploitation and restriction on exploitation are enforced, which increases the groundwater level.

Analysis of spatial distribution characteristics
In order to further study the temporal variation of groundwater level, groundwater level reduction rate of the typical regions for different land use was analyzed (Table 4). The results show that the groundwater level reduction speeds for the residential and industrial area are 0.77 and 0.65 m/a respectively during 2001 and 2006 (Fig. 6). Since the year 2000, the agricultural water proportion has decreased and the industrial water proportion and the domestic water proportion have increased. Therefore, the groundwater  . 6 Land use types of the study area level reduction speeds are high in the residential area and industrial area. At the same time, the enforcement of water-saving irrigation causes the agricultural water quantity to decrease continuously. The reduction speed of the groundwater level is only 0.25 m/a; for the upstream region of the river, the groundwater level reduction rate is not high because of the surface water infiltration recharge and sufficient piedmont lateral flow recharge.
Because of over-exploitation, the annual average reduction speed during 2007 and 2013 is higher than that during 2001 and 2006. Under the same land use, the change of exploitation way of the groundwater will cause the change of reduction speed of the groundwater level (Choi et al. 2012). Due to the rapid development of the urban area and increase of exploitation amount of the groundwater year by year, the groundwater level during 2006 and 2013 presents an overall downward trend. In the residential area and industrial area, the exploitation quantity of the groundwater is high and the groundwater level reduction speed increases significantly; the agriculture area is mainly distributed in the middle and lower plain area of the alluvial-proluvial fan. The exploitation quantity of the groundwater is of small amount and the reduction speed is small; in the middle and lower river area of the alluvial-proluvial fan, the reduction speed of the groundwater level increases significantly. The reasons are as follows: the severe shortage of the surface water decreases the lateral infiltration recharge quantity; excessive exploitation causes the lateral recharge rate of the piedmont groundwater to be smaller than the groundwater level reduction speed in the river area; the complete lining of the Jingmi diversion canal in the central part of the plain area also aggravates the situation that the groundwater cannot be recharged in time.

Conclusion
1. This paper is based on the ArcGIS geo-statistical module and semi-variable function model to analyze the spatial variability of the groundwater level, compare the simulation accuracies and prediction effects of seven interpolation models and select SK interpolation as the optimal interpolation model for the piedmont plain in Beijing.
The results show that the SK interpolation can make unbiased optimal estimation of unknown points while ensuring the local accuracy. 2. The spatial variability of the groundwater level is closely related to the hydrogeological units and land use. Owing to the excessive exploitation, the spatial variability of the groundwater level for the top and lower part of the alluvial-proluvial fan is small; the spatial variability of the groundwater level for the middle and upper part of the alluvial-proluvial fan is high; In the residential area, industrial area and river area, the reduction speed of the groundwater level is high; the establishment of new town causes the arable land to decrease and water-saving irrigation decreases the agricultural water quantity, and the reduction rate of the groundwater level is small in the agricultural area. 3. The simple Kriging method is applicable for the interpolation of groundwater level in piedmont plain area and the geostatistical interpolation models are of significance in studying the spatial and time variability of the groundwater level. It can also effectively optimize the exploited well distribution and slow down the decline rate of groundwater level.