 Research
 Open Access
Variableintercept panel model for deformation zoning of a superhigh arch dam
 Zhongwen Shi^{1, 2, 3}Email author,
 Chongshi Gu^{1, 2, 3} and
 Dong Qin^{1, 2, 3}
 Received: 28 December 2015
 Accepted: 16 June 2016
 Published: 27 June 2016
Abstract
This study determines dam deformation similarity indexes based on an analysis of deformation zoning features and panel data clustering theory, with comprehensive consideration to the actual deformation law of superhigh arch dams and the spatial–temporal features of dam deformation. Measurement methods of these indexes are studied. Based on the established deformation similarity criteria, the principle used to determine the number of dam deformation zones is constructed through entropy weight method. This study proposes the deformation zoning method for superhigh arch dams and the implementation steps, analyzes the effect of special influencing factors of different dam zones on the deformation, introduces dummy variables that represent the special effect of dam deformation, and establishes a variableintercept panel model for deformation zoning of superhigh arch dams. Based on different patterns of the special effect in the variableintercept panel model, two panel analysis models were established to monitor fixed and random effects of dam deformation. Hausman test method of model selection and model effectiveness assessment method are discussed. Finally, the effectiveness of established models is verified through a case study.
Keywords
 Deformation zoning
 Variableintercept panel model
 Superhigh arch dam
 Similarity index
 Validity checking
Background
The entire service period of a superhigh arch dam can be divided into several stages, and each stage presents a different deformation behavior pattern. Although plenty of analysis models have been developed for the complex influencing factors and deformation law of superhigh arch dams, the majority of these models are just extension of the traditional safety monitoring model of dam deformation. Analysis models are based on onedimensional time series of single measuring point. For this reason, the effects of measurement error, missing data, and collinearity on model precision are difficult to avoid. Moreover, spatial–temporal monitoring information on superhigh arch dam deformation cannot simply be randomly obtained. The spatial–temporal deformation features of the entire dam structure throughout its service period, as well as their correlations, are difficult to comprehend. Therefore, the traditional deformation analysis model fails to completely reflect the deformation behavior of a superhigh arch dam. In addition, influencing factors (e.g., load effect, constraint, material property, and environmental factors) of the deformation behavior differ significantly at different regions of a superhigh arch dam. However, the traditional method does not consider such difference and still analyzes all measuring points with reservoir water level, temperature, and aging. The actual deformation laws of different regions of a superhigh arch dam are difficult to depict. Therefore, an analysis model that can reflect the different influencing factors of deformation should be established.
Panel data contain deformation information of two dimensions—time and cross section—which adequately reflect the spatial–temporal features of arc deformation. A deformation panel sequence is superior to pure time series and cross section sequence for rich information, high degree of freedom (DOF), and effective collinearity reduction. Hence, panel data can be used for modeling an analysis of the deformation behavior of a superhigh arch dam. Before establishing the panel model, all monitoring points on the superhigh arch dam were zoned according to deformation similarity with consideration for regional characteristics of overall deformation and correlation of deformation at all measuring points, which can eliminate the influence of different deformation laws at different monitoring points and measuring error on the model. Furthermore, because common influencing factors (reservoir water level, temperature and aging) cannot easily depict the different deformation laws of different regions, this study introduced dummy variables that can represent the effect of special influencing factors of different regions, called special deformation effect variables. On this basis, an analysis model that considers both common and special influencing factors of deformation is established, which has higher explanatory power and estimation precision. Meanwhile, studying special deformation effect variables is beneficial in comprehensively analyzing the deformation behavior of the superhigh arch dams and is expected to offset some shortcomings of the traditional analysis model.
Criteria for dam deformation zoning
Dam deformation zoning analysis has to address two key problems: (1) determining the statistical magnitude that can be used to represent the deformation similarity between measuring points and (2) determining the specific systematic zoning method to be used, that is, the criteria to be used to determine deformation similarity between different dam regions. The criteria that would determine the deformation similarity of different dam regions were constructed based on the panel features of superhigh arch dam deformation combined with spatial–temporal information of deformation. The criteria created the deformation zoning method for the superhigh arch dam by studying structural deformation behavior in time dimension and crosssection dimension.
Deformation similarity indexes
To solve the first key problem, this study constructed deformation similarity indexes based on the idea of panel data clustering.
The deformation data of superhigh arch dams reveal at least three aspects: (1) the absolute deformation of the arch dam; (2) the dynamic level of time series of deformation, that is, the deformation growth with time; and (3) the fluctuation of deformation development, that is, the degree of variation or fluctuation. The focus of this chapter is dam deformation zoning based on the similarity of deformation sequence when the dam structure has not obviously changed or the time series of the deformation that has smoothly change. The similarity of the deformation sequence is reflected by combining the “absolute deformation” and “deformation growth” of the dam.
Measuring method of indexes
During deformation data preprocessing, δ _{ it }(i = 1, 2, …, N; t = 1, 2, …, T) is often used to express the deformation dataset of the superhigh arch dam, where N is the total measuring points on the dam, and T is the total monitoring periods, that is, monitoring time series. With respect to the deformation dataset, δ _{ it }, S _{ t } is defined as the standard deviation (SD) of deformation during t, and d _{ ij } is the direct distance between measuring points i and j. d _{ ij } meets the following basic axiom (He 2008).
Definition 1
Definition 2
 (1)Calculate the characteristic proportion of the ith evaluation index to the jth evaluation object aswhere p _{ ij } ϵ [0, 1] and the original proportional relationship between deformation monitoring sequences remains undamaged, that is, \(d_{ij} \ge 0,\,\,\,\sum\nolimits_{i = 1}^{N} {d_{ij} > 0}\).$$p_{ij} = \frac{{r_{ij} }}{{\sum\nolimits_{j = 1}^{m} {r_{ij} } }},\quad (i = 1,2, \ldots n;j = 1,2, \ldots m)$$(5)
 (2)Calculate the entropy. The entropy of the ith evaluation index values is given as$$S_{i} =  \frac{1}{\ln m}\sum\limits_{j = 1}^{m} {p_{ij} } \ln p_{ij} ,\quad (i = 1,2, \ldots n).$$(6)
 (3)Determine the entropy weight. The entropy weight of the ith evaluation index is$$w_{i} = \frac{{1  S_{i} }}{{\sum\nolimits_{i = 1}^{n} {1  S_{i} } }},\quad (i = 1,2, \ldots n).$$(7)
The statistical magnitude of the similarity measurement is defined by Euclidean distance. The calculated w _{ i } is the weight coefficient of the ith evaluation index. Substituting into Eq. (3) obtains d _{ ij }(CED) between the different measuring points can be used as the criterion of the deformation similarity. In this way, the first key problem, that is, determining the statistical magnitude that can be used to represent deformation similarity between measuring points, is solved. Similar statistical magnitudes can be acquired according to other common distance forms (e.g., Mahalanobis distance and Lance distance) or correlation coefficient and included angle cosine. These other distance forms, coefficient, and angle cosine are not introduced in this paper (Li and He 2010). In the following text, the number of dam zones with different deformation similarities is determined through panel data system clustering, thus proposing the systematic dam deformation zoning method.
Determining the number of dam deformation zones and zoning process
In Eq. (8), W _{ l } is the total sum of squares of the deviations of N _{ l } measured values, X _{ it } is the deformation of measuring point i during t, \(Y_{it} = \frac{{\Delta X_{it} }}{{X_{it  1} }}\) is the relative deformation growth of measuring point i in G _{ l } during t, ΔX _{ it } = X _{ it } − X _{ it−1} is the difference of absolute deformation of measuring point i in G _{ l } between t and \(t  1\) \(\overline{{X_{t} }} = \frac{1}{{N_{l} }}\sum\nolimits_{t = 1}^{{N_{l} }} {X_{it} }\), and \(\overline{{Y_{t} }} = \frac{1}{{N_{l} }}\sum\nolimits_{t = 1}^{{N_{l} }} {Y_{it} }\).
If S _{ l } differs slightly with S _{ l+1}, but significantly differs with S _{ l−1}, corresponding regional distance D _{ l } can be used as the threshold of deformation zoning. Based on this threshold, the number of regions can be further calculated.
The implementation method and steps to the second key problem are interpreted based on Ward clustering. A complete dam deformation zoning process is proposed in next according to deformation similarity criteria and determination method of k.

Step 1 Calculate the “absolute” and “growing” deformation distances and regional distances through Eqs. (1) and (2).

Step 2 Calculate the entropy weights above two distances through Eqs. (5), (6), and (7).

Step 3 Substitute the calculated weight coefficients into Eq. (3) to calculate comprehensive distance d _{ ij }(CED) between two measuring points and matrix of regional distance D ^{(0)}.

Step 4 Initialize (Step 1, i = 1) all measuring points into one region, that is, k = N. Let D ^{(1)} = D ^{(0)} and the ith region be G _{ i } = {X _{(i)}}(i = 1, 2, …, N).

Step 5 Calculate regional distance matrix D ^{(i−1)}, and combine two regions with the minimum “comprehensive distance” into a new region according to the principle of minimum W.

Step 6 Calculate comprehensive distance (_{ dij }(CED)) between the new region with other regions and obtain new distance matrix D ^{ i }. Repeat Steps 5 and 6 until all measuring points are divided into one region.

Step 7 Draw the hierarchical dendrogram.

Step 8 Derive the optimum zoning scheme and the optimum number of zoning (K) according to method to determine the dam deformation zoning threshold and practical situations. Draw the dam deformation zoning distribution map.
Establishment of variableintercept panel model for deformation
Selection of deformation components
Variableintercept panel model for dam deformation zoning
Expression form of the variableintercept panel model
Panel model of fixed deformation effect
Panel model of random deformation effect
If the special deformation effect of different measuring points is viewed as a random variable and the random effect model is used to describe the actual state of dam deformation so that the model parameters focus on the main components of monitored deformation, the random effect will reflect the special deformation component of different measuring points. In other words, individual effect in the variableintercept panel model (\(\eta_{i}\)) is viewed as a random variable. The following text studies the panel model of monitored random deformation effect.
In summary, two kinds of influencing factors of superhigh arch dam deformation (Y) exist: independent variables and special effect. The independent variables \(X_{1} , \ldots ,X_{p}\) represent common influencing factors of deformation at all measuring points (hydraulic pressure, temperature, and aging), whereas special effect \(\alpha_{1} , \ldots ,\alpha_{i}\) reflects deformation difference between different measuring points. Such special effect has two values, and the corresponding panel model has a fixedeffect model and randomeffect model. In practical engineering, the panel model that conforms to deformation characteristics of the dam shall be chosen, which means that choosing the appropriate model according to the test results of the deformation monitoring sequences is necessary.
Model selection and effectiveness evaluation
In the null hypothesis, when \(H_{0} :E(v_{it} \left {X_{it} ) = 0} \right.\) is true, the asymptotic distributions of both m _{2} and m _{3} are \(\chi_{K}^{2}\).
Based on the above analysis, if \(E(v_{it} \left {X_{it} ) = 0} \right.\), the factors in the model, which can not be monitored, change randomly and are uncorrelated with independent variables. Under this circumstance, the randomeffect model should be chosen. If \(E(v_{it} \left {X_{it} ) = 0} \right.\) is false, the factors in the model that cannot be monitored are correlated with independent variables, and their effect on the model can be tested. Therefore, the fixedeffect model shall be chosen.
When evaluating the effect of a model, the main consideration is whether explanatory variables of the model can interpret changes of dependent variables as much as possible. The evaluation requires some reference standard or guidelines; otherwise, it cannot determine whether the chosen model is good, appropriate, or accurate in the empirical analysis (Hausman and Taylor 1981). In this study, the effect of the established variableintercept panel model was evaluated from the overall goodness of fit and significance of every variable.
Case study
Deformation zoning of the superhigh arch dam
Model selection
Hausman test result of deformation time series of the measuring points in the area of section 1 of the dam
Hausman test  

b = consistent under Ho and Ha; obtained from xtreg  
B = inconsistent under Ha, efficient under Ho; obtained from xtreg  
Test: Ho: difference in coefficients not systematic  
chi2(2) = (bB)’[(V_bV_B)^(−1)](bB) = 0.00  
Prob > chi2 = 1.0000 
Hausman test result of deformation time series of the measure points in the area of Sect. 3 of the dam
Hausman test  

b = consistent under Ho and Ha; obtained from xtreg  
B = inconsistent under Ha, efficient under Ho; obtained from xtreg  
Test: Ho: difference in coefficients not systematic  
chi2(2) = (b−B)’[(V_b−V_B)^(−1)](b−B) = 9.14  
Prob > chi2 = 0.0104 
According to the Hausman test results, Wald test statistics Prob (hereinafter referred as P), Eq. (22), and the null hypothesis \(E(v_{it} \left {X_{it} ) = 0} \right.\), P of the rest zones, except zone III, fails to turn down the null hypothesis under 5 % significance level, indicating that P conforms to the randomeffect model. P of zone III is equal to 0.0104, which implies that it can turn down the null hypothesis under 5 % significance level, can use the fixedeffect model.
Variableintercept panel model for deformation zoning
Results of the randomeffects model of the panel data in the area of section I of the dam
Randomeffects model  

Randomeffects GLS regression  Number of obs = 17323  
Group variable: _j  Number of groups = 17  
Rsq: within = 0.0000  Obs per group: min = 1019  
between = 0.0000  avg = 1019.0  
overall = 0.1816  max = 1019  
Wald chi2(10) = 15548.80  
corr(u_i, X) = 0 (assumed)  Prob > chi2 = 0.0000 
var  Coef.  Std.Err.  z  P > z  [95 % Conf. Interval]  

x1  −159.307  827.6679  −0.19  0.847  −1781.51  1462.893 
x2  331.9333  1494.828  0.22  0.824  −2597.88  3261.743 
x3  −280.182  1192.674  −0.23  0.814  −2617.78  2057.416 
x4  91.74331  354.8331  0.26  0.796  −603.717  787.2034 
x5  0.067455  0.118763  0.57  0.57  −0.16532  0.300226 
x6  −0.97668  0.111943  −8.72  0  −1.19608  −0.75727 
x7  −0.01332  0.042572  −0.31  0.754  −0.09676  0.070115 
x8  −0.3112  0.020144  −15.45  0  −0.35068  −0.27172 
x9  0.006649  0.019384  0.34  0.732  −0.03134  0.04464 
x10  −0.05327  0.020773  −2.56  0.01  −0.09398  −0.01255 
_cons  23.4151  170.7861  0.14  0.891  −311.32  358.1497 
sigma_u  3.061757  
sigma_e  1.702147  
rho  0.763903  (fraction of variance due to u_i) 
Results of the fixedeffects model of the panel data in the area of section III of the dam
Fixedeffects model  

Fixedeffects (within) regression  Number of obs = 2038  
Group variable: _j  Number of groups = 2  
Rsq: within = 0.9602  Obs per group: min = 1019  
between = .  avg = 1019.0  
overall = 0.9602  max = 1019  
F(10,2026) = 4892.37  
corr(u_i, X) = 0 (assumed)  Prob > F = 0.0000 
var  Coef.  Std.Err.  z  P > z  [95 % Conf. Interval]  

x1  −8441.51  2756.353  −3.06  0.002  −13847.1  −3035.93 
x2  15257.36  4978.174  3.06  0.002  5494.487  25020.23 
x3  −12008.4  3971.919  −3.02  0.003  −19797.8  −4218.88 
x4  3522.763  1181.688  2.98  0.003  1205.313  5840.214 
x5  6.01408  0.395511  15.21  0  5.238429  6.789732 
x6  −2.62441  0.372799  −7.04  0  −3.35552  −1.8933 
x7  −0.78513  0.141776  −5.54  0  −1.06317  −0.50709 
x8  −1.45153  0.067085  −21.64  0  −1.58309  −1.31997 
x9  −0.10303  0.064552  −1.6  0.111  −0.22963  0.023561 
x10  −0.43057  0.06918  −6.22  0  −0.56624  −0.2949 
_cons  1710.663  568.7574  3.01  0.003  595.2524  2826.073 
sigma_u  0.02987277  
sigma_e  1.944306  
rho  0.000236  (fraction of variance due to u_i)  
F test that all u_i = 0: F(1, 2026) = 0.24 Prob > F = 0.6239 
According to the comparison of the modeling results of six deformation zones, no significant individual difference is observed among measuring points of the same zone. For measuring points of the same zone, max(Δα _{ i }) = 10.331 mm, which reflects that the structural deformations of six divided zones make the same responses to external complex factors and can be viewed as the structural deformation of the same nature in the analysis and computation.
Conclusions
 (1)
According to the regional difference of dam deformation, in this study, all measuring points of a superhigh arch dam are divided into six zones on the basis of deformation similarity. Modeling analysis is implemented to every zone on the basis of the monitored deformation, thereby effectively eliminating the interference of the deformation difference among different measuring points and measurement errors in the model.
 (2)
The effects of the special influencing factors of different dam zones on dam deformation was analyzed based on the basis of the studied deformation features of the superhigh arch dam. Dummy variables that represent the special effects of dam deformation were introduced to establish the variableintercept panel model for the deformation zoning of the superhigh arch dam. The variableintercept panel models, except those for zones I and II, can adequately explain the actual deformation behavior of the dam, as manifested by the fitting effect. Furthermore, the inherent differences in the deformation laws among different dam zones can be discovered by analyzing the special deformation effect.
 (3)
By considering the different patterns of the special effect in the variableintercept panel model, two panel models consistent with practical engineering situations were established based on the Hausman test: the fixedeffect model the and randomeffect model. The evaluation method of model effectiveness was explored.
 (4)
Although the randomeffect model can adequately explain the overall dam deformation behavior, an accurate fitting is difficult to achieve if only the special deformation effect is involved. For example.in this study, the deformation laws in zones I and II, which are related to various factors, such as constraints and regional difference of foundation properties, are very complex because these two zones are close to the dam foundation. Therefore, dynamic adjustment of the model coefficient is needed during modeling analysis, which requires further research.
Declarations
Authors’ contributions
ZS, CG, and DQ have made substantial contributions to conception, design, acquisition of data, and analysis of data. ZS, and DQ have been involved in drafting the manuscript and revising it critically for important intellectual content. ZS, CG, and DQ have given final approval of the version to be published. ZS, CG, and DQ agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors read and approved the final manuscript.
Acknowledgements
This study was funded by the China Scholarship Council, the National Natural Science Foundation of China (Grant Nos. 51379068, 51139001, 51279052, 51209077, 51179066, 51079046 and 51079086), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120094110005, 20120094130003, 20130094110010), and the Program for New Century Excellent Talents in University (Grant Nos. NCET110628, NCET100359), Jiangsu Province “Six Talent Peaks” Project (Grant Nos. JY008, JY003).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Chen X, Li G (2006) Spatial panel data model analysis of economic convergence in China. Econ Sci (5) (in Chinese)Google Scholar
 Damodar N (2000) Gujarati. Econometrics. China Renmin University Press, Beijing (in Chinese) Google Scholar
 Hausman JA (1978) Specification tests in econometrics. Econometrica 46:1251–1272View ArticleGoogle Scholar
 Hausman JA, Taylor WE (1981) Panel data and unobservable individual effects. Econometrica 49:1377–1398View ArticleGoogle Scholar
 He X (2008) Multivariate statistical analysis, 2nd edn. China Renmin University Press, Beijing, pp 57–79 (in Chinese) Google Scholar
 Hsiao C (1985) Benefits and limitations of panel data. Econom Rev 4(1):121–174View ArticleGoogle Scholar
 Huixuan G (2005) Application of multivariate statistical analysis. Peking University Press, Beijing, pp 217–243Google Scholar
 Kaitai F, Enpei P (1982) Cluster analysis. Geological Publishing House, Beijing, pp 92–104Google Scholar
 Li Y, He X (2010) Panel clustering method and its application. Stat Res 27(9):73–79 (in Chinese) Google Scholar
 Lin P, Liu XL, Hu Y, Xu WB, Li QB (2013) Deformation stability analysis of Xiluodu arch dam under stressseepage coupling condition. Chin J Rock Mech Eng 32(6):1137–1144Google Scholar
 Lin P, Zhou WY, Liu HY (2014a) Experimental study on cracking, reinforcement, and overall stability of the Xiaowan superhigh arch dam. Rock Mech Rock Eng. doi:10.1007/s006030140593x Google Scholar
 Lin P, Ma TH, Liang ZZ, Tang CA, Wang RK (2014b) Failure and overall stability analysis on high arch dam based on DFPA code. Eng Fail Anal 45:164–184. doi:10.1016/j.engfailanal.2014.06.020 View ArticleGoogle Scholar
 Maddala GS (1971) The likelihood approach to pooling crosssection and time series data. Econometrica 39:939–953View ArticleGoogle Scholar
 Mundlak Y (1978) On the pooling of time series and crosssection data. Econometrica 46:69–85View ArticleGoogle Scholar