Optimality condition and iterative thresholding algorithm for \(l_p\)-regularization problems
- Hongwei Jiao^{1}Email author,
- Yongqiang Chen^{2} and
- Jingben Yin^{1}
Received: 22 May 2016
Accepted: 11 October 2016
Published: 26 October 2016
Abstract
This paper investigates the \(l_p\)-regularization problems, which has a broad applications in compressive sensing, variable selection problems and sparse least squares fitting for high dimensional data. We derive the exact lower bounds for the absolute value of nonzero entries in each global optimal solution of the model, which clearly demonstrates the relation between the sparsity of the optimum solution and the choice of the regularization parameter and norm. We also establish the necessary condition for global optimum solutions of \(l_p\)-regularization problems, i.e., the global optimum solutions are fixed points of a vector thresholding operator. In addition, by selecting parameters carefully, a global minimizer which will have certain desired sparsity can be obtained. Finally, an iterative thresholding algorithm is designed for solving the \(l_p\)-regularization problems, and any accumulation point of the sequence generated by the designed algorithm is convergent to a fixed point of the vector thresholding operator.
Keywords
Mathematics Subject Classification
Background
In this article, we pay more attention to derive the characteristics of the global optimum solution of problem (1), which is inspired by Xu et al. (2012). The remaining sections of the paper are organized as follows. In “Technical preliminaries” section, we portray some important technical results. “Lower bound and optimality conditions” section first develop the proximal operator associated with a non-convex \(l_p\) quasi-norm, which can be looked as an extension of the well-known proximal operator associated with convex functions. Next, an exact lower bound for the absolute value of nonzero entries in every global optimum solution of (1) is derived, which clearly demonstrates the relation between the sparsity of the optimum solution and the choice of the regularization parameter and norm. We also establish the necessary condition for global optimum solutions of the \(l_p\)-regularization problems, i.e., the global optimum solutions are fixed points of a vector thresholding operator. In “Choosing the parameter λ for sparsity” section, we also propose a sufficient condition on the selection of \(\lambda\) to meet the sparsity requirement of global minimizers of the \(l_p\)-regularization problems. “Iterative thresholding algorithm and its convergence” section proposes an iterative thresholding algorithm for the \(l_p\)-regularization problems, and any accumulation point of the sequence produced by the designed algorithm is convergent to a fixed point of the vector thresholding operator. Finally, some conclusions are drawn in “Numerical experiments” section.
Technical preliminaries
Lemma 1
- 1.
\(s=h_{\lambda , p}(r)\) is a continuous function defined on \(({\bar{r}}, +\infty )\).
- 2.
\(s=h_{\lambda , p}(r)\) is a differentiable function over \(({\bar{r}}, +\infty )\) and \(h_{\lambda , p}^{'}(r)=\frac{2}{2+\lambda p(p-1)h_{\lambda , p}^{p-2}(r)}\).
- 3.
\(s=h_{\lambda , p}(r)\) is a strictly increasing function over \(({\bar{r}}, +\infty )\).
Moreover, if \(r>{\bar{r}}\), then \(s=h_{\lambda , p}(r)\) is the sole local minimizer of \(g_r(s)\) over \((0, +\infty )\).
Lemma 2
Proposition 1
Proof
Proposition 2
- 1.
The function \(h_{\lambda }(r)\) is an odd function over \((-\infty , +\infty )\).
- 2.
The function \(h_{\lambda }(r)\) is continuous over \((r^*, +\infty )\), furthermore, \(\lim _{r\downarrow r^*} h_{\lambda }(r)=L\).
- 3.
The function \(h_{\lambda }(r)\) is differentiable over \((r^*, +\infty )\).
- 4.
The function \(h_{\lambda }(r)\) is strictly increasing over \((r^*, +\infty )\).
Proof
By Proposition 1 and Lemma 1, this proposition can be followed. \(\square\)
When \(p=1/2\), in Xu et al. (2012), \(h_{\nu , p}(r)\) of (7) has the following analytic corollary.
Corollary 1
Proof
Lower bound and optimality conditions
In this section, by using function’s separability and the operator splitting technique, we propose the proximal operator associated with \(l_p\) quasi-norm. Next, we present the properties of the global optimum solutions of the \(l_p\)-regularization problems (1). For convenience, first of all, we define the following thresholding function and thresholding operators.
Definition 1
(\(\mathrm {p}\) thresholding function) Assume that \(r\in {R}\), for any \(\lambda > 0\), the function \(h_\lambda (r)\) defined in (7) is called as a \(\mathrm {p}\) thresholding function.
Definition 2
Theorem 1
Proof
Theorem 2
Proof
Lemma 3
Proof
Theorem 3
Furthermore, we have: if \(s^*_i\in (-L, L)\), then \(s^*_i=0\).
Proof
Remark 1
In Theorem 3, the necessary condition for global optimum solutions of the \(l_p\)-regularization problems is established, which is a thresholding expression associated with the global optimum solutions. Particularly, the global optimum solutions for the problem (1) are the fixed points of a vector-valued thresholding operator. In contrast, the conclusion does not hold in general, i.e., a point satisfying (16) is not the global optimum solution for the \(l_p\)-regularization problems (1) in general. This is related to the nature of the matrix A, for an instance, when \(A\equiv I\) and \(\mu =1\), a fixed point of (16) is the global optimum solution for the \(l_p\)-regularization problems (1) (i.e., Theorem 1).
Remark 2
In Theorem 3, the exact lower bound for the absolute value of nonzero entries in every global optimum solution of the model is also provided, which can be used to identify zero entries precisely in any global optimum solution. These lower bounds clearly demonstrate the relationship between the sparsity of the global optimum solution and the choices of the regularization parameter and norm, therefore, our theorem can be used to select the desiring model parameters and norms.
Choosing the parameter \(\lambda\) for sparsity
In many applications such that sparse solution reconstruction and variable selection, one need to seek out least square estimators with no more than k nonzero entries. Chen et al. (2014) present a sufficient condition on \(\lambda\) for global minimizers of the \(l_p\)-regularization problems, which have desirable sparsity, and which are based on the lower bound theory in local optimum solutions. In this paper, we also present a sufficient condition on \(\lambda\) for global minimizers of the \(l_p\)-regularization problems, which also have desirable sparsity, but which are based on the lower bound theory in global optimum solutions.
Theorem 4
Proof
- 1.Assume that \(\lambda \ge \beta (k)\), we shall prove it through apagoge. If \(\Vert s^* \Vert _0\ge k \ge 1\), then by (3.11) and the definition of \(\beta (k)\) in (3.8), we haveThis is in contradiction with that \(s^*\) is a global minimizer of (1). Therefore, we have \(\Vert s^* \Vert _0< k\).$$\begin{aligned}\begin{array}{ll} f_\lambda (s^*) &> \lambda |T | (\lambda \mu (1-p))^{p/(2-p)} = k \lambda ^{2/(2-p)} (\mu (1-p))^{p/(2-p)} \\ &\ge k k^{-1} \Vert b \Vert ^2 \\ &= \Vert b \Vert ^2=f_\lambda (0).\\ \end{array} \end{aligned}$$
- 2.Assume that \(\lambda \ge \beta (1)\), we shall prove it through apagoge. If \(s^* \ne 0\), then there exists \(i_0\) satisfying \(s^*_{i_0}\ne 0\) and$$\begin{aligned} f_\lambda (s^*)=\Vert As^*-b \Vert ^2+\lambda \Vert s^* \Vert ^p_p> \lambda |s^*_{i_0} |^p \ge \lambda (\lambda \mu (1-p))^{p/(2-p)}\ge \Vert b \Vert ^2=f_p(0). \end{aligned}$$
Iterative thresholding algorithm and its convergence
Firstly, some important lemmas are given in the following.
Lemma 4
Let \(0<\mu <\Vert A \Vert ^{-2}\) and \(\{s^k \}\) be the sequence produced by the algorithm (22), then we can follow that the sequences \(\{ (f_\lambda (s^k))_k \}\) and \(\{ (f_\mu (s^{k+1}, s^k))_k \}\) are non-increasing.
Proof
This lemma demonstrate that, from iteration to iteration, the objective function \(f_\lambda (s)\) does not increase, moreover, using the proposed algorithm does not lead to worse results than not using the proposed algorithm. The algorithm (22) does not have a unique fixed point, therefore it is very important to analyze the fixed points in detail.
Lemma 5
Proof
A fixed point of the algorithm (22) is any \(s^*\) satisfying \(s^{*}=H_{\lambda \mu }(s^*+\mu A^T (b-As^*))\), i.e., \(s^{*}_i=h_{\lambda \mu }(s^*_i+\mu A_i^T (b-As^*))\). If \(i\in \Gamma _0\), the equality holds when and only when \(|\mu A_i^T (b-As^*)|\le \frac{2-p}{2(1-p)} [\lambda \mu (1-p) ]^{1/(2-p)}\), i.e., \(|A_i^T (b-As^*)|\le \frac{2-p}{2} \lambda ^{1/(2-p)} [\mu (1-p) ]^{(p-1)/(2-p)}\). Similarly, \(i\in \Gamma _1\) when and only when \(s^*_i=h_{\lambda \mu , p}(s^*_i+\mu A_i^T (b-As^*))\). \(\square\)
The following lemma demonstrate that the sequence \(\{s^k \}\) produced by the algorithm (22) is asymptotically regular, i.e., \(\lim _{k\rightarrow \infty } \Vert s^{k+1} -s^k \Vert _2=0\).
Lemma 6
If \(f_\lambda (s^0)<\infty\), \(0<\mu < \Vert A \Vert ^{-2}\) and assume that \(\{s^k \}\) be the sequence produced by the algorithm (22), \(\forall \epsilon >0,~\exists K\) satisfying \(\forall k>K,~\Vert s^{k+1}-s^k \Vert _2^2\le \epsilon\).
Proof
In the following, we present an very important property of the algorithm, i.e., any accumulation point of the sequence \(\{s^k \}\) is a fixed point of the algorithm (22). Therefore, we have the following theorem and conclusion.
Theorem 5
If \(f_\lambda (s^0)<\infty\) and \(0<\mu < \Vert A \Vert ^{-2}\), then we have the following conclusion: any accumulation point of the sequence \(\{s^k \}\) produced by the algorithm (22) is a fixed point of (22).
Proof
Numerical experiments
Now we report numerical results to compare the performance of Iterative thresholding algorithm (ITA) (\(p=0.5\)) for solving (1) (Signal reconstruction) with LASSO to find sparse solutions. The computational test was conducted on a Intel(R) Core(TM)2 Duo CPU E 8400 @3.00GHZ Dell desktop computer with 2.0GHz of memory with using Matlab R2010A.
Comparison of ITA and LASSO algorithm
Problems | LASSO | ITA | ||||
---|---|---|---|---|---|---|
n | T | m | Time | Error | Time | Error |
800 | 60 | 150 | 0.572 | 4.16e−4 | 0.375 | 1.15e−4 |
800 | 80 | 180 | 0.461 | 3.58e−4 | 0.252 | 1.06e−4 |
2000 | 160 | 300 | 0.853 | 5.75e−4 | 0.516 | 1.62e−4 |
2000 | 200 | 500 | 0.853 | 5.86e−4 | 0.553 | 1.73e−4 |
From Table 1 we find that ITA has smaller prediction accuracy than LASSO in shorter time.
Conclusion
In this paper, an exact lower bound for the absolute value of nonzero entries in each global optimum solution of the problem (1) is established. And the necessary condition for global optimum solutions of the \(l_p\)-regularization problems is derived, i.e., the global optimum solutions are the fixed points of a vector thresholding operator. In addition, we have derived a sufficient condition on the selection of \(\lambda\) for the desired sparsity of global minimizers of the problem (1) with the given (A, b, p). Finally, an iterative thresholding algorithm is designed for solving the \(l_p\)-regularization problems, and the convergence of algorithm is proved.
Declarations
Authors' contributions
All authors are common first author, all authors contribute equally to the manuscript. All authors have a good contribution to derive the exact lower bounds, to establish the global optimum condition and to design the iterative thresholding algorithm, and to perform the numerical experiments of this research work. All authors read and approved the final manuscript.
Acknowledgements
This paper is supported by the National Natural Science Foundation of China under Grant (11171094), the Natural Science Foundation of of Henan Province (152300410097), the Key Scientific Research Project of Universities in Henan Province (14A110024), (16A110014) and (15A110022), the Major Scientific Research Projects of Henan Institute of Science and Technology (2015ZD07), the High-level Scientific Research Personnel Project for Henan Institute of Science and Technology (2015037), the Science and Technology Innovation Project for Henan Institute of Science and Technology.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Bredies K, Lorenz DA, Reiterer S (2015) Minimization of non-smooth, non-convex functionals by iterative thresholding. J Optim Theory Appl 165:78–112MathSciNetView ArticleMATHGoogle Scholar
- Candès EJ, Wakin MB, Boyd SP (2008) Enhancing sparsity by reweighted \(l_1\) minimization. J Fourier Anal Appl 14:877–905MathSciNetView ArticleMATHGoogle Scholar
- Chartrand R (2007a) Nonconvex regularization for shape preservation. In: Proceedings of IEEE international conference on image processingGoogle Scholar
- Chartrand R (2007b) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14:707–710ADSView ArticleGoogle Scholar
- Chartrand R, Staneva V (2008) Restricted isometry properties and nonconvex compressive sensing. Inverse Probl 24:1–14MathSciNetView ArticleMATHGoogle Scholar
- Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. In: Proceedings of international conference on acoustics, speech, signal processing (ICASSP)Google Scholar
- Chen X, Xu F, Ye Y (2010) Lower bound theory of nonzero entries in solutions of \(l_2\)-\(l_p\) minimization. SIAM J Sci Comput 32:2832–2852MathSciNetView ArticleMATHGoogle Scholar
- Chen X, Ge D, Wang Z, Ye Y (2014) Complexity of unconstrained \(l_2 -l_p\) minimization. Math Program 143:371–383MathSciNetView ArticleMATHGoogle Scholar
- Chen YQ, Xiu NH, Peng DT (2014) Global solutions of non-lipschitz \(s_2-s_p\) minimization over the semidefinite cones. Optim Lett 8(7):2053–2064MathSciNetView ArticleMATHGoogle Scholar
- Fan Q, Wu W, Zurada JM (2016) Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks. SpringerPlus 2016(5):1–17Google Scholar
- Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Ass 96:1348–1360MathSciNetView ArticleMATHGoogle Scholar
- Foucart S, Lai MJ (2009) Sparsest solutions of under-determined linear systems via \(l_q\) minimization for \(0 < q \le 1\). Appl Comput Harmonic Anal 26:395–407MathSciNetView ArticleMATHGoogle Scholar
- Frank IE, Freidman JH (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35:109–148View ArticleMATHGoogle Scholar
- Ge D, Jiang X, Ye Y (2011) A note on the complexity of \(l_p\) minimization. Math Program 129:285–299MathSciNetView ArticleMATHGoogle Scholar
- Huang J, Horowitz JL, Ma S (2008) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat 36:587–613MathSciNetView ArticleMATHGoogle Scholar
- Knight K, Wu JF (2000) Asymptotics for lasso-type estimators. Ann Stat 28:1356–1378MathSciNetView ArticleMATHGoogle Scholar
- Lai M, Wang Y (2011) An unconstrained $$l_q$$ l q minimization with $$0 < q < 1$$ 0 < q < 1 for sparse solution of under-determined linear systems. SIAM J Optim 21:82–101Google Scholar
- Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24:227–234MathSciNetView ArticleMATHGoogle Scholar
- Shehu Y, Iyiola OS, Enyi CD (2013) Iterative approximation of solutions for constrained convex minimization problem. Arab J Math 2:393–402MathSciNetView ArticleMATHGoogle Scholar
- Shehu Y, Cai G, Iyiola OS (2015) Iterative approximation of solutions for proximal split feasibility problems. Fixed Point Theory Appl 2015(123):1–18MathSciNetMATHGoogle Scholar
- Tian M, Huang LH (2013) Iterative methods for constrained convex minimization problem in hilbert spaces. Fixed Point Theory Appl 2013(105):1–18MathSciNetMATHGoogle Scholar
- Tian M, Jiao S-W (2015) Regularized gradient-projection methods for the constrained convex minimization problem and the zero points of maximal monotone operator. Fixed Point Theory Appl 11:1–23Google Scholar
- Xu Z, Zhang H, Wang Y, Chang X (2010) \(l_{1/2}\) regularizer. Sci China Inf Sci 53:1159–1169MathSciNetView ArticleGoogle Scholar
- Xu Z, Chang X, Xu F, Zhang H (2012) \(l_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw Learn Syst 23:1013–1027View ArticlePubMedGoogle Scholar