In this section, by using function’s separability and the operator splitting technique, we propose the proximal operator associated with \(l_p\) quasi-norm. Next, we present the properties of the global optimum solutions of the \(l_p\)-regularization problems (1). For convenience, first of all, we define the following thresholding function and thresholding operators.
Definition 1
(\(\mathrm {p}\)
thresholding function) Assume that \(r\in {R}\), for any \(\lambda > 0\), the function \(h_\lambda (r)\) defined in (7) is called as a \(\mathrm {p}\) thresholding function.
Definition 2
(Vector
\(\mathrm {p}\)
thresholding operator) Assume that \(s\in {R}^n\), for any \(\lambda > 0\), the vector \(\mathrm {p}\) thresholding operator \(H_\lambda (s)\) is defined as
$$\begin{aligned} H_\lambda (s):=(h_\lambda (s_1),h_\lambda (s_2),\ldots ,h_\lambda (s_n))^T. \end{aligned}$$
In this section, one of the main results is a proximal operator associated with the non-convex \(l_p~(0<p<1)\) quasi-norm, and which can be also looked as an extension of the well-known proximal operator associated with convex functions.
Theorem 1
For given a vector
\(y\in {R}^n\)
and constants
\(\lambda >0, ~0<p<1\). Assume that
\(s^*\)
be the global optimum solution of the following problem
$$\begin{aligned} \min _{s\in {R}^n}~f(s):=\Vert s-y \Vert _2^2+\lambda \Vert s \Vert _{p}^{p}, \end{aligned}$$
(11)
then \(s^*\) can be expressed as
$$\begin{aligned} s^*=H_{\lambda }(y). \end{aligned}$$
Furthermore, we can get the exact number of global optimum solutions for the problem.
Proof
From
$$\begin{aligned} f(s)&=\Vert s-y \Vert _2^2+\lambda \Vert s \Vert _{p}^{p}=\Vert s \Vert _2^2-2\langle s, y \rangle +\Vert y \Vert _2^2 +\lambda \Vert s \Vert _{p}^{p} \\&=\sum \limits _{i=1}^{n} \left( s^2_i- 2y_i s_i+\lambda {|s_i|}^p \right) +\Vert y \Vert _2^2. \end{aligned}$$
Let \(g_{y_i}(s_i)=s^2_i- 2y_i s_i+\lambda {|s_i|}^p\), then
$$\begin{aligned} f(s)=\sum \limits _{i=1}^{n} g_{y_i}(s_i)+\Vert y \Vert _2^2. \end{aligned}$$
Therefore, to solve the problem (11) is equivalent to solving the following n problems, for each \(i=1,2,\ldots ,n\),
$$\begin{aligned} \min _{s_i \in {R}}~ g_{y_i}(s_i). \end{aligned}$$
(12)
By Proposition 1, for each \(i=1,2,\ldots ,n\), we can follow
$$\begin{aligned} s_i^*=\arg \min_{s_i \in {R}}~ g_{y_i}(s_i)=h_{\lambda }(y_i), \end{aligned}$$
and if \(|y_i |=r^*:=\frac{2-p}{2(1-p)} [\lambda (1-p) ]^{1/(2-p)}\), the problem (12) has two solutions; else, unique solution. Hence we can know the exact number of global optimum solutions of (11). The proof is thus complete. \(\square\)
For any \(\lambda ,~ \mu > 0,~ 0<p<1,\) and \(z\in {R}^n\), let
$$\begin{aligned} f_{\mu }(s,z):=\mu (f_{\lambda }(s)-\Vert As-Az \Vert _2^2)+ \Vert s-z \Vert _2^2, \end{aligned}$$
(13)
For simplicity, let
$$\begin{aligned} B_{\mu }(z):=z+{\mu } A^T(b-Az). \end{aligned}$$
(14)
Theorem 2
Assume that
\(s^*\in {R}^n\)
be the global minimizer of
\(f_\mu (s,z)\)
for any fixed
\(\lambda>0, \mu >0\)
and
\(z\in {R}^n\), then we have
$$\begin{aligned} s^*=H_{\lambda \mu }(B_{\mu }(z)). \end{aligned}$$
(15)
Proof
Without loss of generality, \(f_\mu (s,z)\) can be rewritten as
$$\begin{aligned} \begin{array}{lll} f_\mu (s,z)&=&\mu (\Vert As-b \Vert _2^2+\lambda \Vert s \Vert _p^p -\Vert As-Az \Vert _2^2)+ \Vert s-z \Vert _2^2\\ &=&\lambda \mu \Vert s \Vert _p^p+\Vert s \Vert _2^2-2\langle s, z+\mu A^T (b-Az) \rangle +\Vert z \Vert _2^2+\mu \Vert b \Vert _2^2-\mu \Vert Az \Vert _2^2\\ &=&\Vert s -B_\mu (z)\Vert _2^2+\lambda \mu \Vert s \Vert _p^p+ \Vert z \Vert _2^2+\mu \Vert b \Vert _2^2-\mu \Vert Az \Vert _2^2-\Vert B_\mu (z) \Vert _2^2. \end{array} \end{aligned}$$
Therefore, to solve \(\min _{s\in {R}^n} f_\mu (s,z)\) for any fixed \(\nu , \mu\) and Y is equivalent to solving
$$\begin{aligned} \min _{s\in {R}^n}\large \{\Vert s-B_\mu (z) \Vert _2^2+\lambda \mu \Vert s \Vert _p^p \}. \end{aligned}$$
By Theorem 1, thus the proof is complete. \(\square\)
Lemma 3
If
\(s^*\in {R}^n\)
is a global minimizer of the problem (1) for any fixed
\(\nu >0\)
and for any fixed
\(\mu\)
which satisfies
\(0 <\mu \le \Vert A \Vert ^{-2}\), then
\(s^*\)
is also a global minimizer of
\(f_\mu (s, s^*)\), that is,
$$\begin{aligned} f_{\mu }(s^*, s^*)\le f_{\mu }(s, s^*) \quad \mathrm{for\;all}\;s\;\in {R}^{n}. \end{aligned}$$
Proof
For any \(s\in {R}^{n}\), Since \(0 <\mu \le \Vert A \Vert ^{-2}\), we have
$$\begin{aligned} \Vert s-s^* \Vert _2^2-\mu \Vert As-As^* \Vert _2^2\ge \Vert s-s^* \Vert _2^2-\mu \Vert A \Vert ^{2} \Vert s-s^* \Vert _2^2\ge 0. \end{aligned}$$
Hence,
$$\begin{aligned} \begin{array}{lll} f_{\mu }(s, s^*)&=&\mu (f_\lambda (s)-\Vert As-As^* \Vert _2^2)+ \Vert s-s^* \Vert _2^2\\ &=&\mu (\Vert As-b \Vert _2^2+\lambda \Vert s \Vert _p^p)+ (\Vert s-s^* \Vert _2^2-\mu \Vert As-As^* \Vert _2^2)\\ &\ge & \mu (\Vert As-b \Vert _2^2+\lambda \Vert s \Vert _p^p) \\ &=& \mu f_\lambda (s)\ge \mu f_\lambda (s^*) \\ &=& f_{\mu }(s^*, s^*) \\ \end{array} \end{aligned}$$
the proof is complete. \(\square\)
Theorem 3
For any given
\(\lambda >0,~0<\mu \le \Vert A \Vert ^{-2}\), if
\(s^*\)
be the global optimum solution of the problem (1), then
\(s^*\) satisfies
$$\begin{aligned} s^*=H_{\lambda \mu }(B_\mu (s^*)). \end{aligned}$$
(16)
Especially, we have
$$\begin{aligned} \begin{array}{ll} s^*_i & =h_{\lambda \mu }([B_\mu (s^*)]_i)\\ & =\left\{ \begin{array}{llll} h_{\lambda \mu , p}([B_\mu (s^*)]_i), & \quad \mathrm{if} \quad {[B_\mu (s^*)]}_i> r^* \\ L~\mathrm{or}~0, & \quad \mathrm{if} \quad {[B_\mu (s^*)]}_i= r^* \\ 0, & \quad \mathrm{if} \quad -r^*<{[B_\mu (s^*)]}_i< r^* \\ -L~\mathrm{or}~0, & \quad \mathrm{if} \quad {[B_\mu (s^*)]}_i= -r^* \\ -h_{\lambda \mu , p}(-[B_\mu (s^*)]_i), & \quad \mathrm{if} \quad {[B_\mu (s^*)]}_i< -r^* \end{array} \right. \end{array} \end{aligned}$$
(17)
where
\(r^*:=\frac{2-p}{2(1-p)} [\lambda \mu (1-p) ]^{1/(2-p)}\)
and
\(L:=(\lambda \mu (1-p) )^{1/(2-p)}\).
Furthermore, we have: if
\(s^*_i\in (-L, L)\), then
\(s^*_i=0\).
Proof
Since \(s^*\) is a global minimizer of \(f_\mu (s, z)\) for given \(z=s^*\), by Theorem 2 and Lemma 3, we can directly get (16) and (17). By proposition 2, we can follow that
$$\begin{aligned} \lim _{r\downarrow r^*} h_{\lambda \mu }(r)=[\lambda \mu (1-p)]^{\frac{1}{2-p}} \;{=:}\;L. \end{aligned}$$
By Proposition 2, combining with the strict monotonicity of \(h_{\lambda \mu }(\cdot )\) on \(({\bar{r}}, +\infty )\) and \((-\infty , -{\bar{r}})\), we can follow that \(s^*_i> L\) as \({[B_\mu (s^*)]}_i> r^*\), \(s^*_i< -L\) as \({[B_\mu (s^*)]}_i< -r^*\) and \(|s^*_i|=L\) as \(| {[B_\mu (s^*)]}_i| = r^*\). Therefore, the proof is completed. \(\square\)
Remark 1
In Theorem 3, the necessary condition for global optimum solutions of the \(l_p\)-regularization problems is established, which is a thresholding expression associated with the global optimum solutions. Particularly, the global optimum solutions for the problem (1) are the fixed points of a vector-valued thresholding operator. In contrast, the conclusion does not hold in general, i.e., a point satisfying (16) is not the global optimum solution for the \(l_p\)-regularization problems (1) in general. This is related to the nature of the matrix A, for an instance, when \(A\equiv I\) and \(\mu =1\), a fixed point of (16) is the global optimum solution for the \(l_p\)-regularization problems (1) (i.e., Theorem 1).
Remark 2
In Theorem 3, the exact lower bound for the absolute value of nonzero entries in every global optimum solution of the model is also provided, which can be used to identify zero entries precisely in any global optimum solution. These lower bounds clearly demonstrate the relationship between the sparsity of the global optimum solution and the choices of the regularization parameter and norm, therefore, our theorem can be used to select the desiring model parameters and norms.