The q-G method

In this work, the q-Gradient (q-G) method, a q-version of the Steepest Descent method, is presented. The main idea behind the q-G method is the use of the negative of the q-gradient vector of the objective function as the search direction. The q-gradient vector, or simply the q-gradient, is a generalization of the classical gradient vector based on the concept of Jackson’s derivative from the q-calculus. Its use provides the algorithm an effective mechanism for escaping from local minima. The q-G method reduces to the Steepest Descent method when the parameter q tends to 1. The algorithm has three free parameters and it is implemented so that the search process gradually shifts from global exploration in the beginning to local exploitation in the end. We evaluated the q-G method on 34 test functions, and compared its performance with 34 optimization algorithms, including derivative-free algorithms and the Steepest Descent method. Our results show that the q-G method is competitive and has a great potential for solving multimodal optimization problems.

of the secant line passing through the points [x, f(x)] and [qx, f(qx)]. It is immediately evident that the sign of the q-derivative can be either positive or negative, depending on the value of the parameter q. For q 1 and q 2 (Fig. 1), the sign of the q-derivative is positive and the q-G method will move to the left as the Steepest Descent method would do. However, for q 3 the sign of the q-derivative is negative which potentially allows the q-G method to move to the right direction, towards the global minimum of f. Figure 2 shows the contour lines of a quadratic function f (x) with x ∈ R 2 , shifted to x = (10, 10), and the negative of both classical and q-gradient vectors. The negative of the classical gradient vector of f, the steepest descent direction, is represented in Fig. 2  negative of the q-gradient vector obtained for different values of the parameter q ∈ R 2 , q = (q 1 , q 2 ) . The line segments from 2 to 7 use positive values of q 1 and q 2 taken symmetrically around 1 [q = (q 1 , q 2 )], and the line segments from 2' to 7' use the same values of q 1 and q 2 , but with inverted positions [q = (q 2 , q 1 )]. The line segments from 8 to 10 and 8′ to 10′ have a similar description, but use negative values of q 1 and q 2 . Note that as the parameter q approaches 1, the negative of the q-gradient tends to the steepest descent direction. As can be seen, depending on the value of the parameter q, the q-gradient can point to any direction and not only to descent directions. Figure 3 illustrates the points sampled by the q-G method, with two different strategies to generate the parameter q, over a multimodal function f (x), x ∈ R 2 . The initial point of the search is x 0 = (11, 11) and the strategy adopted to define the step length is the same in both cases. Note that the function f has a local minimum at x = (10, 10) and a global minimum at x = (13, 13). In Fig. 3a, the parameter q k i is fixed and close to 1 along all the iterations k (q k i ≈ 1, ∀i, i = 1, 2) and, consequently, the q-G method performs a local search converging to the nearest local minimum to the starting point. In Fig. 3b, the parameters q k i are different from 1 in the beginning and tend to 1 along the iterative procedure (q k i → 1, ∀i, i = 1, 2), with the q-G method performing a global search, with exploration of the search space and exploitation of the global minimum vicinity. As can be seen in Fig. 3, the possibility of having different values of the parameter q k i , and not only values close to 1, enables the method to move against the local descent direction, escape from the local minimum and sample other regions of the search space. In other words, when the values of q k i are sufficiently different from 1, the q-gradient vector can point to any region of the search space. This potentially allows the q-G method to move towards the global minimum.
These simple examples show that the use of the q-gradient, based on Jackson's derivative, offers a new mechanism for escaping from local minima. The algorithm for the q-G method is complemented with strategies to generate the parameter q and to compute the step length in a way that the search process gradually shifts from global in the beginning to almost local search in the end. As here proposed, the q-G algorithm has only two free parameters to be adjusted: the initial standard deviation (σ 0 ) and the reduction factor (β). Although a bad choice may lead to some deterioration in its performance, the q-G method has shown to be sufficiently robust for still being capable of reaching the global minimum.
We evaluated q-G's performance of against 34 optimization methods and on 34 test problems. First, we considered ten 2-D optimization problems, eight unimodal and two multimodal, as defined in (Luksan and Vlcek 2000), and compared the q-G method with 22 derivative-free algorithms described in (Rios and Sahinidis 2013). Second, we evaluated our approach on twelve 10-D and twelve 30-D test problems, ten unimodal and fourteen multimodal. These test problems have been proposed as a benchmark for the CEC′2005 Special Session on Real-Parameter Optimization of the IEEE Congress on Evolutionary Computation 2005 . For this second set of problems, we compared the q-G with 11 Evolutionary Algorithms (EAs) participants of the competition and with the steepest descent (SD) method.
The rest of the paper is organized as follows. The "q-gradient vector" section introduces the q-gradient vector. "The q-G method" section presents an algorithm for the q-G method. "Computational experiments" section shows the numerical results, and "Conclusions" section presents our main conclusions.

The q-gradient vector
Let f(x) be a differentiable one-variable function. The classical derivative of f is given by and it is related to infinitesimal summation of the independent variable x. The q-derivative is related to dilations of the independent variable x that is multiplied by the parameter q as follows (Jackson 1908) ( where x � = 0 and q � = 1. The parameter q can be any number different from 1. Therefore, the q-derivative can be written as (Koekoev and Koekoev 1993) Similarly, let f (x) be a differentiable function of n real variables, then the first-order partial q-derivative of f with respect to the variable x i can be given by (Soterroni et al. 2010) Thus, the q-gradient is the vector of the n first-order partial q-derivatives of f defined as (Soterroni et al. 2010) where the parameter q is a vector q = (q 1 , . . . , q i , . . . , q n ) ∈ R n . The classical gradient is recovered in the limit of q i → 1, for all i = 1, ..., n.

The q-G method
Let a general nonlinear unconstrained optimization problem be defined as where x ∈ R n is the vector of the independent variables and f : R n → R is the objective function. A common optimization strategy is to consider an iterative procedure that, starting from an initial point x 0 , generates a sequence {x k } given by where k is the iteration number, d k is the search direction and α k is the step length or the distance moved along d k in the iteration k. Gradient-based optimization methods can be characterized by the different strategies used to move through the search space. The Steepest Descent method, for example, sets d k = −∇f (x k ) as the search direction and the step length α k is usually determined by a line-search technique that minimizes the objective function along the direction d k . In the q-G method, the search direction is the negative of the q-gradient of the objective function −∇ q f (x).
Therefore, the optimization procedure defined by (7) becomes Key to the performance of the q-G method is the correct specification of the parameter q. Considering a function of n variables f (x), a set of n different parameters q i ∈ R − {1} (i = 1, . . . , n) are needed to compute the q-gradient vector of f. The strategy adopted here is to draw the values of q i x i , ∀i, i = 1, . . . , n, from a Gaussian probability density function (pdf ), with a standard deviation that decreases as the iterative search proceeds and the mean equal to the current point of the search x i .
Starting from σ 0 , the standard deviation of the pdf is decreased by the following "cooling" schedule, σ k+1 = β · σ k , where 0 < β < 1 is the reduction factor (see Fig. 4d). As σ k approaches zero, the values of q k i tend to 1, the algorithm reduces to the Steepest Descent method, and the search process becomes essentially local. In this sense, the role of the standard deviation here is reminiscent of the one played by the temperature in a Simulated Annealing (SA) algorithm, that is, to make the algorithm sweeps from a global random sampling in the beginning to a local deterministic search in the end. As in a SA algorithm, the performance of the minimization algorithm depends crucially on the choice of parameters σ 0 and β. A too rapid decrease of σ k , for example, may cause the algorithm to be trapped in a local minimum. Figure 4a-c show 500 line segments that represent the possible search directions of the q-G method for a 2-D quadratic function at x 0 = (11, 11). The parameters q were generated by a Gaussian distribution with standard deviation σ k that decreases along the iterative procedure as σ k+1 = β · σ k , with a initial standard deviation σ 0 and the reduction factor β. In the beginning of the iterative procedure (Fig. 4a), the search directions can point to anywhere but, as the σ k tends to 0, the parameter q k i tends to 1 and, consequently, the search directions tend to the steepest descent direction (Fig. 4c).  Fig. 4 a-c Possible 500 search directions of the q-G method for a 2-D quadratic function at x 0 = (11, 11) obtained from parameters q generated by a Gaussian distribution with mean µ k = x 0 and standard deviation σ k that decreases along the iterative procedure from σ 0 = 10 and reduction factor β = 0.99; d evolution of the standard deviation σ k by the expression σ k+1 = β · σ k used to generate the 500 line segments represented in a-c. The line segment in gray illustrates the steepest descent direction of the function The algorithm of the q-G method is completed with a strategy for calculating the step length α k based on standard parabolic interpolation. Given the current point of the search x k and the parameter q k , we calculate γ = �q k where d k is the search direction in the iteration k. The step length corresponds to the distance between the current value and the minimum value of the parabola passing through a, b and c. Other strategies for computing the step length are, naturally, possible. Nevertheless, the use of a line-search technique is not recommended since the search directions of the q-G method are not always descent directions. Descent directions are required for the success of this kind of onedimensional minimization (Nocedal and Wright 2006).
Summarizing, the algorithm of the q-G method for unconstrained continuous global optimization problems is described as follows.
Algorithm for the q-G method Given x 0 (initial point), σ 0 > 0 and β ∈ (0, 1): Generate q k x k by a Gaussian distribution with σ k and µ Compute α k by parabolic interpotation, and 4. Return x best and f (x best ).
The q-G method stops when the appropriate stopping criterion is attained. In realworld applications (i.e., in problems for which the global minimum is not known), it can be the maximum number of function evaluations, or the value of the local gradient ||∇f (x k )|| < ǫ (ǫ > 0), since the q-G method converges to the Steepest Descent method. The algorithm returns the x best as the minimum value of the objective function f obtained during the iterative procedure, i.e., f (x best ) ≤ f (x k ), ∀k.

Computational experiments
The q-G method was tested on two set of problems followed by a systematic comparison with derivative-free algorithms. First, we applied our approach on ten 2-D test problems defined in (Luksan and Vlcek 2000), eight unimodal and two multimodal, and compared it with 22 derivative-free algorithms described in (Rios and Sahinidis 2013). Second, the q-G method was evaluated on twelve 10-D and twelve 30-D test problems, ten unimodal and fourteen multimodal, and it was compared with 11 Evolutionary Algorithms (EAs) participants of the CEC′2005 Special Section on Real-Parameter Optimization of the 2005 IEEE Congress on Evolutionary Computation, and the Steepest Descent method. The q-G method and the Steepest Descent method were performed on a iMac 2.7GHz with Intel Core i5 processor and 8GB RAM running Intel Fortran Composer XE for Mac OS* X.
The q-G method has two free parameters, the initial standard deviation σ 0 and the reduction factor β. The optimal setting for these parameters is typically problem dependent. The value of σ 0 should not be too small so that the algorithm does not behave like its classical version too early, and not too large so that the iterates do not escape the search space frequently. Extensive numerical tests have shown that σ 0 = κL, where κ = √ D/2 , D is the dimension of the problem and L is the largest distance within the search space given by L = n i=1 (x max i − x min i ), provides a simple heuristic for setting this parameter. For example, we set σ 0 = L for all the 2-D problems. This strategy may not provide the best parameter setting for every test function, but is good enough for a wide range of problems.
The value of β, which controls the speed with which the algorithm shifts from global to local search, also depends on D. As a rule of thumb, log(1 − β) ∼ −κ gives good estimates for β. For example, for D ranging from 1 to 8, 0.9 ≤ β ≤ 0.99 is a good choice; for 8 ≤ D ≤ 18, we choose 0.99 ≤ β ≤ 0.999; for 18 ≤ D ≤ 32, 0.999 ≤ β ≤ 0.9999; and so on. Naturally, the choice of β should be balanced with the maximum number of function evaluations of one is willing or limited to perform.

First set of problems
The first ten 2-D test problems, eight unimodal and two multimodal, are described in Table 1. The evaluation of the objective function at the global optimum f (x * ) (last column of Table 1) is used to define a successful run. A run is considered successful if the objective function value at x is within 1% or 0.01 of the global optimum (Rios and Sahinidis 2013). For each problem, 10 independent runs were performed from random starting points obtained from a uniform distribution within the search space, as defined in (Rios and Sahinidis 2013). The same starting points were used for all methods, including

Table 1 A subset of problems prosed at (Luksan and Vlcek 2000) for dimension 2-D
The column [x min , x max ] D shows the domain used to define L, and the column f (x * ) has the evaluation of the function on the global optimum x * our approach. The stopping criterion used by all algorithms is N max = 2500 evaluations of the objective function per run. The resolution of all 22 derivative-free methods and the q-G method over this set of ten problems, each one solved for 10 independent runs, results in a total number of 100 optimization instances per algorithm. In order to perform a qualitative comparison of the different methods over this set of problems, we calculate for each algorithm the fraction of solved problems given by the ratio between the number of successful runs and the total number of instances. This ratio is computed every 50 evaluations of the objective function. Figures 5, 6 and 7 show the fraction of multimodal, unimodal and all problems, respectively, solved by each method to reach the optimality tolerance along the iterative procedure.

Test problems
The q-G method solved 100 % of the multimodal problems (see Fig. 5), 79 % of the unimodal problems (see Fig. 6) and, in total, our approach solved 83 % of the problems arriving in a 7th position among the 23 methods. The comparison was performed under the same initial conditions and stopping criterion, and the q-G method used the same set of free parameters for all problems, namely σ 0 = L and β = 0.95.

Second set of problems
The performance of the q-G method was also evaluated on twelve 10-D and twelve 30-D test problems, ten unimodal and fourteen multimodal. Their characteristics are summarized in Table 2. These functions are a subset of the problems proposed at CEC′2005 . The q-G method is compared with the top eleven Evolutionary Algorithms ( (Sinha et al. 2005) and SPC-PNX (Ballester et al. 2005). In addition, we applied the Steepest Descent (SD) method to the same test problems in order to compare the q-G method with its classical version. The test problems proposed at CEC′2005 are based on classical benchmark functions such as Sphere, Rosenbrock's, Rastrigin's, Ackley's and Griewank's function. They contain difficulties such as huge number of local minima, shifted global optimum, rotated domain, noise, global optimum outside the initialization range or within a very narrow basin, and a combination of different function properties . The function f 15 , for example, is a composition of Rastrigin's, Weierstrass, Griewank's, Ackley's and Sphere functions. The selected subset of the CEC′2005 test problems comprises those for which at least one of the thirteen algorithms (eleven EAs, the q-G and SD methods) was capable to achieve a fixed accuracy or a target function value defined in .
To ensure a fair comparison, we applied on the q-G and SD methods the same evaluation criteria defined in  and used by the EAs. For each function and dimension, the algorithms performed 25 independent runs from different starting points generated with a uniform random distribution within the search space 1 (see col- Table 2). The stopping criteria are either the termination error value equal to 10 −8 or less {i.e., [f (x best ) − f (x * )] < 10 −8 , where x * is the global opti-mum} or the maximum number of function evaluations (Max_FEs) equal to 10,000 × D. More details of the functions and evaluation criteria can be found in . The resolution of these twelve 10-D and twelve 30-D problems, each one solved for 25 independent runs, results in a total number of 600 optimization instances per algorithm. Here we are also using fixed free parameters σ 0 = √ 5L with β = 0.995 for all 10-D problems, and σ 0 = √ 15L with β = 0.9995 for all 30-D problems. The SD method uses golden section technique to generate the step length. For the multimodal functions,

Table 2 A subset of the test problems proposed at CEC′2005 for dimensions 10-D and 30-D
The column "[x min , x max ] D " shows the domain used to generate the initial points of the search, where D is the dimension. The column f (x * ) shows the evaluation of the function on the global optimum x * . A complete description of these problems can be found in   whenever the SD method terminates before reaching the termination error and the number of function evaluations is less than Max_FEs, it is restarted from a randomly generated point inside the search space.
To evaluate the performance of the methods on this second set of problems, we computed a "success rate" (SR) and a "success performance" (SP) for each algorithm and function as  SR represents the fraction of runs that were successful. A successful run for this set of problem is defined as a run in which the algorithm achieves a given accuracy level 2 with less or equal Max_FEs . SP is the number of function evaluations needed for an algorithm to achieve the minimum, with a given accuracy, in a successful run. Table 3 presents SR and SP values obtained by the q-G and SD methods, for all the twelve 10-D and twelve 30-D test functions.
The performance comparison of the q-G method with its classical version (SD method) and the top eleven EAs of CEC′2005 was made for two groups: the five unimodal and the eight multimodal test functions. For each group and dimensions, 10-D and 30-D, we calculated the average success rate and the average success performance of each algorithm. The average success rate of an algorithm is the arithmetic mean of the SP values in a group of functions. The average success performance is the arithmetic mean of the SP values for the functions with at least one successful run. Tables 4  and 5 show the resulting ranking of the algorithms for the unimodal and multimodal (9) SR = number of successful runs number of total runs , (10) SP = mean of FEs for successful runs × number of total runs number of successful runs .
2 The accuracy level is defined for each function in .

Table 3 Success rate (SR) and success performance (SP) of the q-G and the SD methods for each function and dimension
functions, respectively, and the dimensions 10-D and 30-D. The algorithms are ranked by the following criteria: 1. Highest value of the average success rate (column "Average SR %"). 2. Number of solved functions in each group (column "SF"). A function is considered solved by an algorithm if at least one of the runs is a successful run or if the SR is different from 0. 3. Lowest value of the average success performance (column "Average SP").  For the unimodal problems, the q-G method does not perform very well arriving in eleventh and tenth positions for dimensions 10-D and 30-D, respectively. The average success rates for the unimodal problems are 59 % for 10-D and 20 % for 30-D. Note that the increase of the dimension affected the performance of all algorithms, in terms of either SR or the number of solved functions. Overall, the performance of the q-G method is not very different from its classical version, the SD method, which arrives in the thirteenth and ninth positions, for dimensions 10-D and 30-D, respectively.
This picture changes for the multimodal problems, where the q-G method performed well, arriving in fifth and second positions for 10-D and 30-D, respectively. The average success rates of the q-G method are 43 and 41 % for 10-D and 30-D, respectively. As expected, the SD method has a poor performance over the multimodal problems arriving in twelfth and eighth positions for dimensions 10-D and 30-D, respectively. Again, the increase of the dimension affected the performance of all algorithms.

Conclusions
In this paper we presented the q-G method, a generalization of the Steepest Descent method based on the use of the q-gradient vector to compute the search direction. This strategy provides the algorithm an effective mechanism for escaping from local minima. As implemented here, the search process performed by the q-G method gradually shifts from global search in the beginning to local search in the end. Our computational results have shown that the q-G method is competitive and promising. For the multimodal functions in the two set of problems, it performed well compared to the other derivative-free algorithms, some considered to be among the state-of-the-art in the evolutionary computation and numerical optimization communities.
Although our preliminary results show that the method is effective, further research is necessary. Currently, a novel version of the q-G method, which is able to guarantee the convergence of the algorithm to the global minimum in a probabilistic sense, is under development. This version is based on the generalized adaptive random search (GARS) framework for deterministic functions (Regis 2010). In addition, gains in the performance of the q-G method are expected with the implementation of several improvements, such as inclusion of side, linear and nonlinear restrictions, development of better step selection strategies and others.