Interactive evolutionary computation with minimum fitness evaluation requirement and offline algorithm design

In interactive evolutionary computation (IEC), each solution is evaluated by a human user. Usually the total number of examined solutions is very small. In some applications such as hearing aid design and music composition, only a single solution can be evaluated at a time by a human user. Moreover, accurate and precise numerical evaluation is difficult. Based on these considerations, we formulated an IEC model with the minimum requirement for fitness evaluation ability of human users under the following assumptions: They can evaluate only a single solution at a time, they can memorize only a single previous solution they have just evaluated, their evaluation result on the current solution is whether it is better than the previous one or not, and the best solution among the evaluated ones should be identified after a pre-specified number of evaluations. In this paper, we first explain our IEC model in detail. Next we propose a (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu +1$$\end{document}μ+1)ES-style algorithm for our IEC model. Then we propose an offline meta-level approach to automated algorithm design for our IEC model. The main feature of our approach is the use of a different mechanism (e.g., mutation, crossover, random initialization) to generate each solution to be evaluated. Through computational experiments on test problems, our approach is compared with the (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu +1$$\end{document}μ+1)ES-style algorithm where a solution generation mechanism is pre-specified and fixed throughout the execution of the algorithm.

can evaluate multiple solutions at a time. It is also assumed that a human user can assign a different rank to each solution. However, it is not always easy to assign a different rank to each solution. A simpler fitness evaluation scheme is the choice of a pre-specified number of good solutions from a population (e.g., to choose three from a population of ten solutions). The simplest setting under this scheme is a pair-wise comparison where two solutions are compared with each other (i.e., a better solution is selected from the presented two solutions). In pair-wise comparison-based IEC models (Fukumoto et al. 2010;Takagi and Pallez 2009), it is implicitly assumed that two solutions can be evaluated simultaneously. Thus, the comparison of two solutions is usually counted as a single evaluation. However, in some application tasks of IEC such as hearing aid design (Takagi and Ohsaki 2007) and music composition (Fernandez and Vico 2013), human users can evaluate only a single solution at a time. Our focus in this paper is such a situation where a pair-wise comparison is counted as two evaluations.
In this paper, we assume the following simplest fitness evaluation scenario: a single solution is evaluated at a time, the current solution is compared with the previous one that has been just evaluated, and the evaluation result is whether the current solution is better than the previous one or not. Based on this scenario, we formulated an IEC model with the minimum requirement for the fitness evaluation ability of human users (Ishibuchi et al. 2012(Ishibuchi et al. , 2014a. More specifically, our IEC model is based on the following assumptions: A human user can evaluate only a single solution at a time. (ii) A human user can memorize only a single previous solution. After the evaluation of a current solution is completed, his/her memory is replaced with the newly evaluated one independent of its evaluation result. (iii) A human user can evaluate the current solution in comparison with the previous solution in his/her memory. The evaluation result is whether the current solution is better than the previous one or not. (iv) A human user can evaluate a pre-specified number of solutions in total.
In addition to these assumptions, we further assume that the following requirement should be satisfied in order to identify a single final solution (Ishibuchi et al. 2012(Ishibuchi et al. , 2014a: When a pre-specified number of evaluations is completed, the best solution among the evaluated ones should be identified. One important issue in IEC is to decrease the burden of a human user in fitness evaluation (Sun et al. 2012). Our IEC model was formulated for this purpose by assuming the minimum requirement for human user's fitness evaluation ability. As a result, the complexity of a human user's response is minimized. That is, a human user in our IEC model is supposed to answer the following yes-or-no question after the evaluation of each solution: "Is the current solution better than the previous one?" The simplicity of a human user's response may lead to the possibility of its automated recognition from his/her facial expression or brain wave activity in the future. This recognition task in our model is much simpler than the case of a five-rank evaluation scheme. It may be very difficult to automatically classify a human user's reaction into one of the five ranks. The use of the simple fitness evaluation scheme in our IEC model will make the automated recognition task much easier. Our future goal is the implementation of an IEC model with an automated recognition system. However, in this paper, we focus on the design of evolutionary algorithms to efficiently search for a good solution using a simple fitness evaluation scheme: Whether the current solution is better than the previous one or not.
This paper is an extended version of our former conference papers (Ishibuchi et al. (2012(Ishibuchi et al. ( , 2014a). In Ishibuchi et al. (2012), we proposed the basic idea of our IEC model with the minimum requirement for human user's fitness evaluation ability. We also implemented a simple evolutionary algorithm for our IEC model, which was based on the (1 + 1) generation update mechanism of evolution strategy (ES). This algorithm was referred to as the (1 + 1) ES-style algorithm. In Ishibuchi et al. (2014a), we generalized the (1 + 1)ES-style algorithm to a (µ + 1) ES-style algorithm by proposing an archive maintenance mechanism, which was used to decrease the archive size from µ to 1 before the termination of the algorithm. Then we proposed an idea of automatically designing an evolutionary algorithm for our IEC model in Ishibuchi et al. (2014b). Our idea was to use an offline meta-level approach for the design of an IEC algorithm. An IEC algorithm was designed by specifying an operator (e.g., crossover, mutation, and random initialization) to generate each solution. In Ishibuchi et al. (2014b), an IEC algorithm with 200 evaluations was represented by an operator string of length 200. The i-th operator in each string was used to generate a solution for the i-th evaluation (i = 1, 2, . . . , 200 ). Each string was evaluated by applying it to a test problem 100 times. In this paper, we examine the effect of the following factors on the performance of automatically designed algorithms through computational experiments on a number of test problems:

The number of runs used for evaluating each string
Due to a stochastic nature of EC algorithms, usually a different solution is obtained from a different run of the same EC algorithm. Thus its performance evaluation needs multiple runs. This means that the fitness evaluation of a string in our offline meta-level approach needs multiple runs of the corresponding IEC algorithm. In this paper, we examine the relation between the number of runs for fitness evaluation and the performance of designed algorithms.

The string length
In Ishibuchi et al. (2014b), an IEC algorithm with 200 evaluations was coded by an integer string of length 200 where each integer shows an operator for generating a single solution. If we use six candidate operators as in Ishibuchi et al. (2014b), the size of the search space (i.e., the total number of different strings) is 6 200 . Since the search space is large and the fitness evaluation has a stochastic nature, it is not likely that the optimal solution can be obtained. For the same reason, it is not easy to search for a good approximate solution, either. A simple idea for decreasing the size of the search space is the use of the same operator to generate a number of solutions. For example, if the same operator is used to generate 20 solutions, an IEC algorithm with 200 evaluations is coded by an integer string of length 10. The search space is decreased from 6 200 to 6 10 . The first value of the string of length 10 is used to generate the first 20 solutions. In this paper, we examine the relation between the string length and the performance of designed algorithms.

The number of possible operators
In Ishibuchi et al. (2014b), one of six candidate operators was selected to generate a single solution. Other specifications of candidate operators can be possible. For example, we can use a sequence of operators such as "crossover & mutation" and "mutation & mutation" as a single candidate operator to generate a new solution. In this manner, we can increase the number of candidate operators for generating a solution. It is also possible to decrease the number of candidate operators by removing a specific operator (e.g., crossover). In this paper, we examine the relation between the specification of candidate operators and the performance of designed algorithms.
In this paper [and in our former studies (Ishibuchi et al. 2012(Ishibuchi et al. , 2014a], we use a test problem instead of a human decision maker in computational experiments. No actual IEC experiments with human decision makers are included. Practical usefulness of our offline meta-level approach totally depends on the similarity between an actual IEC problem and a test problem used in our computational experiments. Our intention is not to insist any practical usefulness of our approach in real-world IEC applications, but to discuss the design of IEC algorithms under severely limited information about the fitness of each solution. We believe that the idea of using a different operator to generate each generation will give a new insight to the design of IEC algorithms and also to the design of EC algorithms in general. This paper is organized as follows. In "Our IEC model" section, we explain our IEC model. In "Our (µ + 1)ES-style IEC algorithm" section, we show how an archive maintenance mechanism in our former study (Ishibuchi et al. 2014b) was derived. Using the derived mechanism, we explain our (µ + 1)ES-style algorithm in its general form including the case of µ = 1. Its performance is also examined in "Our (µ + 1)ES-style IEC algorithm" section for different values of µ. In "Meta-level approach to the design of IEC algorithms" section, we show an offline meta-level approach for automatically designing an IEC algorithm. The performance of designed algorithms under various settings of our offline meta-level approach is also evaluated in comparison with the (µ + 1)ES-style algorithm in "Meta-level approach to the design of IEC algorithms" section. This paper is concluded in "Conclusion" section.

Our IEC model
The main feature of our IEC model is the necessity of solution re-evaluation for identifying the best solution among the evaluated ones. Some solutions may be re-evaluated several times. This is often the case in our everyday life. For example, we usually examine some pairs of glasses several times to compare them with each other before buying a single pair. It is very difficult for us to choose a single best solution after evaluating a number of solutions just once. Let us explain this feature using the following simple example with five solutions.
Example 1 Ishibuchi et al. (2014b) Let us assume that we have five solutions: means that a solution y is preferred to a solution x. Thus x C is the worst and x D is the best. Let us evaluate the five solutions x A , x B , x C , x D and x E in this alphabetical order. First x A is shown to a human user. Next x B is evaluated in comparison with x A . The evaluation result is "x A is better than x B (i.e., x B ≺ x A )". Then x C is evaluated as x C ≺ x B . After the evaluation of the three solutions, we can say that x A is the best since After the evaluation of x D , we cannot say which is the best between x A and x D (since the available information is It is clear from this evaluation result that x E is not the best. However, we cannot still say which is the best between x A and x D (since the available information is From this result, we can say that x D is the best solution. This example explains the necessity of solution re-evaluation to identify the best solution among the examined ones. In our IEC model, the upper limit on the total number of evaluations is pre-specified (e.g., 200 in our computational experiments). An important requirement in our IEC model is that the best solution among the examined ones should be identified after the pre-specified number of evaluations without any additional re-evaluations. Let us assume that the upper limit on the total number of evaluations is seven in the abovementioned example. The best solution x D was identified after six evaluations in the order of Since the total number of evaluations is six and its upper limit is seven, we can evaluate one more solution x F in comparison with the previously evaluated solution x A . If the evaluation result is x F ≺ x A , we can say that x D is the best solution among the examined six solutions. If the evaluation result is x A ≺ x F , we cannot say which is better between x D and x F . In order to identify the best solution between them, we need to re-evaluate x D after the evaluation of x F . However, we cannot perform this re-evaluation since the given upper limit on the total number of evaluations is seven. This means that we cannot identify the best solution among the examined six solutions when the evaluation result is x A ≺ x F . In order to satisfy both requirements (i.e., the upper limit on the total number of evaluations and the identification of the best solutions among the examined ones), we have to terminate the search after the sixth evaluation in the order of This example suggests the necessity of early termination before the total number of evaluations reaches the upper limit. In our IEC model, we assume that the decision maker can always answer the following question: "Is the current solution x t at the t-th evaluation better than the previous solution x t−1 ?" When the decision maker thinks that there is no difference between them, we assume that the decision maker's answer is "Yes". In our computational experiments on a minimization problem of an objective function f (x), it is assumed that the decision maker's answer is "Yes" if and only if f (x t−1 ) ≥ f (x t ).
Let us denote the given upper limit on the total number of evaluations by T. The task in our IEC model is to search for a good solution using up to T evaluations. From the assumption (v) in "Background" section, the best solution among the evaluated ones should be identified when an IEC algorithm is terminated. As we have already explained, the algorithm may be terminated before T evaluations due to this requirement. In the next section, we discuss the identification of the best solution among the evaluated ones and the termination of an IEC algorithm.

Archive maintenance rule
Before explaining our (µ + 1)ES-style IEC algorithm, we explain how we can identify the best solution among the examined ones. Let x t be the solution to be evaluated at the t-th evaluation. We denote a set of candidate solutions for the best solution after the evaluation of x t by S t . That is, S t includes the examined solutions with the possibility to be the best solution. In the following, we first explain the update of S t depending on the evaluation result of x t at the t-th evaluation. Then we show how the best solution among the evaluated ones can be identified by re-evaluation.
After the first solution x 1 is evaluated, S t is specified as S 1 = {x 1 } since no other solutions are examined. Next x 2 is examined. If x 2 is better than x 1 (i.e., x 1 ≺ x 2 ), S t is updated as S 2 = {x 2 } since x 2 is the best solution among the examined one. If x 1 is better than x 2 (i.e., x 1 ≻ x 2 ), S t is not changed: S 2 = S 1 = {x 1 }. Then x 3 is examined. Depending on the evaluation result of x 3 , S t is updated. For example, when S 2 = {x 1 } and x 2 ≺ x 3 , S t is updated as S 3 = {x 1 , x 3 } since both of x 1 and x 3 have the possibility to be the best solution. In this case, we have two options about the choice of the fourth solution x 4 : one is to generate a new solution, and the other is to re-evaluate the first solution x 1 to decrease the size of S t . When x 1 is re-evaluated as the fourth solution (i.e., x 4 = x 1 ), S t is updated as follows: When a new solution x 4 is evaluated (instead of re-evaluating x 1 ) in the case of S 3 = {x 1 , x 3 }, S t is updated as follows: Let us denote the cardinality of S t by |S t | (i.e., |S t | is the number of candidate solutions in S t ). The update of S t based on the evaluation result of x t is summarized as follows: Case A: x t is a new solution: Since x t = x q holds in Case B, S t in B-1 and B-3 can be also written as S t = S t−1 − {x t−1 } and S t = S t−1 , respectively. The above formulations of S t in B-1 and B-3 are for explicitly explaining that x t ∈ S t always holds after the candidate solution set update when x t−1 ≺ x t (see also A-1 and A-3).
The evaluation of a new solution in Case A increases the number of candidate solutions only in A-3. In Case B, the number of candidate solutions can be decreased by the re-evaluation of a candidate solution whenever x t−1 ∈ S t−1 holds (i.e., in B-1 and B-2). Only in B-3, the re-evaluation of a candidate solution in Case B does not decrease the number of candidate solutions. However, in B-3, x t ∈ S t always holds after the re-evaluation of x t . As a result, the re-evaluation at the (t + 1)th evaluation always decreases the number of candidate solutions. This means that the number of candidate solutions can be always decreased by iterating the re-evaluation twice.
Let us discuss whether a new solution x t can be evaluated at the t-th evaluation. As explained in "Our IEC model" section, the upper limit on the total number of evaluations is given and denoted by T. First, let us consider the case of x t−1 ∈ S t−1 . In this case, the evaluation of a new solution x t at the t-th evaluation does not increase the number of candidate solutions (see A-1 and A-2). After the t-th evaluation, the upper limit on the number of remaining evaluations is (T − t). Since one candidate solution can be removed by iterating the re-evaluation twice, we can remove Int((T − t)/2) candidate solutions by iterating the re-evaluation (T − t) times after the t-th evaluation where Int((T − t)/2) is the integer part of (T − t)/2. Thus we can evaluate a new solution x t when the following relation holds: Next, let us consider the case of x t−1 / ∈ S t−1 . In this case, the evaluation of a new solution x t at the t-th evaluation increases the number of candidate solutions from |S t−1 | to |S t | = |S t−1 | + 1 when the conditions in A-3 hold. In A-3, x t ∈ S t always holds after the evaluation of the new solution x t . Thus the number of candidate solutions can be decreased by the re-evaluation at the (t + 1)th evaluation from |S t | to |S t+1 | = |S t | − 1 = |S t−1 |. After the (t + 1)th evaluation, the upper limit on the number of remaining evaluations is (T − t − 1). We can remove Int((T − t − 1)/2) candidate solutions by iterating the re-evaluation (T − t − 1) times after the (t + 1)th evaluation. Thus we can evaluate a new solution x t when the following relation holds: Since the left hand side is also integer, this inequality condition is equivalent to |S t−1 | ≤ (T − t + 1)/2.
These discussions are summarized as the following archive maintenance rule:

Archive maintenance rule
A new solution x t is evaluated at the t-th evaluation in the following two cases: In all the other cases, x t should be a candidate solution randomly selected from S t−1 (excluding x t−1 ).
Let us discuss the solution evaluation at t = T. That is, let us examine whether our archive maintenance rule is valid for the last evaluation at t = T. When x T −1 ∈ S T −1 , there are two possibilities: |S T −1 | = 1 and |S T −1 | = 2. If |S T −1 | = 1 [i.e., when (a) is satisfied in the archive maintenance rule], a new solution x T can be evaluated and compared with x T −1 . The final solution is the better one between x T −1 and x T . Thus |S T | = 1 is satisfied. If |S T −1 | = 2 (i.e., when (a) is not satisfied), one candidate solution in S T −1 is x T −1 . The other candidate solution in S T −1 is re-evaluated and compared with x T −1 at t = T . The final solution is the better one in this comparison. Thus |S T | = 1 is satisfied. When x T −1 / ∈ S T −1 , |S T −1 | = 1 always holds from our archive maintenance rule. In this case, (b) is never satisfied since |S T −1 | = 1 and t = T. Thus a new solution is not examined. Since we have only a single candidate in S T −1 , its re-evaluation is meaningless. Thus no solution is evaluated at t = T. As a result, |S T | = 1 holds after the termination of the algorithm.
For demonstrating our archive maintenance rule, let us perform a simple computer simulation by assuming a minimization problem of f (x) = x. We also assume that a new solution x t is generated as a random real number in the unit interval [0, 1]. Our archive maintenance rule is used for 200 evaluations (t = 1, 2, . . . , 200 and T = 200). Average results over 100 runs are shown by dotted lines in Fig. 1. The average number of candidate solutions in S t and the average number of evaluated new solutions are calculated in Fig. 1a, b, respectively. In Fig. 1, results of a single run are also shown by solid lines. We can see from Fig. 1a that the number of candidate solutions first increases from |S t | = 1 at t = 1 to about 40 and then decreases to |S T | = 1 at T = 200.

Archive maintenance for (µ + 1)ES-style algorithms
By introducing the upper bound µ on the number of candidate solutions, we modify our archive maintenance rule in the previous subsection to design a (µ + 1)ES-style algorithm. Our idea is to re-evaluate a candidate solution whenever the number of solutions increases from µ to (µ + 1). That is, a new solution can be evaluated only when the number of candidate solutions is less than or equal to µ. This idea is combined into our archive maintenance rule as follows:

Archive maintenance rule for (µ + 1)ES-style algorithms
A new solution x t is evaluated at the t-th evaluation in the following two cases: In all the other cases, x t should be a candidate solution randomly selected from S t−1 (excluding x t−1 ).
For demonstrating the effect of incorporating the upper bound µ into our archive maintenance rule, we specify µ as µ = 10 and perform the same computer simulation as in Fig. 1. Average results over 100 runs are shown in Fig. 2 together with results of a single run. As shown in Fig. 2a, the number of candidate solutions is decreased to 10 by reevaluating a candidate solution whenever it becomes 11. In the final stage, the number of candidate solutions is decreased to one. A little bit more new solutions are examined in Fig. 2b than Fig. 1b. For examining this issue, we perform the same computer simulation for each of the following six settings of µ: µ = 1, 2, 5, 10, 20, 50. The average total number of examined new solutions over 100 runs for each setting is as follows: 146. 8, 146.1, 144.6, 142.5, 138.8, 134.2 for µ = 1, 2, 5, 10, 20, 50, respectively. A little bit more new solutions are examined when we use a small value of µ (i.e., a little bit more reevaluations are needed when we use a large value of µ).

Generation of new solutions
An important issue in the design of (µ + 1)ES-style algorithms is how to generate a new solution x t to be compared with the previous solution x t−1 at the t-th evaluation. A simple idea is the use of a mutation operator to generate a new solution x t from a randomly selected candidate solution in S t−1 . We used this idea in a (1 + 1)ES-style algorithm in Ishibuchi et al. (2012) and a (µ + 1)ES-style algorithm in Ishibuchi et al. (2014a). The basic framework of our (µ + 1)ES-style algorithm in Ishibuchi et al. (2014a) can be written as follows: a b The basic framework of our (µ + 1)ES-style IEC algorithm 1. An initial solution x 1 is randomly generated. Initialize t and S t as t = 1 and S t = {x 1 }. 2. Update t as t + 1 (i.e., t = t + 1).
3. Decide whether a new solution can be evaluated at the t-th evaluation using the archive maintenance rule in "Archive maintenance for (µ + 1)ES-style algorithms" section. 4. If a new solution can be evaluated, x t is generated by a mutation operator from a ran- Then update S t based on the comparison result. 6. If the termination condition is not satisfied, return to Step 2.
When two or more candidate solutions are stored in S t−1 , it is possible to use a crossover operator as in standard genetic algorithms to generate a new solution x t in Step 4. That is, a crossover operator is applied to a randomly selected pair of different candidate solutions for generating an offspring. Then a mutation operator is applied to the offspring to generate a new solution x t . It should be noted that we cannot use any fitness-based parent selection mechanism since no information is available about the fitness of each candidate solution (i.e., since no comparison has been performed among the candidate solutions in S t−1 ). Thus, each parent is randomly selected from the candidate solution set. When we use a crossover operator, we always select a pair of different candidate solutions. This is to make the crossover operator always meaningful.

Computational experiments by our (µ + 1)ES-style IEC algorithm
In this subsection, we examine the search ability of our (µ + 1)ES-style IEC algorithm under various specifications of µ on well-known six continuous test problems: Sphere, Rosenbrock, Griewank, Ackley, Levy and Rastrigin functions (e.g., see Surjanovic and Bingham 2013). The number of decision variables is specified as 50: x = (x 1 , x 2 , . . . , x n ) where n = 50. This 50-dimensional decision vector is represented by a real number string of length 50 in our computational experiments. The upper limit on the total number of evaluations is always specified as T = 200 throughout this paper. Four specifications of µ are examined: µ = 1, 2, 5, 10.
We examine the search ability of our (µ + 1)ES-style IEC algorithm for each combination of the four values of µ and the two settings for new solution generation mechanisms explained in the previous subsection (i.e., mutation only and crossover & mutation). For mutation, we use the polynomial mutation operator with P m = 1 and η m = 20 [for details, see Hamdan (2010)]. For crossover, we use the simulated binary crossover (SBX) with η c = 15 (Deb and Kumar 1995). When a new solution is to be generated by mutation only, the polynomial mutation is used with the probability 1.0. When a new solution is to be generated by crossover & mutation, both the SBX crossover and the polynomial mutation are used with the probability 1.0.
The comparison of the current solution x t with the previous one x t−1 is simulated by a test function f (x) as follows: x t is preferred to x t−1 by the decision maker when f (x t ) ≤ f (x t−1 ) for the minimization problem of f (x). That is, the evaluation result is

Each test problem is a minimization problem of the following non-linear function [16]:
Sphere: Rosenbrock: Ackley: (ω i − 1) 2 1 + 10 sin 2 (π ω i + 1) where In Fig. 3, we show the shape of each function for the case of two decision variables [i.e., x = (x 1 , x 2 )]. The Sphere function is a simple quadratic function with no local minima. The Rosenbrock function has no local minima, either. The decision variables are not separable in the Rosenbrock function whereas they are separable in the Sphere function. The Griewank function has a large number of small local minima. Since they are very small, the function shape in Fig. 3c looks very simple. The Ackley function in Fig. 3d has many small and shallow local minima. The other two functions are complicated nonlinear functions with many small but deep local minima as shown in Fig. 3e, f. From Fig. 3, one may think that near optimal solutions of the Sphere function can be easily found. This is almost always the case in the literature. However, it is not the case in this study due to the following three reasons: (i) the fitness evaluation of each solution is the comparison with the previous solution, (ii) the upper limit on the number of evaluations is only 200, and (iii) each test problem has 50 decision variables. One may also think that multi-point global search algorithms with high diversification ability are needed to handle the highly non-linear Levy and Rastrigin functions. However, for the same three reasons, high convergence ability is very important to find a good solution even for those functions. Our task is to find a good solution of each test problem with 50 decision variables under the severely limited number of evaluations and the very simple fitness evaluation mechanism.
Average results over 1000 runs of our (µ + 1)ES-style algorithm are summarized in Tables 1 and 2. Only mutation is used in Table 1 while both crossover and mutation are used in Table 2. No crossover is used when µ = 1 even in Table 2. So the same results are shown for µ = 1 in the two tables. The best result (i.e., the smallest average function value) for each test problem is highlighted by bold in each table. In these tables, the best or near best results are obtained from our (µ + 1)ES-style algorithm with µ = 1.  For the Levy and Rastrigin functions, the best results are obtained from our (µ + 1)ESstyle algorithm with µ = 5 in Table 2 where both crossover and mutation are used. However, differences between those best results and the results by µ = 1 are small in Table 2 if compared with their standard deviations in parentheses. For visually examine their differences, we show the histogram of 1000 solutions obtained from each of the two settings (i.e., µ = 1 and µ = 5 in Table 2) for the Levy and Rastrigin functions in Fig. 4. We can see that the two histograms by µ = 1 and µ = 5 for each test problem are heavily overlapping in each plot in Fig. 4. In Fig. 4a, a long black bar around 45,000 may show that the search with µ = 1 is trapped in local minima of the Levy function in its many runs.  In Fig. 5, we show how the function value was decreased by 200 evaluations in each setting of our (µ + 1)ES-style algorithm with crossover and mutation in Table 2. Figure 5a-d clearly show the deterioration of the search ability by increasing the value of µ (i.e., by increasing the upper bound on the number of candidate solutions). In Fig. 5e, f, the best results are obtained from µ = 5 for the Levy and Rastrigin functions (see Table 2). However, as shown in Fig. 4, we cannot observe any clear performance improvement by increasing the value of µ in Fig. 5e, f.

Meta-level approach to the design of IEC algorithms
In our computational experiments in "Our (µ + 1)ES-style IEC algorithm" section, good results are obtained by the (1+1)ES-style algorithm where new solutions are always generated by mutation. No experimental results strongly support the necessity of multiple candidate solutions and crossover in our (µ + 1)ES-style algorithm. In this section, we further try to improve the performance of our (µ + 1)ES-style algorithm using an idea of offline meta-level design of IEC algorithms. The necessity of multiple candidate solutions and crossover is clearly shown for the Levy and Rastrigin functions in this section.
In general, an important issue in evolutionary computation is how to generate new solutions to be evaluated. This issue is more important in IEC algorithms since only a small number of solutions can be evaluated. Since re-evaluation of solutions is needed in our IEC model, standard EC algorithms cannot be directly used. Motivated by these discussions, we proposed an idea of offline meta-level design of IEC algorithms in our former study (Ishibuchi et al. 2014b). The basic idea in Ishibuchi et al. (2014b) is to represent an IEC algorithm by an integer string of length T. Each string (i.e., each IEC algorithm) is evaluated by applying it to a test problem. In this section, we examine various implementation issues of this idea such as the number of runs for evaluating each string, the string length, and the number of possible operators to generate a new solution.

Offline meta-level algorithm design approach in Ishibuchi et al. (2014b)
In this subsection, we explain an offline meta-level approach to the design of IEC algorithms in our former study (Ishibuchi et al. 2014b). In our offline meta-level approach, each IEC algorithm with T evaluations is coded by a string of length T as τ = τ 1 τ 2 . . . τ T where τ t shows how to generate the t-th solution x t . In Ishibuchi et al. (2014b), τ t is one of the following six operators: Operator 0: Re-evaluation (if inapplicable, random creation is used), Operator 1: Re-evaluation (if inapplicable, mutation is used), Operator 2: Random creation, Operator 3: Crossover (if inapplicable, random creation is used), Operator 4: Crossover (if inapplicable, mutation is used), Operator 5: Mutation, where re-evaluation means the random selection of a candidate solution from S t−1 (excluding x t−1 ). If S t−1 includes only x t−1 (i.e., S t−1 = {x t−1 }), re-evaluation is not applicable. In this case, random creation is used in Operator 0 while mutation is used in Operator 1. Mutation is applied to a randomly selected candidate solution from S t−1 . Except for the generation of the first solution, mutation is always applicable since we have at least one candidate solution. The first solution x 1 is always generated by random creation (since all of the other operators are inapplicable to generate the first solution). Crossover is applied to two candidate solutions that are randomly selected from S t−1 . If the number of candidate solutions in S t−1 is one, crossover is not applicable. In this case, random creation is used in Operator 3 while mutation is used in Operator 4. It should be noted that the string τ is used to generate solutions together with our archive maintenance rule in "Archive maintenance rule" section without the upper limit µ on the number of candidate solutions. More specifically, τ t is used to generate the t-th solution x t only when the generation of a new solution is allowed by the archive maintenance rule. Otherwise, the re-evaluation of a randomly selected candidate solution from S t−1 (excluding x t−1 ) is performed.
The six operators are denoted by the corresponding integers in Ishibuchi et al. (2014b): , 1, 2, 3, 4, 5} for t = 1, 2, . . . , T. Thus the search space size is 6 T . A simple evolutionary algorithm with the following components is used to search for the best integer string (i.e., the best IEC algorithm) in Ishibuchi et al. (2014b): • Random creation of initial strings (i.e., randomly generated initial population), • Binary tournament selection for choosing a pair of parents, • Uniform crossover, • Mutation (the current value is replaced with a randomly specified integer), • (µ + 1)ES-style generation update mechanism to construct the next population.
The fitness of each string is evaluated by applying the corresponding IEC algorithm to a test problem (as in our computational experiments in "Computational experiments by our (µ + 1)ES-style IEC algorithm" section). In Ishibuchi et al. (2014b), the average result over 100 runs of the IEC algorithm on the test function is used as its fitness value.

Various implementation issues of offline meta-level approach
In this section, we discuss various implementation issues of our offline meta-level approach to the design of IEC algorithms. The effect of each implementation issue on the performance of designed IEC algorithms is reported in the next subsection.

The number of possible operators
In Ishibuchi et al. (2014b), one of the six operators is used to generate a new solution for each evaluation. It is possible to use a different set of operators in our approach. For example, Operator 3 and Operator 4 can be removed for designing an IEC algorithm with re-evaluation, random creation and mutation. It is also possible to add "crossover & mutation" to the set of the six operators in Ishibuchi et al. (2014b). We examine the use of a different set of operators in the next subsection.

The number of runs used for evaluating each string
In Ishibuchi et al. (2014b), each string (i.e., each IEC algorithm) is evaluated by the average performance over its 100 runs. In general, the fitness evaluation becomes more accurate by increasing the number of runs. However, the increase in the number of runs leads to the increase in computation time. We examine the effect of the number of runs for the fitness evaluation on the performance of obtained IEC algorithms in the next subsection.

The string length
In Ishibuchi et al. (2014b), an IEC algorithm with 200 evaluations is coded by an integer string of length 200. This is to use a different operator to generate a new solution at each evaluation. Since we have six operators, the search space size is 6 200 . One may think that we do not have to use a different operator to generate a solution at each evaluation. If we use the same operator for 10 evaluations, the string length is decreased from 200 to 20 as τ = τ 1 τ 2 . . . τ 20 where τ t is used to generate 10 solutions from the (10t − 9)-th evaluation to the 10t-th evaluation. In the next subsection, we examine various specifications of string length (i.e., various specifications of the number of evaluations where the same operator is used).

Computational experiments of meta-level algorithm design
In our previous study (Ishibuchi et al. 2014b), our offline meta-level approach was applied to the Sphere and Rastrigin functions under the following setting, which is referred to as the basic setting in this paper: Coding: integer string of length 200 with 0, 1, 2, 3, 4, 5, Population size: 100, Termination condition: 1000 generations, Generation update model: (µ + 1)ES-style, Crossover: uniform crossover with the crossover probability 1.0, Mutation: random generation of an integer value with the mutation probability 1/ (string length), Fitness evaluation of each string: average performance of 100 runs.
In this paper, we apply our approach to all the six test problems in "Our (µ + 1)ES-style IEC algorithm" section. Average results are calculated over ten runs of our approach. After the termination of our approach, a single string with the best fitness value in the final population is selected as the designed IEC algorithm. The designed IEC algorithm is evaluated by its additional 100 runs which are different from the 100 runs for fitness evaluation during the execution of our offline meta-level approach. The design of an IEC algorithm and its performance evaluation are iterated ten times. This means that the performance of our approach is evaluated by 1000 runs (i.e., 100 runs of each of the ten algorithms designed by our approach).
First, let us examine the effect of a set of operators for solution generation on the performance of designed algorithms. As explained in "Offline meta-level algorithm design approach in Ishibuchi et al. (2014b)" section, the six operators are used to generate new solutions in our former study (Ishibuchi et al. 2014b). In this paper, we also examine the following two settings with respect to possible operators in addition to the six operators in Ishibuchi et al. (2014b).

Four operators
In order to examine the necessity of crossover, we perform computational experiments using the set of the following four operators.

Eight operators
For comparison, we also perform computational experiments using the following two operators in addition to the six operators in "Offline meta-level algorithm design approach in Ishibuchi et al. (2014b)" section (eight operators in total).

Operator 6: Crossover & Mutation (if crossover is inapplicable, random creation is used), Operator 7: Crossover & Mutation (if crossover is inapplicable, mutation is used).
Average results over ten runs are summarized in Table 3. For comparison, we show the average results by the (1+1)ES-style algorithm in the second column of Table 3. The best average result for each test problem is highlighted by bold. We cannot observe any clear performance improvement from the (1+1)ES-style algorithm for the first four test problem in Table 3. This observation is consistent with the performance deterioration for those test problems by increasing the value of µ in "Our (µ + 1)ES-style IEC algorithm" section. For the last two test problems, however, we can observe clear performance improvement by our approach.
As we have already explained, our approach is applied to each test problem 10 times. Each of the ten designed algorithms is evaluated by its 100 runs after the termination of our approach (i.e., 1000 runs in total for each test problem). Figure 6 shows average results over those 1000 runs for the Levy and Rastrigin functions. For comparison, we also show the average results over 1000 runs of the (1+1)ES-style algorithm in "Our (µ + 1)ES-style IEC algorithm" section. In Fig. 6, we can observe clear performance improvement by our approach with the six and eight operators. Inferior performance Table 3 Average results over 10 runs of our offline meta-level approach with a different setting of solution generation operators Standard deviations in parentheses are calculated over 1000 runs by the ten designed IEC algorithms for each problem of the four-operator setting in comparison with the six-operator and eight-operator settings in Fig. 6 suggests the usefulness of crossover for the Levy and Rastrigin functions. In Fig. 7, we show the histogram of 1000 solutions obtained by 100 runs of each of the ten designed algorithms with the six-operator setting. For comparison, we also show a b the histogram of 1000 solutions by the (1 + 1)ES-style algorithm in "Our (µ + 1)ES-style IEC algorithm" section. We can observe clear differences between the two histograms in each plot in Fig. 7. Next, let us examine the effect of the number of runs for fitness evaluation on the performance of our offline meta-level approach. In the previous computational experiments, each string (i.e., each IEC algorithm) is evaluated by its 100 runs on a test problem. That is, the average result over the 100 runs is used as the fitness of each string. It is likely that the decrease in the number of runs for fitness evaluation leads to the performance deterioration of designed IEC algorithms. For discussing this issue, we perform computational experiments for three settings: 5 runs, 20 runs and 100 runs for fitness evaluation. All the other specifications are the same as the basic setting (e.g., the six operators for solution generation). Our approach is applied to each test problem ten times using each setting of the number of runs for fitness evaluation. Average experimental results are summarized in Table 4. Experimental results on the Levy and Rastrigin functions are also shown in Fig. 8. As expected, the performance of the designed IEC algorithms was deteriorated by decreasing the number of runs. However, the deterioration is not so Table 4 Average results over 10 runs of our offline meta-level approach with a different setting of the number of runs for fitness evaluation Standard deviations in parentheses are calculated over 1000 runs of the ten designed IEC algorithms (100 runs of each algorithm) for each problem a b Fig. 8 Average results of over 1000 runs of the ten designed IEC algorithms (100 runs of each algorithm) for the three settings of the number of runs for fitness evaluation. a Levy function. b Rastrigin function severe if compared with the improvement from the (1+1)ES-style algorithm for the Levy and Rastrigin functions as shown in the last two rows of Table 4 and Fig. 8. Finally, let us examine the effect of string length on the performance of our metalevel algorithm design approach. In our previous computational experiments, an IEC algorithm with 200 evaluations is coded by an integer string τ of length 200 as τ = τ 1 τ 2 . . . τ 200 where τ t is used to generate a solution for the t-th evaluation. When we use the six operators, the total number of strings is 6 200 . One may think that the problem size (i.e., 6 200 ) may be too large. One may also think that it is not needed to use a different operator for generating each solution. The string length can be decreased by using τ t for generating multiple solutions. In this paper, we examine the following four settings: τ t is used for generating a single solution (i.e., the basic setting: string length 200), 5 solutions (string length 40), 10 solutions (string length 20), and 50 solutions (string length 4). Each setting is evaluated by ten runs of our offline meta-level approach.
Experimental results are summarized in Table 5. For the first four test problems, similar results are obtained from the four settings of the string length and the (1+1)ES-style IEC algorithm in Table 5. This observation may suggest that we do not have to use different operators for those test problems (i.e., only mutation is enough). This issue will be further discussed later in "Algorithm design" section. For the Levy and Rastrigin functions, however, we can observe clear performance deterioration when the string length is specified as 4. Experimental results on the Levy and Rastrigin functions are also shown in Fig. 9. In the case of string length 4, the same operator continues to be used to generate 50 solutions. That is, solution generation operators are changed only after the 50th, 100th and 150th evaluations. This leads to an interesting shape of the solid blue line in each plot in Fig. 9. For example, we can observe slow performance improvement before the 50th evaluation and speed-up after the 50th evaluation in Fig. 9a, b. Since almost the same results are obtained from the other settings (i.e., string length of 20, 40, 200), we can see that a different operator is needed for every ten solutions (whereas a different operator is not needed for every solution).

Further examination of designed algorithms
As shown in our computational experiments in this section, our offline meta-level approach found better algorithms than the (1+1)ES-Style algorithm for the Levy and Rastrigin functions. In this subsection, we further examine the ten designed algorithms for each test problem by the best setting for each test problem in Table 5 (i.e., six operators, 100 runs and string length 20 or 40). Each of the ten designed algorithms is an integer string of length 20 for the first three problems (Sphere, Rosenbrock and Griewank) and length 40 for the last three problems (Ackley, Levy and Rastrigin). In Table 6, we show the average percentage of each integer among the generated ten algorithms for each problem.
Each of the ten designed algorithms for each test problem is applied to the test problem 100 times. During this computational experiment, we monitor how each solution is generated. That is, we check which operator is actually used for generating each solution. Then we calculate the percentage of solutions generated by each operator. Our experimental results are summarized in Table 7. In Table 7, "Re-evaluation (operator)" and "Re-evaluation (archive)" mean the re-evaluation by the designed algorithm string and the archive maintenance rule, respectively.
We can observe clear differences in experimental results in Table 7 between the first four problems and the last two problems. Crossover is mainly used to generate new solutions for the last two problems whereas mutation is mainly used for the first four problems. More solutions are generated randomly for the last two problems. a b Fig. 9 Average results of the designed IEC algorithms for the four settings of the string length. a Levy function. b Rastrigin function Table 6 Average percentage of each integer among the ten IEC algorithms designed by ten runs of our offline meta-level approach These differences are related to the shape of each function: the Levy and Rastrigin functions have a number of deep local minima. We can also see that the percentage of re-evaluations is almost the same for all test problems. This is because a single best solution should be identified within 200 evaluations. A little bit more re-evaluations are performed by the archive maintenance rule for the Levy and Rastrigin functions. This may be related to the number of candidate solutions (as we mentioned in "Archive maintenance for (µ + 1)ES-style algorithms" section with respect to the relation between the number of re-evaluations and the upper limit µ on the number of candidate solutions).
For discussing this issue, we calculate the average number of candidate solutions in our computational experiments by the ten designed algorithms for each test problem. Experimental results are shown in Fig. 10. It should be noted that different scales are used for the vertical axis between Fig. 10a-d and e-f. The number of candidate solutions in Fig. 10e and f is much larger than the results for the first four test problems in Fig. 10a-d. This difference may be related to a difference in the average percentage of re-evaluations by the archive maintenance rule in Table 7 between the first four problems and the last two problems.
For the Levy and Rastrigin functions, we further check which operator is actually used to generate each solution. Then we calculate the percentage of each operator in each of the following four different search phases: 1-50th evaluations, 51-100th evaluations, 101-150th evaluations and 151-200th evaluations. Our experimental results are summarized in Tables 8 and 9. We can obtain the following observations from both tables: (1) New solutions for the first 50 evaluations are mainly generated randomly whereas percentages of random creation are very low for the other evaluations (i.e., 51-200th evaluations). This observation suggests that the designed IEC algorithms first search for promising search areas randomly before generating new solutions from stored candidate solutions by crossover.
Percentages of re-evaluation in the first 50 evaluations are clearly lower than those in the other evaluations. This observation corresponds to the increase in the number of candidate solutions in Fig. 10e, f during the first 50 evaluations.
There exist no large differences in the average percentage of each operator among the last three search phases: 51-100, 101-150 and 151-200 evaluations. That is, the average percentages of mutation, crossover, random generation and re-evaluation (operator) are in [8,16], [46,51], [3,12] and [24,38], respectively. This observation may suggest the necessity of totally different search strategies . It seems that the designed algorithms search for good starting points by randomly generating solutions in the early search phase. However, even in the first 50 evaluations, mutation is mainly used in Table 10 for the Sphere function with no local minima.

Algorithm design
From our experimental results, we can see that the first four problems (Sphere, Rosenbrock, Griewank and Ackley) and the last two problems (Levy and Rastrigin) need totally different algorithms. For the first four problems, the (1 + 1)ES-style algorithm worked well. However, from Tables 8, 9, 10, the examination of randomly generated solutions in the early generations seems to be a good idea for not only the last two test problems but also the first four test problems. So, we implement a slightly modified (1 + 1)ES-style algorithm by using random solutions in the first ten evaluations instead of mutated solutions in the (1 + 1)ES-style algorithm. This algorithm is referred to as the "(1 + 1)ES-Random-10" algorithm.  Table 9 Average percentage of each operator in a different phase for the Rastrigin function in Table 7 Search phase Re-evaluation (operator) (%)  Table 10 Average percentage of each operator in a different phase for the Sphere function in Table 7 Search phase Re-evaluation (operator) (%) For comparison, we also implement the "(1 + 1)ES-Random-50" algorithm where the first 50 solutions are generated randomly. Experimental results are summarized in Fig. 11. It is shown by Fig. 11 that the use of random solutions in the first ten evaluations a b

Re
c d e f Fig. 11 Experimental results by the (1 + 1)ES-style algorithm and its two variants: (1 + 1)ES-Random-10 and (1 + 1)ES-Random-50. a Sphere. b Rosenbrock. c Griewank. d Ackley. e Levy. f Rastrigin clearly improves the performance of the (1 + 1)ES-style algorithm for the last two test problems without degrading its performance for the first four test problems. For the last two test problems, we can further improve the performance of the (1 + 1)ES-style algorithm by increasing the archive size and using the crossover operator. However, its performance for the first four test problems is deteriorated by those changes. Finally, we examine the generalization ability of the ten designed algorithms in the best setting in Table 5. Each algorithm designed for a test problem is applied to other test problems for examining its generalization ability. In our computational experiments, we divide our six test problems into two groups: Group A = {Sphere, Griewank, Levy} and Group B = {Rosenbrock, Ackley, Rastrigin}. Group A and Group B include the three test problems in the left and right columns of each figure (e.g., Fig. 11), respectively. Each of the ten algorithms designed for a test problem in one group is applied to each test problem in the other group 100 times. Experimental results are summarized in Fig. 12. We can observe from Fig. 12 that the designed algorithms for one of the last two test problems work well on the other test problem in Fig. 12e, f. That is, the designed algorithms for Levy (Rastrigin) work well on Rastrigin (Levy). However, those algorithms do not work well on the first four test problems in Fig. 12a-d. We can also see that the designed algorithms for one of the first four test problems work well on the other three test problems in Fig. 12a-d. Our experimental results show that the designed algorithms have a limited but high generalization ability to similar test problems.

Conclusion
We examined the performance of our offline meta-level approach to the design of IEC algorithms. The main feature of our approach is that a different operator is used to generate each solution. In the basic setting of our approach, an IEC algorithm is coded as a string of operators where the string length is the same as the number of solutions to be generated. We obtained promising results where efficient multi-point search algorithms were designed for non-linear test problems with many local minima. The designed algorithms seemed to adjust the diversity-convergence balance over 200 evaluations by frequently changing operators to generate new solutions. With respect to the frequency of operator change, we obtained similar results from the following three settings: the same solution generation operator was used to generate a single, five and ten solutions (Table 5). This observation suggests that we do not need to change operators to generate each solution. However, when we used the same operator to generate 50 solutions, we observed clear performance deterioration of designed algorithms. This observation suggests the need of a more frequent change of operators than every 50 solutions.
As expected, different algorithms were designed for different test problems. One common feature among all the designed algorithms was the use of randomly generated solutions in an early stage of evolution. We demonstrated that the performance of the (1 + 1 ) ES-style algorithm was improved by using randomly generated solutions in its first ten generations (Fig. 11). We also demonstrated that a designed algorithm for one test problem worked well on another test problem when they were similar to each other with respect to the shape of the fitness function (Fig. 12). This result suggests the possibility of designing a high-performance IEC algorithm for a real-world application problem if we have a similar test problem. a b c d e f Fig. 12 Examination of the generalization ability of the designed algorithms under the best setting in Table 5. Designed algorithms for a test problem in the left (right) column is applied to the other test problems in the right (left) column. a Sphere. b Rosenbrock. c Griewank. d Ackley. e Levy. f Rastrigin