Reverse network model is built based on BP network combined with complex nonlinear mapping and selflearning. An artificial network is simulated the real gene network according to the network characteristics: the average path length, average clustering coefficient, average degree, modularity, map’s density and et al.
Structure and algorithm of the BP neural network
The neural network is a computational model which originally was used in the simulation of the structure of biological neural network and used for other computational simulations lately, for example, evaluating the landslide susceptibility and predicting the liver injury (Sukumar et al. 2012; Rampone and Valente 2011). The algorithm of BP network we adopt in this paper has already been detailed before (Rampone and Valente 2011; Cao et al. 2016; Liu et al. 2016). The classical artificial neural network structure is a feed forward network (Fig. 1) with multiple layers consists of an input layer, an output layer and a hidden layer with different roles. Each neuron of a given layer is connected to all the neurons of the next one and each connecting line has an associated weight. There are three procedures to build the whole structure with three equations. First, neuron receives the weighted sum of the input patterns and/or of the other neuron outputs as an input.
$$o_{k} = f\left( {\sum\nolimits_{n} {w_{kn} o_{n}  b_{k} } } \right)$$
(1)
w
_{
kn
}: The weight from neuron n to neuron k; o
_{
n
}: the output of neuron n or of the nth input: b
_{
k
}: the neuron threshold.
The transfer function in our experiment is chosen the sigmoid function:
$$f({\text{x}}) = 1/(1 + {\text{e}}^{  x} )$$
(2)
The training procedure adopted BP algorithm. During the training the weights and biases of the network are iteratively adjusted to minimize the difference (error) when the output value isn’t equal to or less than the desired output, until the mean square error (MSE) of the system is minimized. The error E
^{p} of a given the pth pattern is calculated as
$$E^{p} = 1/2 \times \sum\limits_{j} {\left( {t_{j}^{P}  o_{j}^{P} } \right)}^{2}$$
(3)
t
^{p}_{
j
}
: the pth desired output value; o
^{p}_{
j
}
: the output of the corresponding neuron.
The rule of the error BP algorithm used in this study as following.

a.
Initialization: small random values are taken for weights of each layer and thresholds of each neuron, the max cycle times and the min whole error are set as m and ɛ respectively;

b.
Vector \(X^{{^{p} }} = \left[ {x_{0}^{P} ,x_{1}^{P} , \ldots ,x_{n  1}^{P} } \right]\) is the inputted data pattern, where P means the Pth pattern, and n is the number of neurons of initial layer, x
^{P}_{
i
}
is the input of the given hidden layer;

c.
The actual output of hidden layer \(O_{j}^{p} = f\left( {\sum\nolimits_{j = 0}^{n  1} {w_{ij} x_{ji}  B_{i} } } \right)\) is calculated and regarded as an input to the next layer, f is the activation function;

d.
If the layer is the last layer (output layer), the actual error E
^{P} is calculated as \(E^{P} = \sum _{j} \left( {t_{j}^{P}  o_{j}^{P} } \right)^{ 2}\), otherwise the error calculation as (c);

e.
The whole error E is calculated as \(E = 1/ 2 \times \sum\limits_{{_{P} }} {E^{P} }\);

f.
Weights are adjusted from the last layer and going backwards (BP),
$$W_{ji} ({\text{new}}) = W_{ji} ({\text{old}}) + \eta \times \delta_{j}^{p} o_{j}^{p} +\upalpha \times \text{(}W_{ji} ({\text{new}})  W_{ji} ({\text{old}})\text{)}$$
(4)
where η (0 < η < 1) and α (0 < α < 1) are constants named learning rate and momentum, respectively; η measures the influence degree of the error; α determines the influence of the weight change.
When neuron j is output layer neuron and hidden layer neuron, the error term for pattern p is \(\delta_{j}^{p} = f^{\prime }\left( {{\text{o}}_{j}^{p} } \right)\left( {{\text{t}}_{j}^{p}  {\text{o}}_{j}^{p} } \right)\) and \(\delta_{j}^{p} = f^{\prime }\left( {{\text{o}}_{j}^{p} } \right)\left( {\sum\nolimits_{k} {W_{kj} \times \delta_{k}^{p} } } \right)\) respectively;

g.
If the cycle time is m or the whole error is less than ɛ, train is over, otherwise go to (b).
The weight adjusting in the above algorithm is aimed at minimizing the whole error E, which is performed with the gradient descent via weight changing to make the error steepest down (Bishop 1996). The η term is a measure of the influence degree for updating weights in the formula, whereas the α term determines the influence of the past history of weight changes in the same formula. The singlelayer neural network structure is a twolayer network structure with input and output layer. It facilitates more easily the mapping from the trained BP network to the gene regulatory network, so we will use the singlelayer BP network in this paper.
Establishment of genes networks based on BP ANN
The architecture diagram of the proposed model is shown in Fig. 2. The model takes microarray data as input, and will be trained as described in flowchart: finding out the relationship between any one gene and other n − 1 genes, making adjacency matrix, building gene regulatory network and getting the final gene network according to the weight ratio λ. The training is carried on in each group respectively.
In the network, the gene is simplified as a node, the regulation is simplified as the connection between nodes (edge), and the gene regulatory network is composed of nodes set V and the set of edges between nodes in E:
$$G = \, \left( {V, \, E} \right)$$
Given that the adjacency matrix can be used to describe the relationships between nodes in a network, the topology of the network is represented by adjacency matrix A:
$$A_{n \times n} = \;\left[ {\begin{array}{*{20}c} 0 &\quad {a_{21} } &\quad \cdots &\quad {a_{1n} } \\ {a_{12} } &\quad 0 &\quad \cdots &\quad {a_{2n} } \\ \cdots &\quad \cdots &\quad \cdots &\quad \cdots \\ {a_{n1} } &\quad {a_{n1} } &\quad \cdots &\quad 0 \\ \end{array} } \right]$$
where a
_{
ij
} and a
_{
ji
} in A represents the regulation between gene i and j. Assuming that the state changes of genes in a real regulatory network mainly depend on the effect of other genes, the selfregulation of a gene can be ignored. Then, the diagonal elements in A are 0, i.e. a
_{
ij
} = 0. Two kinds of regulatory relationships exist between genes, activation and inhibition which indicated by the negative or positive value of a
_{
ij
}.
First, a singlelayer BP neural network with (n − 1) − 1 structure is adopted to construct the model (Fig. 3a). The n − 1 neurons in the input layer are used as the temporary storage for the input data corresponding to the n − 1 genes. The neuron in the output layer is corresponding to the nth genes. Selfcorrelation is not considered in this model which means network weight w
_{
ii
} doesn^{’}t exist. So, the input samples for the network are \(X^{p} = x_{1}^{p} , \cdots ,x_{n  1}^{p}\), and the target sample is \(T^{p} = x_{1}^{p}\). The training samples are {X
^{p}, T^{p}}, i.e., the input samples are the same as the target samples. The p training samples are inputted to the network to train the network until the error or the operating cycle reaches the set value. Therefore, a weight vector W
_{
i
} can be obtained. This process is reiterated form x
_{1} to x
_{
n
}, and then, a weight matrix W is obtained.
$$X = F\left( {{\text{W}}^{T} {\text{X}}} \right)$$
(5)
$$\left[ \begin{aligned} o_{1} \hfill \\ o_{2} \hfill \\ \cdots \hfill \\ o_{n} \hfill \\ \end{aligned} \right] = F\left( {\left[ {\begin{array}{*{20}c} 0 &\quad {w_{12} } &\quad \cdots &\quad {w_{1n} } \\ {w_{21} } &\quad 0 &\quad \cdots &\quad {w_{2n} } \\ \cdots &\quad \cdots &\quad \cdots &\quad \cdots \\ {w_{n1} } &\quad {w_{n2} } &\quad \cdots &\quad 0 \\ \end{array} } \right]\left[ \begin{aligned} x_{1} \hfill \\ x_{2} \hfill \\ \cdots \hfill \\ x_{n} \hfill \\ \end{aligned} \right]} \right)$$
\(o_{i} = f\left( {w_{1i} {\text{x}}_{1} + \cdots w_{ij} {\text{x}}_{j} + \cdots w_{ni} {\text{x}}_{n} } \right),{\text{j}}\;{ \ne }\;{\text{i}} .\) When the training error is very small [Eq. (4), E < 10^{−2}], o
_{i ≈}
x
_{
i
} can be considered; That is, any gene among the n genes can become a sigmoid function of the linear combination of the other n − 1 genes.
Second, the trained BP neural network is mapped into a gene regulatory network. The trained neural network is mapped into a directed gene regulatory network (Fig. 3b). A = W, weight \(w_{ij} \left( {{\text{i}}\;{ \ne }\;{\text{j}}} \right)\) denotes the edge weight from neuron i to neuron j in the gene regulatory network, whereas \(w_{ji} \left( {{\text{i}}\,{ \ne }\,{\text{j}}} \right)\) denotes the edge weight from neuron j to neuron i.
Third, a reasonable weight threshold is selected to choose the highly relevant genes and finally determine the gene regulatory network. The process of choosing threshold is very important. The threshold can be set according to the weight ratio \(\lambda = {{\left {w_{ij} } \right} \mathord{\left/ {\vphantom {{\left {w_{ij} } \right} {\sum\nolimits_{i} {\left {w_{ij} } \right} }}} \right. \kern0pt} {\sum\nolimits_{i} {\left {w_{ij} } \right} }}\) or by other methods. In the regulatory network, the edges with weight ratio which is less than the threshold are removed to obtain the final gene regulatory network. The weight ratios corresponding to weight w
_{1i} and w
_{21} are assumed to be less than the threshold and are deleted, and then the corresponding edges in the regulatory network are also removed (Fig. 3c).
Structural parameters of network
The network statistics used to describe the network structure are briefly explained in this section. G = (V, E) is assumed to be a complex network with node set V = {1,2,…N} and edge set E. The parameters are determined according to the network statistics as following:

1.
Average path length (L)
The distance d
_{
ij
} between nodes i and j in V is defined as the minimal number of edges connecting nodes i and j. The average path length of the network is defined as \(L = \sum\nolimits_{i > j} {{{d_{ij} }/ {\left( {0.5{\text{N(N}}  1)} \right)}}}\).

2.
Average clustering coefficient (C)
The clustering coefficient C
_{
i
} of node I is the ratio of the number of actually existing edges to that of possible existing edges among the adjacent nodes of i. The clustering coefficient of the network C is the average of the clustering coefficients of all nodes.

3.
Average degree (K)
The degree of a node is the number of other nodes connecting to this node. The average degree K of the network is the average of degrees of all nodes.

4.
Modularity (Q)
Network G is supposed to contain k communities as G_{1}, G_{2}, …G_{k}. A symmetric matrix H = (h_{
ij
})_{
k × k
} is defined, where h
_{
ij
} denotes the ratio of the number of edges between two communities G
_{
i
} and G
_{
j
} to the number of total edges of the network.
Modularity is defined as
$$Q = \sum\nolimits_{i} {Q_{i} } = \sum\nolimits_{i} {\left( {h_{ij}  \alpha_{ij}^{2} } \right)}.$$
where α
_{
i
} denotes the sum of elements in the ith row of matrix H, which represents the ratio of the number of edges connecting to community G
_{
i
} to the number of total edges.

5.
Density of map (D)
Density of map is the ratio of the total path length to the area of the map.