- Methodology
- Open Access
Adaption of the temporal correlation coefficient calculation for temporal networks (applied to a real-world pig trade network)
- Kathrin Büttner^{1}Email author,
- Jennifer Salau^{1} and
- Joachim Krieter^{1}
- Received: 18 May 2015
- Accepted: 12 February 2016
- Published: 24 February 2016
Abstract
The average topological overlap of two graphs of two consecutive time steps measures the amount of changes in the edge configuration between the two snapshots. This value has to be zero if the edge configuration changes completely and one if the two consecutive graphs are identical. Current methods depend on the number of nodes in the network or on the maximal number of connected nodes in the consecutive time steps. In the first case, this methodology breaks down if there are nodes with no edges. In the second case, it fails if the maximal number of active nodes is larger than the maximal number of connected nodes. In the following, an adaption of the calculation of the temporal correlation coefficient and of the topological overlap of the graph between two consecutive time steps is presented, which shows the expected behaviour mentioned above. The newly proposed adaption uses the maximal number of active nodes, i.e. the number of nodes with at least one edge, for the calculation of the topological overlap. The three methods were compared with the help of vivid example networks to reveal the differences between the proposed notations. Furthermore, these three calculation methods were applied to a real-world network of animal movements in order to detect influences of the network structure on the outcome of the different methods.
Keywords
- Temporal network
- Temporal correlation coefficient
- Topological overlap
- Pig trade network
Background
In contrast to the static situation, the time when edges are active and especially the chronological order of contacts play an important role in temporal networks. Both are essential elements for the representation of these dynamical systems (Holme and Saramäki 2012). In previous studies dealing with network analysis, the temporal information has been partly neglected by an aggregation of contacts over specific observation windows, which have been analysed separately (examples of animal trade networks are Bajardi et al. 2011; Büttner et al. 2015; Dubé et al. 2011; Nöremark et al. 2011; Rautureau et al. 2011; Vernon and Keeling 2009). Even in cases where the temporal information was available, this aggregation was performed due to the fact that the methodological framework for the analysis of temporal networks is still in its infancy (Nicosia et al. 2013; Masuda and Holme 2013). However, recently, new methods for the analysis of temporal networks have been developed or methods of the static network analysis have been adapted to temporal systems. Examples are the newly proposed parameters causal fidelity by Lentz et al. (2013) or the temporal correlation coefficient, which was derived from the local clustering coefficient of static networks (Nicosia et al. 2013; Tang et al. 2010). In the case of the temporal correlation coefficient, the novelty of the temporal network analysis and the fact that its methodologies are still under development becomes obvious. Here, Pigott and Herrera (2014) presented a possible correction for the calculation of the temporal correlation coefficient proposed by Nicosia et al. (2013). The temporal correlation coefficient (hereinafter abbreviated C) is a measure of the overall average probability for an edge to persist across two consecutive time steps (Nicosia et al. 2013; Tang et al. 2010). For the calculation of the temporal correlation coefficient, the average topological overlaps of the graph which measures the amount of changes in the edge configuration between two consecutive time steps are determined. The values for the average topological overlap range between zero and one, whereby zero and one indicate that the edge configuration of the two consecutive graphs is completely different or has not changed at all, respectively. Current methods depend on the number of nodes in the network (Nicosia et al. 2013), hereinafter referred to as Method 1, or on the maximal number of connected nodes in the consecutive time steps, hereinafter referred to as Method 2. Method 1 fails to deliver the value of one for identical consecutive graphs if there are nodes with no edges (Pigott and Herrera 2014), and Method 2 delivers values greater than one if the maximal number of nodes with at least one edge is greater than the maximal size of the greatest connected component in the two consecutive graphs. The newly proposed adaption, hereinafter referred to as Method 3, uses the maximal number of active nodes, i.e. the number of nodes with at least one edge, for the calculation of the topological overlap. This article provides small, comprehensible examples of graphs, where the results of the temporal correlation coefficient differ between the three methods. Additionally, using all three methods, the average topological overlaps were calculated for a real-world network describing animal movements. Influences of the network structure on the differences between methods were statistically analysed.
Methods
In the first part of the materials and methods section, the individual calculation steps of the temporal correlation coefficient are introduced, followed by a summary of the previous proposals and the adaption presented in this article with the help of vivid example networks. In the fifth part of the materials and methods section, the convergence behaviour of the three methods is compared, followed by a real-world example of a trade network of a pork supply chain.
Temporal correlation coefficient
The values of all three calculation steps range between zero and one, with one indicating that there is a complete match of the edge configuration and zero if none the same edges is shared.
Method 1: original calculation by Nicosia et al. (2013)
1st step: calculation of \(C_{i} \left( {t_{m} , t_{m + 1} } \right)\)
Compare Eq. (1) in 2.1.
2nd step: calculation of \(C_{m}\)
3rd step: calculation of \(C\)
The summation over all possible C _{ m } gives the temporal correlation coefficient C, compare Eq. (2) in 2.1.
According to Nicosia et al. (2013), C _{ m } = 1 if and only if the two graphs of the two consecutive time steps t _{ m } and t _{ m+1} have exactly the same configuration of edges. C _{ m } = 0 if the two graphs do not share any edges. This claim is only true if all N nodes considered in the calculation have at least one edge (Pigott and Herrera 2014), i.e. are active. However, this is not applicable for networks containing unconnected nodes, since for these graphs the correlation between two snapshots is underestimated.
Method 2: proposed correction by Pigott and Herrera (2014)
However, if the maximal number of active nodes is higher than the maximal number of connected nodes, the proposed correction leads to an overestimation of the average topological overlap (C _{ m } > 1).
Method 3: adaption of the calculation of the temporal correlation coefficient
Convergence behaviour of the temporal correlation coefficient in the three example networks
In order to reveal the convergence behaviour of the three presented methods, the last snapshot, i.e. the graph at t _{ M } of the example networks, was repeatedly attached to the existing time series until the length of the series equalled 100. For all \(m = 1, \ldots , M - 1\) an average topological overlap C _{ m } ≤ 1 is expected. Due to the fact that the following graphs are identical to the snapshots at t _{ M }, all the following values for the average topological overlap equal 1. Therefore, this identical extension of the time series should show a convergence of the temporal correlation coefficient to one.
Real-world example: pig trade network of a producer community in Northern Germany
Data basis
Pig movement data from a producer community in Northern Germany were recorded in an observation period from 1st June 2006 to 31st May 2009. The date of the movements, the supplier, the purchaser as well as the batch size and the type and age group of the delivered livestock were recorded. The holdings are represented by the nodes of the network and the edges illustrate the animal movements between them. In total, the data contained 4635 animal movements between 483 holdings.
Construction of networks with different time window lengths
In order to assess the influence of the chosen time window length on the results of the temporal correlation coefficient, time windows with increasing lengths were generated from 1 to 548 days. This implies that 1096 snapshots of the network were constructed for the time window length of 1 day, there were 548 snapshots for the time window length of 2 days, and finally there were only 2 snapshots in which the edge configuration can be compared for the time window length of 548 days. An incomplete time window remains to aggregate contacts for the last snapshot for time window lengths that are not proper divisors of 1096. Snapshots resulting from incomplete time windows were ignored in the analysis. For each time window length, the topological overlap of each two consecutive time steps were calculated using all three methods presented in “ Method 1: original calculation by Nicosia et al. (2013)”, “ Method 2: proposed correction by Pigott and Herrera (2014)” and “ Method 3: Adaption of the calculation of the temporal correlation coefficient” sections. These were afterwards summarized to the temporal correlation coefficient for each time window length.
Statistical analysis
Main effects used for the analysis of variance
Effect | Group boundaries | Group size |
---|---|---|
TWL—length of the time window chosen to analyse the development of the graph over time | TWL = 1 | 1096 |
2 ≤ TWL ≤ 4 | 1184 | |
5 ≤ TWL ≤ 12 | 1106 | |
13 ≤ TWL 35 | 1110 | |
36 ≤ TWL ≤ 105 | 1080 | |
TWL ≥ 106 | 1166 | |
Mean number—arithmetic mean of the number of connected components (containing more than one node) between two consecutive time steps | Mean number ≤ 4 | 2177 |
5 ≤ Mean number ≤ 11 | 2324 | |
Mean number ≥ 12 | 2248 | |
Mean size—arithmetic mean of the average sizes of all connected components containing more than one node between two consecutive time steps | Mean size ≤ 3 | 1830 |
3 < Mean size ≤ 4.5 | 1692 | |
4.5 < Mean size ≤ 23 | 1569 | |
Mean size > 23 | 1658 | |
Mean edges—arithmetic mean between the number of edges between two consecutive time steps | Mean edges ≤ 20 | 2327 |
21 ≤ Mean edges ≤ 125 | 2134 | |
Mean edges ≥ 126 | 2288 | |
Mean first—arithmetic mean of the sizes of the largest connected components between two consecutive time steps | Mean first ≤ 7 | 2235 |
8 ≤ Mean first ≤ 60 | 2228 | |
Mean first ≥ 61 | 2286 | |
Mean active-first—arithmetic mean of the differences between active nodes and the size of the largest network component between two consecutive time steps | Mean active-first ≤ 8 | 2262 |
9 ≤ Mean active-first ≤ 35 | 2223 | |
Mean active-first ≥ 36 | 2264 |
Results
Comparison between the different methods based on vivid example networks
In the following, some general network examples are illustrated to reveal the differences between the three methods described above. For the example networks presented in Pigott and Herrera (2014), no differences between Method 2 and Method 3 could be obtained. Therefore, new example networks are presented in this article to identify the issues with the previous proposed formulas.
Time series without isolated nodes and identical unconnected components of equal size
Calculation of the temporal correlation coefficient C for time series without isolated nodes and identical unconnected graphs of equal size
Snapshots | 1st calculation step | 2nd calculation step | 3rd calculation step |
---|---|---|---|
\(t_{m} , t_{m + 1}\) | \(C_{i = 1} \left( {t_{m} , t_{m + 1} } \right) = \frac{1}{\sqrt 3 }\) | Method 1: \(C_{m} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) Method 2: \(C_{m} = \frac{1}{{\text{max} \left[ {N\left( {t_{m} } \right), N\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) Method 3: \(C_{m} = \frac{1}{{\text{max} \left[ {A\left( {t_{m} } \right), A\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) | Method 1: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.70\) Method 2: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 1.20\) Method 3: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.70\) |
\(C_{i = 2} \left( {t_{m} , t_{m + 1} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 4} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(t_{m + 1} , t_{m + 2}\) | \(C_{i = 1} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | Method 1: \(C_{m + 1} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) Method 2: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {N\left( {t_{m + 1} } \right), N\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 2\) Method 3: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {A\left( {t_{m + 1} } \right), A\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |
\(C_{i = 2} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 4} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) |
Time series with identical unconnected components of equal size and isolated node
Calculation of the temporal correlation coefficient C for time series with identical unconnected components of equal size and isolated node
Snapshots | 1st calculation step | 2nd calculation step | 3rd calculation step |
---|---|---|---|
\(t_{m} , t_{m + 1}\) | \(C_{i = 1} \left( {t_{m} , t_{m + 1} } \right) = \frac{1}{\sqrt 3 }\) | Method 1: \(C_{m} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.32\) Method 2: \(C_{m} = \frac{1}{{\text{max} \left[ {N\left( {t_{m} } \right), N\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) Method 3: \(C_{m} = \frac{1}{{\text{max} \left[ {A\left( {t_{m} } \right), A\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) | Method 1: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.56\) Method 2: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 1.20\) Method 3: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.70\) |
\(C_{i = 2} \left( {t_{m} , t_{m + 1} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 4} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 5} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(t_{m + 1} , t_{m + 2}\) | \(C_{i = 1} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | Method 1: \(C_{m + 1} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 0.80\) Method 2: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {N\left( {t_{m + 1} } \right), N\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 2\) Method 3: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {A\left( {t_{m + 1} } \right), A\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |
\(C_{i = 2} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 4} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 5} \left( {t_{m + 1} , t_{m + 2} } \right) = 0\) |
Time series with identical unconnected components of different sizes and isolated nodes
Calculation of the temporal correlation coefficient C for time series with identical unconnected components of different sizes and isolated nodes
Snapshots | 1st calculation step | 2nd calculation step | 3rd calculation step |
---|---|---|---|
\(t_{m} , t_{m + 1}\) | \(C_{i = 1} \left( {t_{m} , t_{m + 1} } \right) = \frac{1}{\sqrt 3 }\) | Method 1: \(C_{m} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.23\) Method 2: \(C_{m} = \frac{1}{{\text{max} \left[ {N\left( {t_{m} } \right), N\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.39\) Method 3: \(C_{m} = \frac{1}{{\text{max} \left[ {A\left( {t_{m} } \right), A\left( {t_{m + 1} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m} , t_{m + 1} } \right) \approx 0.32\) | Method 1: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.47\) Method 2: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 1.03\) Method 3: \(C = \frac{1}{M - 1}\mathop \sum \nolimits_{m}^{M - 1} C_{m} \approx 0.66\) |
\(C_{i = 2} \left( {t_{m} , t_{m + 1} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 4} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 5} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 6} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(C_{i = 7} \left( {t_{m} , t_{m + 1} } \right) = 0\) | |||
\(t_{m + 1} , t_{m + 2}\) | \(C_{i = 1} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | Method 1: \(C_{m + 1} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) \approx 0.71\) Method 2: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {N\left( {t_{m + 1} } \right), N\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) \approx 1.67\) Method 3: \(C_{m + 1} = \frac{1}{{\text{max} \left[ {A\left( {t_{m + 1} } \right), A\left( {t_{m + 2} } \right)} \right]}}\mathop \sum \nolimits_{i = 1}^{N} C_{i} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |
\(C_{i = 2} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 3} \left( {t_{m + 1} , t_{m + 2} } \right) = 0\) | |||
\(C_{i = 4} \left( {t_{m + 1} , t_{m + 2} } \right) = 0\) | |||
\(C_{i = 5} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 6} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) | |||
\(C_{i = 7} \left( {t_{m + 1} , t_{m + 2} } \right) = 1\) |
Convergence behaviour of the temporal correlation coefficient in the three example networks
For the example network of Fig. 2, Method 1 showed the same results as the newly proposed Method 3, since the maximal number of active nodes equalled the maximal number of all nodes in the network. Therefore, only differences for the example networks of Figs. 3 and 4 between Method 1 and Method 3 could be revealed. Here, the temporal correlation coefficient converged towards the fraction of active nodes in the added identical snapshots (Pigott and Herrera 2014), which is 0.8 or 0.71, respectively, with regard to the example networks of Figs. 3 and 4.
For all three example networks, Method 2 showed values larger than one for M ≥ 3. Method 3 shows in all three example networks a convergence towards 1, which corresponds to the expected behaviour of the temporal correlation coefficient.
Estimates of the distortions between methods
Averaged estimate errors for the topological overlap
Lower and upper boundaries for estimate errors in temporal correlation coefficients
Real-world network: trade network of a pork supply chain
Descriptive statistics
Descriptive statistics of the topological overlap values for the three different methods
Method 1 | Method 2 | Method 3 | |
---|---|---|---|
N | 6749 | 6749 | 6749 |
Min | 0 | 0 | 0 |
Max | 0.36 | 1.72 | 0.69 |
Mean | 0.10 | 0.39 | 0.24 |
Variance | 0.02 | 0.13 | 0.06 |
Skewness | 0.76 | 0.36 | 0.40 |
Kurtosis | 1.85 | 2.08 | 1.51 |
For time window lengths above 1 day (corresponding to observations number 1097 and higher), the values for the topological overlap obtained from Method 2 and Method 3 showed increasing behaviour up to a time window length of 53 days, which corresponds to observation number 4900 (Fig. 6). For larger time window lengths, the topological overlap values decreased again. In contrast, the values obtained from Method 1 increased until approximately observation 6200. For both Method 1 and Method 3, rising variation could be observed until observation 4900 in Fig. 6. In contrast to this, the variation of Method 2 was reduced from that moment. Additionally, the results obtained from Method 1 and Method 3 remained in [0, 1] defined for the topological overlap, whereas the results calculated with Method 2 exceeded the predefined upper limit of this parameter.
Descriptive statistics of the differences between the topological overlap values of the three methods
Method 2 − Method 1 | Method 3 − Method 1 | Method 2 − Method 3 | |
---|---|---|---|
N | 6749 | 6749 | 6749 |
Min | 0 | 0 | 0 |
Max | 1.66 | 0.42 | 1.42 |
Mean | 0.29 | 0.14 | 0.16 |
Variance | 0.10 | 0.02 | 0.06 |
Skewness | 0.97 | 0.36 | 1.65 |
Kurtosis | 3.22 | 1.60 | 4.89 |
Analysis of variance
As the additional interaction effect between Mean number and Mean first (see Table 1) has no influence on the models’ coefficients of determination, the results are restricted to the models including only linear effects.
Differences of the topological overlap between Method 2 and Method 1
The results of the analysis of variance using a linear model showed that all six main effects had a significant influence on the differences between the topological overlap values of Method 2 and Method 1 (p < 0.05). The model explained 82.4 % of the total variance (coefficient of determination). For the single main effects, most of the variance was explained by the time window length (effect size = 0.053), followed by the mean of the differences between active nodes and the size of the largest network component between two consecutive time steps (Mean active-first, see Table 1; effect size = 0.017) and the mean of the sizes of the largest network components between two consecutive time steps (Mean first, see Table 1; effect size = 0.016).
Differences of the topological overlap between Method 3 and Method 1
The results of the analysis of variance using a linear model showed that all six main effects had a significant influence on the difference between the topological overlap values of Method 3 and Method 1 (p < 0.05). The model explained 91.7 % of the total variance. For the single main effects, most of the variance was explained by the time window length (effect size = 0.039), followed by the arithmetic mean of the average sizes of all connected components containing more than one node (Mean size, see Table 1; effect size = 0.004) and Mean active-first (effect size = 0.004).
Differences of the topological overlap between Method 2 and Method 3
The results of the analysis of variance using a linear model showed that all six main effects had a significant influence on the difference between the topological overlap values of Method 2 and Method 3 (p < 0.001). The model explained 77.9 % of the total variance. For the single main effects, most of the variance was explained by the time window length (effect size = 0.044), followed by Mean size (see Table 1; effect size = 0.038) and Mean active-first (see Table 1; effect size = 0.020).
Discussion
The intention of this article was to eliminate uncertainties for the calculation of the topological overlap and the temporal correlation coefficient proposed by Nicosia et al. (2013) and its extension proposed by Pigott and Herrera (2014) and to give clear definitions of the network parameters used for their calculations. Therefore, we proposed comprehensive example networks which included more possible network configurations (e.g. the network contained more than one network component with more than one node) than the example networks included in Pigott and Herrera (2014). Additionally, we introduced the results of the topological overlap of a real-world network of animal movements, which revealed the problems of the previous formulas. The influences of the network structure on the outcome of the different methods were analysed with the help of this trade network.
Expected behaviour of the topological overlap and the temporal correlation coefficient
Since the topological overlap represents the probability for edges to persist across two consecutive time steps and the temporal correlation coefficient is the average over all topological overlap values, both should range between 0 and 1. Thus, values above the upper limit of one cannot be interpreted. The present article shows that only the results obtained from Method 1 and Method 3 remained in [0,1], whereas the results calculated with Method 2 exceeded the predefined upper limit of this range. This becomes obvious for the small example networks as well as for the real-world trade network. Additionally, the fact that values greater than one were determined for Method 2 suggests that also the values in the expected range overestimated the real topological overlap and, therefore, led to invalid results. Similarly, Method 1 converged towards a value smaller than one in Fig. 5b, c, where the maximal number of connected nodes did not equal the maximal number of active nodes. Here, the possible topological overlap and the temporal correlation coefficient were underestimated. A detailed discussion of the estimates of the distortions between the three methods is given in the following paragraph.
Estimates of the distortions between methods
Given the presence of isolated (i.e. not active) nodes in one of the snapshots t _{ m } or t _{ m+1}, the originally proposed Method 1 systematically outputs a smaller topological overlap between those network snapshots than both recently proposed methods. This was e.g. illustrated in the Calculation of C _{ m } associated with the example network of Fig. 4. The ratios in Eq. (7) are always smaller or equal to one and quantify the underestimation in the average topological overlap values for the time step from t _{ m } to t _{ m+1} obtained from Method 1 in comparison to Method 2 and Method 3 for a fixed \(m = 1, \ldots , M - 1\). Consequently, the right side of Eq. (8) states the averaged underestimation concerning the topological overlap caused by Method 1 compared to Method 2 over time. A similar estimation can be found in Pigott and Herrera (2014). Respectively, the topological overlap is averagely underestimated using Method 1 compared to the newly proposed Method 3 by the fraction \(\frac{{\mathop {\text{mean}}\nolimits_{m \le M - 1} \left( {\text{max} \left[ {A\left( {t_{m} } \right),A\left( {t_{m + 1} } \right)} \right]} \right)}}{N} \le 1\).
If the maximal number of connected nodes \(\hbox{max} \left[ {N\left( {t_{m} } \right),N\left( {t_{m + 1} } \right)} \right]\) is not equal to the maximal number of active nodes \(\hbox{max} \left[ {A\left( {t_{m} } \right),A\left( {t_{m + 1} } \right)} \right]\) for a fixed \(m = 1, \ldots , M - 1\), the distortion in C _{ m } between Method 2 and Method 3 is represented by the fraction \(\frac{{\text{max} \left[ {N\left( {t_{m} } \right),N\left( {t_{m + 1} } \right)} \right]}}{{\text{max} \left[ {A\left( {t_{m} } \right),A\left( {t_{m + 1} } \right)} \right]}} \ge 1\). This is underpinned by calculations for the example network of Fig. 2. Here C _{ m+1} = 2 and C _{ m+1} = 1 when obtained from Method 2, respectively, Method 3, whilst max [N(t _{ m+1}), N(t _{ m+2})] = 4 and max [A(t _{ m+1}), A(t _{ m+2})] = 2.
As the average topological overlap C _{ m } has no explanatory power concerning the complete dynamic network, the distortions between methods in temporal correlation coefficient C should be considered in addition. Due to the double sum in the formula to calculate C, less transformation with equality sign is possible, but estimations are necessary. The inequalities (9)–(11) give upper and lower boundaries using characteristics of the network, as maximal and minimal values of max [N(t _{ m+1}), N(t _{ m+2})] and max [A(t _{ m+1}), A(t _{ m+2})] over time. They might provide a valuable tool in assessing the distortion connected to the usage of the different methods.
Real-world network: trade network of a pork supply chain
For the pig trade network, the results of the topological overlap values showed for Method 2 a completely different behaviour than for Method 1 and Method 3 (Fig. 6). For Method 2, the topological overlap values varied over a huge range until observation 4900. This can be explained by the variation in the differences between the maximal number of connected and the maximal number of active nodes. These differences became smaller with increasing time window length, since for larger time window length the network formed larger network components which included the majority of the nodes. Thus, the differences between the maximal number of connected and active nodes decreased, which resulted in a smaller variation.
Results in analysis of variance
With regard to the real-world example given by the described pig trade network, the differences of C _{ m } between methods (Method 2 − Method 1, Method 2 − Method 3, Method 3 − Method 1) were analysed with linear models containing six categorical variables chosen from the characteristics of the underlying network. The goal was to analyse the impact of the network structure on the differences in methods. As—except for the time window length—two snapshots are needed to calculate C _{ m }, the categorical variables are determined as the characteristics’ mean value between two consecutive snapshots. The models used successfully explained the variance in the target variables, as coefficients of determination ranged from 0.78 to 0.92. All six chosen effects were significant in all three cases, but the time window length was the strongest effect in all three considered differences and showed medium effect sizes from 0.038 to 0.055 (Cohen 1988). The remaining effects used the number and size of connected components or the total number of edges in the snapshots at t _{ m } and t _{ m+1}. When the time windows for the aggregation of pig trade activities became longer, more edges and fewer but larger connected components are to be expected in the snapshots, but significant interaction effects between time window length and the remaining categorical variables have to be excluded in advance. The effect Mean active-first categorises the difference “size of the largest connected component − number of active nodes” averaged between the two considered snapshots. It was to be expected that its effect size was medium concerning Method 2 − Method 3 and only small for the other two target variables since these methods differ exactly in the terms max [N(t _{ m+1}), N(t _{ m+2})] and max [A(t _{ m+1}), A(t _{ m+2})].
General aspects
The description of temporal networks as well as the analysis of their structural characteristics is still under development (Nicosia et al. 2013). Therefore, there is still a lack of appropriate methods which help to analyse how the structure of temporal networks influences the dynamics of processes occurring on it, such as disease transmission. Furthermore, the question which characteristics of the network impact the dynamics is still not fully answered. Konschake et al. (2013) investigated the structural dynamics of a pig trade network and found that time-independent node centrality has to be treated with caution, whereas the stationary sampling of the nodes is still applicable for the network under representation. They also stated that similar results are expected for other pig trade networks since the processes in the pork supply chain are highly standardized and industrialized. A further issue, which was revealed in the present study, is the choice of an appropriate time window length. Also Clauset and Eagle (2012) stated, that the choice of the time window length effectively determines many of the statistical properties of the resulting network and that an incorrect choice may impose a strong bias on the resulting analysis and conclusion. Additionally, they could show that a time window length which displays the natural periodicity of the system should be chosen which depends on the interactions under investigation. For a pig trade network, Lentz et al. (2013) showed a periodical pattern of 180 days which represents the biological properties of pig production from farrowing to abattoir. Also Valdano et al. (2015) stated that the extent of the time window length may affect the prediction of the epidemic threshold and the spreading potential within a temporal network. Furthermore, their study confirmed the findings from other investigations that the network’s typical timescale and the temporal variability of its structure should definitely be considered for the analysis of dynamic systems. Therefore, the static aggregation of temporal networks should be treated with caution due to the fact that this approach neglects the temporal variation in the system which is of special importance for the analysis of the speed and the extent of infectious diseases (Kempe et al. 2002; Holme and Saramäki 2012; Tantipathananandh et al. 2007). To sum up, regarding the yet known dependencies and issues dealing with temporal network analysis, a measure like the temporal correlation coefficient which evaluates the consistency of the edge configuration could help to understand the structural dynamics of temporal networks.
Conclusion
In this study, an adaption for a method to calculate the average topological overlap C _{ m } between two consecutive snapshots of a dynamic network was proposed and compared to the original method and another recently proposed adaption. The methods differ in the kind of nodes used to average the changes in edge configuration. The numerical differences between the methods were demonstrated using several small and clearly arranged example networks, and analytical estimations were given as well. A pig trade network was introduced and statistically analysed as a real-world example. The newly proposed Method 3 uses the maximal number of active nodes in two consecutive snapshots. Solely for Method 3, the temporal correlation coefficient shows convergence behaviour towards one and, additionally, the values for the topological overlap equals one (C _{ m } = 1) in cases where consecutive snapshots are identical with regard to all given examples. Both are expected behaviours for a measure of temporal correlation between graphs.
Declarations
Authors’ contributions
KB, JS and JK designed the study. KB participated in data analysis and drafted the manuscript; JS participated in data analysis and carried out the statistical analyses. All authors read and approved the final manuscript.
Acknowledgements
We gratefully acknowledge funding by the German Research Foundation (DFG).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Bajardi P, Barrat A, Natale F, Savini L, Colizza V (2011) Dynamical patterns of cattle trade movements. PLoS One 6(5):e19869View ArticleGoogle Scholar
- Büttner K, Krieter J, Traulsen I (2015) Characterization of contact structures for the spread of infectious diseases in a pork supply chain in northern germany by dynamic network analysis of yearly and monthly networks. Transbound Emerg Dis 62(2):188–199. doi:10.1111/tbed.12106 View ArticleGoogle Scholar
- Clauset A, Eagle N (2012) Persistence and periodicity in a dynamic proximity network. arXiv preprint arXiv:12117343
- Cohen J (1988) Statistical power analysis for the behavioral sciences, vol 2. Lawrence Erlbaum Associates, Publishers, HillsdaleGoogle Scholar
- Dubé C, Ribble C, Kelton D, McNab B (2011) Estimating potential epidemic size following introduction of a long-incubation disease in scale-free connected networks of milking-cow movements in Ontario, Canada. Prev Vet Med 99(2–4):102–111View ArticleGoogle Scholar
- Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125. doi:10.1016/j.physrep.2012.03.001 View ArticleGoogle Scholar
- Kempe D, Kleinberg J, Kumar A (2002) Connectivity and inference problems for temporal networks. J Comput Syst Sci 64(4):820–842. doi:10.1006/jcss.2002.1829 View ArticleGoogle Scholar
- Konschake M, Lentz HHK, Conraths FJ, Hövel P, Selhorst T (2013) On the robustness of in- and out-components in a temporal network. PLoS One 8(2):e55223View ArticleGoogle Scholar
- Lentz HHK, Selhorst T, Sokolov IM (2013) Unfolding accessibility provides a macroscopic approach to temporal networks. Phys Rev Lett 110(11):118701View ArticleGoogle Scholar
- Masuda N, Holme P (2013) Predicting and controlling infectious disease epidemics using temporal networks. F1000Prime Rep 5:6View ArticleGoogle Scholar
- MATLAB (2015) Statistics and machine learning toolbox™ user’s guide (version 2014a). The MathWorks Inc., NatickGoogle Scholar
- Nicosia V, Tang J, Mascolo C, Musolesi M, Russo G, Latora V (2013) Graph metrics for temporal networks. In: Holme P, Saramäki J (eds) Temporal networks. Springer, Berlin Heidelberg, pp 15–40View ArticleGoogle Scholar
- Nöremark M, Hakansson N, Lewerin SS, Lindberg A, Jonsson A (2011) Network analysis of cattle and pig movements in Sweden: measures relevant for disease control and risk based surveillance. Prev Vet Med 99(2–4):78–90View ArticleGoogle Scholar
- Pigott F, Herrera M (2014) Proposal for a correction to the temporal correlation coefficient calculation for temporal networks. arXiv preprint arXiv:14031104
- Rautureau S, Dufour B, Durand B (2011) Structural vulnerability of the French swine industry trade network to the spread of infectious diseases. Animal 6(07):1152–1162. doi:10.1017/S1751731111002631 View ArticleGoogle Scholar
- Tang J, Scellato S, Musolesi M, Mascolo C, Latora V (2010) Small-world behavior in time-varying graphs. Phys Rev E 81(5):055101View ArticleGoogle Scholar
- Tantipathananandh C, Berge-Wolf T, Kempe D (2007) A framework for community identification in dynamic social networks. In: Paper presented at the Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, California, USAGoogle Scholar
- Valdano E, Ferreri L, Poletto C, Colizza V (2015) Analytical computation of the epidemic threshold on temporal networks. Phys Rev X 5(2):021005Google Scholar
- Vernon MC, Keeling MJ (2009) Representing the UK’s cattle herd as static and dynamic networks. Proc R Soc B 276(1656):469–476. doi:10.1098/rspb.2008.1009 View ArticleGoogle Scholar