Open Access

Mining precise cause and effect rules in large time series data of socio-economic indicators

SpringerPlus20165:1625

https://doi.org/10.1186/s40064-016-3292-0

Received: 16 March 2016

Accepted: 11 September 2016

Published: 21 September 2016

Abstract

Discovery of cause–effect relationships, particularly in large databases of time-series is challenging because of continuous data of different characteristics and complex lagged relationships. In this paper, we have proposed a novel approach, to extract cause–effect relationships in large time series data set of socioeconomic indicators. The method enhances the scope of relationship discovery to cause–effect relationships by identifying multiple causal structures such as binary, transitive, many to one and cyclic. We use temporal association and temporal odds ratio to exclude noncausal association and to ensure the high reliability of discovered causal rules. We assess the method with both synthetic and real-world datasets. Our proposed method will help to build quantitative models to analyze socioeconomic processes by generating a precise cause–effect relationship between different economic indicators. The outcome shows that the proposed method can effectively discover existing causality structure in large time series databases.

Keywords

Data miningCause–effect relationshipsCausalityTemporal associationTemporal odds ratio

Background

A system such as mechanical, biological or social-economic system consists of independent components. These components influence one another to maintain their activity for the existence of a system in order to achieve the goal of the system. The system changes behavior when a component is changed or removed significantly. This motivates us to find the reason or cause behind fault and discover the cause parameters in explaining the interactions among the components of a system or process. The causal discovery indicates not only that the indicators are correlated, but also how changing a cause variable is expected to induce a change in an effect variable. For example, with analyzed cause–effect relationships, we can predict potential effects before taking any actions (causes), which is useful in preventing inaccurate decision or policy making in the social-economical system. Time series data can be used to extract delayed relationship between two variables, for example, “CO2 emission occurring at a place might cause air pollution at another place after some delay”. These lagged relationships signify the time lag between the cause–effect parameters. Identifying lagged relationships between socioeconomic processes is challenging due to the presence of various complex dependencies in the data. This dependency among the various parameters has enabled us to identify relationships among different domain parameters in time series data (Madsen 2007; Geweke 1984). The cause–effect relationship for time series prediction is a step towards extracting the various existing causal relations between different domain, such as employment, education, agriculture and rural development etc. Causal discovery has been used in various fields with great success as bioinformatics (Needham et al. 2007), biology (Shipley 2002), earth sciences, etc. to identify protein interactions (Sachs et al. 2005; Chen et al. 2010), gene regulatory networks (Pinna et al. 2010; Friedman et al. 2007) and to study atmospheric teleconnections (Chu et al. 2005). It has also emerged in economics and social sciences (Spirtes et al. 2000; Neapolitan 2004) such as to improve the economic development (Easterly and Levine 2003) and growth (Asafu-Adjaye 2000) of a country and to study the impact of climate change (Ebert-Uphoff and Deng 2014; Deng and Ebert-Uphoff 2014). Before describing the proposed method to extract various causal rules, we explain the following example (Fig. 1) to show the motivation of our research.
Fig. 1

Causal relationships

Suppose we have set of indicators such as exercise, weight, diseases, calcium, alcohol, and bone growth etc. Various causal relationships can exists among them. An indicator may affect other instantly or after some time. For example, if a person takes alcohol he may feel a lack of energy (lethargy) instantly or after some time (Fig. 1a). If he takes alcohol frequently, the changes can be observed and it can be concluded that alcohol is one of the causes behind tiredness. We could identify the time between alcohol was taken and occurrence of lethargy and can also identify the amount of alcohol dose tends to cause the lethargy. More relationship like transitive can be analyzed between set of indicators (shown in Fig. 1), such as lack of exercise increases weight, which increases the chance of diseases (Fig. 1b, c). Many to one, shows the relationship such as if a person is taking the proper dose of calcium and vitamin D, it will help in bone growth i.e. bone growth requires both calcium and vitamin D. Figure 1d describes the cyclic relationship mean properties affecting each other in a cyclic manner, for example, lethargy increases weight which in turn also increases lethargy. These extracted relationships are referred as binary, transitive, many to one and cyclic respectively.

In this paper, we have proposed a method to extract various causal relationships as binary, transitive, many to one and cyclic with properties such as time required to occur an effect (as lag value), rate of change (of both cause and effect parameter) and strength of a relationship without using statistical information.

Related work and contributions

The common way to identify cause–effect relationships is to plan randomized controlled experiments, which is generally expensive and unattainable with a huge number of parameters. Therefore, much concentration is needed to discover cause–effect relationships from increased growth of the huge amount of observational data. Discovery of cause–effect relationships in large observational data is a demandable task. Pearl and Verma (1991) suggested a framework that discovered causal structures from connected conditional independence, based on that some techniques have been developed to identify the causal relationships. However, still it cannot discover causal structures effectively from large databases and also the computational cost is high for the discovery. Probabilistic dependence is one technique, used to represent causality. Probabilistic cause–effect relationships have been examined and suggested in the literature (Reinchenbach 1978; Reichenbach and Reichenbach 1991; Good 1959; Suppes 1970). More recently, Bayesian networks (Pearl 2014), graphical causal modeling have emerged as a leading technique for discovering causal relationships. Authors (Heckerman 1995, 1997; Zhang and Poole 1996; Waldmann and Martignon 1998; Nadkarni and Shenoy 2001) describe the techniques they have proposed for characterizing, interpreting and learning probabilistic independence among parameters. However, Bayesian network learning to discover complete cause–effect models is an NP-complete problem (Chickering 1996). Constraint-based techniques are more efficient by avoiding the search for a generic Bayesian network. Currently, several constraint-based approaches have been implemented to identify causal relationships in large databases and achieved some satisfactory results (Cooper 1997; Silverstein et al. 2000; Mani et al. 2012; Pellet and Elisseeff 2008; Aliferis et al. 2010). These approaches use observational data to detect and learn causal structures using conditional independence among variables. It is significantly notable that these constraints-based approaches directly or indirectly implement the concept of Bayesian network learning, by creating a directed acyclic graph (DAG) which describes the conditional independence between variables (parameters). Even constraint-based methods shown promising results with large databases, they typically are designed to detect causality with few fixed structures in a directed acyclic graph (DAG), such as Y structures (Mani et al. 2012), CCC (Cooper 1997), and CCU (Silverstein et al. 2000).

Another technique in this area is Granger causality (GC) (Granger 1969). It has also been discussed in the previous literature (Lozano et al. 2009a, b; Arnold et al. 2007; Pang and Su 2010) and well known in economics causal inference. The method calculates the impact of one time series on another by finding out whether the response prediction can be improved by including the knowledge of a predictor or not. GC is reported to perform well for stationary time series data but is sensitive to non-linearity. All these methods infer directed networks. Although these methods are fast and, the inferred interactions are undirected. Moreover, these approaches are well suited for small sample data analysis (Veiga et al. 2007) but are not designed to detect combined causal parameters. Most of the time, two or more parameters may enhance the strength of effects. Even when individual parameter does not cause more effect, together they may do. We noticed that discovering causal structures in observational data only is insufficient. So, the discovered relationships have to be verified with time series data and controlled experiments. Still, it is acceptable to remove noncausal relationships discovered from data. Cause–effect relationship discovery is to find a brief list of rules that are probably causal. These causal rules provide a set of statistically decisive relationships which are acceptable to embed cause–effect relationships. This differentiates between the causal and normal rule discovery.

Association rule mining (Agrawal et al. 1993) has an efficient and versatile means for discovering relationships in data (Han et al. 2011). Authors (Jin et al. 2012; Li et al. 2013; Ma et al. 2016) use the advantage of association rule mining for causality discoveries. Jin et al. (2012) discovers the causal relationships with multiple cause variables in large databases of binary variables and excludes non-causal associations. Researchers (Li et al. 2013; Ma et al. 2016) discover potential causal rules using cohort study (Euser et al. 2009; Fleiss et al. 2003) and capable to generate combine causal rules in observational data. Author (Li et al. 2015) presented four approaches PC, HITON-PC, CR-PA and CR-CS for causality detection around a given target variable and discuss their efficiency. The PC and HITON-PC methods are based on Bayesian network learning theory and use conditional independence tests to eliminate non persistent associations, CR-PA use association rule and partial association and CR-CS uses the concept of a cohort study.

These proposed methods are able to find single and combined causal rules effectively in small and large database with low and high dimensional data, but they are restricted to discrete data and unable to extract the cyclic relationships and strength of relationships, although causality can be observed in various hidden relationships. However, statistically predictable associations do not illustrate cause–effect relationships, although mostly causality is usually observed as an association in the dataset. Therefore, in this paper, initially we use the concept of temporal association (Ji et al. 2011) and odds ratio (Fleiss et al. 2003) to extract binary causal relationship and further other relationships are extracted.

To the best of our knowledge, there is no previous work on discovering cyclic and transitive causal relationships with properties as the rate of change of parameters and their relationship strength in time series data. We should observe that discovering causal relationships in observational and constraint-based data only are insufficient.

The contributions of this work are listed in the following:
  • First, we present a method to extract cause–effect relationships like binary, transitive, many to one and cyclic in large time series database.

  • Second, we define the concept of temporal association lag rule and temporal odds ratio to extract cause–effect relationships between various parameters.

  • Third, we are generating more specific cause–effect rules like binary, transitive, many to one and cyclic with their relationship strength which is useful for strategic decisions.

Our proposed method is useful to extract time lagged relationships across different field indicators that can be used to understand the lagged response of one indicator on another and various relationships such as binary, cyclic, many to one and transitive. We show the utility of our approach by extracting some relationships between different field indicators. For example, the rule (Cereal production, D, 2 %, 2\(\Rightarrow\) (Agricultural raw materials exports, 3 %), indicates a causal rule that cereal production is directly related to agricultural raw materials exports and if it is changed by 2 %, it affects the export of agricultural raw material by 3 % after 2 years. The proposed approach can be broadly applied to other problems in the temporal domain to extract various time lagged relationships.

Preliminaries

In this section, first we define the terms used in this paper. Then we define the concepts for describing proposed cause–effect relationship extraction method. Finally, we describe the formal definition of various cause–effect relationships, discovering such causal relationships is the aim of this paper.

This paper deals with continuous parameters. Since all the parameters are having different ranges and we are interested in finding relationships. So instead of taking the absolute value of parameters, the rate of change is used to extract the effect of change of one parameter on another parameter, each time series value is categorized as a positive rate of change (U), a negative rate of change (D) and no rate of change (Q). To find an association between two parameters temporal association rule is used and defined using following terms:
n

Number of elements in time-series

z

Number of parameters in database P

l

Lag parameter, l ≠ 0

l max

Maximum lag difference value

T k

Value of kth time unit

P i,k

Value of P i parameter in kth time unit

γ i,k
Rate of change of parameter P i in kth year, can be calculated as:
$$\gamma_{i,k} = \frac{{P_{i,k} - P_{i,k - 1} }}{{P_{i,k - 1} }}$$
(1)
\(\delta\)

Minimum rate of change used to consider a significant change

R i,k
Parameters indicate type of change, defined as:
$$R_{i,k} = \left\{ {\begin{array}{*{20}l} {U\quad if \quad \, \gamma_{i,k} \ge \delta } \hfill \\ {D \quad if\quad \,\gamma_{i,k} \le - \delta } \hfill \\ {Q \quad if\quad - \delta \le \gamma_{i,k} \le \delta } \hfill \\ \end{array} } \right\}$$
(2)

The time series of parameter P i is converted into a set of tuple 〈P i , T k , R i,k 〉 where T k is kth time period and R i  = R i,k   {U, D, Q} indicates the positive, negative or no rate of change for kth time unit. For example, if GDP is having a positive rate of change in 1970 than it is indicated by tuple 〈GDP, 1970, U〉.

Based on above structure of time series, the relationship between two parameters P i and P j for lag l is defined using following terms:
D i,j,k,l
Parameters indicate direct relationship, defined as:
$$D_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {if\,(R_{i,k} = U\,and\,R_{j,k + l} = U)\,or\,(R_{i,k} = D\,and\,R_{j,k + l} = D)} \hfill \\ 0 \hfill &\quad {otherwise} \hfill \\ \end{array} } \right\}$$
(3)
i.e. the rate of change of P i matches with the rate of change of P j after time period l
\(S_{D} (P_{i} ,P_{j} ,l)\)
Support count of direct relationship, defined as:
$$S_{D} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n - l} D_{i,j,k,l}$$
(4)
\(\alpha_{D} (P_{i} ,P_{j} ,l)\)
Support percent of direct relationship, defined as:
$$\alpha_{D} (P_{i} ,P_{j} ,l) = \frac{{S_{D} (P_{i} ,P_{j} ,l)}}{n - l}$$
(5)
\(I_{i,j,k,l}\)
Parameters indicate inverse relationship, defined as:
$$I_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {if\,(R_{i,k} = U\, and\, R_{j,k + l} = D) \,or\, (R_{i,k} = D\, and\, R_{j,k + l} = U)} \hfill \\ 0 \hfill & \quad{otherwise} \hfill \\ \end{array} } \right\}$$
(6)
i.e. the rate of change of P i is opposite to rate of change of P j after time period l
\(S_{I} (P_{i} ,P_{j} ,l)\)
Support count of inverse relationship, defined as:
$$S_{I} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n - l} I_{i,j,k,l}$$
(7)
\(\alpha_{I} (P_{i} ,P_{j} ,l)\)
Support percent of inverse relationship, defined as:
$$\alpha_{I} (P_{i} ,P_{j} ,l) = \frac{{S_{I} (P_{i} ,P_{j} ,l)}}{n - l}$$
(8)
\(\varTheta_{R}\)
Strength of relationship. It indicates toughness of relationship exists between parameters. The relationship between P i and P j is calculated as:
$$\begin{aligned} \varTheta_{R} \left( { P_{i} , P_{j} } \right) & = \alpha * \log \left( n \right),\quad {\rm where} \quad \\ \alpha & = \alpha_{D} \left( {P_{i} ,P_{j} ,l} \right)\quad or\quad \alpha_{I} \left( {P_{i} ,P_{j} ,l} \right) \\ \end{aligned}$$
(9)
With our approach, we first consider the temporal association between indicators P i and P j since an association is needed for a cause–effect relationship. User defined support count threshold are defined as follows:
α 1

Support count threshold for all causal relationships (considered as 70 % for experimentation).

β

Threshold for temporal odds ratio (considered as 3 for experimentation)

Since α 1 is set 70 %, β is set to 3.

Definition 1

(Temporal association) Using direct or indirect relationship [Eqs. (3)–(8)] temporal association can be defined as follows.

Temporal direct association Temporal direct association between two parameters P i and P j for time lag l is defined as \(P_{i} \mathop \to \limits^{l} P_{j} \,if\, \alpha_{D} (P_{i} ,P_{j} ,l) \ge \alpha_{1}\).

Temporal inverse association Temporal inverse association between two parameters P i and P j for time lag l is defined as \(P_{i} \mathop \to \limits^{l} P_{j} \,if\, \alpha_{I} (P_{i} ,P_{j} ,l) \ge \alpha_{1}\).

Next, we define the terms to calculate the temporal odds ratio of temporally associated parameters to check whether the temporal association rule \(P_{i} \mathop \to \limits^{l} P_{j}\) is also causal rule or not.

\(C_{E} (P_{i} ,P_{j} ,l)\) = Count of the number of pairs when no rate of change in P i is associated with positive or negative rate of change in P j after time period l, defined as:
$$C_{E} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n - l} E_{i,j,k,l}$$
(10)
where \(E_{i,j,k,l}\) = Parameters indicate neutral-change relationship, defined as:
$$E_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {\, if\,\,(R_{i,k} = Q\,\, and\,\,R_{j,k + l} = U)\,\,or\,\, \left( {R_{i,k} = Q\,\, and\,\,R_{j,k + l} = D} \right)} \hfill \\ 0 \hfill &\quad {otherwise} \hfill \\ \end{array} } \right\}$$
(11)
\(C_{F} (P_{i} ,P_{j} ,l)\) = Count of the number of pairs when the positive or negative rate of change in P i is associated with no rate of change in P j after time period l, defined as:
$$C_{F} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n - l} F_{i,j,k,l}$$
(12)
where \(F_{i,j,k,l}\) = Parameters indicate change-neutral relationship, defined as:
$$F_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {if\,\,(R_{i,k} = U\,\,and\,\,R_{j,k + l} = Q) \,\,or\,\,(R_{i,k} = D\,\, and\,\,R_{j,k + l} = Q)} \hfill \\ 0 \hfill &\quad {otherwise} \hfill \\ \end{array} } \right\}$$
(13)
\(C_{N} (P_{i} ,P_{j} ,l)\) = Count of the number of pairs when no rate of change in P i is associated with no rate of change in P j after time period l, defined as:
$$C_{N} \left( {P_{i} ,P_{j} ,l} \right) = \mathop \sum \limits_{k = 1}^{n - l} N_{i,j,k,l}$$
(14)
where \(N_{i,j,k,l}\) = Parameters indicate neutral relationship, defined as:
$$N_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad { if\,\,(R_{i,k} = Q\,\,and\,\,R_{j,k + l} = Q)} \hfill \\ 0 \hfill &\quad {otherwise} \hfill \\ \end{array} } \right\}$$
(15)

Definition 2

(Temporal odds ratio) It quantifies how strongly the presence or absence of change in value of parameter P i effecting change in value of parameter P j . Using above terms [Eqs. (11)–(16)] temporal odds ratio is defined as follows.

Temporal direct odds ratio Temporal direct odds ratio between two parameters P i and P j for time lag l is defined as:
$$OR_{D} \left( {P_{i} ,P_{j} ,l} \right) = Oddratio_{D} \left( {P_{i} ,P_{j} ,l} \right) = \frac{{S_{D} \left( {P_{i} ,P_{j} ,l} \right)*C_{N} \left( {P_{i} ,P_{j} ,l} \right)}}{{C_{E} \left( {P_{i} ,P_{j} ,l} \right)* C_{F} \left( {P_{i} ,P_{j} ,l} \right)}}$$
(16)
Temporal inverse odds ratio Temporal inverse odds ratio between two parameters P i and P j for time lag l is defined as:
$$OR_{I} \left( {P_{i} ,P_{j} ,l} \right) = Oddratio_{I} \left( {P_{i} ,P_{j} ,l} \right) = \frac{{S_{I} \left( {P_{i} ,P_{j} ,l} \right)}}{{C_{E} \left( {P_{i} ,P_{j} ,l} \right)* C_{F} \left( {P_{i} ,P_{j} ,l} \right)}}$$
(17)
In our experimentation, if the value of \(C_{N} (P_{i} ,P_{j} ,l)\,\, or\,\, C_{E} (P_{i} ,P_{j} ,l) \,\,or \,\,C_{F} (P_{i} ,P_{j} ,l)\) between parameters is zero, we considered it as 1 to avoid infinite temporal odds ratio.

Further causal rules are defined using terms define in Definitions 1 and 2.

Definition 3

(Binary rule) A binary causal rule \((P_{i} , D, l) \Rightarrow (P_{j} )\), exists between P i and P j if there is temporal association rule \(P_{i} \mathop \to \limits^{l} P_{j} \,{\text{and}}\, Oddratio_{D} (P_{i} ,P_{j} ,l) \ge \beta \,{\text{or}}\,Oddratio_{I} (P_{i} ,P_{j} ,l) \ge \beta\).

In experimentation results, we represent direct causal rule by \((P_{i} , D, l) \Rightarrow (P_{j} )\) and inverse by \((P_{i} , I, l) \Rightarrow (P_{j} )\).

This rule will serve as a forward pruning criterion where all parameters which are not associated with another parameter with non-zero lag value are excluded from the combination of future search. The minimum required support makes the search space manageable.

Definition 4

(Precise binary rule) A precise binary rule \((P_{i} , D, \delta_{1} ,l) \Rightarrow (P_{j} ,\delta_{2} )\), exists between P i and P j if there is binary rule \((P_{i} , D, l) \Rightarrow (P_{j} )\) and \(\left( {\delta = \delta_{1} } \right),\) i.e. minimum growth rate of change of P i and \(\left( {\delta = \delta_{2} } \right),\) i.e. minimum growth rate of change of P j and the rule will not hold either \(\delta > \delta_{1} \,\,for \,\,P_{i } \,\, or\,\, \delta > \delta_{2} \,\,for\,\,P_{j}\).

Definition 5

(\(fscore (\delta_{1} ,\delta_{2} )\)) A function is used to calculate the specificity of the rule. In the experimentation, it is defined as \(fscore (\delta_{1} ,\delta_{2} ) = \delta_{1}^{2} + \delta_{2}^{2}\). If rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} )\) is satisfied for multiple value of \(\delta_{1} ,\delta_{2}\) than the rule which gives the maximum valid fscore is retained.

Based on binary causal rule, we try to extract other causal relationships as transitive, many to one (combined cause) and cyclic. We define these relationships as follows.

Definition 6

(Transitive rule) A transitive rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,D,\delta_{2} ,l_{2} ) \Rightarrow \left( {P_{k} , \delta_{3} } \right),\) exists between P i , P j and P k if there is \(r1:(P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} ), r2: (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{k} ,\delta_{3} ), (P_{i} , D, \delta_{1} ,l_{3} ) \Rightarrow (P_{k} ,\delta_{3} ),\) \(l_{3} \ge l_{1} + l_{2} \,and\, r_{1} (P_{j} ) \cap r_{2} (P_{j} ) \ne \emptyset\).

Definition 7

(Combined cause rule) A many to one rule \(\left( {\left( {P_{i} , D, \delta_{1} ,l_{1} } \right),\left( {P_{j} ,D,\delta_{2} ,l_{2} } \right)} \right) \Rightarrow (P_{k} ,\delta_{3} ),\) exists between P i , P j and P k if there is \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{k} ,\delta_{3} ), (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{k} ,\delta_{3} ), S_{D} \left( {P_{i} ,P_{k} ,l_{1} } \right) \ge \alpha_{1} ,\) \(S_{D} (P_{j} ,P_{k} ,l_{2} ) \ge \alpha_{1} \, and\,S_{D} ((P_{i} , P_{j} ),P_{k} ,l_{1} ,l_{2} ) \ge \alpha_{1}\).

Definition 8

(Cyclic rule) A cyclic rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Leftrightarrow (P_{j} ,D,\delta_{2} ,l_{2} ),\) exists between P i and P j if there is \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} ), (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{i} ,\delta_{1} ) \,\,and\,\,S_{D} ((P_{i} , P_{j} ),l_{1} ,l_{2} ) \ge \alpha_{1}\).

Proposed method

In this section, we described an algorithm based on the definitions. The algorithm is explained in five steps. Step 1 generates the binary causal rule. Step 2 generates more precise rules of binary causal rules. Steps 3, 4, and 5 generate the transitive, many to one and cyclic rules. Further, we give the explanation of each step of an algorithm. Table 1 represents the abbreviations used in the algorithm and in this paper. Let P be a time series database in discrete form and P i is a time series of parameter P i have U, D, and Q values as mentioned in the definitions, z is a number of parameters in database P.
Table 1

Abbreviation table

Abbreviation

Description

TOR

Temporal odds ratio

BRS

Binary rule set

SRS

Specific rule set

TRS

Transitive rule set

MOS

Many to one rule set

CRS

Cyclic rule set

AG

Agriculture land

AR

Arable land

ARME

Agricultural raw materials exports

CAB

Current account balance

CY

Cereal yield

CO2

CO2 emissions

CP

Crop production

CPI

Crop production index

EDOE

Electronic data processing and office equipment

FDI

Foreign direct investment

FMP

Fuels and mining products

FR

Forest rents

GDP

Gross domestic product

GGR

General government revenue

GNS

Gross national savings

I1 to I10

No of indicators (10)

ICEC

Integrated circuits and electronic components

IS

Iron and steel

OM

Other manufactures

OTE

Office and telecom equipment

TI

Total investment

VEG

Volume of exports of goods

VEGS

Volume of exports of goods and services

VIG

Volume of imports of goods

Step 1: Binary rule generation

A causal rule may be generated for multiple lag values, the lag value which gives maximum support of rule will be considered. Suppose P = {P 1 , P 2, P 3, P 4, P 5}, set of time series dataset and using this step 1 BRS generated results are as follows.

BRS = {(P 1, P 2, D, l, 75, 4), (P 1 , P 3 , D, 2, 73, 4), (P 2 , P 3 , I, l, 77, 3), (P 4 , P 3 , I, l, 71, 6), (P 2 , P 5 , D, l, 76, 5), (P 5 , P 2 , D, l, 72, 4)}. Here (P 1 , P 2 , D, l, 75, 4), describes that parameters P 1 and P 2 have a direct relationship with lag 1, support 75 and TOR = 3, which indicates that (P 1 , P 2) are causally related, i.e. P 1 effects P 2 after 1 year. Similarly, by comparing support and their odds ratio between parameters for each tuple, the other binary causal relationship can be extracted and interpreted.

Explanation

To describe this step, we consider the time series using rate of change as positive (U) or negative (D) of two parameters say P i and P j for a time period (91–97).

Let
  • T = {1991, 1992, 1993, 1994, 1995, 1996, 1997}

  • P i  = {U, U, U, U, U, D, U}

  • P j  = {D, U, U, U, U, U, U}

Here we calculate support value α for lag value = 1.

Support value for lag value 1 α D (P i , P j , 1) = 83 % and temporal odd ratio (TOR), Oddratio D (P i , P j , 1) = 5.

Since calculated α D  > α 1 and TOR > 3 the rule \((P_{i} , D, 1) \Rightarrow (P_{j} )\), is correct and exists for lag value 1 (i.e. l ≠ 0).

Relationship strength [using Eq. (10)] of this rule is, 70.13.

If time series data are given for some parameters, we can calculate α D and TOR between parameters and rules can be extracted. So with the help of the above algorithm, we would be able to extract all two-variable causal relationships between parameters for a time series data set.

Step 2: Specific rules generation

In this step, we calculated the specific rule for binary causal rules generated in the above algorithm.

Let \(\gamma_{i}\) and \(\gamma_{j}\) are the rate of change of parameters P i and P j and parameters have a direct relationship.

\(Let \delta_{i} max\) = maximum value of the rate of change of P i , \(\delta_{j} max\) = maximum value of the rate of change of P j , \(\delta_{i} min\) = minimum value of the rate of change P i , \(\delta_{j} min\) = minimum value of the rate of change P j .

Calculation of interval value \(\upeta_{{P_{i} }}\) (increment, value for a parameter P i )
$$\eta_{{P_{i} }} = \frac{{\delta_{i} \hbox{max} - \delta_{i} \hbox{min} }}{n} \,\,\,and\,\,\,\eta_{{P_{j} }} = \frac{{\delta_{j} \hbox{max} - \delta_{j} \hbox{min} }}{n}$$
(18)
where \(\delta_{i} \hbox{max} \,\,or\,\,\delta_{j} \hbox{max} = \mu + 2\sigma \,\,and\,\,\delta_{i} \hbox{min} \,\,or\,\,\delta_{j} \hbox{min} = \mu - 2\sigma\)

Let \(\delta_{1} , \delta_{2}\) is the minimum rate of change of parameters P i , P j . Then, using this step 2 more specific causal rules \((P_{i} , D, \delta_{1} ,l) \Rightarrow (P_{j} ,\delta_{2} )\) can be generated. The rule indicates that P i and P j have a direct causal relationship with lag 1 and if P i is changed by \(\delta_{1}\) it leads to change P j by \(\delta_{2}\). Based on BRS results assumed in step 1 more specific rules can be generated as follows:

SRS = {(P 1 , P 2 , D, 1 %, 2 %, 1), (P 1 , P 3 , D, 2 %, 1 %, 2), (P 2 , P 3 , D, 2 %, 1.5 %, 1), (P 4 , P 3 , I, 1.5 %, 2 %, l), (P 2 , P 5 , D, 2 %, 3 %, 1), (P 5 , P 2 , I, 3 %, 2 %, 1)}.

Step 3: Transitive rule generation

Based on SRS results in step 2, tuple (P 1 , P 2 , D, 1 %, 2 %, 1), (P 2 , P 3 , D, 2 %, 1.5 %, 1) and (P 1 , P 3 , D, 2 %, 1 %, 2) satisfies all the conditions of transitive relation and generate a transitive rule
$$(P_{1} , D, 1\% ,1) \Rightarrow (P_{2} ,D, 2\% ,1) \Rightarrow \left( {P_{3} , 1\% } \right)$$
If the same parameter has a different rate of change in different rules minimum of them is considered.

Explanation

To understand this, we consider the time series of three parameters P i , P j , and P k as follows.

Let TOR > 3 and \(\delta_{1} , \delta_{2} , \delta_{3}\) is the rate of change of parameters \(P_{i} , P_{j} , P_{k} .\) Calculate support values from Table 2 is:
  • \({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{i} \left( U \right)\,{\text{and}}\,P_{j} \left( D \right),\,\alpha_{ij} \left( {P_{i} , P_{j} ,1} \right) = 77.7.\)

  • \({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{j} \left( D \right)\,{\text{and }}\,P_{k} \left( D \right),\alpha_{jk} \left( {P_{j} , P_{k} ,1} \right) = 88.8,\)

  • \({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{i} \left( D \right)\,{\text{and }}\,P_{k} \left( D \right),\alpha_{ik} \left( {P_{i} , P_{k} ,2} \right) = 75,\)

Since \(\alpha_{ij} > \alpha_{1} , \alpha_{jk} > \alpha_{1} , \alpha_{ik} > \alpha_{1} ,\) generated binary causal rules are
$$(P_{i} , I, \delta_{1} ,1) \Rightarrow (P_{j} ,\delta_{2} ), (P_{j} , D, \delta_{2} ,1) \Rightarrow (P_{k} ,\delta_{3} ), (P_{i} , I, \delta_{1} ,2) \Rightarrow (P_{k} ,\delta_{3} ).$$
The condition l 3 ≥ 2 (1 + 1) is also satisfies and generated transitive rule is \((P_{i} , I, \delta_{1} ,1) \Rightarrow (P_{j} ,D,\delta_{2} ,1) \Rightarrow (P_{k} , \delta_{3} )\).
Table 2

Parameter time series

Time

Pi

Pj

Pk

1991

U

D

U

1992

U

D

D

1993

U

D

D

1994

U

D

D

1995

U

D

D

1996

D

D

D

1997

D

D

D

1998

U

D

U

1999

D

D

D

2000

U

D

D

Step 4: Many to one (combined causal) rule generation

Based on SRS results in step 2, tuple (P 1 , P 3 , D, 2 %, 1 %, 2), (P 4 , P 3 , I, 1.5 %, 2 %, l) and using this step 4 generated combined causal rule is \(((P_{1} , D, 2\% ,2),(P_{4} ,I,1.5\% ,1)) \Rightarrow (P_{3} ,1\% ).\)

Explanation

Let we have the following values for parameters \(P_{i} , P_{j} , {\text{and }}P_{k}\).

Let TOR > 3, \(\delta_{1} , \delta_{2} , \delta_{3}\) is the rate of change of parameters \(P_{i} , P_{j} , P_{k}\). Calculate support values from Table 3 as: \({\text{Support}}\,{\text{value}}\,{\text{of}}\,\alpha_{ik} \left( {P_{i} , P_{k} ,1} \right) = 77.7\,\% ,\, {\text{Support}}\,{\text{value }}\,{\text{of}}\, \alpha_{jk} \left( {P_{j} , P_{k} ,1} \right) = 88.8\%\).
Table 3

Parameter time series

Time

Pi

Pj

Pk

1991

U

U

D

1992

U

U

D

1993

U

D

D

1994

D

U

D

1995

U

U

D

1996

U

U

D

1997

U

U

D

1998

U

U

D

1999

U

U

D

2000

U

U

D

Italic letters indicate the temporal association between parameters for given time. For example, Pi and Pj are associated for lag 0 in 1991 and (Pi, Pj ) are associated with Pk at lag 1. So, Pi and Pj values are italic at 1991 and Pk at 1992

Calculated support values \(\alpha_{ik} , \alpha_{jk} \,{\text{and}}\,\alpha_{ijk} > \alpha_{1}\) which satisfies Definitions 4 and 7. In Table 3 highlighted rows indicates the \(((P_{i} , P_{j} ), P_{k} )\) relationship. Since all the conditions are satisfied the generated combined rule is\(((P_{i} ,I,\delta_{1} ,1),(P_{j} ,I,\delta_{2} ,1)) \Rightarrow (P_{k} , \delta_{3} )\).

Step 5: Cyclic rule generation

Based on SRS results in step 2, tuple (P 2 , P 5 , D, 2 %, 3 %, 1), (P 5 , P 2 , I, 3 %, 2 %, 1) and using this step generated cyclic rule is \((P_{2} , D, 2\% ,1) \Leftrightarrow \left( {P_{5} , D,3\% , 1} \right)\).

Explanation

To understand this rule, we consider two parameters say P i , and P j , for a time period 1998–2015. Let \(\delta_{1} \, {\text{and}}\, \delta_{2}\) are rate of change for parameters \(P_{i} , P_{j}\) which have the following values.

We can identify that relationship \((P_{i} , I, \delta_{1} ,1) \Rightarrow (P_{j} ,\delta_{2} )\), \((P_{j} , I, \delta_{2} ,1) \Rightarrow (P_{i} ,\delta_{1} ),\) are satisfied in Table 4 from Definition 4. In Table 4, the time period satisfies cyclic relation between parameters is {(1988–1991), (1990–1992), (1992–1994), (1993–1995), (1995–1997), (1996–1998)}. For example (1988–1991) indicates that if P i increases in 1988 P j goes down in 1989 which in turn increases P i in 1990.
Table 4

Parameter time series

Time

Pi

Pj

1988

U

U

1990

U

D

1991

U

D

1992

U

D

1993

D

D

1994

U

U

1995

D

D

1996

D

U

1997

D

U

1998

D

U

Calculated support value α ij for parameters P i and P j : 75 %. Since α ij  > α 1 cyclic relation is satisfied and generated cyclic causal rule is (P i , I, δ 1, 1)  (P j , I, δ 2, 1).

Experiments

We implemented our method using Java programming language with Net Beans IDE 7.3. The computation time to check the causal relationship between parameters is high using serialized programming. So we use a parallelization approach in our program using threads in Java on a machine with configuration Dual-Core CPU contains 12-Cores, 8 GB RAM, and 64-bit Windows 7 Operating System. Our goal is to discover various causal relationships between the different economic parameters. Firstly, we find all the binary causal rules (i.e. one cause and one effect parameter) and then other causality rules are discovered using proposed method. For experimentation, minimum support threshold α 1 is set 70 % and \(\beta\) is set 3.

Dataset

The approach is discussed using 2 synthetic and 3 real-world dataset. Table 5 shows the summary of data sets. The synthetic dataset is generated using R software based on Bayesian network (BN). First, we create random numbers, next build a BN on it and then generate the data from BN. Real world economic datasets are obtained from the World Trade Organization (1995), International Monetary Fund (1945) and World Bank data (1944). The WTO provides data on international trade in merchandise and commercial services. IMF contains time series data of 189 countries on economic parameters. World Bank contains time series data from 250 countries on a variety of topics such as agriculture, education, health, and an environment, etc. In World Bank and IMF, both we tested our algorithm for south-Asian countries (India, Pakistan, Sri Lanka, Bangladesh, Nepal, Bhutan and the Maldives, Afghanistan). In WTO, we used the data of Merchandise trade: Network of world merchandise trade in Asia.
Table 5

Datasets

Name

Length of time series (years)

No of indicators (parameters)

Synthetic-1

40

6

Synthetic-2

40

10

WTO

31

30

IMF

34

40

World Bank

52

1346

All the datasets are selected to test the effectiveness of proposed method. In our experiments first, we preprocess the continuous data set [Eq. (1)] and represented them by positive, negative and neutral (no) rate of change as U, D, and Q value [Eq. (2)] from the primitive data sets.

Results

This section presents the various extracted causal relationships for World Bank data sets. Results on other datasets are shown in “Comparison” section. To save space, at below, we omitted all relationships and consider only those relationships which are present in multiple countries and displaying some of them. The discovered causal rules with our approach are shown in Table 6 for south-Asian countries. In Table 6 causal relationship between parameters is described with its support, strength and rate of change of indicators. For example, a rule (Cereal production, D, 3 %, 1)  (Crop production index, 1 %), indicates direct relationship, i.e. increase in cereal production by 3 %, will increase the crop production index by 1 % after 1 year. This rule is discovered in four countries Srilanka, Nepal, Pakistan and India with different strength and support values. On the basis of support and strength value, we can say that this rule is more valid for Nepal rather than the other three countries. We can also identify a rule which has more valid for a country. In Table 6 from the binary causal rule, we can observe that three rules are present in India and above discussed rule is more valid than other rules in India. The transitive causal rules: (Rural population, D, 1 %, 1 Population density, D, 0.33 %, 1 Population, total, 0.68 %) can be described as, a 1 % increase in rural population increase population density by 0.33 % after 1 year, which tends to increase the total population by 0.68 % after a year. This rule is present in four countries, Afghanistan, India, Maldives, and Nepal. The rule is having more impact on India. As compared to binary and transitive causal rules, the algorithm extracts the less number of causal rules for many to one (combined causal) and cyclic. The many to one causal rule: {(Forest rents, I, 5 %, 2), (Foreign direct investment, D, 3 %, 1)}  (Crop production index, 7 %) indicates that the decrease in forest rent by 5 % and increase in foreign direct investment by 3 % would tend to increase the crop production index by 7 %. The cyclic causal rule: (Gross domestic savings, D, 1 %, 1)  (Cereal yield, D, 0.5 %, 2) can be described as, a 1 % increase in gross domestic savings increase cereal yield by 0.5 % after a year and increases in cereal yield would again increase gross domestic savings after 2 years. Similarly, other rules in all causal relationships can be analyzed.
Table 6

Causality rules

Rules

Countries

Support

Strength

Binary causal rules

(Cereal production, D, 2 %, 2) \(\Rightarrow\) (agricultural raw materials exports, 3 %)

India

74

120.8767

Pakistan

76

124.1436

(Air transport, D, 1 %, 2) \(\Rightarrow\) (GDP growth, 0.22 %)

India

74

120.8767

Nepal

79

129.0440

(Cereal production, D, 3 %, 1) \(\Rightarrow\) (crop production index, 1 %)

Srilanka

76

124.1436

Nepal

81

132.3109

Afganistan

76

124.1436

India

76

124.1436

Transitive causal rules

(Rural population, D, 1 %, 1) \(\Rightarrow\) (population density, D, 0.33 %, 1) \(\Rightarrow\) (population total, 0.68 %)

Afghanistan

74

120.8767

India

83

135.5779

Maldives

77

125.7771

Nepal

71

115.9763

(Land under cereal production, D, 3 %, 1) \(\Rightarrow\) (food exports, D, 1 %, 2) \(\Rightarrow\) (GDP growth, 1.5 %)

India

71

115.9763

Pakistan

72

117.6097

Bangladesh

71

115.9763

(Arable land, D, 1 %, 1) \(\Rightarrow\) (agricultural land, D, 1 %, 3) \(\Rightarrow\) (CO2 emissions, 1.5 %)

India

71

115.9763

Srilanka

71

115.9763

India

70

114.3428

Many to one (combined causal) causal rule

{(Rural population, D, 2.3 %, 1), (urban population D, 0.5 %, 1)} \(\Rightarrow\) (population density, 1 %)

India

79

129.0440

Afghanistan

72

117.6097

Pakistan

72

117.6097

{(Forest rents, I, 5 %, 2), (Foreign direct investment, D, 3 %, 1)} \(\Rightarrow\) (crop production index, 7 %)

Srilanka

72

117.6097

{(Land under cereal production, D, 0.8 %, 1), (rural population, I, 1 %, 2)} \(\Rightarrow\) (cereal production, 2 %)

Afghanistan

73

119.2432

India

72

117.6097

Pakistan

70

114.3428

Cyclic causal rules

(Land under cereal production, D, 2.5 %, 2)  (agricultural land, D, 4.5 %, 1)

India

72

117.6097

(Gross domestic savings, D, 1 %, 1)  (cereal yield, D, 0.5 %, 2)

Srilanka

70

114.3428

India

70

114.3428

Prediction effectiveness

The rules can be validated by calculating the mutual information (Meyer 2014) between indicators and the conditional entropy (Marsh 2013; Meyer 2014) change of the indicator before and after applying the rule. It is shown in Table 7 that the indicators are mutually related and the entropy of the indicator is decreased after applying the rule.
Table 7

Entropy of indicators

Indicators

Target indicator entropy

Proposed method conditional entropy after applying rule

Mutual information between indicators

CP → ARME

1.0973

0.51

0.837

AG → AR → CO2

1.0986

0.58

0.585

(FDI, FR) → CPI

1.0972

0.035

0.595

GDP ←→ CY

1.0961

0.37

0.583

Table 7 results show that the target indicator entropy is decreased after the rule is applied, which represents that indicator value is more uncertain when it is considered alone. For example, the large value of mutual information between CP and ARME, indicates that the two indicators are related and the entropy of ARME is decreased after the rule CP → ARME is applied. So it can be concluded that the proposed method achieves high prediction effectiveness. We validated all the generated causal rules using the concept of decrease in entropy and mutual information to check their prediction effectiveness. Generated causal rules can also be validated using time series graphs shown in “Appendix”.

Scalability

Further, we do experimentation to evaluate the scalability of the algorithm with the involved years and the number of indicators. Considering Figs. 2 and 3, it could be seen that, the proposed cause–effect discovery method scales up with the number of indicators. We examine the performance degradation of the algorithm on the basis of various causal rule discoveries for nine different scales (number of indicators): 50, 75, 100, 125, 150, 175, 200, 225 and 250. The minimum support threshold is set 70, and it remains the same in all the experiments.
Fig. 2

Scale up of indicators for binary causal rules

Fig. 3

Scale up of indicators for other causal rules

As shown in Fig. 2, the extraction time increases squarely with the number of indicators. More important, the curve is parabolic, which means that the performance of our algorithm is non-linearly related to the increase of number of indicators in binary causal rules. Though the time for generation of the binary causal rule is increasing squarely with a number of indicators, time for generation of other rules is not non-linear because the generation of other rules uses the result of binary rule generation (in Fig. 3).

The proposed method is able to extract nonlinear relationship from extracted causal rules because we are dealing with change of values as the rate of change and this change can be linear or nonlinear.

Discussion

Comparison

To assess the efficiency of the proposed method, we compared proposed method with both statistical and non statistical methods. Statistical (Granger causality, Bayesian network) methods comparison is performed using R software packages as lmtest (Hothorn et al. 2015) for GC and bnlearn (Scutar 2016) for BN. In BN we calculate the results using constraint based local discovery algorithm hiton.pc (Aliferis et al. 2003). For non-statistical approaches, we implemented the methods (Silverstein et al. 2000; Jin et al. 2012; Li et al. 2013) in Java for causal rule discovery.

First, we compared proposed method with GC and BN. GC is the base method to detect lag relationship in stationary time series data set. We run GC for different lag values with significance level, α = 0.05. HITON-PC is an effective algorithm of BN to extract parent–child relationship. So we considered both statistical methods as a benchmark for accuracy comparison. Tables 8 and 9 describe that all the binary rules which are generated in all the datasets by other methods are also generated by the proposed method. For example in the synthetic-2 dataset, we described the rule related to indicator I7 and I8. In the statistical approach from Table 8, we can observe that the GC can discover only binary causal rules while BN can discover transitive as well as binary rules between indicators. For example, in a BN graph like I 1 → I 3 → I 6 can be generated, but I 1 and I 6 are independent, i.e. I 1 and I 6 may or may not be dependent. In proposed method I 1 and I 6 are conditionally dependent or I 1 is an indirect cause of I 6.
Table 8

Comparison of proposed method with statistical method

Dataset

Indicators relationships

Extracted rules

Statistical methods

Proposed method

Granger causality

Bayesian network

Synthetic-1 (I1–I6)

Binary

I1 → I3

Many to one

(I2, I4) → I5

  

Transitive

I1 → I3 → I6

 

Cyclic

I1 ←→ I3

  

Synthetic-2 (I1–I10)

Binary

I1 → I7, I2 → I7, I7 → I2, I1 → I3, I7 → I8

Many to one

(I6, I9) → I7

  

Transitive

I1 → I7 → I8

 

Cyclic

I2 ←→ I7

  

WTO

Binary

Chemicals → Textiles

Chemicals → OTE

Many to one

(OTE, Textiles) → EDOE

  

Transitive

IS → OM → ICEC

 

Cyclic

OM ←→ IS

  

IMF

Binary

GGR → VEG

Many to one

(GGR, GNS) → TI

  

Transitive

GDP → VIG → TI

 

Cyclic

CAB ←→ VEGS

  

World Bank data

Binary

CP → ARME

Many to one

(FDI, FR) → CPI

  

Transitive

AR → AG → CO2

 

Cyclic

GDP ←→ CY

  
Table 9

Comparison of proposed method with non statistical method

Dataset

Indicators relationships

Extracted rules

Non-statistical methods

Proposed method

Silverstein et al. (2000)

Jin et al. (2012)

Li et al. (2013)

Synthetic-1 (I1–I6)

Binary

I1 → I3

Many to one

(I2, I4) → I5

 

Transitive

I1 → I3 → I6

  

Cyclic

I1 ←→ I3

   

Synthetic-2 (I1–I10)

Binary

I1 → I7, I2 → I7, I7 → I2, I1 → I3, I7 → I8

Many to one

(I6, I9) → I7

 

Transitive

I1 → I7 → I8

  

Cyclic

I2 ←→ I7

   

WTO

Binary

Chemicals → Textiles

Chemicals → OTE

Many to one

(OTE, Textiles) → EDOE

 

Transitive

IS → OM → ICEC

  

Cyclic

OM ←→ IS

   

IMF

Binary

GGR → VEG

Many to one

(GGR, GNS) → TI

 

Transitive

GDP → VIG → TI

  

Cyclic

CAB ←→ VEGS

   

World Bank data

Binary

CP → ARME

Many to one

(FDI, FR) → CPI

  

Transitive

AR → AG → CO2

  

Cyclic

GDP ←→ CY

   

Second, we compared our method with non-statistical methods. From Table 9 it can observe that binary and combined (many to one) causal relationship can be discovered by Jin et al. (2012) and Li et al. (2013) in all datasets. Silverstein et al. (2000) can also detect many to one rule but independently. For example, if we consider the rule (I 2 , I 4) → I 5 in the synthetic-1 dataset it would be considered as I 2 → I 5 ← I 4, i.e. I 2 and I 4 affect I 5 independently, so we have not considered the many to one rule generated in a method (Silverstein et al. 2000). A transitive relationship is extracted by Silverstein et al. (2000) and proposed method. Relationships extracted by various methods are shown in Tables 8 and 9.

Based on the experimental results, it is reasonable to conclude that proposed method is capable to extract various causal relationships and causal rules like cyclic and the transitive causal rule cannot be extracted by other methods. Although non-statistical methods can generate combined causal rules, but are not generating specific rule and relationship strength. One more advantage of our method is that it also generates more specific rule and their strength between indicators. For example, when we run our algorithm on the synthetic-1 dataset, rules are extracted with various properties as lag value (time period after which one affects another indicator), strength and the rate of change of indicators i.e. positive or negative percent change. Actually, the rule I 1 → I 3 is extracted as \((I_{1} , I, 2\% , 1) \Rightarrow \left( {I_{3} , 1\% } \right)\), 113.6, which indicates 2 % change in I 1 inversely effect 1 % change in I 3 after 1 year with 113.6 relationship strength. The results of proposed method are also demonstrated with real world data sets, as described in the following.

To investigate various causal rules in the real world cases, we run the proposed algorithm on the three real world data sets shown in Table 5 for performance evaluation. The proposed algorithm generates various binary, many to one, transitive and cyclic rules, some of the causal rules are reasonable as judged by common sense, shown in Table 8. For example, from the IMF data set, it is found that increases in general government revenue would also increase the volume of exports of goods, increase in growth of general government revenue and gross national saving effect to increase in total investment, and a decrease in government revenue can lead to decreased exports of goods too. Some interesting causal relationships are also extracted in the WTO and World Bank dataset. For example, if crop production of a country is increased, it effects to increase the export of agriculture raw material which helps to improve the economic growth of a country.

Performance evaluation

This section presents measures for assessing how accurately our proposed method can generate causal rules. The used accuracy measures (Han et al. 2011) are Precision, Recall, Specificity, F-score, Accuracy (recognition rate) and Misclassification rate. We evaluated all measures for proposed, statistical and non-statistical methods compared previously. Binary rules are considered to predict accuracy because this can be generated by all compared methods. Initially we classify the results in two classes as a causal rule (CR) and non-causal rule (NCR). Then, based on the CR and NCR results confusion matrix (TP, TN, FP, FN) is created to evaluate measures shown in “Appendix”. Finally accuracy measures are calculated using TP, TN, FP and FN values. Performance of various methods is evaluated in real world, World Bank dataset for five different scales (numbers of indicators): 10, 20, 30, 40 and 50. Number of target indicators is set to 5 and remain same for all different scales. In Table 10, WBD-10 represents that 10 indicators are considered for causal rule extraction similarly others can be interpreted. Causal rules (some of them) extracted by most of the compared methods are shown in “Appendix”. To indicate extracted causal rules significance appropriate references from previous literatures and documents are given. In Table 10, we can see that the proposed method can achieve higher accuracy and less error rate than all other statistical and non- statistical method for different scales of World Bank dataset.
Table 10

Prediction accuracy of proposed, statistical and non-statistical methods on different scales

Accuracy parameters

Proposed method

Li et al. (2013)

Jin et al. (2012)

Silverstein et al. (2000)

Granger causality

Bayesian network

WBD-10, Rules: 50, CR:16, NCR: 34

Sensitivity

0.94

0.81

0.75

0.69

0.69

0.75

Specificity

0.91

0.82

0.74

0.65

0.68

0.79

Precision

0.83

0.68

0.57

0.48

0.50

0.63

F-Score

0.88

0.74

0.65

0.56

0.58

0.69

Accuracy

0.92

0.82

0.74

0.66

0.68

0.78

Misclassification rate

0.08

0.18

0.26

0.34

0.32

0.22

WBD-20, Rules: 100, CR:38, NCR: 62

Sensitivity

0.92

0.84

0.74

0.68

0.66

0.76

Specificity

0.90

0.82

0.74

0.66

0.68

0.77

Precision

0.85

0.74

0.64

0.55

0.56

0.67

F-Score

0.89

0.79

0.68

0.61

0.60

0.72

Accuracy

0.91

0.83

0.74

0.67

0.67

0.77

Misclassification rate

0.09

0.17

0.26

0.33

0.33

0.23

WBD-30, Rules: 150, CR: 65, NCR: 85

Sensitivity

0.91

0.80

0.72

0.63

0.65

0.77

Specificity

0.88

0.81

0.73

0.65

0.66

0.78

Precision

0.86

0.76

0.67

0.58

0.59

0.72

F-Score

0.88

0.78

0.70

0.60

0.62

0.75

Accuracy

0.89

0.81

0.73

0.64

0.65

0.77

Misclassification rate

0.11

0.19

0.27

0.36

0.35

0.23

WBD-40, Rules: 200, CR: 88, NCR: 112

Sensitivity

0.91

0.80

0.68

0.60

0.59

0.70

Specificity

0.89

0.81

0.71

0.63

0.61

0.72

Precision

0.87

0.77

0.65

0.56

0.55

0.65

F-Score

0.89

0.78

0.67

0.58

0.57

0.68

Accuracy

0.90

0.81

0.70

0.62

0.60

0.72

Misclassification rate

0.10

0.20

0.30

0.39

0.40

0.32

WBD-50, Rules: 250, CR: 112, NCR: 138

Sensitivity

0.90

0.79

0.65

0.60

0.57

0.67

Specificity

0.88

0.79

0.67

0.61

0.59

0.68

Precision

0.90

0.75

0.62

0.55

0.53

0.63

F-Score

0.90

0.77

0.63

0.58

0.55

0.65

Accuracy

0.89

0.79

0.66

0.60

0.58

0.68

Misclassification rate

0.09

0.21

0.34

0.40

0.42

0.32

The accuracy curve for proposed method and the compared methods is shown in Fig. 4. The proposed method can extract causal rules more accurately and performs the best in all different scales. We can also notice when the dataset size increases; the statistical method performance degrades more than non-statistical methods. We regard our proposed method has a stable and good performance accuracy in comparison with the other compared methods.
Fig. 4

Accuracy curve of various methods on different scales

In summary the comparison results show that the proposed method has high performance and also performs well in terms of all accuracy measures as compare to other compared methods.

Complexity

The steps defined in an algorithm to make minimum passes over the data. In the first pass, we calculate the growth rate of parameters and its positive, negative or neutral growth rate change value U, D, and Q are assigned to each parameter to perform the next steps. In the second pass, we calculate the support value and an odds ratio of all the individual parameters together with other parameters for different lag values. Non-zero lag value associations identified from the tests are considered. Associations with insufficient support and odds ratio will be eliminated directly. The cause–effect rules in current pairs can be determined from temporal associations and temporal odds ratio for nonzero lag value. At the end, causal pairs found previously are combined for the next steps to generate transitive, many to one and cyclic rule using basic causal binary rule. To achieve efficiency, all the combinations are not considered as a condition during the generation of other causality rules. Instead, we only investigate the combinations appearing in the data which are related to non-zero lag value. Since such combinations are very small as compared to total combinations, the cost of computation is reduced.

To analyze the performance of the algorithm with respect to time and space complexity, and the number of passes over the data set, we denote the set of parameter S, the number of parameters n, the length of the time series t, the number of extracted pairs m and the lag value l. The complexity of the method is discussed based on the extraction of binary causal rules in the form of P 1 → P 2 for lag value l.

The single parameters are paired and the support is calculated with O(n) passes over the data set. Each pair combination needs to test for l lag values to determine the association and causality, which requires O(n * l) passes. In the process of extracting binary causal relationships, a causal association will be examined on all combinations.

The total number of possible pair combinations P is:
$$P = \mathop \sum \limits_{n = 1}^{\left| s \right|} \mathop \sum \limits_{m = 1}^{\left| s \right|} \left( {s_{{C_{m} - }} s - m_{{C_{n} }} } \right)$$
(19)

So the data set needs to scan as many as O (Pnl) times. This way we can conclude the passes over the data set is O (Pnl), and the time it takes is O (Pnlt). Complexity will be substantially reduced by firstly applying the pruning step1 (binary rule generation) before extraction of other relationships.

Conclusion

This paper proposed a novel method to extract various types of causal relationship like binary, transitive, many to one and cyclic in large time series database. The proposed method is generating more specific rules and their strength which are useful for strategic information. We also defined the concept of temporal odds ratio to categorize temporal association as a causal rule. Experiments have shown that the proposed algorithm can extract single, transitive, combined and cyclic causes from large time series data sets. Additionally, the extracted rules are validated to prove their accuracy and the algorithms have been shown to scale up well with respect to the number of indicators on time series data.

In future, the efficiency of the method can be improved by using fast algorithms of mining association rule. The concept of the algorithm can also be extended to other types of time series. The proposed method can be applied in various social, economic, agriculture domains to generate strategic rules for decision making. The method is also useful to detect the exact cause of fault for the large mechanical system which is monitored by various sensors generating time series data.

Declarations

Authors’ contributions

SH conceived the idea, designed, analyzed and interpreted the data, involved in the system design and implementation, wrote and drafted the manuscript. PSD supervised the research, responsible for algorithm and manuscript revision for important intellectual content. He gave valuable advices on conducting the study and helped editing the article. Both authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank the department of Computer Science and Engineering, VNIT, Nagpur, for making available required computing facilities.

Competing interests

The authors declare that they have no competing interests.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology

References

  1. Abolhosseini S, Heshmati A, Altmann J (2014) The effect of renewable energy development on carbon emission reduction: an empirical analysis for the EU-15 countries. Institute for the Study of Labor, Germany. IZA DP no. 7989Google Scholar
  2. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216. doi:10.1145/170036.170072 View ArticleGoogle Scholar
  3. Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON: a novel Markov Blanket algorithm for optimal variable selection. In: AMIA annual symposium proceedings, American Medical Informatics Association 2003, pp 21–25Google Scholar
  4. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J Mach Learn Res 11:171–234MathSciNetMATHGoogle Scholar
  5. Arnold A, Liu Y, Abe N (2007) Temporal causal modeling with graphical granger methods. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 66–75Google Scholar
  6. Asafu-Adjaye J (2000) The relationship between energy consumption, energy prices and economic growth: time series evidence from Asian developing countries. Energyeconomics 22(6):615–625. doi:10.1016/S0140-9883(00)00050-5 Google Scholar
  7. BIS (2011) https://www.gov.uk. Analyses the sources of economic growth in relation to trade and investment. Trade and investment analytical papers. Ref: BIS/11/723
  8. Cai B, Wang J, He J, Geng Y (2016) Evaluating CO2 emission performance in China’s cement industry: an enterprise perspective. Appl Energy 166:191–200View ArticleGoogle Scholar
  9. Chen X, Hoffman MM, Bilmes JA, Hesselberth JR, Noble WS (2010) A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data. Bioinformatics 26(12):i334–i342. doi:10.1093/bioinformatics/btq175 View ArticlePubMedPubMed CentralGoogle Scholar
  10. Chickering DM (1996) Learning Bayesian networks is NP-complete. Learning from data. Springer, New York, pp 121–130View ArticleGoogle Scholar
  11. Chu T, Danks D, Glymour C (2005). Data driven methods for nonlinear granger causality. Clim Teleconnect Mech. doi:10.1.1.85.7974Google Scholar
  12. Cooper GF (1997) A simple constraint-based algorithm for efficiently mining observational databases for causal relationships. Data Min Knowl Discov 1(2):203–224. doi:10.1023/A:1009787925236 View ArticleGoogle Scholar
  13. Deng Y, Ebert-Uphoff I (2014) Weakening of atmospheric information flow in a warming climate in the Community Climate System Model. Geophys Res Lett 41(1):193–200. doi:10.1002/2013GL058646 ADSView ArticleGoogle Scholar
  14. Easterly W, Levine R (2003) Tropics, germs, and crops: how endowments influence economic development. J Monet Econ 50(1):3–39. doi:10.1016/S0304-3932(02)00200-3 View ArticleGoogle Scholar
  15. Ebeke C, Omgba LD (2011) Oil rents, governance quality, and the allocation of talents in developing countries. CERDI, Etudes et Documents, E 2011.23Google Scholar
  16. Ebert-Uphoff I, Deng Y (2014) Causal discovery from spatio-temporal data with applications to climate science. In: 13th international conference on machine learning and applications, pp 606–613. doi:10.1371/journal.pcbi.0030129
  17. Enyedi G, Volgyes I (2016) The effect of modern agriculture on rural development: comparative rural transformation series. Elsevier, Pergaman Press, USA. ISBN 978-0-08-027179-8Google Scholar
  18. EPA (1970) https://www3.epa.gov/. United States Environmental Protection Energy, Washington, DC. Accessed 2 December 1970
  19. Euser AM, Zoccali C, Jager KJ, Dekker FW (2009) Cohort studies: prospective versus retrospective. Nephron Clin Pract 113(3):c214–c217. doi:10.1159/000235241 View ArticlePubMedGoogle Scholar
  20. FAO (1945) http://www.fao.org/docrep/006/y4683e/y4683e06.htm#TopOfPage. Agriculture, food and water. chapter two: how the world is fed. Accessed 16 October 2016
  21. Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, London. ISBN 978-0-471-52629-2View ArticleMATHGoogle Scholar
  22. Friedman N, Linial M, Nachman I, Pe’er D (2007) Using Bayesian networks to analyze expression data. J Comput Biol 7(3–4):601–620. doi:10.1089/106652700750050961 Google Scholar
  23. Geweke J (1984) Inference and causality in economic time series models. Handb Econom 2:1101–1144View ArticleMATHGoogle Scholar
  24. Good IJ (1959) A theory of causality. Br J Philos Sci 9(36):307–310View ArticleGoogle Scholar
  25. Granger CW (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 3(37):424–438Google Scholar
  26. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, USAMATHGoogle Scholar
  27. Heckerman D (1995) A Bayesian approach to learning causal networks. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 285–295Google Scholar
  28. Heckerman D (1997) Bayesian networks for data mining. Data Min Knowl Disc 1(1):79–119. doi:10.1023/A:1009730122752 View ArticleGoogle Scholar
  29. Hothorn T, Zeileis A, Farebrother RW, Cummins C, Millo G, Mitchell D (2015) Package lmtest. In: Testing linear regression models. https://cran.r-project.org/web/packages/lmtest/lmtest.pdf. Accessed 6 June 2015
  30. International Monetary Fund (1945) US New Hampshire, Bretton Woods. http://www.imf.org. Accessed 1945
  31. Ji Y, Ying H, Dews P, Mansour A, Tran J, Miller RE, Massanari RM (2011) A potential causal association mining algorithm for screening adverse drug reactions in post marketing surveillance. IEEE Trans Inf Technol Biomed 15(3):428–437. doi:10.1109/TITB.2011.2131669 View ArticlePubMedGoogle Scholar
  32. Jin Z, Li J, Liu L, Le TD, Sun B, Wang R (2012) Discovery of causal rules using partial association. In: IEEE 12th international conference in data mining (ICDM), pp 309–318. doi:10.1109/ICDM.2012.36
  33. Li X (2005) Foreign direct investment and economic growth: an increasingly endogenous relationship. World Dev 33(3):393–407View ArticleGoogle Scholar
  34. Li J, Le TD, Liu L, Liu J, Jin Z, Sun (2013) Mining causal association rules. In: IEEE 13th international conference in data mining workshops (ICDMW), pp 114–123. doi:10.1109/ICDMW.2013.88
  35. Li J, Liu L, Le T (2015) Practical approaches to causal relationship exploration. Springer, Berlin. doi:10.1007/978-3-319-14433-7 View ArticleGoogle Scholar
  36. Lozano AC, Abe N, Liu Y, Rosset S (2009a) Grouped graphical Granger modeling methods for temporal causal modeling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 577–586. doi:10.1145/1557019.1557085
  37. Lozano AC, Abe N, Liu Y, Rosset S (2009b) Grouped graphical Granger modeling for gene expression regulatory networks discovery. Bioinformatics 25(12):i110–i118View ArticlePubMedPubMed CentralGoogle Scholar
  38. Ma S, Li J, Liu L, Le TD (2016) Mining combined causes in large data sets. Knowl-Based Syst 92:104–111. doi:10.1016/j.knosys.2015.10.018 View ArticleGoogle Scholar
  39. Madsen H (2007) Time series analysis. Chapman and Hall/CRC Press, Taylor and Francis Group, Boca Raton. ISBN 9781420058670MATHGoogle Scholar
  40. Mani S, Spirtes PL, Cooper GF (2012) A theoretical study of Y structures for causal discovery. arXiv:1206.6853
  41. Marsh C (2013) Introduction to Continuous Entropy. http://www.crmarsh.com/static/pdf/Charles_Marsh_Continuous_Entropy.pdf. Accessed 13 December 2013
  42. Mehmood S (2012) Effect of different factors on gross domestic products: a comparative study of Bangladesh and Pakistan. doi: 10.1.1.403.5474Google Scholar
  43. Mellios G, Hausberger S, Keller M, Samaras C, Ntziachristos L, Dilara P, Fontaras G (2011) Parameterisation of fuel consumption and CO2 emissions of passenger cars and light commercial vehicles for modelling purposes. Publications Office of the European Union, EUR. 2011; 24927Google Scholar
  44. Meyer EP (2014) Package infotheo. In: Information-Theoretic Measures. https://cran.r-project.org/web/packages/infotheo/infotheo.pdf. Accessed 20 February 2015
  45. Nadkarni S, Shenoy PP (2001) A Bayesian network approach to making inferences in causal maps. Eur J Oper Res 128(3):479–498View ArticleMATHGoogle Scholar
  46. Neapolitan RE (2004) Learning Bayesian networks. Pearson Prentice Hall, Upper Saddle River. ISBN 9780130125347Google Scholar
  47. Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR (2007) A primer on learning in Bayesian networks for computational biology. PLoS Comput Biol 3(8):e129. doi:10.1371/journal.pcbi.0030129 ADSView ArticlePubMedPubMed CentralGoogle Scholar
  48. Ogawa K, Sterken E, Tokutsu I (2016) Public debt, economic growth and the real interest rate: a panel VAR approach to EU and OECD countries. doi:10.2139/ssrn.2726367
  49. Pang DL, Su HW (2010) A test of Granger causality between internal and external imbalances: the case of China, Japan and United States. In: International conference in management and service science (MASS), pp 1–4. doi:10.1109/ICMSS.2010.5577179
  50. Pearl J (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Los Altos. ISBN 9780080514895MATHGoogle Scholar
  51. Pearl J, Verma T (1991) A theory of inferred causation. Knowledge representation and reasoning. In: Proceedings of the seventh annual symposium on principles of programming languages pp 441–452Google Scholar
  52. Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9:1295–1342. doi:10.1023/A:1012487302797 MathSciNetMATHGoogle Scholar
  53. Pinna A, Soranzo N, de la Fuente A (2010) From knockouts to networks: establishing direct cause–effect relationships through graph analysis. PloS One 5(10):e12912. doi:10.1371/journal.pone.0012912 ADSView ArticlePubMedPubMed CentralGoogle Scholar
  54. Rasmidatta P (2011) The relationship between domestic saving and economic growth and convergence hypothesis: case study of Thailand. Department of Economics, Sodertorns University. URN: urn:nbn:se:sh:diva-9451Google Scholar
  55. Reichenbach H, Reichenbach M (1991) The direction of time. University of California Press, Berkeley. ISBN 9780520074149MATHGoogle Scholar
  56. Reinchenbach H (1978) The principle of causality and the possibility of its empirical confirmation. Springer, Netherlands, 1909–1953, pp 345–371. doi:10.1007/978-94-009-9855-1_14
  57. Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529. doi:10.1126/science.1109447 ADSView ArticlePubMedGoogle Scholar
  58. Scutar M (2016) Package bnlearn. In: Bayesian network structure learning, parameter learning and inference. https://cran.r-project.org/web/packages/bnlearn/bnlearn.pdf. Accessed 16 May 2016
  59. Shipley B (2002) Cause and correlation in biology: a user’s guide to path analysis, structural equations and causal inference. Cambridge University Press, CambridgeGoogle Scholar
  60. Silverstein C, Brin S, Motwani R, Ullman J (2000) Scalable techniques for mining causal structures. Data Min Knowl Disc 4(2–3):163–192View ArticleGoogle Scholar
  61. Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search. MIT Press, Cambridge. doi:10.1007/978-1-4612-2748-9 MATHGoogle Scholar
  62. StatsCan (1971) Statistics Canada: http://www.statcan.gc.ca/pub/16-201-x/2009000/part-partie1-eng.htm#wb-cont. Ottawa, ON
  63. Stewart A, Hope-Morley A, Mock P (2015) For comments or queries please contact: quantifying the impact of real-world driving on total CO2 emissions from UK cars and vans for The Committee on Climate Change. Element Energy Limited, Terrington House, CambridgeGoogle Scholar
  64. Suppes P (1970) A probabilistic theory of causality. North-Holland, Amsterdam. doi:10.1086/288485 Google Scholar
  65. Tian X, Geng Y, Dai H, Fujita T, Wu R, Liu Z, Masui T, Yang X (2016) The effects of household consumption pattern on regional development: a case study of Shanghai. Energy 103:49–60View ArticleGoogle Scholar
  66. Veiga DFT, Vicente FFR, Grivet M, De la Fuente A, Vasconcelos ATR (2007) Genome-wide partial correlation analysis of Escherichia coli microarray data. Genet Mol Res 6:730–742PubMedGoogle Scholar
  67. Waldmann MR, Martignon L (1998) A Bayesian network model of causal learning. In: Proceedings of the twentieth annual conference of the Cognitive Science Society, pp 1102–1107Google Scholar
  68. World Bank Data (1944) USA Washington, DC. http://www.worldbank.org. Accessed 1944
  69. World Trade Organization (1995) Switzerland. http://www.wto.org. Accessed 1 January 1995
  70. Zhang NL, Poole D (1996) Exploiting causal independence in Bayesian network inference. J Artif Intell Res 5:301–328Google Scholar

Copyright

© The Author(s) 2016