As mentioned above, there are many different process mining techniques with respect to three basic classes: (1) process mining techniques for discovering new models, (2) process mining techniques for auditing and conformance checking of a model with real data, and (3) process mining techniques that can be used for enhancement purposes. This study was focused on the first and second classes of process mining techniques (i.e., Discovery and Conformance Analysis). From the discovery class, Alpha algorithm, Heuristic Mining, Fuzzy Mining, and Social Network Analysis techniques were applied in order to discover models and organizational structures related to the handling of the proceedings’ peer reviews for an international conference in Thailand. From the conformance analysis class, LTL Checker and Performance Analysis techniques were used in order to compare, check, and audit the authentic dataset with a pre-defined model. Accordingly, the main goals of the study were: (1) to convert the collected event log into the appropriate format supported by process mining analysis tools, (2) to discover process models and to construct social networks based on the collected event log, and (3) to find deviations, discrepancies and bottlenecks between the collected event log and the master pre-defined model. Therefore, the results of the applied approaches—considering the second (2) and third (3) goals of the study—are discussed as the following:
Alpha (α) algorithm
The main benefit of applying the α-algorithm in process mining is to reconstruct causality from a set of sequences of activities. The algorithm is capable of mining the control flow perspective of a process with respect to Petri Nets which deal with Place and Transition Nets (Burattin et al. 2014). The Alpha algorithm was first developed by Professor Wil van der Aalst from Technische Universiteit Eindhoven (TU/e) in The Netherlands (Devi and Sudhamani 2013). In this paper, we applied the Alpha algorithm as a technique to identify the routing constructs within the proceedings review system. Since our main emphasis was on the peer review process as a whole; therefore, we based our discovery on the “completed” process instances only. As a result, our log contained only two process types: Start and Complete. As shown in Fig. 4 (up), filtering the MXML event log allowed us to select only those types of events (i.e., tasks or audit trail entries) that we were interested in considering during the peer review process. In Fig. 4 (down), a general summary of the proceedings review process regarding the event log is illustrated. Since the peer review process in the international conference starts with inviting reviewers for reviewing the manuscripts, and ends with an accept, or a reject decision based on the board decision about the submitted manuscript; therefore, the process instance of “Invite Reviewers” was chosen as the starting event point while both process instances of “Accept Paper” or “Reject Paper” were chosen as the ending event points.
Figure 5 shows a screenshot of the resulting model created by the Alpha algorithm based on the international conference’s peer-review event log. By studying the model, we are able to better investigate the peer review process with respect to: (1) the tasks that came before/after other tasks, (2) the tasks that concurrently occurred with other tasks, or (3) the tasks that were duplicated (i.e., loop) in the model.
Heuristic mining
Although the resulting model created by the Alpha algorithm gave us a holistic view of the tasks executed during the peer review process, the produced model is not able to properly deal with noise in the log. Moreover, any frequency information about dependencies of the tasks was not taken into account by the Alpha algorithm. When dealing with noisy data, the Alpha algorithm does not necessarily produce reliable and robust models. Process maps and workflow nets may include several types of structures and constructs which the α-algorithm cannot rediscover (Aalst et al. 2004). If an event log contains short loops of length one, then α-algorithm is not capable of rediscovering them. Similarly, if an event log contains short loops of length two, α-algorithm is not capable of rediscovering them, either. And if an event log contains non-local dependencies, α-algorithm is not capable of properly dealing with process constraints. (Aalst 2011; Medeiros et al. 2004). Therefore, to avoid such constraints and to solve such problems, we applied the Heuristic Miner algorithm in order to look for causal dependencies where one task follows another task (i.e., Heuristic Miner algorithm was more sophisticated and adequate than α-algorithm). Our main objective was to create models that were less sensitive to the incompleteness of logs and contain noise. In Fig. 6, the rectangular boxes represent the activities or tasks; while the arrows indicate the dependency between activities, and the number in the event box shows the frequency of the activities that were performed. The number on the arrows shows the number of times the connection has been used. For example, the number 87 in the rectangular box of the activity “Invite Reviewers” shows that a total of 87 times the organizing committee members of the conference have invited a reviewer to review a manuscript. In the same manner, the number 19 on the arrow between the tasks “Invite Reviewers” to “Receive The First Review” indicates that the task “Receive The First Review” has been followed 19 times by the task “Invite Reviewers”. Again, number 14 on the arrow between the tasks “Invite Reviewers” to “Receive The Third Review” shows that the task “Receive The Third Review” has been followed 14 times by the task “Invite Reviewers”. The second number adjacent to the frequency number is called the Dependency Measure which indicates dependency relation between two tasks. A maximum value of 1.0 as a Dependency Measure represents the fact that there is a 100% full dependency relation between the connected tasks. Alternatively, a minimum value of 0 as a Dependency Measure represents the fact that there is no dependency relation between the connected tasks. However, although the resulting model created by the Heuristic Miner algorithm gave us a more sophisticated view (compared with α-algorithm) of the tasks executed during the proceedings’ peer review process, yet the produced model is not able to properly deal with mixed and complex AND and XOR joint/split situations. Moreover, if an event log contains complex spaghetti-like structures, then Heuristic Miner is not capable of rediscovering them. Similarly, if an event log contains dangling activities, missing connections, or missing activities, then the results of Heuristic Mining approach provide less meaningful information about the underlying processes. Therefore, a more sophisticated process discovery technique overcoming these limitations and constraints needed to be considered. In this paper, we found the Fuzzy Mining algorithm quite adequate (i.e., much more adequate than Heuristic Miner algorithm) as it does not only present visual process models but also properly deals with more complex structures which may exist within event logs (Saravanan et al. 2011).
Fuzzy mining
Process models should normally be able to present understandable and meaningful constructs of operational processes. Applying different process mining plugins, we sometimes encounter models that look very complex and meaningless without highlighting what is important in the data. Fuzzy mining is one of the traditional process mining techniques that is commonly used in order to deal with those types of complex models that are not easily comprehensible at the first glance. Figure 7 (left) shows a fuzzy model corresponding to the event log that was used to construct the process model of the proceedings review. The thickness of the arrows and the colors of the connections represent the level of absolute frequency of occurrence of the tasks, or the extent of relationship between the tasks (Premchaiswadi and Porouhan 2015). The produced fuzzy model shows all the activities as well as all the causal dependencies between them in a more sophisticated manner. When studying the resulting fuzzy model, we realized that the tasks “Board Decide”, “Invite Another Reviewer”, “Time-Out X”, and “Receive The X Review” were the most significant tasks (during the peer review process of the international conference) allocating 33.61%, 28.28%, 7.13%, and 7.01% of the performed tasks to themselves, respectively.
Using the Fuzzy Mining technique, we also could project all log traces onto a model simultaneously. Figure 7 (right) indicates an animation based on the historic information received from all log traces in the peer-review event log. The animation shows the actual execution of the cases (i.e., revision of manuscripts) based on the fuzzy model. The animation can be played multiple times in order to come up with a better understanding about what has really occurred during the peer review process in the international conference. Moreover, watching the animation helped the organizing committee members to differentiate individual process instances and observe the overall peer review process at a particular point in time. In addition, the technique significantly increases the awareness of the conference organizers about the parts that are more important (or less important) during the proceedings review process.
Analysis of social network
The work presented in this paper investigated the social network (between the reviewers and the decision board members) with respect to three metrics as follows: (a) Handover of Task metric. This metric examined who passed a task to whom, (b) Similarity of Tasks metric. This metric examined who executed the same type of tasks and (c) Working Together metric. This metric examined how frequently individuals have worked on the same task.
Handover of task
Within each process instance (i.e., Case), there is a handover of task from Individual X to Individual Y if there are two subsequent tasks where the first is completed by Individual X and the second by Individual Y. The graph in Fig. 8 (up) illustrates the Handover of Task graph for the conference committee members and the organizers (i.e., process instances). The nodes with oval shapes in the graph represent the relationship between the in and out extent of the exchanged tasks (shown with arrows). Thus, the more the nodes receive tasks, the more vertical they look like (or the more in-going arrows, the more vertical oval shapes appear). Alternatively, the more the nodes assign tasks to others, the more horizontal they look (or the more out-going arrows, the more horizontal oval shapes appear). Therefore, considering Fig. 8 (up) of the study, we realized that Mr. A and Ms. B were the most active (hardworking) members of the proceedings review process (i.e., few out-going arrows went out of their nodes). In other words, Mr. A and Ms. B received a large burden of work handed over from other members to them. Similarly, as shown in Fig. 8 (down), the number of in-going arrows coming to Mr. A and Ms. B are much greater, compared to the other conference committee members and organizers who receive fewer in-going arrows.
Similarity of task
Despite the Handover of Task metric, the Similarity of Task metric does not emphasize how activities were shared from one individual to another individual. This approach emphasizes the activities that individuals performed. In the Similarity of Task metric every individual has a profile based on the number of times he or she has performed specific tasks. There were many different approaches to determine the “distance” between 2 profiles. However, in this study we used a similarity coefficient forh comparing the similarity of sample groups (Intarasema et al. 2012). As shown in Fig. 9, the graph clearly illustrates similar actions performed by Mr. A, Mr. C, Ms. B ad Ms. G. Normally individuals performing similar tasks have stronger relations than individuals performing completely different tasks.
Working together
Likewise, we wanted to know whether people in the same community work together or not. To address this issue, we focused on the cases (instead of the activities) using Social Network Miner technique. Figure 10 shows a social network obtained by the Working Together graph using ProM 5.2. The graph shows active participation and the collectiveness among Mr. A, Mr. C, Ms. B ad Ms. G. The Working Together graph is helpful when there are disjoint teams in the log.
Organizational mining
Although the three previous techniques (i.e., Handover of Task, Similarity of Task, and Working Together) gave us interesting insights with respect to the relationships between individuals or tasks, these techniques are not able to provide information about the organizational structures within the event log. Therefore, we applied the Semantic Organizational Mining technique in order to investigate the peer review event log in terms of levels of structural organization. As shown on the left side of Fig. 11 (up), the semantic organizational miner technique enabled us to classify different groups of the conference committee members and organizers based on the similarity of tasks they performed. Tasks were assumed to be similar every time there were instances of the same concepts.
Role hierarchy mining
This approach is based on an Agglomerative Hierarchical Clustering technique which deals with joint activities in the peer review event log. Here, the main idea is to create multiple clusters which are consistent with the activities that each individual performs. Figure 11 (down) shows a dendogram derived from the Agglomerative Hierarchical Clustering technique. The technique enabled us to generate flat or disjoint organizational entities by adjusting the dendogram with a threshold (i.e., certain value). As a result, by adjusting the dendogram with a threshold value of 0, we obtained 4 different clusters (meaning there are 4 different groups of individuals with dissimilar roles and duties) within the peer review event log. The first group, cluster 1, included Mr. A as the Editor-in-Chief and head of the organizing committee. The second group, cluster 2, consisted of the Mr. E, Mr. F, Mrs. D, and Ms. G who were in charge of receiving the reviews/feedback from the invited reviewers. The third group, cluster 3, consisted of Ms. B who behaved as a medium between cluster 2 and Mr. C. Thus, Ms. B was constantly in touch with cluster 2 and Mr. C. And consequently, the fourth group, cluster 4, included Mr. C as the secretary of the conference in charge of inviting reviewers, collecting all the reviews, and announcing the board decisions to the authors (i.e., whether the manuscript is accepted or rejected). Therefore, using the Organizational Miner technique we could investigate the data at a higher level of abstraction—compared with the previously mentioned techniques. While the handover of task, similarity of task and working together metrics emphasized more on the individuals and the transactions between individuals; the Organizational Miner and Role Hierarchy Miner technique mainly focused on the teams, groups and hierarchies as a whole.
Semantic LTL checker
It is often the case that processes for reviewing manuscripts need to obey specific rules and regulations. For instance, in order to make a decision about a manuscript (i.e., accept or reject) for the international conference, feedback (reviews) from at least three reviewers needed be received and collected. Similarly, any invitation of an additional reviewer was based on the board decision after careful investigation of the all initially collected reviews from reviewers. One way to check whether these rules and regulations have been actually obeyed is to audit the event log using certain techniques. We used the Semantic LTL Checker tool in ProM to verify and check the property: Does the task “Invite Another Reviewer” always happen after the tasks “Board Decide” and “Collect All Reviewers”? (i.e., formula: eventually_activity A_then B_then C). The resulting screen shown on the right side of Fig. 11 (up) indicates that the event log is divided into two main parts: (1) the part including the cases that satisfy the property (i.e., formula: eventually_activity A_then B_then C), and (2) the part including the cases that do not satisfy the property. Studying the LTL results for the correct process instances, we realized that in 80 of the cases (out of the total of 87 process instances) the task “Invite another Reviewer” eventually happened after “Board Decide” and after “Collect All Reviews”. However, in 7 of the cases (out of the total of 87 process instances) the task “Invite another Reviewer” did not happened after “Board Decide” and after “Collect All Reviews”. This is assumed to be an obvious violation of rules and regulations during the peer review process.
Conformance checking
When implementing different process mining discovery algorithms or different social network analysis metrics, all of the process instances in the input event log are built and replayed based on the input Petri net. Therefore, it is possible that a process instance is not completely compatible (or does not fit) with the Petri net. In this section, we used the Conformance Checker technique in order to replay (i.e. audit) the peer review event log. The rationale behind this approach was to compare the level of fitness between the authentic event log and the pre-defined master model. Using this approach, when an activity is executed in a log trace; one of the matching transitions in the Petri net will be paired with that activity, and then the necessary measurements are taken.
Model perspective
We used the “model perspective” feature of ProM’s Conformance Checker plugin in order to replay the peer-review dataset with respect to: (1) number of missing activities (i.e., activities that were missed to be taken into account in the Petri net model), (2) number of failed activities (i.e., activities that were not enabled), (3) number of remaining activities (i.e., activities that remained enabled). The results showed that 9 blocks of activity in authentic peer review event log were not compatible/(no fit) with the activities in the pre-defined Petri net model (i.e., master model). Moreover, five activity traces were not correctly enabled in the authentic event log. Also, seven activities were missed in the authentic event log compared with the pre-defined master Petri net model. In other words, the resulting Petri net model and the real peer review event log were not a complete fit and compatible with each other at a few points and nodes.
Log perspective
We also used the “log perspective” feature of ProM’s Conformance Checker plugin in order to check how much process instances in the peer review event log were compatible with the Petri net model. Our goal was to highlight the possible discrepancies and bottlenecks between the real log and the model. As shown in Fig. 12, the log perspective feature of the Conformance Checker approach indicated 9 blocks of activity (highlighted in orange color) that were not compatible with the pre-defined master Petri net model. Therefore, the violations and deviations during the peer review process for the international conference were identified as the following:
-
Bottleneck 1: The board has made a decision before receiving feedback/reviews from Reviewer X. This is an obvious violation of the rules, as the board can only make a decision when feedback from a reviewer is received, or when the legitimate time for a reviewer to review the manuscript has ended.
-
Bottleneck 2: The board has rejected a manuscript before receiving feedback from Reviewer X. This is an obvious violation of the rules, as the board can only reject a manuscript after receiving all of the feedback from all reviewers.
-
Bottleneck 3: The board has rejected a manuscript after only receiving feedbacks from two reviewers. This is an obvious violation of the rules, as the board can only reject a manuscript after receiving feedback from at least three of the reviewers.
-
Bottleneck 4: The board has accepted a manuscript before receiving feedback from Reviewer X. This is an obvious violation of the rules, as the board can only accept a manuscript after receiving all of the feedback from all reviewers.
-
Bottleneck 5: The board has rejected a manuscript before receiving feedback from Reviewer X. This is an obvious violation of the rules, as the board can only reject a manuscript after receiving all of the feedbacks from all reviewers.
Performance analysis with petri net
We used the Performance Analysis technique with the Petri Net feature of ProM in order to investigate the total waiting time during the peer review process as well. Figure 13 shows information about waiting times during the peer review process for the proceedings of an international conference in Thailand. We categorized the waiting times in terms of High, Medium and Low (Performance Analysis with Petri Net 2009). If a reviewer sends his/her feedback in less than 1 week (7 days), it will be considered as a “Low” response time which is shown in the blue color. If a reviewer sends his/her feedback within 7–21 days, it will be considered as a “Medium” response time which is shown in the yellow color. And finally, if a reviewer sends his/her feedback after more than 21 days, it will be considered as a “High” response time shown in the red (critical) color. By studying the results of the Performance Analysis technique with the peer review Petri net, we realized that in only 45% of the cases was feedback from the Second Reviewer received on time, while in 55% of the cases, the Second Reviewers did not participate in the peer review process until the time-out was reached (i.e., their time and deadline to review the manuscript was over). Interestingly, those 45% of feedback from the Second Reviewers were also received with long waiting times (i.e., high response time shown in the red color). Thus, the reviewing process to collect feedback from the Second Reviewers become too long and ineffective; therefore, the proceedings organizers need to fix this problem for the forthcoming peer-review processes in future. On the other hands, although the feedback from the First Reviewers and the Third Reviewers were received within 7–21 days (i.e., including the reviewers’ comments or timeout) which were fair enough, the organizing committee members wasted a lot of time (shown in red color) to collect and report their feedback. Therefore, in order to increase the efficiency and effectiveness of the peer review process, the conference’s organizers also need to speed up the process of attending to the feedback received from the reviewers in upcoming conferences.