From: A comprehensive review on privacy preserving data mining
References | PPDM, PPDM based on data distortion, data mining, outsourced data mining, distributed and anonymity method | Merits and de-merits | Parameters |
---|---|---|---|
Matwin (2013) | Surveyed the existing privacy-preserving data mining methods | Analyzed the methods | PPDM |
Vatsalan et al. (2013) | Presented methods that permitted the linking of databases between organizations and preserved the privacy of these data | Presented taxonomy of PPRL techniques | PPDM |
Qi and Zong (2012) | Stated methods of data mining for privacy protection | Classified PPDM methods | PPDM |
Raju et al. (2009) | Apply homomorphic encryption on multiply protocol | Possible influence in many applications | PPDM |
Analyzed current privacy preserving solutions for cloud services and outlined their solution based on advanced cryptographic components | Outputted the experimental results and compared the performance with related solutions | PPDM | |
Mukkamala and Ashok (2011) | Compared a set of fuzzy based on mapping methods | Combined the multiple practical values of a data item into a single value | PPDM |
Kamakshi (2012) | Distortion method, A novel idea to identify the sensitive attributes dynamically | The data is modified be retaining the original properties of the data | Privacy |
Zhang et al. (2012a) | Distortion method, proposed HPNGS | Reduced the noise requests over | Privacy and utility |
Zhang et al. (2012b) | Distortion method, Proposed a novel APNGS | Improved the effectiveness of privacy protection on noise obfuscation in terms of association probabilities Extra cost in comparison to existing representative strategies is the main demerit | Privacy |
Li et al. (2009a) | Distortion method, proposed anonymous perturbation method | Low costs with a high strength | Privacy |
Kamakshi and Babu (2010) | Distortion method, proposed model include three parts that are data centers, clients, and database | Customers and their sits database role could be interchangeable | Privacy |
Islam and Brankovic (2011) | Distortion method, introduced a framework that incorporates several novel techniques to perturb all attributes of a data set | Effective in preserving original patterns in a perturbed data set | Privacy |
Wang and Lee (2008) | Distortion method, proposed an approach to avoid Forward-Inference Attacks, generated by the sanitization process | Restricted Forward-Inference Attacks | Privacy |
Shrivastava et al. (2011) | Data mining algorithms, Proposed an improved distortion technique for privacy preserving frequent item-set mining | Enhanced the performance of the algorithm by reducing the disk access time | Privacy and performance |
Vijayarani et al. (2010a) | Data mining algorithms, introduced various communities | Focused on importance of association rule | Privacy |
Aggarwal and Yu (2008) | Stated that support and confidence are considered the two significant measures within association rule mining | Explained the basic elements of association rule | PPDM |
Belwal et al. (2013) | Data mining algorithms, proposed the basis of reduction of support and confidence of sensitive rules | Hided any desired sensitive association rule without any side effect Hidden only the rule that has single sensitive item on the left side is disadvantageous | PPDM |
Jain et al. (2011) | Data mining algorithms, proposed a new algorithm that increases and decreases the support of the left side and right side item of hide association rule | Made minimum modification to the data entries to hide a set of rules with lesser CPU time than the previous work | Privacy |
Naeem et al. (2010) | Data mining algorithms, proposed an architecture which hides the restricted association rules with the complete removal of the known side effects like the generation of unwanted, non-genuine association rules while yielding no hiding failure | Used other standard statistical measures instead of conventional framework of support and confidence to generate association rules | Privacy |
Li and Liu (2009) | Data mining algorithms, Proposed DDIL based on data disturbance and inquiry limitation | Effective, good privacy and accuracy Restriction with random parameters is disadvantageous | Privacy |
Weng et al. (2008) | Data mining algorithms, FHSAR Fast Hiding Sensitive Association Rules (SAR) algorithm | Adv. hiding sensitive association rules with limited side effects | Privacy |
Dehkordi et al. (2009) | Data mining algorithms, proposed method for hiding sensitive association rules by depending on the concept of genetic algorithms | Offered security as well as keeping the utility | Security and Utility |
Gkoulalas-Divanis and Verykios (2009) | Data mining algorithms, proposed a novel approach that offers best solution to hide sensitive frequent item sets | Provided effective solution to hide sensitive frequent item sets | Privacy and efficiency |
Li et al. (2009b) | Data mining algorithms, introduced a new algorithm for sanitizing a transactional database | Selection of victim-items with no affection to the non-sensitive patterns is disadvantageous | Privacy |
Kasthuri and Meyyappan (2013) | Data mining algorithms, proposed a new method to detect the sensitive items for hiding sensitive association rules | Found the frequent item sets and generates the association rules | Privacy |
Quoc et al. (2013) | Data mining algorithms, proposed a heuristic algorithm to hide a set of sensitive association rules using the distortion technique | Specified the victim item and minimum number of transactions | Privacy |
Domadiya and Rao (2013) | Data mining algorithms, proposed MDSRRC | Highly efficient and maintains database quality | Privacy, efficiency and quality |
Xiong et al. (2006) | Data mining algorithms, used k as the closet neighbor classification technique based on SMC techniques | Balance in accuracy, performance, and privacy protection | Privacy and accuracy. |
Singh et al. (2010) | Data mining algorithms, attempted providing a simple and efficient privacy preserving classification for cloud data | Facilitated computing local neighbors at each node in the cloud in a secure way and classifies the unseen records using weighted k-NN classification approach | Privacy |
Baotou (2010) | Data mining algorithms, proposed an effective algorithm depending on random perturbation matrix | Enhanced privacy protection and the accuracy | Privacy and accuracy |
Vaidya et al. 2008) | Data mining algorithms developed an approach for vertically partitioned mining data | Modified and extended to a variety of data mining applications as decision trees | Privacy and efficiency |
Kantarcıoglu and Vaidya (2003) | Data mining algorithms, discussed the use of secure logarithm and summation, where the distributed naive Bayes classifier can be determined securely | Supported the concept that few useful secure protocols facilitated the secure deployment of different types of distributed data mining algorithms | Privacy and accuracy |
Sathiyapriya and Sadasivam (2013) | Data mining algorithms, a classification of privacy preserving techniques | The optimal sanitization is proved to be NP-Hard and always there is a trade-off between privacy and accuracy is the notable de-merit | Privacy |
Yi and Zhang (2013) | Data mining algorithms, applied k-means clustering on vertically partitioned data | Did not apply any secure two-party computation algorithm is the demerit | Privacy and security |
Raghuram and Gyani (2012) | Data mining algorithms, proposed an associative classification model | Accuracy is tested | Privacy |
Lin and Lo (2013) | Data mining algorithms, proposed a set of algorithms, containing EWS algorithm, ROD algorithm, SSWS algorithm and the PSWS algorithm | Delivered excellent performance with respect to scalability and execution time | Privacy, scalability and execution time |
Harnsamut and Natwichai (2008) | Data mining algorithms, proposed a novel heuristic algorithm to preserve the privacy and maintain the data quality | Efficient and highly effective | Privacy and efficient |
Seisungsittisunti and Natwichai (2011) | Data mining algorithms, proposed an incremental polynomial- time algorithm to transform the data to meet a privacy standard | Efficient in every problem setting | Privacy and efficient |
Giannotti et al. (2013) | Outsourced data mining, proposed model based on background knowledge of attack | Strong defense against an attack They do not deal with other attack is the demerit | PPDM |
Worku et al. (2014) | Outsourced data mining, improved their method by minimizing bilinear mapping | Secured and efficient The demerit is it is not wholly active | PPDM |
Arunadevi and Anuradha (2014) | Outsourced data mining, proposed an attack model based on the basic assumption | Improved the security of the system | PPDM |
Lai et al. (2014) | Outsourced data mining, proposed the first semantically secure solution for outsourcing association rule mining with data privacy | The demerit is it is non-deterministic and secure against an adversary at cloud servers | PPDM |
Kerschbaum and Julien (2008) | Outsourced data mining, proposed a searchable encryption scheme for outsourcing data analytics | Secured | PPDM |
Ying-hua et al. (2011) | Distributed, survey on the distributed privacy preserving data mining (DPPDM) | Surveyed on the DPPDM | PPDM |
Li (2013) | Distributed, designed, and analyzed a symmetric-key based privacy- preserving scheme for mining support counts | Effective in detecting misbehaving nodes and increasing average throughput in the whole network | Privacy |
Dev et al. (2012) | Distributed, combining categorization, fragmentation and distribution, prevents data mining by maintaining privacy levels, splitting data into chunks and storing these chunks of data to appropriate cloud providers | Provided an effective way to protect privacy from mining based attacks It introduced performance overhead as demerit | Privacy |
Tassa (2014) | Distributed, proposed a protocol based on association rules in horizontally distributed databases | Devised an effective protocol for disparity verifications is disadvantageous | Privacy, accuracy and efficiency |
Chan and Keng (2013) | Distributed, proposed a distributed architecture for privacy preserving outsourcing of association rules mining | Computational and storage overheads are significantly reduced in such a scheme | Privacy |
Dong and Kresman (2009) | Distributed, focused on the linking between distributed data mining | It is simple to implement with least computing requirements | Privacy |
Aggarwal et al. (2005) | Distributed, have discussed the developed techniques such as services based on data encryption, causing a large overhead in query processing and proposed a new distributed framework to enable privacy-preservation for the outsourced storage of data | A new definition for privacy has been demonstrated based on hiding sets of attribute values and it also discussed how proposed decomposition approaches help to achieve privacy, and identify the best privacy-preserving decomposition technique | Privacy |
Xu and Yi (2011) | Distributed, proposed taxonomy to categorize those PPDDM protocols into important categories | High performance of these protocols | Privacy |
Inan and Saygin (2010) | Distributed, proposed a method which constructs different matrix in the horizontal distributed data mining | Provided different comparison function for either character or numerical data | Privacy |
Nanavati and Jinwala (2012) | Distributed, proposed techniques that protect privacy for global and partial cycles in a distributed data | Distinguished global cycles in a cooperative setup | Privacy |
Agrawal and Srikant (2000) | Distributed, have developed a uniform randomization method based association rule for the categorical datasets | The data reassembled is sanitized knowledge based | Privacy |
Wang et al. (2010) | Distributed, proposed an enhanced algorithm (PPFDM) | An effective and appropriate for the practical application fields | Privacy |
Nguyen et al. (2012) | Distributed, Proposed Enhanced Scheme (EMHS) | Performance is better than MHS in specific databases | Privacy |
Om Kumar et al. (2013) | Distributed, used WEKA to predict the patterns in a single cloud and by using cloud data distributor with a secure distributed approach | An effective solution that prevents such mining attacks on cloud thus making the cloud a secure platform for service and storage | Privacy |
Mokeddem and Belbachir (2010) | Distributed, proposed model allowing the class association rules detection in a shared-nothing architecture | Created classification rules in a parallel setting | Privacy |
Ibrahim et al. (2012) | Distributed, presented a practical cryptographic method to compute the KNN classification problem | Demonstrated that accuracy of the proposed work is the same as that of a naive scheme without security | Privacy |
Patel et al. (2012) | Distributed, stated an effective algorithm to preserve privacy of distributed K-Means clustering | Faster than other algorithms and it is more appropriate for huge datasets in practical scenario | privacy |
Kumbhar and Kharat (2012) | Distributed, analyzed different methods for PPARM | Studied the methods that depended on association rules mining on distributed dataset | Privacy |
Nix et al. (2012) | Distributed, implemented two sketching protocols for the scalar (dot) product of two vectors which can be used as sub-protocols in larger data mining tasks | Accuracy and efficiency results through extensive experimentation | Privacy, accuracy and efficiency |
Keshavamurthy et al. (2013) | Distributed, proved approach of Genetic Algorithm (GA) has two potential advantages comparison with traditional frequent pattern mining algorithm | The fitness function of GA plays an important role, and the convergence of search space is directly proportionate to the effectiveness of fitness function The GA could result in duplicate formation in its successive generations is a de-merit | Privacy |
Anonymity, proposed a novel approach that fulfils utility of data requirements | Effective | Privacy and utility | |
Wang et al. (2004) | Anonymity, have studied data mining as approach used for data masking, known as data mining-based privacy protection | Two key factors, quality and scalability has been focused specifically is advantageous | Privacy, quality, and scalability |
Friedman et al. (2008), Loukides and Gkoulalas-divanis (2012) | Anonymity, presented definitions of k-anonymity | It could be used in many data mining algorithms | Privacy |
Ciriani et al. (2008) | Anonymity, presented the possible threats to K-anonymity and categorized two main approaches for merging K- anonymity in data mining | Discussed different methods that could be applied to detect K-anonymity violations | Privacy |
Anonymity, proposed an algorithm which is based on clustering to produce a utility-friendly anonymized version of micro data | Utility is improved by their approach | Privacy and utility | |
Anonymity, analyzed existing K-anonymity model and its applications | Analyzed current K-anonymity model | Privacy | |
Anonymity, studied K-anonymity model | Surveyed K-anonymity model | Privacy | |
Anonymity, employed a version of the chase, called standard chase | Provided a stronger privacy model for the proposed method and can be valuable | Privacy | |
Anonymity, proposed a numerical method to mine maximal frequent patterns with privacy preserving capability | An efficient data transformation technique, a novel encoded and compressed lattice structure, and MFPM algorithm | Privacy | |
Anonymity, proposed a rule-based privacy model that allows data publishers to express fine-grained protection requirements for both identity and sensitive information disclosure | Outperformed the state-of-the-art in terms of retaining data utility, while achieving good protection | Privacy, utility and scalability | |
K-anonymity has been studied as an interesting approach to protect micro data related to public or semi-public sectors from linking attacks | Proposed novel approach | Privacy | |
Anonymity, proposed new clustering algorithms to achieve multi relational anonymity | Provided utility of data and efficiency | Utility, effectiveness and efficiency | |
Anonymity, proposed a Distributed k-support Noise Taxonomy tree algorithm, abbreviated as DKNT | Achieved good protection and better computation efficiency, as compared to the computation efficiency on single machine | Privacy and efficiency | |
Anonymity, introduce a pseudo taxonomy tree and have the third party mine the generalized frequent item-sets instead | Achieved very good privacy protection with moderate storage overhead | Privacy | |
Anonymity, had analyzed and performed a comparison for the present developed K-anonymity models and its applications | Enhanced K -anonymity and improve it | Privacy | |
Anonymity, proposed novel method named kactus | Accuracy is better than other methods based on K -anonymity | Privacy and accuracy | |
Anonymity, a new definition of K-anonymity for personal sequential data which provides an effective privacy protection model is introduced | Results are extremely interesting in the case of dense datasets | Privacy | |
Anonymity, the hybrid generalizations with data relocation | Increased the utility of data | Privacy and utility | |
Anonymity, proposed hybrid approach by combining Top-Down Specialization and Bottom-Up Generalization | Improved the scalability and efficiency of TDS | Privacy and scalability | |
Zhang et al. (2014a) | Anonymity, proposed a highly scalable two-phase TDS approach using Map Reduce on cloud | Scalability and efficiency of TDS are improved significantly over existing approaches | Privacy and scalability |
Anonymity, proposed method depends on an efficient quasi-identifier index | Protected privacy when new data is added | Privacy and efficiency | |
Nergiz and Gök (2014) | Anonymity, Hybrid generalizations | Ensured the utility of data | Privacy and utility |
Anonymity, have presented a distributed anonymization protocol for privacy-preserving data publishing from multiple data providers in a cloud system | Performed a personalized anonymization to satisfy every data provider’s requirements and the union forms a global anonymization to be published | Privacy |