A comprehensive review on privacy preserving data mining

Aldeen, Yousra Abdul Alsahib S.; Salleh, Mazleena; Razzaque, Mohammad Abdur

doi:10.1186/s40064-015-1481-x

SpringerPlus

Table 2 Relevant literatures on PPDM in terms of their merits and de-merits

From: A comprehensive review on privacy preserving data mining

References	PPDM, PPDM based on data distortion, data mining, outsourced data mining, distributed and anonymity method	Merits and de-merits	Parameters
Matwin (2013)	Surveyed the existing privacy-preserving data mining methods	Analyzed the methods	PPDM
Vatsalan et al. (2013)	Presented methods that permitted the linking of databases between organizations and preserved the privacy of these data	Presented taxonomy of PPRL techniques	PPDM
Qi and Zong (2012)	Stated methods of data mining for privacy protection	Classified PPDM methods	PPDM
Raju et al. (2009)	Apply homomorphic encryption on multiply protocol	Possible influence in many applications	PPDM
Malina and Hajny (2013), Sachan et al. (2013)	Analyzed current privacy preserving solutions for cloud services and outlined their solution based on advanced cryptographic components	Outputted the experimental results and compared the performance with related solutions	PPDM
Mukkamala and Ashok (2011)	Compared a set of fuzzy based on mapping methods	Combined the multiple practical values of a data item into a single value	PPDM
Kamakshi (2012)	Distortion method, A novel idea to identify the sensitive attributes dynamically	The data is modified be retaining the original properties of the data	Privacy
Zhang et al. (2012a)	Distortion method, proposed HPNGS	Reduced the noise requests over	Privacy and utility
Zhang et al. (2012b)	Distortion method, Proposed a novel APNGS	Improved the effectiveness of privacy protection on noise obfuscation in terms of association probabilities Extra cost in comparison to existing representative strategies is the main demerit	Privacy
Li et al. (2009a)	Distortion method, proposed anonymous perturbation method	Low costs with a high strength	Privacy
Kamakshi and Babu (2010)	Distortion method, proposed model include three parts that are data centers, clients, and database	Customers and their sits database role could be interchangeable	Privacy
Islam and Brankovic (2011)	Distortion method, introduced a framework that incorporates several novel techniques to perturb all attributes of a data set	Effective in preserving original patterns in a perturbed data set	Privacy
Wang and Lee (2008)	Distortion method, proposed an approach to avoid Forward-Inference Attacks, generated by the sanitization process	Restricted Forward-Inference Attacks	Privacy
Shrivastava et al. (2011)	Data mining algorithms, Proposed an improved distortion technique for privacy preserving frequent item-set mining	Enhanced the performance of the algorithm by reducing the disk access time	Privacy and performance
Vijayarani et al. (2010a)	Data mining algorithms, introduced various communities	Focused on importance of association rule	Privacy
Aggarwal and Yu (2008)	Stated that support and confidence are considered the two significant measures within association rule mining	Explained the basic elements of association rule	PPDM
Belwal et al. (2013)	Data mining algorithms, proposed the basis of reduction of support and confidence of sensitive rules	Hided any desired sensitive association rule without any side effect Hidden only the rule that has single sensitive item on the left side is disadvantageous	PPDM
Jain et al. (2011)	Data mining algorithms, proposed a new algorithm that increases and decreases the support of the left side and right side item of hide association rule	Made minimum modification to the data entries to hide a set of rules with lesser CPU time than the previous work	Privacy
Naeem et al. (2010)	Data mining algorithms, proposed an architecture which hides the restricted association rules with the complete removal of the known side effects like the generation of unwanted, non-genuine association rules while yielding no hiding failure	Used other standard statistical measures instead of conventional framework of support and confidence to generate association rules	Privacy
Li and Liu (2009)	Data mining algorithms, Proposed DDIL based on data disturbance and inquiry limitation	Effective, good privacy and accuracy Restriction with random parameters is disadvantageous	Privacy
Weng et al. (2008)	Data mining algorithms, FHSAR Fast Hiding Sensitive Association Rules (SAR) algorithm	Adv. hiding sensitive association rules with limited side effects	Privacy
Dehkordi et al. (2009)	Data mining algorithms, proposed method for hiding sensitive association rules by depending on the concept of genetic algorithms	Offered security as well as keeping the utility	Security and Utility
Gkoulalas-Divanis and Verykios (2009)	Data mining algorithms, proposed a novel approach that offers best solution to hide sensitive frequent item sets	Provided effective solution to hide sensitive frequent item sets	Privacy and efficiency
Li et al. (2009b)	Data mining algorithms, introduced a new algorithm for sanitizing a transactional database	Selection of victim-items with no affection to the non-sensitive patterns is disadvantageous	Privacy
Kasthuri and Meyyappan (2013)	Data mining algorithms, proposed a new method to detect the sensitive items for hiding sensitive association rules	Found the frequent item sets and generates the association rules	Privacy
Quoc et al. (2013)	Data mining algorithms, proposed a heuristic algorithm to hide a set of sensitive association rules using the distortion technique	Specified the victim item and minimum number of transactions	Privacy
Domadiya and Rao (2013)	Data mining algorithms, proposed MDSRRC	Highly efficient and maintains database quality	Privacy, efficiency and quality
Xiong et al. (2006)	Data mining algorithms, used k as the closet neighbor classification technique based on SMC techniques	Balance in accuracy, performance, and privacy protection	Privacy and accuracy.
Singh et al. (2010)	Data mining algorithms, attempted providing a simple and efficient privacy preserving classification for cloud data	Facilitated computing local neighbors at each node in the cloud in a secure way and classifies the unseen records using weighted k-NN classification approach	Privacy
Baotou (2010)	Data mining algorithms, proposed an effective algorithm depending on random perturbation matrix	Enhanced privacy protection and the accuracy	Privacy and accuracy
Vaidya et al. 2008)	Data mining algorithms developed an approach for vertically partitioned mining data	Modified and extended to a variety of data mining applications as decision trees	Privacy and efficiency
Kantarcıoglu and Vaidya (2003)	Data mining algorithms, discussed the use of secure logarithm and summation, where the distributed naive Bayes classifier can be determined securely	Supported the concept that few useful secure protocols facilitated the secure deployment of different types of distributed data mining algorithms	Privacy and accuracy
Sathiyapriya and Sadasivam (2013)	Data mining algorithms, a classification of privacy preserving techniques	The optimal sanitization is proved to be NP-Hard and always there is a trade-off between privacy and accuracy is the notable de-merit	Privacy
Yi and Zhang (2013)	Data mining algorithms, applied k-means clustering on vertically partitioned data	Did not apply any secure two-party computation algorithm is the demerit	Privacy and security
Raghuram and Gyani (2012)	Data mining algorithms, proposed an associative classification model	Accuracy is tested	Privacy
Lin and Lo (2013)	Data mining algorithms, proposed a set of algorithms, containing EWS algorithm, ROD algorithm, SSWS algorithm and the PSWS algorithm	Delivered excellent performance with respect to scalability and execution time	Privacy, scalability and execution time
Harnsamut and Natwichai (2008)	Data mining algorithms, proposed a novel heuristic algorithm to preserve the privacy and maintain the data quality	Efficient and highly effective	Privacy and efficient
Seisungsittisunti and Natwichai (2011)	Data mining algorithms, proposed an incremental polynomial- time algorithm to transform the data to meet a privacy standard	Efficient in every problem setting	Privacy and efficient
Giannotti et al. (2013)	Outsourced data mining, proposed model based on background knowledge of attack	Strong defense against an attack They do not deal with other attack is the demerit	PPDM
Worku et al. (2014)	Outsourced data mining, improved their method by minimizing bilinear mapping	Secured and efficient The demerit is it is not wholly active	PPDM
Arunadevi and Anuradha (2014)	Outsourced data mining, proposed an attack model based on the basic assumption	Improved the security of the system	PPDM
Lai et al. (2014)	Outsourced data mining, proposed the first semantically secure solution for outsourcing association rule mining with data privacy	The demerit is it is non-deterministic and secure against an adversary at cloud servers	PPDM
Kerschbaum and Julien (2008)	Outsourced data mining, proposed a searchable encryption scheme for outsourcing data analytics	Secured	PPDM
Ying-hua et al. (2011)	Distributed, survey on the distributed privacy preserving data mining (DPPDM)	Surveyed on the DPPDM	PPDM
Li (2013)	Distributed, designed, and analyzed a symmetric-key based privacy- preserving scheme for mining support counts	Effective in detecting misbehaving nodes and increasing average throughput in the whole network	Privacy
Dev et al. (2012)	Distributed, combining categorization, fragmentation and distribution, prevents data mining by maintaining privacy levels, splitting data into chunks and storing these chunks of data to appropriate cloud providers	Provided an effective way to protect privacy from mining based attacks It introduced performance overhead as demerit	Privacy
Tassa (2014)	Distributed, proposed a protocol based on association rules in horizontally distributed databases	Devised an effective protocol for disparity verifications is disadvantageous	Privacy, accuracy and efficiency
Chan and Keng (2013)	Distributed, proposed a distributed architecture for privacy preserving outsourcing of association rules mining	Computational and storage overheads are significantly reduced in such a scheme	Privacy
Dong and Kresman (2009)	Distributed, focused on the linking between distributed data mining	It is simple to implement with least computing requirements	Privacy
Aggarwal et al. (2005)	Distributed, have discussed the developed techniques such as services based on data encryption, causing a large overhead in query processing and proposed a new distributed framework to enable privacy-preservation for the outsourced storage of data	A new definition for privacy has been demonstrated based on hiding sets of attribute values and it also discussed how proposed decomposition approaches help to achieve privacy, and identify the best privacy-preserving decomposition technique	Privacy
Xu and Yi (2011)	Distributed, proposed taxonomy to categorize those PPDDM protocols into important categories	High performance of these protocols	Privacy
Inan and Saygin (2010)	Distributed, proposed a method which constructs different matrix in the horizontal distributed data mining	Provided different comparison function for either character or numerical data	Privacy
Nanavati and Jinwala (2012)	Distributed, proposed techniques that protect privacy for global and partial cycles in a distributed data	Distinguished global cycles in a cooperative setup	Privacy
Agrawal and Srikant (2000)	Distributed, have developed a uniform randomization method based association rule for the categorical datasets	The data reassembled is sanitized knowledge based	Privacy
Wang et al. (2010)	Distributed, proposed an enhanced algorithm (PPFDM)	An effective and appropriate for the practical application fields	Privacy
Nguyen et al. (2012)	Distributed, Proposed Enhanced Scheme (EMHS)	Performance is better than MHS in specific databases	Privacy
Om Kumar et al. (2013)	Distributed, used WEKA to predict the patterns in a single cloud and by using cloud data distributor with a secure distributed approach	An effective solution that prevents such mining attacks on cloud thus making the cloud a secure platform for service and storage	Privacy
Mokeddem and Belbachir (2010)	Distributed, proposed model allowing the class association rules detection in a shared-nothing architecture	Created classification rules in a parallel setting	Privacy
Ibrahim et al. (2012)	Distributed, presented a practical cryptographic method to compute the KNN classification problem	Demonstrated that accuracy of the proposed work is the same as that of a naive scheme without security	Privacy
Patel et al. (2012)	Distributed, stated an effective algorithm to preserve privacy of distributed K-Means clustering	Faster than other algorithms and it is more appropriate for huge datasets in practical scenario	privacy
Kumbhar and Kharat (2012)	Distributed, analyzed different methods for PPARM	Studied the methods that depended on association rules mining on distributed dataset	Privacy
Nix et al. (2012)	Distributed, implemented two sketching protocols for the scalar (dot) product of two vectors which can be used as sub-protocols in larger data mining tasks	Accuracy and efficiency results through extensive experimentation	Privacy, accuracy and efficiency
Keshavamurthy et al. (2013)	Distributed, proved approach of Genetic Algorithm (GA) has two potential advantages comparison with traditional frequent pattern mining algorithm	The fitness function of GA plays an important role, and the convergence of search space is directly proportionate to the effectiveness of fitness function The GA could result in duplicate formation in its successive generations is a de-merit	Privacy
Loukides et al. (2012), Machanavajjhala et al. (2007)	Anonymity, proposed a novel approach that fulfils utility of data requirements	Effective	Privacy and utility
Wang et al. (2004)	Anonymity, have studied data mining as approach used for data masking, known as data mining-based privacy protection	Two key factors, quality and scalability has been focused specifically is advantageous	Privacy, quality, and scalability
Friedman et al. (2008), Loukides and Gkoulalas-divanis (2012)	Anonymity, presented definitions of k-anonymity	It could be used in many data mining algorithms	Privacy
Ciriani et al. (2008)	Anonymity, presented the possible threats to K-anonymity and categorized two main approaches for merging K- anonymity in data mining	Discussed different methods that could be applied to detect K-anonymity violations	Privacy
He et al (2011), Friedman et al. (2008)	Anonymity, proposed an algorithm which is based on clustering to produce a utility-friendly anonymized version of micro data	Utility is improved by their approach	Privacy and utility
Patil and Patankar (2013), He et al. (2011)	Anonymity, analyzed existing K-anonymity model and its applications	Analyzed current K-anonymity model	Privacy
Zhu and Chen (2012), Patil and Patankar (2013)	Anonymity, studied K-anonymity model	Surveyed K-anonymity model	Privacy
Soodejani et al. (2012), Zhu and Chen (2012)	Anonymity, employed a version of the chase, called standard chase	Provided a stronger privacy model for the proposed method and can be valuable	Privacy
Karim et al. (2012), Soodejani et al. (2012)	Anonymity, proposed a numerical method to mine maximal frequent patterns with privacy preserving capability	An efficient data transformation technique, a novel encoded and compressed lattice structure, and MFPM algorithm	Privacy
Loukides et al. (2012), Karim et al. (2012)	Anonymity, proposed a rule-based privacy model that allows data publishers to express fine-grained protection requirements for both identity and sensitive information disclosure	Outperformed the state-of-the-art in terms of retaining data utility, while achieving good protection	Privacy, utility and scalability
Vijayarani et al. (2010a, b), Loukides et al. (2012)	K-anonymity has been studied as an interesting approach to protect micro data related to public or semi-public sectors from linking attacks	Proposed novel approach	Privacy
Nergiz et al. (2009), Xu and Yi (2011)	Anonymity, proposed new clustering algorithms to achieve multi relational anonymity	Provided utility of data and efficiency	Utility, effectiveness and efficiency
Tai et al. (2013), Vijayarani et al. (2010b)	Anonymity, proposed a Distributed k-support Noise Taxonomy tree algorithm, abbreviated as DKNT	Achieved good protection and better computation efficiency, as compared to the computation efficiency on single machine	Privacy and efficiency
Tai et al. (2010, 2013)	Anonymity, introduce a pseudo taxonomy tree and have the third party mine the generalized frequent item-sets instead	Achieved very good privacy protection with moderate storage overhead	Privacy
Pan et al. (2012), Tai et al. (2010)	Anonymity, had analyzed and performed a comparison for the present developed K-anonymity models and its applications	Enhanced K -anonymity and improve it	Privacy
Deivanai et al. (2011), Pan et al. (2012)	Anonymity, proposed novel method named kactus	Accuracy is better than other methods based on K -anonymity	Privacy and accuracy
Monreale et al. (2014), Deivanai et al. (2011)	Anonymity, a new definition of K-anonymity for personal sequential data which provides an effective privacy protection model is introduced	Results are extremely interesting in the case of dense datasets	Privacy
Nergiz et al. (2013), Monreale et al. (2014)	Anonymity, the hybrid generalizations with data relocation	Increased the utility of data	Privacy and utility
Zhang et al. (2013a, 2014a), Nergiz et al. (2013)	Anonymity, proposed hybrid approach by combining Top-Down Specialization and Bottom-Up Generalization	Improved the scalability and efficiency of TDS	Privacy and scalability
Zhang et al. (2014a)	Anonymity, proposed a highly scalable two-phase TDS approach using Map Reduce on cloud	Scalability and efficiency of TDS are improved significantly over existing approaches	Privacy and scalability
Zhang et al. (2013a, b), Zhang et al. (2014a)	Anonymity, proposed method depends on an efficient quasi-identifier index	Protected privacy when new data is added	Privacy and efficiency
Nergiz and Gök (2014)	Anonymity, Hybrid generalizations	Ensured the utility of data	Privacy and utility
Ding et al. (2013), Zhang et al. (2013c)	Anonymity, have presented a distributed anonymization protocol for privacy-preserving data publishing from multiple data providers in a cloud system	Performed a personalized anonymization to satisfy every data provider’s requirements and the union forms a global anonymization to be published	Privacy

Back to article page