skip to main content
research-article

Privacy-preserving data publishing: A survey of recent developments

Published:23 June 2010Publication History
Skip Abstract Section

Abstract

The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

References

  1. Abul, O., Bonchi, F., and Nanni, M. 2008. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE). 376--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adam, N. R. and Wortman, J. C. 1989. Security control methods for statistical databases. ACM Comput. Surv. 21, 4, 515--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aggarwal, C. C. and Yu, P. S. 2008a. A framework for condensation-based anonymization of string data. Data Min. Knowl. Discov. 13, 3, 251--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aggarwal, C. C. and Yu, P. S. 2008b. On static and dynamic methods for condensation-based privacy-preserving data mining. ACM Trans. Datab. Syst. 33, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aggarwal, C. C. and Yu, P. S. 2008c. Privacy-Preserving Data Mining: Models and Algorithms. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aggarwal, C. C. and Yu, P. S. 2007. On privacy-preservation of text and sparse binary data with sketches. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  7. Aggarwal, C. C., Pei, J., and Zhang, B. 2006. On privacy preservation against adversarial data mining. In Proceedings of the 12th ACM SIGKDD. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the 31st Conference on Very Large Data Bases (VLDB). 901--909. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2006. Achieving anonymity via clustering. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART PODS Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Anonymizing tables. In Proceedings of the 10th International Conference on Database Theory (ICDT). 246--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data-mining algorithms. In Proceedings of the 20th ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 247--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the ACM SIGMOD. ACM, New York, 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Agrawal, S. and Haritsa, J. R. 2005. A framework for high-accuracy privacy-preserving mining. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE). 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Atzori, M., Bonchi, F., Giannotti, F., and Pedreschi, D. 2008. Anonymity preserving pattern discovery. Int. J. Very Large Data Bases 17, 4, 703--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D., and Abul, O. 2007. Privacy-aware knowledge discovery from location data. In Proceedings of the International Workshop on Privacy-Aware Location-based Mobile Services (PALMS). 283--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Barak, B., Chaudhuri, K., Dwork, C., Kale, S., Mcsherry, F., and Talwar, K. 2007. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In Proceedings of the 26th ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 273--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times (Aug. 9).Google ScholarGoogle Scholar
  19. Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE). 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Beinat, E. 2001. Privacy and location-based: Stating the policies clearly. GeoInformatics.Google ScholarGoogle Scholar
  21. Blum, A., Ligett, K., and Roth, A. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC). ACM, New York, 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The sulq framework. In Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Brand, R. 2002. Microdata protection through noise addition. In Inference Control in Statistical Databases, From Theory to Practice, London, 97--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bu, Y., Fu, A. W. C., Wong, R. C. W., Chen, L., and Li, J. 2008. Privacy preserving serial data publishing by role composition. Proc. VLDB Endowment 1, 1, 845--856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Burnett, L., Barlow-Stewart, K., Pros, A., and Aizenberg, H. 2003. The gene trustee: A universal identification system that ensures privacy and confidentiality for human genetic databases. J. Law and Medicine 10, 506--513.Google ScholarGoogle Scholar
  26. Byun, J.-W., Sohn, Y., Bertino, E., and Li, N. 2006. Secure anonymization for incremental datasets. In Proceedings of the VLDB Workshop on Secure Data Management (SDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Carlisle, D. M., Rodrian, M. L., and Diamond, C. L. 2007. California inpatient data reporting manual, medical information reporting for California (5th Ed), Tech. rep., Office of Statewide Health Planning and Development.Google ScholarGoogle Scholar
  28. Chakaravarthy, V. T., Gupta, H., Roy, P., and Mohania, M. 2008. Efficient techniques for documents sanitization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chaum, D. 1981. Untraceable electronic mail, return addresses, and digital pseudonyms. Comm. ACM 24, 2, 84--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Theory of Cryptography Conference (TCC). 363--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chawla, S., Dwork, C., McSherry, F., and Talwar, K. 2005. On privacy-preserving histograms. In Proceedings of the Uncertainty in Artificial Intelligence Coference (UAI).Google ScholarGoogle Scholar
  32. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4, 2, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Clifton, C. 2000. Using sample size to limit exposure to data mining. J. Comput. Security 8, 4, 281--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cox, L. H. 1980. Suppression methodology and statistical disclosure control. J. Am. Statistical Assoc. 75, 370, 377--385.Google ScholarGoogle ScholarCross RefCross Ref
  35. Dalenius, T. 1986. Finding a needle in a haystack - or identifying anonymous census record. J. Official Statistics 2, 3, 329--336.Google ScholarGoogle Scholar
  36. Dalenius, T. 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429--444.Google ScholarGoogle Scholar
  37. Denning, D. E. 1985. Commutative filters for reducing inference threats in multilevel database systems. In Proceedings of the IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  38. Deutsch, A. and Papakonstantinou, Y. 2005. Privacy in database publishing. In Proceedings of the 10th International Conference on Database Theory (ICDT). 230--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd ACM Symposium on Principles of Database Systems (PODS). 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Domingo-Ferrer, J. 2008. Privacy-Preserving Data Mining: Models and Algorithms. Springer, Berlin, 53--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Domingo-Ferrer, J. and Torra, V. 2008. A critique of k-anonymity and some of its enhancements. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES). 990--993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Domingo-Ferrer, J. and Torra, V. 2002. Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, 113--134.Google ScholarGoogle Scholar
  43. Domingo-Ferrer, J. 2001. Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, 91--11.Google ScholarGoogle Scholar
  44. Du, W. and Zhan, Z. 2003. Using randomized response techniques for privacy-preserving data mining. In Proceedings of the 9th ACM SIGKDD. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Duncan, G. and Fienberg, S. 1998. Obtaining information while preserving privacy: A Markov perturbation method for tabular data. In Statistical Data Protection, 351--362.Google ScholarGoogle Scholar
  46. Dwork, C. 2008. Differential privacy: A survey of results. In Proceedings of the 5th International Conference on Theory and Applications of Models of Computation (TAMC). 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dwork, C. 2007. Ask a better question, get a better answer: A new approach to private data analysis. In Proceedings of the International Conference on Database Theory (ICDT). 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Dwork, C. 2006. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC). 265--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Dwork, C. and Nissim, K. 2004. Privacy-preserving data mining on vertically partitioned databases. In Proceedings of the 24th International Cryptology Conference (CRYPTO). 528--544.Google ScholarGoogle Scholar
  51. Emam, K. E. 2006. Data anonymization practices in clinical research: A descriptive study. Tech. rep. Access to Information and Privacy Division of Health in Canada.Google ScholarGoogle Scholar
  52. Evfimievski, A., Fagin, R., and Woodruff, D. P. 2008. Epistemic privacy. In Proceedings of the 27th ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD. ACM, New York, 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Farkas, C. and Jajodia, S. 2003. The inference problem: A survey. ACM SIGKDD Explor. Newsl. 4, 2, 6--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Fuller, W. A. 1993. Masking procedures for microdata disclosure limitation. Official Statistics 9, 2, 383--406.Google ScholarGoogle Scholar
  56. Fung, B. C. M., Cao, M., Desai, B. C., and Xu, H. 2009. Privacy protection for RFID data. In Proceedings of the 24th ACM SIGAPP Symposium on Applied Computing (SAC). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Fung, B. C. M., Wang, K., Wang, L., and Hung, P. C. K. 2009. Privacy-preserving data publishing for cluster analysis. Data Knowl. Engin. 68, 6, 552--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Fung, B. C. M., Wang, K., Fu, A. W. C., and Pei, J. 2008. Anonymity for continuous data publishing. In Proceedings of the 11th International Conference on Extending Database Technology (EDBT). ACM, New York, 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Fung, B. C. M., Wang, K., Wang, L., and Debbabi, M. 2008. A framework for privacy-preserving cluster analysis. In Proceedings of the 2008 IEEE International Conference on Intelligence and Security Informatics (ISI). 46--51.Google ScholarGoogle Scholar
  60. Fung, B. C. M., Wang, K., and Yu, P. S. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Engin. 19, 5, 711--725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Fung, B. C. M., Wang, K., and Yu, P. S. 2005. Top-down specialization for information and privacy preservation. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE). 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Gehrke, J. 2006. Models and methods for privacy-preserving data publishing and analysis. Tutorial at the 12th ACM SIGKDD.Google ScholarGoogle Scholar
  63. Ghinita, G., Tao, Y., and Kalnis, P. 2008. On the anonymization of sparse high-dimensional data. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE). 715--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Goguen, J. and Meseguer, J. 1984. Unwinding and inference control. In Proceedings of the IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  65. Hegland, M., Mcintosh, I., and Turlach, B. A. 1999. A parallel solver for generalized additive models. Comput. Statistics Data Anal. 31, 4, 377--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Hengartner, U. 2007. Hiding location information from location-based services. In Proceedings of the International Workshop on Privacy-Aware Location-based Mobile Services (PALMS). 268--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Hinke, T. 1988. Inference aggregation detection in database management systems. In Proceedings of the IEEE Symposium on Security and Privacy. 96--107.Google ScholarGoogle ScholarCross RefCross Ref
  68. Hinke, T., Degulach, H., and Chandrasekhar, A. 1995. A fast algorithm for detecting second paths in database inference analysis. J. Comput. Security.Google ScholarGoogle ScholarCross RefCross Ref
  69. Huang, Z., Du, W., and Chen, B. 2005. Deriving private information from randomized data. In Proceedings of the ACM SIGMOD. ACM, New York, 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Hundepool, A. and Willenborg, L. 1996. 1- and ¿-argus: Software for statistical disclosure control. In Proceedings of the 3rd International Seminar on Statistical Confidentiality.Google ScholarGoogle Scholar
  71. Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM SIGKDD. ACM, New York, 279--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jajodia, S. and Meadows, C. 1995. Inference problems in multilevel database management systems. In IEEE Information Security: An Integrated Collection of Essays. 570--584.Google ScholarGoogle Scholar
  73. Jakobsson, M., Juels, A., and Rivest, R. L. 2002. Making mix nets robust for electronic voting by randomized partial checking. In Proceedings of the 11th USENIX Security Symposium. 339--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jiang, W. and Clifton, C. 2005. Privacy-preserving distributed k-anonymity. In Proceedings of the 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security. 166--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Jiang, W. and Clifton, C. 2006. A secure distributed framework for achieving k-anonymity. Very Large Data Bases J. 15, 4, 316--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Kantarcioglu, M. 2008. Privacy-Preserving Data Mining: Models and Algorithms. Springer, Berlin, 313--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Kantarcioglu, M., Jin, J., and Clifton, C. 2004. When do data mining results violate privacy? In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 599--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM). 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Kenthapadi, K., Mishra, N., and Nissim, K. 2005. Simulatable auditing. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Kifer, D. and Gehrke, J. 2006. Injecting utility into anonymized datasets. In Proceedings of ACM SIGMOD. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Kim, J. and Winkler, W. 1995. Masking microdata files. In Proceedings of the ASA Section on Survey Research Methods. 114--119.Google ScholarGoogle Scholar
  82. Kokkinakis, D. and Thurin, A. 2007. Anonymization of Swedish clinical data. In Proceedings of the 11th Conference on Artificial Intelligence in Medicine (AIME). 237--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th World Wide Wed Conference. 628--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Lefevre, K., Dewitt, D. J., and Ramakrishnan, R. 2006a. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Lefevre, K., Dewitt, D. J., and Ramakrishnan, R. 2006b. Workload-aware anonymization. In Proceedings of the 12th ACM SIGKDD. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Lefevre, K., Dewitt, D. J., and Ramakrishnan, R. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of ACM SIGMOD. ACM, New York, 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Li, J., Tao, Y., and Xiao, X. 2008. Preservation of proximity privacy in publishing numerical sensitive data. In Proceedings of the ACM Conference on Management of Data (SIGMOD). 437--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Li, N., Li, T., and Venkatasubramanian, S. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  89. Machanavajjhala, A., Kifer, D., Abowd, J. M., Gehrke, J., and Vilhuber, L. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE). 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Malin, B. and Airoldi, E. 2006. The effects of location access behavior on re-identification risk in a distributed environment. In Proceedings of the 6th Workshop on Privacy Enhancing Technologies (PET). 413--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2007. Worst-case background knowledge in privacy-preserving data publishing. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  94. Matloff, N. S. 1988. Inference control via query restriction vs. data modification: A perspective. In Database Security: Status and Prospects. 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART PODS. ACM, New York, 223--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Miklau, G. and Suciu, D. 2004. A formal analysis of information disclosure in data exchange. In Proceedings of the ACM SIGMOD. ACM, New York, 575--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Mohammed, N., Fung, B. C. M., Wang, K., and Hung, P. C. K. 2009. Privacy-preserving data mashup. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Moore, R. A., Jr. 1996. Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series RR 96-04, U.S. Bureau of the Census, Washington, D.C.Google ScholarGoogle Scholar
  99. Motwani, R. and Xu, Y. 2007. Efficient algorithms for masking and finding quasi-identifiers. In Proceedings of the Conference on Very Large Data Bases (VLDB).Google ScholarGoogle Scholar
  100. Nergiz, M. E., Atzori, M., and Clifton, C. W. 2007. Hiding the presence of individuals from shared databases. In Proceedings of ACM SIGMOD Conference. ACM, New York, 665--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Nergiz, M. E. and Clifton, C. 2007. Thoughts on k-anonymization. Data Knowl. Engin. 63, 3, 622--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Nergiz, M. E., Clifton, C., and Nergiz, A. E. 2007. Multirelational k-anonymity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE). 1417--1421.Google ScholarGoogle Scholar
  103. Ohrn, A. and Ohno-Machado, L. 1999. Using Boolean reasoning to anonymize databases. Artif. Intell. Medicine 15, 235--254.Google ScholarGoogle ScholarCross RefCross Ref
  104. Ozsoyoglu, G. and Su, T. 1990. On inference control in semantic data models for statistical databases. J. Comput. Syst. Sci. 40, 3, 405--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Papadimitriou, S., Li, F., Kollios, G., and Yu, P. S. 2007. Time series compressibility and privacy. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Pinkas, B. 2002. Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor. Newsl. 4, 2, 12--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Pohlig, S. and Hellman, M. 1978. An improved algorithm for computing logarithms over gf(p) and its cryptographic significance. IEEE Trans. Inform. Theory IT-24, 106--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. President Information Technology Advisory Committee. 2004. Revolutionizing health care through information technology. Tech. rep., Executive Office of the President of the United States.Google ScholarGoogle Scholar
  109. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Rastogi, V., Suciu, D., and Hong, S. 2007. The boundary between privacy and utility in data publishing. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). 531--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Reiss, S. P. 1984. Practical data-swapping: The first steps. ACM Trans. Datab. Syst. 9, 1, 20--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Reiss, S. P., Post, M. J., and Dalenius, T. 1982. Non-reversible privacy transformations. In Proceedings of the 1st ACM Symposium on Principles of Database Systems (PODS). 139--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Rosen, B. E., Goodwin, J. M., and Vidal, J. J. 1992. Process control with adaptive range coding. Biological Cyber. 67, 419--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Rubin, D. B. Discussion statistical disclosure limitation. J. Official Statistics 9, 2.Google ScholarGoogle Scholar
  115. Samarati, P. 2001. Protecting respondents' identities in microdata release. IEEE Trans. Knowl. Data Engin. 13, 6, 1010--1027. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Samarati, P. and Sweeney, L. 1998a. Generalizing data to provide anonymity when disclosing information. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART (PODS). ACM, New York, 188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Samarati, P. and Sweeney, L. 1998b. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Tech. rep., SRI International.Google ScholarGoogle Scholar
  118. Saygin, Y., Hakkani-Tur, D., and Tur, G. 2006. Web and Information Security. IRM Press, 133--148.Google ScholarGoogle Scholar
  119. Shannon, C. E. 1948. A mathematical theory of communication. The Bell Syst. Tech. J. 27, 379 and 623.Google ScholarGoogle ScholarCross RefCross Ref
  120. Skowron, A. and Rauszer, C. 1992. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Sweeney, L. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty, Fuzziness, Knowl.-Based Syst. 10, 5, 571--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Sweeney, L. 2002b. k-Anonymity: A model for protecting privacy. Int. J. Uncertainty, Fuzziness, Knowl.-Based Syst. 10, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Sweeney, L. 1998. Datafly: A system for providing anonymity in medical data. In Proceedings of the IFIP TC11 WG11.3 11th International Conference on Database Securty XI: Status and Prospects. 356--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Terrovitis, M. and Mamoulis, N. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM). 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endowment 1, 1, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Thuraisingham, B. M. 1987. Security checking in relational database management systems augmented with inference engines. Comput. Security 6, 479--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Truta, T. M. and Bindu, V. 2006. Privacy protection: p-sensitive k-anonymity property. In Proceedings of the Workshop on Privacy Data Management (PDM). 94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Vaidya, J. 2008. Privacy-Preserving Data Mining: Models and Algorithms. Springer, Berlin, 337--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., and Dasseni, E. 2004. Association rule hiding. IEEE Trans. Knowl. Data Engin. 16, 4, 434--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Vinterbo, S. A. 2004. Privacy: A machine learning view. IEEE Trans. Knowl. Data Engin. 16, 8, 939--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Wang, K., Xu, Y., Fu, A. W. C., and Wong, R. C. W. 2009. ff-anonymity: When quasi-identifiers are missing. In Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Wang, K., Fung, B. C. M., And Yu, P. S. 2007. Handicapping attacker's confidence: An alternative to k-anonymization. Knowl. Inform. Syst. 11, 3, 345--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Wang, K. and Fung, B. C. M. 2006. Anonymizing sequential releases. In Proceedings of the 12th ACM SIGKDD Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Wang, K., Fung, B. C. M., and Dong, G. 2005. Integrating private databases for data analysis. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI). 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Wang, K., Fung, B. C. M., and Yu, P. S. 2005. Template-based privacy preservation in classification problems. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM). 466--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Wang, K., Yu, P. S., and Chakraborty, S. 2004. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Wang, S.-W., Chen, W.-H., Ong, C.-S., Liu, L., and Chuang, Y. 2006. RFID applications in hospitals: A case study on a demonstration RFID project in a Taiwan hospital. In Proceedings of the 39th Hawaii International Conference on System Sciences. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Statistical Assoc. 60, 309, 63--69.Google ScholarGoogle ScholarCross RefCross Ref
  139. Wong, R. C. W., Fu, A. W. C., Wang, K., and Pei, J. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). 543--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Wong, R. C. W., Li., J., Fu, A. W. C., and Wang, K. 2006. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD. ACM, New York, 754--759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Wright, R. N., Yang, Z., and Zhong, S. 2005. Distributed data mining protocols for privacy: A review of some recent results. In Proceedings of the Secure Mobile Ad-Hoc Networks and Sensors Workshop (MADNES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Xiao, X. and Tao, Y. 2007. m-invariance: Towards privacy preserving re-publication of dynamic datasets. In Proceedings of the ACM SIGMOD Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W. C. 2006. Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Xu, Y., Fung, B. C. M., Wang, K., Fu, A. W. C., and Pei, J. 2008. Publishing sensitive transactions for itemset utility. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Xu, Y., Wang, K., Fu, A. W. C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the 14th ACM SIGKDD Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Yang, Z., Zhong, S., and Wright, R. N. 2005. Anonymity-preserving data collection. In Proceedings of the 11th ACM SIGKDD Conference. ACM, New York, 334--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Yao, C., Wang, X. S., and Jajodia, S. 2005. Checking for k-anonymity violation by views. In Proceedings of the 31st Conference on Very Large Data Bases (VLDB). 910--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. You, T.-H., Peng, W.-C., and Lee, W.-C. 2007. Protect moving trajectories with dummies. In Proceedings of the International Workshop on Privacy-Aware Location-Based Mobile Services (PALMS). 278--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Zayatz, L. 2007. Disclosure avoidance practices and research at the U.S. Census Bureau: An update. J. Official Statistics 23, 2, 253--265.Google ScholarGoogle Scholar
  152. Zhang, P., Tong, Y., Tang, S., and Yang, D. 2005. Privacy-preserving naive Bayes classifier. Lecture Notes in Computer Science, vol. 3584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Zhang, Q., Koudas, N., Srivastava, D., and Yu, T. 2007. Aggregate query answering on anonymized tables. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar

Index Terms

  1. Privacy-preserving data publishing: A survey of recent developments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 42, Issue 4
          June 2010
          175 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/1749603
          Issue’s Table of Contents

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 June 2010
          • Revised: 1 December 2008
          • Accepted: 1 December 2008
          • Received: 1 April 2008
          Published in csur Volume 42, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader