Abstract
We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Unlike in the study of secure function evaluation, in which privacy is preserved to the extent possible given a specific functionality goal, in the census problem privacy is paramount; intuitively, things that cannot be learned “safely” should not be learned at all.
An important contribution of this work is a definition of privacy (and privacy compromise) for statistical databases, together with a method for describing and comparing the privacy offered by specific sanitization techniques. We obtain several privacy results using two different sanitization techniques, and then show how to combine them via cross training. We also obtain two utility results involving clustering.
A full version of this paper may be found on the World Wide Web at http://research.microsoft.com/research/sv/DatabasePrivacy/.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, D., Aggarwal, C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems (2001)
Adam, N.R., Wortmann, J.C.: Security-Control Methods for Statistical Databases. A Comparative Study, ACM Computing Surveys 21(4), 515–556 (1989)
Arora, S., Kannan, R.: Learning mixtures of arbitrary Gaussians. In: ACM STOC (2001)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Beck, L.: A security machanism for statistical database. ACM Transactions on Database Systems (TODS) 5(3), 316–338 (1980)
Chawla, S., Dwork, C., McSherry, F., Talwar, K.: On the Utility of Privacy- Preserving Histograms (November 2004) (in preparation)
Cox, L.H.: New Results in Disclosure Avoidance for Tabulations. In: International Statistical Institute Proceedings of the 46th Session, Tokyo, pp. 83–84 (1987)
Dasgupta, S.: Learning mixtures of Gaussians. In: IEEE FOCS (1999)
Denning, D.: Secure statistical databases with random sample queries. ACM Transactions on Database Systems (TODS) 5(3), 291–315 (1980)
Diaconis, P., Sturmfels, B.: Algebraic Algorithms for Sampling from Conditional Distributions. Annals of Statistics 26(1), 363–397 (1998)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Symposium on Principles of Database Systems, pp. 202–210 (2003)
Dobra, A., Fienberg, S.E., Trottini, M.: Assessing the risk of disclosure of confidential categorical data, Bayesian Statistics 7, pp. 125–144. Oxford University Press, Oxford (2000)
Dwork, C.: A Cryptography-Flavored Approach to Privacy in Public Databases. In: lecture at Aladdin Workshop on Privacy in DATA (March 2003), http://www.aladdin.cs.cmu.edu/workshops/privacy/slides/pdf/dwork.pdf
Dwork, C., Naor, M., et al.: Impossibility Results for Privacy-Preserving Data Sanitization (2004) (in preparation)
Dwork, C., Nissim, K.: Privacy-preserving datamining on vertically partitioned databases. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)
Evfimievski, A.V., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Symposium on Principles of Database Systems, pp. 211–222 (2003)
Füredi, Z., Komlós, J.: The eigenvalues of random symmetric matrices, Combinatorica. Combinatorica 1(3), 233–241 (1981)
Gasarch, W.: A Survey on Private Information Retrieval. BEATCS Computational Complexity Column 82, 72–107 (2004)
Gavison, R.: Privacy and the Limits of the Law. In: Johnson, D.G., Nissenbaum, H. (eds.) Computers, Ethics, and Social Values, pp. 332–351. Prentice Hall, Englewood Cliffs (1995)
Goldreich, O.: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)
Goldwasser, S., Micali, S.: Probabilistic Encryption. JCSS 28(2), 270–299 (1984)
Gusfield, D.: A Graph Theoretic Approach to Statistical Data Security. SIAM Journal on Computing 17(3), 552–571 (1988)
Indyk, P., Motwani, R.: Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (1998)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. In: Proceedings of the Third ICDM IEEE International Conference on Data Mining (2003)
Kleinberg, J.M., Papadimitriou, C.H., Raghavan, P.: Auditing Boolean Attributes. J. Comput. Syst. Sci. 66(1), 244–253 (2003)
McSherry, F.: Spectral Partitioning of Random Graphs. In: Proc. 42nd FOCS, pp. 529–537 (2001)
Raghunathank, T.E., Reiter, J.P., Rubin, D.B.: Multiple Imputation for Statistical Disclosure Limitation. J. Official Statistics 19(1), 1–16 (2003)
Roque, G.: Application and Analysis of the Mixture-of-Normals Approach to Masking Census Public-use Microdata (2003) (Manuscript)
Rubin, D.B.: Discussion: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–468 (1993)
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 571–588 (2002)
Vempala, S., Wang, G.: A spectral algorithm for learning mixtures of distributions. In: IEEE FOCS (2002)
Winkler, W.E.: Masking and re-identification methods for public-use microdata: Overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 231–246. Springer, Heidelberg (2004)
Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H. (2005). Toward Privacy in Public Databases. In: Kilian, J. (eds) Theory of Cryptography. TCC 2005. Lecture Notes in Computer Science, vol 3378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30576-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-30576-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24573-5
Online ISBN: 978-3-540-30576-7
eBook Packages: Computer ScienceComputer Science (R0)