ABSTRACT
Perceptual hashing is widely used to search or match similar images for digital forensics and cybercrime study. Unfortunately, the robustness of perceptual hashing algorithms is not well understood in these contexts. In this paper, we examine the robustness of perceptual hashing and its dependent security applications both experimentally and empirically. We first develop a series of attack algorithms to subvert perceptual hashing based image search. This is done by generating attack images that effectively enlarge the hash distance to the original image while introducing minimal visual changes. To make the attack practical, we design the attack algorithms under a black-box setting, augmented with novel designs (e.g., grayscale initialization) to improve the attack efficiency and transferability. We then evaluate our attack against the standard pHash as well as its robust variant using three different datasets. After confirming the attack effectiveness experimentally, we then empirically test against real-world reverse image search engines including TinEye, Google, Microsoft Bing, and Yandex. We find that our attack is highly successful on TinEye and Bing, and is moderately successful on Google and Yandex. Based on our findings, we discuss possible countermeasures and recommendations.
- Pushkal Agarwal, Kiran Garimella, Sagar Joglekar, Nishanth Sastry, and Gareth Tyson. 2020. Characterising User Content on a Multi-lingual Social Network. In Proc. of ICWSM.Google ScholarCross Ref
- Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. 2015. Seven months' worth of mistakes: A longitudinal study of typosquatting abuse. In Proc. of NDSS.Google ScholarCross Ref
- Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2018. Synthesizing Robust Adversarial Examples. In Proc. of ICML.Google Scholar
- Antonio Bianchi, Eric Gustafson, Yanick Fratantonio, Christopher Kruegel, and Giovanni Vigna. 2017. Exploitation and mitigation of authentication schemes based on device-public information. In Proc. of AsiaCCS.Google ScholarDigital Library
- Microsoft Bing. 2021. Microsoft Bing Images. https://www.bing.com/images/trending.Google Scholar
- Kevin Borgolte, Christopher Kruegel, and Giovanni Vigna. 2015. Meerkat: Detecting website defacements through image-based object recognition. In Proc. of USENIX Security.Google Scholar
- Thomas Brunner, Frederik Diehl, Michael Truong Le, and Alois Knoll. 2019. Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks. In Proc. of ICCV.Google ScholarCross Ref
- Johannes Buchner. 2021. ImageHash. https://github.com/JohannesBuchner/imagehash.Google Scholar
- Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright. 2019. Rethinking the Detection of Child Sexual Abuse Imagery on the Internet. In Proc. of WWW.Google ScholarDigital Library
- Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip Yu. 2017. HashNet: Deep Learning to Hash by Continuation. In Proc. of ICCV.Google ScholarCross Ref
- Nicholas Carlini and David Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In Proc. of IEEE SP.Google ScholarCross Ref
- Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. ZOO: Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models. In Proc. of AISec.Google ScholarDigital Library
- Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Shengzhi Zhang, Kai Chen, and XiaoFeng Wang. 2020. Deviltextquoterights Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices. In Proc. of USENIX Security.Google Scholar
- Jiri Fridrich and Miroslav Goljan. 2000. Robust Hash Functions for Digital Watermarking. IEEE Int Conf Information Technology: Coding Computing (2000).Google Scholar
- Robert Frischholz. 2021. Reverse Image Search -- Searching People by Photos. https://facedetection.com/online-reverse-image-search/.Google Scholar
- Oana Goga, Giridhari Venkatadri, and Krishna P Gummadi. 2015. The doppelg"anger bot attack: Exploring identity impersonation in online social networks. In Proc. of IMC.Google Scholar
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
- Google. 2021. Google Image Search. https://www.google.com/imghp.Google Scholar
- Azhar Hadmi, William Puech, Brahim Said, and Abdellah Ouahman. 2013. A robust and secure perceptual hashing system based on a quantization step analysis. Signal Processing: Image Communication, Vol. 28 (2013), 929--948.Google ScholarCross Ref
- Chao-Yung Hsu, Chun-Shien Lu, and Soo-Chang Pei. 2009. Secure and robust SIFT. In Proc. of Multimedia.Google ScholarDigital Library
- Weiwei Hu and Ying Tan. 2017. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. arXiv preprint arXiv:1702.05983 (2017).Google Scholar
- Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box Adversarial Attacks with Limited Queries and Information. arXiv preprint arXiv:1804.08598 (2018).Google Scholar
- Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial Examples Are Not Bugs, They Are Features. In Proc. of NeurIPS.Google Scholar
- Infringement-Report. 2021. ImageRaider. https://infringement.report/api/raider-reverse-image-search/.Google Scholar
- Diane Kelly and Leif Azzopardi. 2015. How Many Results per Page? A Study of SERP Size, Search Behavior and User Experience. In Proc. of SIGIR.Google ScholarDigital Library
- Amin Kharraz, William Robertson, and Engin Kirda. 2018. Surveylance: Automatically detecting online survey scams. In Proc. of IEEE SP.Google ScholarCross Ref
- Seon Joo Kim. 2021. Real and Fake Face Detection: Discriminate Real and Fake Face Images. https://www.kaggle.com/ciplab/real-and-fake-face-detection.Google Scholar
- Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. 2017. Hiding in plain sight: A longitudinal study of combosquatting abuse. In Proc. of CCS.Google ScholarDigital Library
- Evan Klinger and David Starkweather. 2021. pHash: The open source perceptual hash library. https://www.phash.org/docs/.Google Scholar
- Ching-Yung Lin and Shih-Fu Chang. 2001. A robust image authentication system distinguishing JPEG compression from malicious manipulation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11 (2001).Google Scholar
- Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep Supervised Hashing for Fast Image Retrieval. In Proc. of CVPR.Google ScholarCross Ref
- Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and S. Chang. 2012. Supervised Hashing with Kernels. In Proc. of CVPR.Google Scholar
- Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with Graphs. In Proc. of ICML.Google Scholar
- Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into Transferable Adversarial Examples and Black-box Attacks. In Proc. of ICLR.Google Scholar
- David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, Vol. 60, 2 (2004).Google Scholar
- Commons Machinery. 2021. Blockhash. http://blockhash.io/.Google Scholar
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proc. of ICLR.Google Scholar
- Philipe Melo, Johnnatan Messias, Gustavo Resende, Kiran Garimella, Jussara Almeida, and Fabr'icio Benevenuto. 2019. Whatsapp monitor: A fact-checking system for whatsapp. In Proc. of ICWSM.Google ScholarCross Ref
- M. Kivancc Mihcc ak and Ramarathnam Venkatesan. 2002. New Iterative Geometric Methods for Robust Perceptual Image Hashing. In Proc. of ACM Workshop on Security and Privacy in Digital Rights Management.Google Scholar
- Najmeh Miramirkhani, Oleksii Starov, and Nick Nikiforakis. 2016. Dial one for scam: Analyzing and detecting technical support scams. In Proc. of NDSS.Google Scholar
- Alexandros Mittos, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. 2020. ?And We Will Fight for Our Race!" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. In Proc. of ICWSM.Google ScholarCross Ref
- Vishal Monga and Brian L. Evans. 2004. Robust perceptual image hashing using feature points. In Proc. of ICIP.Google Scholar
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In Proc. of CVPR.Google ScholarCross Ref
- Nick Nikiforakis, Federico Maggi, Gianluca Stringhini, M Zubair Rafique, Wouter Joosen, Christopher Kruegel, Frank Piessens, Giovanni Vigna, and Stefano Zanero. 2014. Stranger danger: exploring the ecosystem of ad-based url shortening services. In Proc. of WWW.Google ScholarDigital Library
- Adam Novozámský, Babak Mahdian, and Stanislav Saic. 2020. IMD2020: A Large-Scale Annotated Dataset Tailored for Detecting Manipulated Images. In Proc. of IEEE Winter Applications of Computer Vision Workshops.Google Scholar
- Thomas Kobber Panum, Kaspar Hageman, René Rydhof Hansen, and Jens Myrup Pedersen. 2020. Towards Adversarial Phishing Detection. In Proc. of CSET.Google Scholar
- Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016a. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. arXiv preprint arXiv:1605.07277 (2016).Google Scholar
- Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. In Proc. of AsiaCCS.Google ScholarDigital Library
- Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016b. The limitations of deep learning in adversarial settings. In Proc. of IEEE SP.Google ScholarCross Ref
- Sergio Pastrana, Alice Hutchings, Daniel Thomas, and Juan Tapiador. 2019. Measuring EWhoring. In Proc. of IMC.Google Scholar
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proc. of SOSP.Google ScholarDigital Library
- Pixsy. 2021. Pixsy. https://www.pixsy.com/.Google Scholar
- Erwin Quiring, David Klein, Daniel Arp, Martin Johns, and Konrad Rieck. 2020. Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning. In Proc. of USENIX Security.Google Scholar
- M. Zubair Rafique, Tom Van Goethem, Wouter Joosen, Christophe Huygens, and Nick Nikiforakis. 2017. It's Free for a Reason: Exploring the Ecosystem of Free Live Streaming Services. In Proc. of NDSS.Google Scholar
- Jathushan Rajasegaran, Naveen Karunanayake, Ashanie Gunathillake, Suranga Seneviratne, and Guillaume Jourjon. 2019. A multi-modal neural embeddings approach for detecting mobile counterfeit apps. In Proc. of WWW.Google ScholarDigital Library
- Julio C.S. Reis, Philipe Melo, Kiran Garimella, Jussara M Almeida, Dean Eckles, and Fabr'icio Benevenuto. 2020. A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections. In Proc. of ICWSM.Google ScholarCross Ref
- Gustavo Resende, Philipe Melo, Hugo Sousa, Johnnatan Messias, Marisa Vasconcelos, Jussara Almeida, and Fabr'icio Benevenuto. 2019. (Mis) Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures. In Proc. of WWW.Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. In Proc. of IJCV.Google ScholarDigital Library
- Marc Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proc. of ICIP.Google ScholarCross Ref
- Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li, Haitao Zheng, and Ben Zhao. 2020. Fawkes: Protecting Personal Privacy against Unauthorized Deep Learning Models. In Proc. of USENIX Security.Google Scholar
- Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proc. of CCS.Google Scholar
- Shangcheng Shi, Xianbo Wang, and Wing Cheong Lau. 2019. MoSSOT: An Automated Blackbox Tester for Single Sign-On Vulnerabilities in Mobile Applications. In Proc. of AsiaCCS.Google ScholarDigital Library
- Rachee Singh, Rishab Nithyanand, Sadia Afroz, Paul Pearce, Michael Carl Tschantz, Phillipa Gill, and Vern Paxson. 2017. Characterizing the nature and dynamics of Tor exit blocking. In Proc. of USENIX Security.Google Scholar
- Kate Starbird, Ahmer Arif, and Tom Wilson. 2019. Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations. CSCW, Vol. 3 (2019).Google Scholar
- Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. 2020. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. In Proc. of USENIX Security.Google Scholar
- Ashiwin Swaminathan, Yinian Mao, and Min Wu. 2006. Robust and secure image hashing. IEEE Transactions on Information Forensics and Security, Vol. 1, 2 (2006), 215--230.Google ScholarDigital Library
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
- Zhenjun Tang, Xianquan Zhang, and Shichao Zhang. 2014. Robust Perceptual Image Hashing Based on Ring Partition and NMF. ITKDE, Vol. 26, 3 (2014), 711--724.Google ScholarDigital Library
- TinEye. 2021. TinEye. https://tineye.com/.Google Scholar
- Florian Tramèr, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan Boneh. 2019. AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning. In Proc. of CCS.Google ScholarDigital Library
- Phani Vadrevu and Roberto Perdisci. 2019. What You See is NOT What You Get: Discovering and Tracking Social Engineering Attack Campaigns. In Proc. of IMC.Google ScholarDigital Library
- Tom Van Goethem, Najmeh Miramirkhani, Wouter Joosen, and Nick Nikiforakis. 2019. Purchased Fame: Exploring the Ecosystem of Private Blog Networks. In Proc. of AsiaCCS.Google ScholarDigital Library
- Vinny Vella. 2020. Video of Delaware County poll workers filling out ballots was manipulated and lacked context, officials say. Inquirer. (6 November 2020).Google Scholar
- Ramarathnam Venkatesan, S.-M Koon, Mariusz Jakubowski, and Pierre Moulin. 2000. Robust image hashing. In Proc. of ICIP.Google ScholarCross Ref
- J. Wang, S. Kumar, and S. Chang. 2012. Semi-Supervised Hashing for Large-Scale Search. IEEE Transactions on Pattern Analysis and Machine Intelligence (2012).Google Scholar
- Xiaofeng Wang, Kemu Pang, Xiaorui Zhou, Yang Zhou, Lu Li, and Jianru Xue. 2015. A Visual Model-Based Perceptual Image Hash for Content Authentication. IEEE Transactions on Information Forensics and Security, Vol. 10, 7 (2015), 1336--1349.Google ScholarDigital Library
- Yuping Wang, Fatemeh Tahmasbi, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, David Magerman, Savvas Zannettou, and Gianluca Stringhini. 2021. Understanding the Use of Fauxtography on Social Media. In Proc. of ICWSM.Google ScholarCross Ref
- Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral Hashing. In Proc. of NeurIPS.Google Scholar
- Li Weng and Bart Preneel. 2007. Attacking Some Perceptual Image Hash Algorithms. In Proc. of ICME.Google ScholarCross Ref
- Li Weng and Bart Preneel. 2011. A Secure Perceptual Hash Algorithm for Image Content Authentication. In Proc. of CMS.Google ScholarCross Ref
- Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun. 2009. Bundling Features for Large Scale Partial-Duplicate Web Image Search. In Proc. of CVPR.Google Scholar
- Qixue Xiao, Yufei Chen, Chao Shen, Yu Chen, and Kang Li. 2019. Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms. In Proc. of USENIX Security.Google Scholar
- Yandex. 2021. Yandex Image Search. https://yandex.com/images/.Google Scholar
- Bian Yang, Fan Gu, and Xiamu Niu. 2006. Block Mean Value Based Image Perceptual Hashing. In Proc. of IIHMSP.Google ScholarCross Ref
- Kan Yuan, Di Tang, Xiaojing Liao, XiaoFeng Wang, Xuan Feng, Yi Chen, Menghan Sun, Haoran Lu, and Kehuan Zhang. 2019. Stealthy porn: Understanding real-world adversarial images for illicit online promotion. In Proc. of IEEE SP.Google ScholarCross Ref
- Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. 2018. On the Origins of Memes by Means of Fringe Web Communities. In Proc. of IMC.Google ScholarDigital Library
- Savvas Zannettou, Tristan Caulfield, Barry Bradlyn, Emiliano De Cristofaro, Gianluca Stringhini, and Jeremy Blackburn. 2020. Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter. In Proc. of ICWSM.Google ScholarCross Ref
- Christoph Zauner. 2010. Implementation and Benchmarking of Perceptual Image Hash Functions. Master's thesis, Upper Austria University of Applied Sciences, Hagenberg Campus. (2010).Google Scholar
- Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. of CVPR.Google ScholarCross Ref
Index Terms
- It's Not What It Looks Like: Manipulating Perceptual Hashing based Applications
Recommendations
SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries
AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and SecurityGiven black-box access to the prediction API, model extraction attacks can steal the functionality of models deployed in the cloud. In this paper, we introduce the SEAT detector, which detects black-box model extraction attacks so that the defender can ...
Spanning attack: reinforce black-box attacks with unlabeled data
AbstractAdversarial black-box attacks aim to craft adversarial perturbations by querying input–output pairs of machine learning models. They are widely used to evaluate the robustness of pre-trained models. However, black-box attacks often suffer from the ...
Real-time visual tracking based on improved perceptual hashing
Video object tracking represents a very important computer vision domain. In this paper, a perceptual hashing based template-matching method for object tracking is proposed to efficiently track objects in challenging video sequences. In the tracking ...
Comments