Abstract
Social media have attracted considerable attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and topical components of social media has been only partially explored. Here, we study the presence of homophily in three systems that combine tagging social media with online social networks. We find a substantial level of topical similarity among users who are close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local similarity between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. This analysis suggests that users with similar interests are more likely to be friends, and therefore topical similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this hypothesis on several datasets, confirming that social networks constructed from topical similarity capture actual friendship accurately. When combined with topological features, topical similarity achieves a link prediction accuracy of about 92%.
- Aiello, L. M., Barrat, A., Cattuto, C., Ruffo, G., and Schifanella, R. 2010. Link creation and profile alignment in the aNobii social network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom'10). IEEE, Los Alamitos, CA, 249--256. Google ScholarDigital Library
- Aral, S., Muchnik, L., and Sundararajan, A. 2009. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Nat. Acad. Sci. 106, 51, 21544--21549.Google ScholarCross Ref
- Backstrom, L. and Leskovec, J. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11), ACM, New York. Google ScholarDigital Library
- Benchettara, N., Kanawati, R., and Rouveirol, C. 2010. Supervised machine learning applied to link prediction in bipartite social networks. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining. IEEE, Los Alamitos, CA, 326--330. Google ScholarDigital Library
- Caragea, D., Bahirwani, V., Aljandal, W., and H. Hsu, W. 2009. Ontology-based link prediction in the live journal social network. In Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA'09).Google Scholar
- Catanzaro, M., Boguñá, M., and Pastor-Satorras, R. 2005. Generation of uncorrelated random scale-free networks. Phys. Rev. E 71, 027103.Google ScholarCross Ref
- Cattuto, C., Benz, D., Hotho, A., and Stumme, G. 2008. Semantic grounding of tag relatedness in social bookmarking systems. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, Berlin, 615--631. Google ScholarDigital Library
- Clauset, A., Moore, C., and Newman, M. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98--101.Google ScholarCross Ref
- Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., and Suri, S. 2008. Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD '08), ACM, New York,160--168. Google ScholarDigital Library
- Dunlavy, D. M., Kolda, T. G., and Acar, E. 2010. Temporal link prediction using matrix and tensor factorizations. arXiv:1005.4006, Cornell University Library.Google Scholar
- Fawcett, T. 2006. An introduction to roc analysis. Pattern Recogn. Lett. 27, 861--874. Google ScholarDigital Library
- Feldman, R. and Sanger, J. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. Google ScholarDigital Library
- Getoor, L. and Diehl, C. P. 2005. Link mining: a survey. ACM SIGKDD Explorations Newslett. 7, 2, 3--12. Google ScholarDigital Library
- Getoor, L., Friedman, N., Koller, D., and Taskar, B. 2003. Learning probabilistic models of link structure. J. Mach. Learn. Res. 3, 679--707. Google ScholarDigital Library
- Golder, S. and Huberman, B. A. 2006. The structure of collaborative tagging systems. J. Inf. Sci. 32, 2, 198--208. Google ScholarDigital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The weka data mining software: An update. SIGKDD Explorations Newslett. 11. Google ScholarDigital Library
- Hasan, M. A., Chaoji, V., Salem, S., and Zaki, M. 2006. Link prediction using supervised learning. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.Google Scholar
- Haveliwala, T. H. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 784--796. Google ScholarDigital Library
- Huan, Z. 2006. Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (LinkKDD'06). ACM, New York.Google Scholar
- Kashima, H. and Abe, N. 2006. A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th International Conference on Data Mining (ICDM '06), IEEE, Los Alamitos, CA, 340--349. Google ScholarDigital Library
- Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). ACM, New York, 611--617. Google ScholarDigital Library
- Kunegis, J., De Luca, E., and Albayrak, S. 2010. The link prediction problem in bipartite networks. In Computational Intelligence for Knowledge-Based Systems Design, E. Hllermeier et al., Eds., Lecture Notes in Computer Science, vol. 6178, Springer, Berlin, 380--389. Google ScholarDigital Library
- Leenders, R. 1997. Longitudinal behavior of network structure and actor attributes: Modeling interdependence of contagion and selection. In Evolution of Social Networks, Vol. 1, P. Doreian and F. Stokman, Eds.Google Scholar
- Lerman, K. and Jones, L. 2007. Social browsing on flickr. In Proceedings of International Conference on Weblogs and Social Media (ICWSM'07). http://arxiv.org/abs/cs.HC/0612047.Google Scholar
- Leroy, V., Cambazoglu, B. B., and Bonchi, F. 2010. Cold start link prediction. In Proceedings of the 16th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD'10). ACM, New York. Google ScholarDigital Library
- Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, New York, 462--470. Google ScholarDigital Library
- Leskovec, J. and Horvitz, E. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 915--924. Google ScholarDigital Library
- Leskovec, J., Huttenlocher, D., and Kleinberg, J. 2010. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York. Google ScholarDigital Library
- Li, X., Guo, L., and Zhao, Y. E. 2008. Tag-based social interest discovery. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 675--684. Google ScholarDigital Library
- Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). ACM, New York, 556--559. Google ScholarDigital Library
- Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (ICML). J. W. Shavlik, Ed., Morgan Kaufmann, 296--304. Google ScholarDigital Library
- Lü, L. and Zhou, T. 2009. Role of weak ties in link prediction of complex networks. In Proceedings of the 1st ACM International Workshop on Complex Networks Meet Information and Knowledge Management (CNIKM '09). ACM, New York, 55--58. Google ScholarDigital Library
- Lü, L. and Zhou, T. 2010. Link prediction in complex networks: A survey. Preprint. http://arxiv.org/abs/1010.0725.Google Scholar
- Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., and Stumme, G. 2009. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York. Google ScholarDigital Library
- Markines, B. and Menczer, F. 2009. A scalable, collaborative similarity measure for social annotation systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). ACM, New York. Google ScholarDigital Library
- Markines, B., Roinestad, H., and Menczer, F. 2008. Efficient assembly of social semantic networks. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HT'08). ACM, New York, 149--156. Google ScholarDigital Library
- Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Ht06, tagging paper, taxonomy, flickr, academic article to read. In Proceedings of the 17th Conference on Hypertext and Hypermedia (HYPERTEXT '06). ACM, New York, 31--40. Google ScholarDigital Library
- Maslov, S., Sneppen, K., and Zaliznyak, A. 2004. Detection of topological patterns in complex networks: Correlation profile of the Internet. Physica A 333, 529--540.Google ScholarCross Ref
- McPherson, M., Smith-Lovin, L., and Cook, J. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociology 27, 415--444.Google ScholarCross Ref
- Mislove, A., Koppula, H. S., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2008. Growth of the Flickr social network. In Proceedings of the 1st Workshop on Online Social Networks (WOSP '08). ACM, New York, 25--30. Google ScholarDigital Library
- Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). ACM, New York, 29--42. Google ScholarDigital Library
- Mislove, A., Viswanath, B., Gummadi, K. P., and Druschel, P. 2010. You are who you know: Inferring user profiles in online social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, 251--260. Google ScholarDigital Library
- Molloy, M. and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Alg. 6, 161--179. Google ScholarDigital Library
- Murata, T. and Moriyasu, S. 2007. Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 85--88. Google ScholarDigital Library
- Newman, M. E. J. 2001. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102.Google ScholarCross Ref
- Newman, M. E. J. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701.Google ScholarCross Ref
- Newman, M. E. J. 2003. Mixing patterns in networks. Phys. Rev. E 67, 026126.Google ScholarCross Ref
- Newman, M. E. J. and Park, J. 2003. Why social networks are different from other types of networks. Phys. Rev. E 68, 3, 036122.Google ScholarCross Ref
- Pastor-Satorras, R., Vázquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87, 258701.Google ScholarCross Ref
- Popescul, A., Popescul, R., and Ungar, L. H. 2003. Structural logistic regression for link analysis. In Proceedings of the 2nd International Workshop on Multirelational Data Mining.Google Scholar
- Prieur, C., Cardon, D., Beuscart, J.-S., Pissard, N., and Pons, P. 2008. The strength of weak cooperation: A case study on flickr. Tech. rep. arXiv:0802.2317v1, CoRR.Google Scholar
- Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Boston, MA. Google ScholarDigital Library
- Santos-Neto, E., Condon, D., Andrade, N., Iamnitchi, A., and Ripeanu, M. 2009. Individual and social behavior in tagging systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). C. Cattuto et al. Eds., ACM, New York,183--192. Google ScholarDigital Library
- Schifanella, R., Barrat, A., Cattuto, C., Markines, B., and Menczer, F. 2010. Folks in folksonomies: social link prediction from shared metadata. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM '10). ACM, New York, 271--280. Google ScholarDigital Library
- Serrano, M. A. and Boguñá, M. 2005. Tuning clustering in random networks with arbitrary degree distributions. Phys. Rev. E 72, 036133.Google ScholarCross Ref
- Shalizi, C. and Thomas, A. 2010. Homophily and contagion are generically confounded in observational social network studies. Preprint, arxiv:1004.4704.Google Scholar
- Stäger, M., Lukowicz, P., and Troster, G. 2006. Dealing with class skew in context recognition. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops. IEEE, Los Alamitos, CA, 58. Google ScholarDigital Library
- Szell, M., Lambiotte, R., and Thurner, S. 2010. Multirelational organization of large-scale social networks in an online world. Proc. Nat. Acad. Sci. 107, 31, 13636--13641.Google ScholarCross Ref
- Taskar, B.,Wong, M. F., Abbeel, P., and Koller, D. 2003. Link prediction in relational data. In Proceedings of the Neural Information Processing Systems Conference (NIPS'03).Google Scholar
- van Zwol, R. 2007. Flickr: Who is looking? In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 184--190. Google ScholarDigital Library
- Vázquez, A., Pastor-Satorras, R., and Vespignani, A. 2002. Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130.Google ScholarCross Ref
Index Terms
- Friendship prediction and homophily in social media
Recommendations
Folks in Folksonomies: social link prediction from shared metadata
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningWeb 2.0 applications have attracted a considerable amount of attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and semantic components of ...
Investigating Homophily in Online Social Networks
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. %Several studies have observed this phenomena by conducting surveys on human subjects. These studies have concluded that new ties are ...
Social Link Prediction in Online Social Tagging Systems
Social networks have become a popular medium for people to communicate and distribute ideas, content, news, and advertisements. Social content annotation has naturally emerged as a method of categorization and filtering of online information. The ...
Comments