research-article

Friendship prediction and homophily in social media

Authors:
Luca Maria Aiello

University of Turin, Italy

University of Turin, Italy
View Profile

,
Alain Barrat

Aix-Marseille University and University Sud Toulon, France, ISI Foundation, Italy

Aix-Marseille University and University Sud Toulon, France, ISI Foundation, Italy
View Profile

,
Rossano Schifanella

University of Turin, Italy

University of Turin, Italy
View Profile

,
Ciro Cattuto

ISI Foundation, Italy

ISI Foundation, Italy
View Profile

,
Benjamin Markines

Indiana University, Bloomington

Indiana University, Bloomington
View Profile

,
Filippo Menczer

Indiana University, Bloomington

Indiana University, Bloomington
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 6 Issue 2Article No.: 9pp 1–33https://doi.org/10.1145/2180861.2180866

Published:04 June 2012Publication History

ACM Transactions on the Web

Abstract

Social media have attracted considerable attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and topical components of social media has been only partially explored. Here, we study the presence of homophily in three systems that combine tagging social media with online social networks. We find a substantial level of topical similarity among users who are close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local similarity between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. This analysis suggests that users with similar interests are more likely to be friends, and therefore topical similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this hypothesis on several datasets, confirming that social networks constructed from topical similarity capture actual friendship accurately. When combined with topological features, topical similarity achieves a link prediction accuracy of about 92%.

References

Aiello, L. M., Barrat, A., Cattuto, C., Ruffo, G., and Schifanella, R. 2010. Link creation and profile alignment in the aNobii social network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom'10). IEEE, Los Alamitos, CA, 249--256. Google ScholarDigital Library
Aral, S., Muchnik, L., and Sundararajan, A. 2009. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Nat. Acad. Sci. 106, 51, 21544--21549.Google ScholarCross Ref
Backstrom, L. and Leskovec, J. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11), ACM, New York. Google ScholarDigital Library
Benchettara, N., Kanawati, R., and Rouveirol, C. 2010. Supervised machine learning applied to link prediction in bipartite social networks. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining. IEEE, Los Alamitos, CA, 326--330. Google ScholarDigital Library
Caragea, D., Bahirwani, V., Aljandal, W., and H. Hsu, W. 2009. Ontology-based link prediction in the live journal social network. In Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA'09).Google Scholar
Catanzaro, M., Boguñá, M., and Pastor-Satorras, R. 2005. Generation of uncorrelated random scale-free networks. Phys. Rev. E 71, 027103.Google ScholarCross Ref
Cattuto, C., Benz, D., Hotho, A., and Stumme, G. 2008. Semantic grounding of tag relatedness in social bookmarking systems. In Proceedings of the 7th International Semantic Web Conference (ISWC'08). Lecture Notes in Computer Science, vol. 5318, Springer, Berlin, 615--631. Google ScholarDigital Library
Clauset, A., Moore, C., and Newman, M. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98--101.Google ScholarCross Ref
Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., and Suri, S. 2008. Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD '08), ACM, New York,160--168. Google ScholarDigital Library
Dunlavy, D. M., Kolda, T. G., and Acar, E. 2010. Temporal link prediction using matrix and tensor factorizations. arXiv:1005.4006, Cornell University Library.Google Scholar
Fawcett, T. 2006. An introduction to roc analysis. Pattern Recogn. Lett. 27, 861--874. Google ScholarDigital Library
Feldman, R. and Sanger, J. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. Google ScholarDigital Library
Getoor, L. and Diehl, C. P. 2005. Link mining: a survey. ACM SIGKDD Explorations Newslett. 7, 2, 3--12. Google ScholarDigital Library
Getoor, L., Friedman, N., Koller, D., and Taskar, B. 2003. Learning probabilistic models of link structure. J. Mach. Learn. Res. 3, 679--707. Google ScholarDigital Library
Golder, S. and Huberman, B. A. 2006. The structure of collaborative tagging systems. J. Inf. Sci. 32, 2, 198--208. Google ScholarDigital Library
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The weka data mining software: An update. SIGKDD Explorations Newslett. 11. Google ScholarDigital Library
Hasan, M. A., Chaoji, V., Salem, S., and Zaki, M. 2006. Link prediction using supervised learning. In Proceedings of the SDM Workshop on Link Analysis, Counterterrorism and Security.Google Scholar
Haveliwala, T. H. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 784--796. Google ScholarDigital Library
Huan, Z. 2006. Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (LinkKDD'06). ACM, New York.Google Scholar
Kashima, H. and Abe, N. 2006. A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th International Conference on Data Mining (ICDM '06), IEEE, Los Alamitos, CA, 340--349. Google ScholarDigital Library
Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). ACM, New York, 611--617. Google ScholarDigital Library
Kunegis, J., De Luca, E., and Albayrak, S. 2010. The link prediction problem in bipartite networks. In Computational Intelligence for Knowledge-Based Systems Design, E. Hllermeier et al., Eds., Lecture Notes in Computer Science, vol. 6178, Springer, Berlin, 380--389. Google ScholarDigital Library
Leenders, R. 1997. Longitudinal behavior of network structure and actor attributes: Modeling interdependence of contagion and selection. In Evolution of Social Networks, Vol. 1, P. Doreian and F. Stokman, Eds.Google Scholar
Lerman, K. and Jones, L. 2007. Social browsing on flickr. In Proceedings of International Conference on Weblogs and Social Media (ICWSM'07). http://arxiv.org/abs/cs.HC/0612047.Google Scholar
Leroy, V., Cambazoglu, B. B., and Bonchi, F. 2010. Cold start link prediction. In Proceedings of the 16th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD'10). ACM, New York. Google ScholarDigital Library
Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. 2008. Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, New York, 462--470. Google ScholarDigital Library
Leskovec, J. and Horvitz, E. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 915--924. Google ScholarDigital Library
Leskovec, J., Huttenlocher, D., and Kleinberg, J. 2010. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York. Google ScholarDigital Library
Li, X., Guo, L., and Zhao, Y. E. 2008. Tag-based social interest discovery. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 675--684. Google ScholarDigital Library
Liben-Nowell, D. and Kleinberg, J. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). ACM, New York, 556--559. Google ScholarDigital Library
Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (ICML). J. W. Shavlik, Ed., Morgan Kaufmann, 296--304. Google ScholarDigital Library
Lü, L. and Zhou, T. 2009. Role of weak ties in link prediction of complex networks. In Proceedings of the 1st ACM International Workshop on Complex Networks Meet Information and Knowledge Management (CNIKM '09). ACM, New York, 55--58. Google ScholarDigital Library
Lü, L. and Zhou, T. 2010. Link prediction in complex networks: A survey. Preprint. http://arxiv.org/abs/1010.0725.Google Scholar
Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., and Stumme, G. 2009. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York. Google ScholarDigital Library
Markines, B. and Menczer, F. 2009. A scalable, collaborative similarity measure for social annotation systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). ACM, New York. Google ScholarDigital Library
Markines, B., Roinestad, H., and Menczer, F. 2008. Efficient assembly of social semantic networks. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HT'08). ACM, New York, 149--156. Google ScholarDigital Library
Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Ht06, tagging paper, taxonomy, flickr, academic article to read. In Proceedings of the 17th Conference on Hypertext and Hypermedia (HYPERTEXT '06). ACM, New York, 31--40. Google ScholarDigital Library
Maslov, S., Sneppen, K., and Zaliznyak, A. 2004. Detection of topological patterns in complex networks: Correlation profile of the Internet. Physica A 333, 529--540.Google ScholarCross Ref
McPherson, M., Smith-Lovin, L., and Cook, J. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociology 27, 415--444.Google ScholarCross Ref
Mislove, A., Koppula, H. S., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2008. Growth of the Flickr social network. In Proceedings of the 1st Workshop on Online Social Networks (WOSP '08). ACM, New York, 25--30. Google ScholarDigital Library
Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). ACM, New York, 29--42. Google ScholarDigital Library
Mislove, A., Viswanath, B., Gummadi, K. P., and Druschel, P. 2010. You are who you know: Inferring user profiles in online social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, 251--260. Google ScholarDigital Library
Molloy, M. and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Alg. 6, 161--179. Google ScholarDigital Library
Murata, T. and Moriyasu, S. 2007. Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 85--88. Google ScholarDigital Library
Newman, M. E. J. 2001. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102.Google ScholarCross Ref
Newman, M. E. J. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701.Google ScholarCross Ref
Newman, M. E. J. 2003. Mixing patterns in networks. Phys. Rev. E 67, 026126.Google ScholarCross Ref
Newman, M. E. J. and Park, J. 2003. Why social networks are different from other types of networks. Phys. Rev. E 68, 3, 036122.Google ScholarCross Ref
Pastor-Satorras, R., Vázquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87, 258701.Google ScholarCross Ref
Popescul, A., Popescul, R., and Ungar, L. H. 2003. Structural logistic regression for link analysis. In Proceedings of the 2nd International Workshop on Multirelational Data Mining.Google Scholar
Prieur, C., Cardon, D., Beuscart, J.-S., Pissard, N., and Pons, P. 2008. The strength of weak cooperation: A case study on flickr. Tech. rep. arXiv:0802.2317v1, CoRR.Google Scholar
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Boston, MA. Google ScholarDigital Library
Santos-Neto, E., Condon, D., Andrade, N., Iamnitchi, A., and Ripeanu, M. 2009. Individual and social behavior in tagging systems. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (HT'09). C. Cattuto et al. Eds., ACM, New York,183--192. Google ScholarDigital Library
Schifanella, R., Barrat, A., Cattuto, C., Markines, B., and Menczer, F. 2010. Folks in folksonomies: social link prediction from shared metadata. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM '10). ACM, New York, 271--280. Google ScholarDigital Library
Serrano, M. A. and Boguñá, M. 2005. Tuning clustering in random networks with arbitrary degree distributions. Phys. Rev. E 72, 036133.Google ScholarCross Ref
Shalizi, C. and Thomas, A. 2010. Homophily and contagion are generically confounded in observational social network studies. Preprint, arxiv:1004.4704.Google Scholar
Stäger, M., Lukowicz, P., and Troster, G. 2006. Dealing with class skew in context recognition. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops. IEEE, Los Alamitos, CA, 58. Google ScholarDigital Library
Szell, M., Lambiotte, R., and Thurner, S. 2010. Multirelational organization of large-scale social networks in an online world. Proc. Nat. Acad. Sci. 107, 31, 13636--13641.Google ScholarCross Ref
Taskar, B.,Wong, M. F., Abbeel, P., and Koller, D. 2003. Link prediction in relational data. In Proceedings of the Neural Information Processing Systems Conference (NIPS'03).Google Scholar
van Zwol, R. 2007. Flickr: Who is looking? In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '07). IEEE, Los Alamitos, CA, 184--190. Google ScholarDigital Library
Vázquez, A., Pastor-Satorras, R., and Vespignani, A. 2002. Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130.Google ScholarCross Ref

Index Terms

Friendship prediction and homophily in social media

Recommendations

Folks in Folksonomies: social link prediction from shared metadata
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Web 2.0 applications have attracted a considerable amount of attention because their open-ended nature allows users to create lightweight semantic scaffolding to organize and share content. To date, the interplay of the social and semantic components of ...
Read More
Investigating Homophily in Online Social Networks
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. %Several studies have observed this phenomena by conducting surveys on human subjects. These studies have concluded that new ties are ...
Read More
Social Link Prediction in Online Social Tagging Systems

Social networks have become a popular medium for people to communicate and distribute ideas, content, news, and advertisements. Social content annotation has naturally emerged as a method of categorization and filtering of online information. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on the Web Volume 6, Issue 2
May 2012
137 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2180861
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2012
- Accepted: 1 October 2011
- Revised: 1 January 2011
- Received: 1 July 2010
Published in tweb Volume 6, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social media
collaborative tagging
folksonomies
homophily
link prediction
maximum Information path
social network
topical similarity
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 295
  Total Citations
  View Citations
- 4,729
  Total Downloads
- Downloads (Last 12 months)331
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Friendship prediction and homophily in social media

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Folks in Folksonomies: social link prediction from shared metadata

Investigating Homophily in Online Social Networks

Social Link Prediction in Online Social Tagging Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Friendship prediction and homophily in social media

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Folks in Folksonomies: social link prediction from shared metadata

Investigating Homophily in Online Social Networks

Social Link Prediction in Online Social Tagging Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media