Skip to main content

Large Scale Personality Classification of Bloggers

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6975))

Abstract

Personality is a fundamental component of an individual’s affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argamon, S., Dhawle, S., Koppel, M., Pennebaker, J.W.: Lexical predictors of personality type. In: Proceedings of the 2005 Joint Annual Meeting of the Interface and the Classification Society of North America (2005)

    Google Scholar 

  2. Costa, P.T., McCrae, R.R.: Neo PI-R Professional Manual. In: Psychological Assessment Resources, Odessa, FL (1992)

    Google Scholar 

  3. Eid, M., Diener, E.: Intraindividual variability in affect: Reliability, validity, and personality correlates. Journal of Personality and Social Psychology 76(4), 662–676 (1999)

    Article  Google Scholar 

  4. Estival, D., Gaustad, T., Pham, S.B., Radford, W., Hutchinson, B.: Author profiling for english emails. In: 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pp. 262–272 (2007)

    Google Scholar 

  5. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  6. Gill, A.J., Nowson, S., Oberlander, J.: What are they blogging about? personality, topic and motivation in blogs. In: ICWSM 2009 (2009)

    Google Scholar 

  7. Gütlein, M.: Large scale attribute selection using wrappers. Master’s thesis, Albert-Ludwigs-Universitat, Freiburg (2006)

    Google Scholar 

  8. Hall, M.A., Smith, L.: Practical feature subset selection for machine learning. In: Proc. 21st Australian Computer Science Conference, Perth, Australia, pp. 181–191. Springer, Heidelberg (1998)

    Google Scholar 

  9. Herring, S., Scheidt, L., Bonus, S., Wright, E.: Weblogs as a bridging genre. Information, Technology & People 18(2), 142–171 (2005)

    Article  Google Scholar 

  10. Kramer, A.D.I., Fussell, S.R., Setlock, L.D.: Text analysis as a tool for analyzing conversation in online support groups. In: Extended Abstracts of the 2004 Conference on Human Factors and Computing Systems, pp. 1485–1488 (2004)

    Google Scholar 

  11. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30, 457–500 (2007)

    MATH  Google Scholar 

  12. Mehl, M.R., Gosling, S.D., Pennebaker, J.W.: Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology 90(5), 862–877 (2006)

    Article  Google Scholar 

  13. Nowson, S.: The Language of Weblogs: A study of genre and individual differences. PhD thesis, University of Edinburgh (2006)

    Google Scholar 

  14. Nowson, S., Oberlander, J.: Identifying more bloggers: Towards large scale personality classification of personal weblogs. In: Proceedings of the International Conference on Weblogs and Social (2007)

    Google Scholar 

  15. Nowson, S., Oberlander, J., Gill, A.J.: Weblogs, genres and individual differences. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1666–1671 (2005)

    Google Scholar 

  16. Oberlander, J., Gill, A.J.: Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes 42(3), 239–270 (2006)

    Article  Google Scholar 

  17. Oberlander, J., Nowson, S.: Whose thumb is it anyway? Classifying author personality from weblog text. In: Proceedings of COLING/ACL-2006: 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (2006)

    Google Scholar 

  18. Pennebaker, J.W., Francis, M.E.: Linguistic Inquiry and Word Count, 1st edn. Lawrence Erlbaum, Mahwah (1999)

    Google Scholar 

  19. Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296–1312 (1999)

    Article  Google Scholar 

  20. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  21. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  22. Reeves, B., Nass, C.: The media equation: how people treat computers, television, and new media like real people and places. Cambridge University Press, New York (1996)

    Google Scholar 

  23. Schutte, N.S., Malouff, J.M.: University student reading preferences in relation to the big five personality dimensions. Reading Psychology an International Quarterly 25(4), 273–295 (2004)

    Article  Google Scholar 

  24. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  25. Yarkoni, T.: Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality 44, 363–373 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Iacobelli, F., Gill, A.J., Nowson, S., Oberlander, J. (2011). Large Scale Personality Classification of Bloggers. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24571-8_71

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24570-1

  • Online ISBN: 978-3-642-24571-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics