skip to main content
10.1145/2615569.2615681acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features

Authors Info & Claims
Published:23 June 2014Publication History

ABSTRACT

This paper addresses the problem of determining the best answer in Community-based Question Answering websites by focussing on the content. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.

References

  1. L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web, pages 665--674. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 183--194. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850--858. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Angeletou, M. Rowe, and H. Alani. Modelling and analysis of user behaviour in online communities. In The Semantic Web--ISWC 2011, pages 35--50. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Burel, Y. He, and H. Alani. Automatic identification of best answers in online enquiry communities. In The Semantic Web: Research and Applications, pages 514--529. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of NAACL HLT, pages 460--467, 2007.Google ScholarGoogle Scholar
  7. C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: User lifecycle and linguistic change in online communities. In Proceedings of the 22nd international conference on World Wide Web, pages 307--318. International World Wide Web Conferences Steering Committee, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276--284. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Freund and L. Mason. The alternating decision tree learning algorithm. In ICML, volume 99, pages 124--133, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Jones and N. Altadonna. We don't need no stinkin'badges: examining the social role of badges in the huffington post. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 249--252. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Liu, Q. Wang, C.-Y. Lin, and H.-W. Hon. Question difficulty estimation in community question answering services. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 85--90, 2013.Google ScholarGoogle Scholar
  13. S. T. Piantadosi, H. Tily, and E. Gibson. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9):3526--3529, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  14. E. Pitler and A. Nenkova. Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 186--195. Association for Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Rowe, M. Fernandez, S. Angeletou, and H. Alani. Ontology paper: Community analysis through semantic rules and role composition derivation. Web Semantics: Science, Services and Agents on the World nWide Web, 18(1):31--47, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Shah and J. Pomerantz. Evaluating and Predicting Answer Quality in Community QA. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Q. Tian, P. Zhang, and B. Li. Towards predicting the best answers in community-based question-answering services. In Seventh International AAAI Conference on Weblogs and Social Media, 2013.Google ScholarGoogle Scholar
  18. L. Yang, S. Bao, Q. Lin, X. Wu, D. Han, Z. Su, and Y. Yu. Analyzing and predicting not-answered questions in community-based question answering services. In AAAI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WebSci '14: Proceedings of the 2014 ACM conference on Web science
        June 2014
        318 pages
        ISBN:9781450326223
        DOI:10.1145/2615569

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WebSci '14 Paper Acceptance Rate29of144submissions,20%Overall Acceptance Rate218of875submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader