ABSTRACT
This paper addresses the problem of determining the best answer in Community-based Question Answering websites by focussing on the content. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.
- L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web, pages 665--674. ACM, 2008. Google ScholarDigital Library
- E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 183--194. ACM, 2008. Google ScholarDigital Library
- A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850--858. ACM, 2012. Google ScholarDigital Library
- S. Angeletou, M. Rowe, and H. Alani. Modelling and analysis of user behaviour in online communities. In The Semantic Web--ISWC 2011, pages 35--50. Springer, 2011. Google ScholarDigital Library
- G. Burel, Y. He, and H. Alani. Automatic identification of best answers in online enquiry communities. In The Semantic Web: Research and Applications, pages 514--529. Springer, 2012. Google ScholarDigital Library
- J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of NAACL HLT, pages 460--467, 2007.Google Scholar
- C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: User lifecycle and linguistic change in online communities. In Proceedings of the 22nd international conference on World Wide Web, pages 307--318. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
- L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276--284. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- Y. Freund and L. Mason. The alternating decision tree learning algorithm. In ICML, volume 99, pages 124--133, 1999. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
- J. Jones and N. Altadonna. We don't need no stinkin'badges: examining the social role of badges in the huffington post. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 249--252. ACM, 2012. Google ScholarDigital Library
- J. Liu, Q. Wang, C.-Y. Lin, and H.-W. Hon. Question difficulty estimation in community question answering services. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 85--90, 2013.Google Scholar
- S. T. Piantadosi, H. Tily, and E. Gibson. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9):3526--3529, 2011.Google ScholarCross Ref
- E. Pitler and A. Nenkova. Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 186--195. Association for Computational Linguistics, 2008. Google ScholarDigital Library
- M. Rowe, M. Fernandez, S. Angeletou, and H. Alani. Ontology paper: Community analysis through semantic rules and role composition derivation. Web Semantics: Science, Services and Agents on the World nWide Web, 18(1):31--47, 2013. Google ScholarDigital Library
- C. Shah and J. Pomerantz. Evaluating and Predicting Answer Quality in Community QA. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010. Google ScholarDigital Library
- Q. Tian, P. Zhang, and B. Li. Towards predicting the best answers in community-based question-answering services. In Seventh International AAAI Conference on Weblogs and Social Media, 2013.Google Scholar
- L. Yang, S. Bao, Q. Lin, X. Wu, D. Han, Z. Su, and Y. Yu. Analyzing and predicting not-answered questions in community-based question answering services. In AAAI, 2011.Google ScholarDigital Library
Index Terms
- It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features
Recommendations
Evaluating and predicting answer quality in community QA
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalQuestion answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
Finding high-quality content in social media
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningThe quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes ...
Building a web test collection using social media
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalCommunity Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test ...
Comments