research-article

It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features

Authors:
George Gkotsis

Open University, Milton Keynes, United Kingdom

Open University, Milton Keynes, United Kingdom
View Profile

,
Karen Stepanyan

Knowledge Media Institute, Milton Keynes, United Kingdom

Knowledge Media Institute, Milton Keynes, United Kingdom
View Profile

,
Carlos Pedrinaci

Knowledge Media Institute, Milton Keynes, United Kingdom

Knowledge Media Institute, Milton Keynes, United Kingdom
View Profile

,
John Domingue

Knowledge Media Institute, Milton Keynes, United Kingdom

Knowledge Media Institute, Milton Keynes, United Kingdom
View Profile

,
Maria Liakata

University of Warwick, Coventry, United Kingdom

University of Warwick, Coventry, United Kingdom
View Profile

WebSci '14: Proceedings of the 2014 ACM conference on Web scienceJune 2014Pages 202–210https://doi.org/10.1145/2615569.2615681

Published:23 June 2014Publication History

WebSci '14: Proceedings of the 2014 ACM conference on Web science

Pages 202–210

ABSTRACT

This paper addresses the problem of determining the best answer in Community-based Question Answering websites by focussing on the content. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.

References

L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web, pages 665--674. ACM, 2008. Google ScholarDigital Library
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 183--194. ACM, 2008. Google ScholarDigital Library
A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850--858. ACM, 2012. Google ScholarDigital Library
S. Angeletou, M. Rowe, and H. Alani. Modelling and analysis of user behaviour in online communities. In The Semantic Web--ISWC 2011, pages 35--50. Springer, 2011. Google ScholarDigital Library
G. Burel, Y. He, and H. Alani. Automatic identification of best answers in online enquiry communities. In The Semantic Web: Research and Applications, pages 514--529. Springer, 2012. Google ScholarDigital Library
J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of NAACL HLT, pages 460--467, 2007.Google Scholar
C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: User lifecycle and linguistic change in online communities. In Proceedings of the 22nd international conference on World Wide Web, pages 307--318. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276--284. Association for Computational Linguistics, 2010. Google ScholarDigital Library
Y. Freund and L. Mason. The alternating decision tree learning algorithm. In ICML, volume 99, pages 124--133, 1999. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
J. Jones and N. Altadonna. We don't need no stinkin'badges: examining the social role of badges in the huffington post. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 249--252. ACM, 2012. Google ScholarDigital Library
J. Liu, Q. Wang, C.-Y. Lin, and H.-W. Hon. Question difficulty estimation in community question answering services. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 85--90, 2013.Google Scholar
S. T. Piantadosi, H. Tily, and E. Gibson. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9):3526--3529, 2011.Google ScholarCross Ref
E. Pitler and A. Nenkova. Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 186--195. Association for Computational Linguistics, 2008. Google ScholarDigital Library
M. Rowe, M. Fernandez, S. Angeletou, and H. Alani. Ontology paper: Community analysis through semantic rules and role composition derivation. Web Semantics: Science, Services and Agents on the World nWide Web, 18(1):31--47, 2013. Google ScholarDigital Library
C. Shah and J. Pomerantz. Evaluating and Predicting Answer Quality in Community QA. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010. Google ScholarDigital Library
Q. Tian, P. Zhang, and B. Li. Towards predicting the best answers in community-based question-answering services. In Seventh International AAAI Conference on Weblogs and Social Media, 2013.Google Scholar
L. Yang, S. Bao, Q. Lin, X. Wu, D. Han, Z. Su, and Y. Yu. Analyzing and predicting not-answered questions in community-based question answering services. In AAAI, 2011.Google ScholarDigital Library

Index Terms

It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Evaluating and predicting answer quality in community QA
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Question answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
Read More
Finding high-quality content in social media
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes ...
Read More
Building a web test collection using social media
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebSci '14: Proceedings of the 2014 ACM conference on Web science
June 2014
318 pages
ISBN:9781450326223
DOI:10.1145/2615569
General Chairs:
Filippo Menczer
Indiana University, USA
,
Jim Hendler
Rensselaer Polytechnic Institute, USA
,
William Dutton
University of Oxford, UK
,
Program Chairs:
Markus Strohmaier
GESIS & University of Koblenz-Landau, Germany
,
Eric T. Meyer
University of Oxford, UK
,
Ciro Cattuto
ISI Foundation, Italy
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community question answering
social media
Qualifiers
- research-article
Conference

Acceptance Rates
WebSci '14 Paper Acceptance Rate29of144submissions,20%Overall Acceptance Rate218of875submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features

WebSci '14: Proceedings of the 2014 ACM conference on Web science

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluating and predicting answer quality in community QA

Finding high-quality content in social media

Building a web test collection using social media