Abstract
In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.
- 1 Salton, G. Automatic btformation Organiza;ion and Retrieval. McGraw-Hill, New York, 1968, Ch. 4. Google ScholarDigital Library
- 2 Salton, G., and Yang, C.S. On the specification of term values in automatic indexing. J. Documen. 29, 4 (Dec. 1973), 351-372.Google ScholarCross Ref
- 3 Sparck Jones, K. A statistical interpretation of term specificity and its application to retrieval. J. Documen. 28, 1 (March 1972), 11-20.Google ScholarCross Ref
- 4 Williamson, R.E. Real-time document retrieval. Ph.D. Th., Computer Sci. Dep., Cornell U., June 1974.Google Scholar
- 5 Wong, A. An investigation of the effects of different indexing methods on the document space configuration. Sci. Rep. ISR-22, Computer Sci. Dep., Cornell U., Section II, Nov. 1974.Google Scholar
- 6 Salton, G. A theory of indexing. Regional Conference Series in Applied Mathematics No. 18, SIAM, Philadelphia, Pa., 1975. Google ScholarDigital Library
- 7 Salton, G., Yang, C.S., and Yu, C.T. Contribution to the theory of indexing. Proc. IFIP Congress 74, Stockholm, August 1974. American Elsevier, New York, 1974.Google Scholar
Index Terms
- A vector space model for automatic indexing
Recommendations
Research and application to automatic indexing
ISNN'10: Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part IIBased on the study of TF-IDF, information gain and information entropy, the paper proposes an improved method of weight calculation, which combines the TF-IDF Normalization with information gain, to extract key words Moreover, to abstract indexing words ...
Effective information retrieval using term accuracy
The performance of information retrieval systems can be evaluated in a number of different ways. Much of the published evaluation work is based on measuring the retrieval performance of an average user query. Unfortunately, formal proofs are difficult ...
Comments