skip to main content
article
Free Access

A vector space model for automatic indexing

Published:01 November 1975Publication History
Skip Abstract Section

Abstract

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

References

  1. 1 Salton, G. Automatic btformation Organiza;ion and Retrieval. McGraw-Hill, New York, 1968, Ch. 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Salton, G., and Yang, C.S. On the specification of term values in automatic indexing. J. Documen. 29, 4 (Dec. 1973), 351-372.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3 Sparck Jones, K. A statistical interpretation of term specificity and its application to retrieval. J. Documen. 28, 1 (March 1972), 11-20.Google ScholarGoogle ScholarCross RefCross Ref
  4. 4 Williamson, R.E. Real-time document retrieval. Ph.D. Th., Computer Sci. Dep., Cornell U., June 1974.Google ScholarGoogle Scholar
  5. 5 Wong, A. An investigation of the effects of different indexing methods on the document space configuration. Sci. Rep. ISR-22, Computer Sci. Dep., Cornell U., Section II, Nov. 1974.Google ScholarGoogle Scholar
  6. 6 Salton, G. A theory of indexing. Regional Conference Series in Applied Mathematics No. 18, SIAM, Philadelphia, Pa., 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Salton, G., Yang, C.S., and Yu, C.T. Contribution to the theory of indexing. Proc. IFIP Congress 74, Stockholm, August 1974. American Elsevier, New York, 1974.Google ScholarGoogle Scholar

Index Terms

  1. A vector space model for automatic indexing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 18, Issue 11
        Nov. 1975
        54 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/361219
        Issue’s Table of Contents

        Copyright © 1975 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 1975

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader