Tuesday, January 27, 2009

Document Retrieval Concepts

Document Retrieval Concepts:

Document: A unit of retrieval. It might be a paragraph, a section, a chapter, Web page, an article, a whole book, or images and video.
Index: A data structure built on the text to speed up searching.
Index Term( or keyword): A pre-selected term which can be used to refer to the content of a document. Usually, index terms are noun or noun groups. In the Web, however some search engines use all the words in a document as index terms.
Information Retrieval(IR): Part of computer science that studies the retrieval of information(not data) from a collection of written documents. The retrieved documents aim at satisfying a user information need usually expressed in natural language.
Logical view of documents: The representation of documents and Web pages adopted by the system. The most common form is to represent the text of the document by a set of terms or keywords.
Precision: An information retrieval performance measure that quantifies the fraction of retrieved documents which are known to be relevant.
Query: The expression of the user information need in the input language provided by the information system. The most common type of input language simply allows the specification of keywords and of a few Boolean connectives.
Recall: An information retrieval performance measure that quantifies the fraction of known relevant documents which were effectively retrieved.

No comments:

Computers Add to Technorati Favorites Programming Blogs - BlogCatalog Blog Directory