MSCI 720 Monday Jan 9 Lec 3/36
Syllabus
TREC Agreement
Summary-Review
Architecture
TREC text format
Tokenization
Tutorials will be used to tutor and help students to complete the assignments
OutcomeIdentify, explain, and implement the key components of a search engine
Explain the advantages and disadvantages of in-situ, online and offline evaluation methods
Implement and compute offline effectiveness measures using a custom or existing test collection
Make and justify decisions based on the outcome of experiments
Diagnose search quality problems and suggest areas of engine improvement for future
Information RepresentationInformation retrieval (IR) system purpose: To help people satisfy their information needs.
The way we represent documents is the first step towards obtaining a high quality retrieval system. We begin by discussing issues in representing text items that pertain to both manual and automatic representation techniques.
The items in our collection vary in length and structure and may contain non-text items such as images. In all cases, we will refer to the text items in a collection as documents or items, but it is important to remember that there are a large variety of possible text items.
Our focus will be on representations that utilize words or tokens derived from words. The process of deciding which words to use to describe a document is called indexing and the chosen words are called index terms. Sometimes we want to represent documents with more than words and then it makes sense to talk about the use of features, which are more generic than index terms. An example of a non-word feature could be the number of words in a document.
When we automatically index, we write computer algorithms to process digital forms of the documents and make the decisions about index terms.
We want to index in a manner that will help provide the best interactive retrieval experience for the user. Defining what makes one retrieval experience better than another is complex and is addressed.
We simplify our notion of evaluation in this chapter to act of a single retrieval for a single user query.
Our goal in indexing is two-fold: first, to assign features to a document that make the document easy to find given some similarity measure between the user"s query and the document. Second, at the same time we want the features to have enough discriminatory power so as to not make all documents look similar to the query. A user"s query is not restricted to the keyword queries used with web search engines. A query can be anything that the user formulates given a retrieval technology. However, at some level all queries need to be converted into a form that allows similarity to be measured between the query and the documents in the collection. Such similarity measures are discussed in Chapter 10.
Let"s assume the user"s query is a single index term and that the similarity measure is simple word matching. Given a single index term, simple matching will retrieve all documents that match the index term. for the set of documents retrieved, the user would like them to all be relevant to the user"s information need.
Precision and RecallIdeally, the user would want perfect precision and recall.
Precision is the fraction of items found by the user that are relevant.
Recall is the fraction of relevant items that the user is able to find. It is well established that precision and recall are inversely related.
The degree to which a term is broad or narrow is called its specificity
ExhaustivityIn addition to specificity, there is the exhaustively of indexing. The more exhaustive the indexing, the more index terms that are used for each document.
Manual Indexing文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/115483.html
摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...
摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...
摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...
阅读 988·2023-04-25 22:27
阅读 852·2021-11-22 14:56
阅读 941·2021-11-11 16:54
阅读 1651·2019-08-30 15:54
阅读 3481·2019-08-30 13:20
阅读 1194·2019-08-30 10:55
阅读 2058·2019-08-26 13:34
阅读 3265·2019-08-26 11:53