资讯专栏INFORMATION COLUMN

MSCI 720

DevTalking / 3509人阅读

MSCI 720 Monday Jan 9 Lec 3/36

Syllabus

TREC Agreement

Summary-Review

Architecture

TREC text format

Tokenization

Tutorials will be used to tutor and help students to complete the assignments

Outcome

Identify, explain, and implement the key components of a search engine

Explain the advantages and disadvantages of in-situ, online and offline evaluation methods

Implement and compute offline effectiveness measures using a custom or existing test collection

Make and justify decisions based on the outcome of experiments

Diagnose search quality problems and suggest areas of engine improvement for future

Information Representation

Information retrieval (IR) system purpose: To help people satisfy their information needs.
The way we represent documents is the first step towards obtaining a high quality retrieval system. We begin by discussing issues in representing text items that pertain to both manual and automatic representation techniques.

Text Representation

The items in our collection vary in length and structure and may contain non-text items such as images. In all cases, we will refer to the text items in a collection as documents or items, but it is important to remember that there are a large variety of possible text items.

Our focus will be on representations that utilize words or tokens derived from words. The process of deciding which words to use to describe a document is called indexing and the chosen words are called index terms. Sometimes we want to represent documents with more than words and then it makes sense to talk about the use of features, which are more generic than index terms. An example of a non-word feature could be the number of words in a document.

When we automatically index, we write computer algorithms to process digital forms of the documents and make the decisions about index terms.

We want to index in a manner that will help provide the best interactive retrieval experience for the user. Defining what makes one retrieval experience better than another is complex and is addressed.

We simplify our notion of evaluation in this chapter to act of a single retrieval for a single user query.

Our goal in indexing is two-fold: first, to assign features to a document that make the document easy to find given some similarity measure between the user"s query and the document. Second, at the same time we want the features to have enough discriminatory power so as to not make all documents look similar to the query. A user"s query is not restricted to the keyword queries used with web search engines. A query can be anything that the user formulates given a retrieval technology. However, at some level all queries need to be converted into a form that allows similarity to be measured between the query and the documents in the collection. Such similarity measures are discussed in Chapter 10.

Let"s assume the user"s query is a single index term and that the similarity measure is simple word matching. Given a single index term, simple matching will retrieve all documents that match the index term. for the set of documents retrieved, the user would like them to all be relevant to the user"s information need.

Precision and Recall

Ideally, the user would want perfect precision and recall.
Precision is the fraction of items found by the user that are relevant.
Recall is the fraction of relevant items that the user is able to find. It is well established that precision and recall are inversely related.

Specificity

The degree to which a term is broad or narrow is called its specificity

Exhaustivity

In addition to specificity, there is the exhaustively of indexing. The more exhaustive the indexing, the more index terms that are used for each document.

Manual Indexing

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/115483.html

相关文章

  • js仿苹果悬浮可拖拽按钮,并且点击展开效果

    摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...

    h9911 评论0 收藏0
  • js仿苹果悬浮可拖拽按钮,并且点击展开效果

    摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...

    kelvinlee 评论0 收藏0
  • js仿苹果悬浮可拖拽按钮,并且点击展开效果

    摘要:今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下拖拽如果这个元素的位置内只有一个手指的话阻止浏览器默认事件,重要超过顶部超过底部 今天写了一个仿苹果的悬浮按钮,由于只在右侧展开,所以只能上下拖拽,展开效果入下 showImg(https://segmentfault.com/img/bVZgLZ?w=376&h=404);1.html ...

    alanoddsoff 评论0 收藏0

发表评论

0条评论

DevTalking

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<