Indexing Documents
- What is markup language?
- What does HTML stand for?
- What does URL stand for?
- What is the purpose of a URL?
- What does a program called a 'spider' do?
- What does it mean to 'index' a document?
- What does a stemmer do?
- What is a stop list?
- What kinds of words make up most of the stop list?
- What is an inverse file?
Back to Indexing Documents
Mapping documents into a vector space.
- The underlying space used to retrieve documents is a large ___________ where every __________________ in the language defines one _________________. Each document in the database can be given a location in the ________________described by this ___________________. Each query can also be mapped onto this ______________________.
- In vector-based text retrieval, how does the overall frequency of a word in the language as a whole affect its importance within a query or document?
- In simple vector-based text retrieval, dow do we measure the importance of a given word stem ('lemma') to the meaning of a document in which is occurs?
- If you're building a web page with HTML, and you want to make it easy to find your web site, what metatags should you use, and (informally) how?
Back to Mapping documents into a vector space.
Retrieval
- In standard vector-based text retrieval, how are queries mapped to documents in the database?
- Name two factors that can influence the relevance ranking of a retrieved document.
Back to Retrieval
Evaluation
- How is information retrieval performance measured?
- What is precision?
- What is recall?
- How could I achieve 100% recall (but lousy precision)?
- How could I approach 100% precision, (but get lousy recall)?
Back to Evaluation