cm3060 Topic 08: Information Retrieval

Then recaps the lectures of JM from Jurafsky Manning Lecture Summaries: Topic 17: Information Retrieval covering 17.1 through 17.4 (does not cover phrase queries or positional indexes).

Ranked Retrieval

Boolean search gives control through precise queries, reproducible results, and transparency. But users don’t want to learn complex boolean queries, so in many search contexts ranked retrieval has won out.

Runs through the lectures in Jurafsky Manning Lecture Summaries: Topic 18: Ranked Retrieval. Covers tf, idf, tf-idf.

Vector Space Model

Covers the JM lecture 18.6 almost exactly, for a summary see Jurafsky Manning Lecture Summaries: Topic 18: Ranked Retrieval

Representing Documents

Walks through vectorizing with skl’s CountVectorizer model.

Term Weighting with TF.IDF

Uses skl’s TfidfVectorizer model.

Semantic Search

Discusses query search expansion as a solution to vocabulary mismatch.

Uses whoosh docs here to build a search engine.

Uses this model repository to get the pretrained embeddings.

Uses gensim to get similar words to the search query, and then uses them in the search too, expanding the original query with a boolean or with the related terms.

Lab Summaries

Lab has you play around with TRR’s ‘2D search engine’: https://app.2dsearch.com/

Alex's Notes

cm3060 Topic 08: Information Retrieval

Main Info

Description

Assigned Reading

17: Information Retrieval

18: Ranked Retrieval

Lecture Summaries

Boolean Retrieval

Ranked Retrieval

Vector Space Model

Representing Documents

Term Weighting with TF.IDF

Semantic Search

Lab Summaries

Links to this note

cm3060 Topic 08: Information Retrieval

Main Info

Description

Assigned Reading

Related Jurafsky Manning Lectures

17: Information Retrieval

18: Ranked Retrieval

Lecture Summaries

Boolean Retrieval

Ranked Retrieval

Vector Space Model

Representing Documents

Term Weighting with TF.IDF

Semantic Search

Lab Summaries

Links to this note