CS100 Lecture Notes - Terra Incognita, Bow Tie, Robots Exclusion Standard

45 views14 pages

Document Summary

So far, we have assumed that an index for all the pages on the web exists, and we have relied on that index for answering simple and compound searches. In this module we examine how the index is created and maintained. To create an index for the web, we need to visit all existing webpages to gather the words and generate appropriate postings lists. It turns out that the hyperlinks that point from one page to another form a web-like structure that we can crawl along, gathering pages as we go. To create an index for the web, we need to fetch each webpage, one after the other, and collect all the terms used on that page. If someone would hand us webpages one after the other, we could form the postings for each page and merge them into one collection of postings lists covering the whole collection of pages.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents