Item44: crawl within a set of site, when all links of each site in the set found, see if they interlink.

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Urgent Being Worked On crawler    

Details

crawl within a set of site, when all links of each site in the set found, see if they interlink.

basically this is a snowball WITHIN a set of sites until there are no more links findable WITHIN the sites. Then see if the sites in the sets interlink and draw a cluster map from it.

Or Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.

Specification

Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.

Notes (gmc)

Basically, create IssueCrawler? .shouldVisit(), reject any links that are outside the sp's.

Do one iterations (= 0 iterations in frontend language), check what final iter does.

The Plan:

  1. Run a test on the devel crawler with test1.issuecrawler.net and test2.issuecrawler.net as input
  2. Add the shouldVisit() method
    • Does class IssueCrawler? know the starting points?
  3. Run another test on the devel crawler
ItemTemplate
Summary crawl within a set of site, when all links of each site in the set found, see if they interlink.
ReportedBy ErikBorra
AppliesTo crawler
Priority Urgent
CurrentState Being Worked On
WaitingFor

Topic revision: r3 - 29 Aug 2008 - 15:47:00 - KoenMartens



 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback