crawl within a set of site, when all links of each site in the set found, see if they interlink.
basically this is a snowball WITHIN a set of sites until there are no more links findable WITHIN the sites. Then see if the sites in the sets interlink and draw a cluster map from it.
Or Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.
Specification
Input URLs, find URLs' outlinks (3 deep), map interlinkings between inputted actors only.
Notes (gmc)
Basically, create
IssueCrawler? .shouldVisit(), reject any links that are outside the sp's.
Do one iterations (= 0 iterations in frontend language), check what final iter does.
The Plan:
- Run a test on the devel crawler with test1.issuecrawler.net and test2.issuecrawler.net as input
- Add the shouldVisit() method
- Does class IssueCrawler? know the starting points?
- Run another test on the devel crawler
Topic revision: r3 - 29 Aug 2008 - 15:47:00 -
KoenMartens