Digital Methods: Tools and Utilities

The various scrapers, crawlers, etc. listed below may be useful for different digital methods. For convenience, the list is loosely divided into tools and utilities, with the former denoting more complex operations that are either method- or device-specific, and the latter referring to relatively simple actions with multiple applications. Please note that of the prescribed uses for these tools and utilities, none are required.

Tools

Issue Crawler and Allied Tool Set

  • Issue Crawler
  • Extract URLs - Extracts URLs from an Issuecrawler .xml file
  • Actor Profiler - Use an .svg file from the Issuecrawler to get Google's Pagerank for the top 10 actors in a network.
  • Issue Geographer - Geo-locates organizations from an issue network and visualizes the results on a map.
  • Compare networks over time - Compare Issue Crawler networks over time. Displays ranked actor lists from a scheduled set of Issue Crawler results.
  • Google Network Cloud - enter an Issue Crawler xml id. The script gets the urls of the map and queries these urls in google for certain keywords and makes a tag cloud of it. Good for issue analysis in a network

Page Rank Tools

  • Pagerank - Discover a website's Google Pagerank per issue/query.
  • Issue Dramaturg - Scheduled list of websites' Pagerank per issue.

Google Scrapers

  • Google Scraper - Query Google for a particular keyword/issue, possibly in particular sites.
  • Google Scraper Frequency Tool - Use a results file from the Google Scraper and discover the frequency with which the original query appears in each individual host. (Tip: visualize these results in a tag cloud.)
  • Google Teaser Text Ripper - Use a results file from the Google Scraper to get unique phrases from the teaser (or lead) text for each google search return.
  • YouTube video discovery - Use a results file from the Google Scraper to discover, count, and rank YouTube, Ikbis, and Google Video links in the descriptions. Obtain such a result file by querying a set of sites for e.g. 'youtube.com/watch', 'ikbis.com', 'videoplay?'.
  • Split Results - Split the results from Google Scraper into two lists (e.g. blocked and unblocked sites).
  • Google News Scraper - Query news.google.* with one or more keywords. It's only possible to scrape articles of the last 30 days.
  • Google Images Scraper - Query images.google.com with one or more keywords, and/or use images.google.com to query specific sites for images.
  • Scrape Google Blogsearch (Open Kapow) - Scrapes titles and URLs for a google.nl/blogsearch query.

Technorati Scrapers

Other Tools and Scrapers

  • Issue Discovery - Discovers the most relevant words and phrases among a set of websites, within a text or within an issue network (i.e. an Issue Crawler .xml file).
  • RSS discovery - Discovers RSS/ATOM/RDF feeds in websites.
  • Yahoo inlink Scraper - Gets all the inlinks to a site from Yahoo.
  • YouTube Response Retworks - enter a YouTube movie and get all the responses to that video.
  • Surfer Issue Pathways - Building upon Alexa's related sites feature, this tool determines which sites are likely to be in the actual surfer paths of other sites related to the same issue.
  • Wikipedia Network Analysis - Find a hyperlink network around a Wikipedia topic (see here for more information).
  • De.licio.us Related Tags Cloud Generator - Create a tag cloud showing URLs and tags related to a specific issue or keyword.
  • Delicious tags for url - Get delicious tags for a specfic url (tagcloudable).
  • Webpage History Generator - Uses the Internet Archive's Wayback Machine to make screenshots of all different versions of a site and output a webpage history scroll.
  • WeScrape (Meta-Tool) - A howto guide for building your own scraper.

Utilities

  • Language Detection - Detects the language from websites.
  • Link Ripper - Capture all internal and/or external links from a page.
  • Compare Lists - Compare two lists of urls for commonalities and differences.
  • Image Ripper - Scrape images from a single page.
  • Robots.txt Ripper - Display a site's robot exclusion policy.
  • Censorship Explorer - Check accessiblity of a URL through proxies located around the world.
  • Text Ripper - Rip all non-html (i.e. text) from a specified page.
  • Rip Sentences - Rip text from a specified page and force line breaks between sentences.
  • Timestamp - Rips and displays a web page's last modification date (using the page's HTML header).
  • Whois - Use a site such as http://www.whois.net/ to check who has registered a particular domain name.
  • Tag Cloud Generator - Input a text and visualize word counts in a tag cloud.
  • SVG Tag Cloud Generator - Input tags and values to produce an .svg file for further editing (e.g. in Illustrator).

External Tools and Wishlist

Using the tools

  • On this page you can find some pre-DMI methods which use the tools. Of course you can also look at the MethodsByTheme.

Crystal Ball Tool (reveals future 'look' of all DMI tools):

googlescraper_with_stylesheets.gif


Tags:
create new tag
, view all tags
Topic attachments
I Attachment Action Size Date Who Comment
gifgif googlescraper_with_stylesheets.gif manage 42.3 K 27 Aug 2007 - 15:44 BramNijhof?  
Topic revision: r16 - 23 Nov 2007 - 17:13:29 - ErikBorra
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback