PopTag

Introduction

Tagging is being used by sites to manage and organize their data. Individual items like posts, blogs, images, links and videos can be tagged with a number of named values to identify the particular item within one or several descriptive fields or spaces. When searching for a specific tag, the items related to this tag will be show displaying also the other tags these items have received. Several application will provide the user with a list of tags in which the most used tags in relation to the current tag; the so called related tag. In addition to either the accompanying tags on the item or the related tag list, most websites will provide a tagcloud with most popular tags.

Research

Besides the fact that tags helps to manage and organize data, it also generates spaces. 'Most popular tag clouds', generate a space of topics or issues with are current or important according to the tagosphere. There are the obvious popular tags which probably are used frequently in the tagosphere or within the tagspace of a specific site or community, but as issues or event arise in cyberspace it will have an effect on the tags used as the items related to the event will adapt to current events. As important aspects of the contemporary web such as images and videos are not easaly searchable or their content, tags provide a good way of searching them and as a result providing an issuespace of tags about current events.

Flickr_mostpopulair_tags

Most popular tags surface according to user input and therefore not directly say something about the tag itself. By providing several tags to the items and by specific website generating a related tags list, individual tags seem to be located within a issuespace of their own. As related tags are more related to the tags than to the user, looking at these related tags and the space they occupy can provide information about the tag itself.

Technorati_related_tags

These two spaces of the tag, the most popular and the related tags, will be looked at by creating tools which can extract the tags, cloud and related tags across several devices. The intention of this research and the tools are:

  • Surface and explore the distinctiveness of the tag
  • Display what issues are present on the web according to the tagosphere
  • Define problems and possibilities of 'tag merging' across devices and spheres
  • Provide means to collect tag data and present it

Most popular

As most of the websites which are used on the web today provide a tagcloud with the most popular tags this is a good source of information to start collecting information about current issues and about the possibilities and problems of tagmerging.

Analyzing popular websites, five websites have be selected to be included which seem to be at the heart of either tagging or user contribution. They are:

  • flickr
  • del.icio.us
  • yahoo video
  • youtube
  • technorati

The tool poptag.rest has been created to extract the following information from these websites:

  • tagname, the actual tag
  • taglink, the url where the tag is linking to
  • weight, the specific weight of the tag within that device
  • channel, the specific device the information is gathered from

An example of the output is located here.

By using php and dom, a script has been written to make all the weights generic from values 1 to 5 and display per tag if it is located in on of the other devices. This script is now in its final stage. Output of the script in array fromat can be viewed seen here.

The end result will be a tagcloud of most popular tags across five devices. The cloud needs to represent:

  • The tags and their weights per device
  • A weight according to their occurrence in devices (1-5)
  • Separate colors per devices

In relation to using tagclouds as way of representing data, it seems our current tools are not sufficient to handle multidimensional research as is done with the tagmerging. A new tagcloud generator needs to be created.

Related tags

To examine the related tags there are two approaches. The first is to use/scrape the device generated 'related tags' list. The great advantage of this is that not all tags with a specific tag have to be scraped for their other tags, this is done by the device itself. The WeScrape method can be used to scrape the necessary data. This raises the problem of what rules they have applied to generate the list. This approach only works for specific websites as a lot of website do not provide this list and is thus not very applicable in cross spherical analysis.

The second approach is to search for a tag and then scrape each and every result (or a decided subset as the first 100 results), and generate the related tags list by hand. This approach provides much more freedom and insight into the gathered information, but requires quite a bit of programming and cannot really been done by a WeScrape method. For one site their is already such a tool called the delicious scrape tool. A tool needs to be created for each and every device for which de related tags need to be extracted.

By gathering the related tags of a tag one can view the space this tag is located in. When for instance searching for a name or issue, the issue space surrounding this person or event on the web can be made visible. By trying to tagmerge the related tags for a specific tag, the goals is to look at the tag and its space at a more generic or "global". This then raises the question whether or not tags from different devices could ever be merged as tagging seems to be in many ways device specific. Although in many ways this seems to be the case, it is the assumption that the more data is aggregated from different devices, the more this problem will be filtered out as the occurrences in different devices will effect their priority.

Some steps have been undertaken to use WeScrape but using a tool to physically loop trough the pages of a website proved to big of a load.

An attempt has been done to use WeScrape to retrieve the 'related tag' information from technoratie and go two levels deep. By crawling through the technorati related tags, the idea was to generate the tag space from this. Before pursuing this though however, this needs to be thought over again to determine its relevance and value.


Tags:
tag1Add my vote for this tag create new tag
, view all tags
Topic attachments
I Attachment Action Size Date Who Comment
jpgjpg flickr_poptag.jpg manage 103.3 K 30 Aug 2007 - 20:18 MarijnDeVriesHoogerwerff  
xmlxml poptag.xml manage 122.8 K 05 Aug 2007 - 12:37 MarijnDeVriesHoogerwerff Output XML file of poptag tool
jpgjpg technorati_reltag.jpg manage 34.9 K 30 Aug 2007 - 20:14 MarijnDeVriesHoogerwerff  
Topic revision: r5 - 31 Aug 2007 - 10:08:43 - ErikBorra

Themes

Tag Cloud

archive  climate change skeptics  dataset  delicious related tags  geo-location  google image scraper  google news scraper  google scraper  hyperlink  hyves brands  iraq  israel  issue animals  issuecrawler  issuegeographer  link ripper  localising hyves  no follow  pagerank  palestine  politicians hyves  robot exclusion policy  scrape  source distance  tag  tag cloud generator  technorati scraper  thread  wayback machine  webantenne  wikiscanner



 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback