ISKO Italia. Documenti
| Home page | Iscrizione | Documenti | Bibliografia | Collegamenti |
|---|
<info @ infospaces.it>
paper presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005
Much work has been done by librarians and information scientists to create appropriate and powerful classification systems. Classification requires the design and consistent use of a scheme for a systematic organization of knowledge. See [1].
Traditionally, there are two different approaches to classification:
Hierarchical-enumerative schemes are basically trees of containers connected by parent-child relationships and with one only path from the "root" to the "leaves". We are very familiar with this kind of knowledge organization which, anyway, has a number of drawbacks:
Unlike the above, analytical-synthetical schemes give up enumerating classes by describing items through a combination of aspects (facets). In a faceted classification scheme, the facets may be considered to be dimensions in a Cartesian n-dimensional space, and the value of a facet is the position of the object in that dimension. Instead of imposing a pre-determined hierarchy, items can be placed on-the-fly, by evaluating their inherent characteristics, and can be retrieved by users using the same item properties, either one at a time or all together.
Faceted classification can be applied to large homogeneous datasets and suggests an explorative approach, whereby a large dataset is progressively filtered through the user's choices. Users can restrict the resulting dataset at each step, until they arrive at a group of items that meets their needs. See [2].
In this flexible and scalable approach, an item can be associated to, or better described, by more than one facet, and new facets can be quite painlessly and freely introduced to express new concepts.
The Web publishing process has come to the masses thanks to lower technology and cost barriers.
Blogging and content management software provides every one interested with extremely simple and accessible tools to update a website every day, almost effortlessly and at no cost. See [3].
Blogging is just one component of the emerging, more general concept of social software, a technology «which supports, extends, or derives added value from human social behavior -- message-boards, musical taste-sharing, photo-sharing, instant messaging, mailing lists, social networking». See [4].
The point here is that we have gone past a critical mass of connectivity between people that introduced a new revolutionary ability to communicate, collaborate and share goods online.
Beside blogs and wikis, other tools of social connection are emerging, such as photo sharing, social bookmarking, to-do-list sharing.
These tools are producing an incredible amount of distributed information that we need to link, aggregate, organize in order to extract knowledge. To achieve this goal, better aggregation and concept matching tools are required.
Traditional classification schemes require also:
Using a sound and complete classification scheme requires professionals to do the job, a common clear view of the domain and skilled users that understand the categories and the structure of the classification to use it without problems. See [5].
On the other hand, sprawling, heterogeneous information sources make up an enormous, ever-changing, time-sensitive, not-clearly defined corpus of items to classify without a central authority, targeted at a heterogeneous and increasing group of users. This situation requires new and different classification strategies.
The Web today fits neatly in this description. On the Web, the direction is scalability, flexibility, fluidity and simplicity to satisfy the demanding needs of millions of people with different cultural and social backgrounds all over the world. Under these circumstances, traditional precise classification schemes become expensive (to create and maintain) and probably lose the capability to match the user's way of thinking and organizing the world.
Folksonomies provide an approach to address Web-specific classification issues.
A folksonomy is a user-generated classification, emerging through bottom-up consensus (see [6]). A fusion of the words folks and taxonomy, the first use of the term folksonomy has been attributed to Thomas Vander Wal. Taxonomy comes from taxis and nomos (from Greek). Taxis means classification. Nomos (or nomia) means management. Folk is people.
The term was coined in the AIfIA mailing list to mean the wide-spreading practice of collaborative categorization using freely chosen keywords by a group of people cooperating spontaneously. See [7].
Folksonomies are not a theory or a top-down strategy: they were born out of a feature (folk classification tools) introduced by software like Del.icio.us <http://del.icio.us>, Flickr <http://www.flickr.com>, 43things <http://www.43things.com>, Furl <http://www.furl.net>, Technorati <http://www.technorati.com>, etc. and from people using these platforms to tag their contents (links, photos, etc).
Folksonomies require people to associate keywords with content. Using popular keywords gives them the reward of visibility, to see one's own content gravitate in evidence in the system (for example on the homepage).
In a bottom-up distributed and collaborative grassroots approach, tagging or folksonomy is a manifestation of people moving away from hierarchical authoritative schemes. Rather than learning yet another imposed external scheme to classify items and to restrict, to some extent, the user's thinking, people started to associate their own tags to the items they wanted to collect and share. In a social distributed environment, sharing one's own tags makes for innovative ways to map meaning and let relationships naturally emerge. See [8].
Folksonomies are not simply visitors tagging something for personal use: they also are an aggregation of the information that visitors provide. The power of folksonomy is connected to the act of aggregating, not simply to the creation of tags. Without a social distributed environment that suggests aggregation, tags are just flat keywords, only meaningful for the user that has chosen them. The power is people here. The term-significance relationship emerges by means of an implicit contract between the users.
The concept on which folksonomies are based can be applied to everything that we can aggregate. The key point is in having an activity to observe that:
Though working on a different mechanism, an example of aggregation based on user activity and interest is the recommendations feature on Amazon.com: the aggregated activity here, instead of tagging, is users reading a product page. This activity is explicit, can be aggregated, is meaningful for users and, by transparently tracing user behavior, produces useful insights for the company. See [9].
As explained by Thomas Vander Wal (see [11]), we can distinguish two typologies of folksonomy, each associated with specific properties and suggested use:
A broad folksonomy (as the one of Del.icio.us) is the result of many people tagging the same item. Every user can tag the object in a different way following their own mental model, vocabulary and language. This approach tends to show a power law curve and a long tail effect.
(In nature, events deviating from the average are rare. They follow a bell curve, a curve with a marked peak (a Gaussian curve). Power law distributions are very different from Gaussian curves: they do not have a peak, a characteristic value, but they look like continuously decreasing curves in which a large amount of tiny events (the long tail) coexist with a few anomalously very large ones. See "Long tail" on Wikipedia.)
In a broad folksonomy, the power law reveals that many people agree on using a few popular tags but also that smaller groups often prefer less known terms to describe their items of interest.
Therefore, a broad folksonomy provides a tool to investigate trends in large groups of people describing a corpus of items and can be used to select preferred terms or extract a controlled vocabulary.
The real power of broad folksonomies is in the richness of the mass, in people explicitly exposing their way to define and describe things that leads to the long tail and power curve. These effects are simply absent in personomies, i.e individuals tagging their own self-produced or uploaded content.
A narrow folksonomy (as the one of Flickr), on the other hand, is the result of a smaller number of individuals tagging (using one or more tags) items for later personal retrieval or for their own convenience
Narrow folksonomies lose the richness of the mass, but provide benefits in tagging objects that are not easily findable with traditional tools (full-text search or other text-related tools) or that cannot be simply described in current text-based software on the Web.
A narrow folksonomy provides various target audiences (maybe with a rather specific shared vocabulary) with the instrument to add tags in their own language. This property makes later retrieval fast, efficient and enjoyable.
Detractors of Folksonomies highlight the following drawbacks:
On a positive note, supporters of folksonomies underline that:
In brief, using the words of Timo Hannay (see [20]), a folksonomy is «liberating, not restrictive; bottom-up, not imposed; relational, not hierarchical. It also cleverly harnesses selfish acts and directs them towards the common good. But most of all, it just seems to fit the way our brains work».
Folksonomies are not limited to the geek world or to the blogosphere. Enterprises have also started blogging and experimenting with folksonomies. An example is IBM's Intranet that serves 315,000 IBM employees worldwide in different languages and with multiple roles and information needs. While actually using a controlled taxonomy, they have announced to start experimenting with folksonomy to keep information updated and organized following their users' personal way of accessing the system. See [21].
In the direction of facing the intrinsic precision loss of folksonomies, Jess McMullin proposes to complement social classification with other classification approaches: «automated keyword extraction, tag suggestions built into the tagging tool as the tag is typed [see Google Suggest and Ajax technology], mapping ad-hoc tags to structured facets, and top-down classification oversight by information professionals». See [19].
Large corporations are often made of independent silos unable to communicate with each other and not sharing a common vocabulary. The same thing can have different names in different silos. A typical argument against the introduction of folksonomies in a corporate environment is that their use as a basis for retrieving documents from corporate archives would still require a common language, a shared vocabulary, spoken by the entire company, allowing the use of a well-defined label or set of labels for every article. This is not true: while the vocabulary is not the same, people are classifying the same real things underlying the terms used to name them. This knowledge allows the creation of a mapped folksonomy between the language of individuals and the corporate language as a sort of synonym ring. Every user will retrieve documents using the terms of their specific vocabulary that the system would match to the corporate vocabulary. See [22].
This analysis of people behavior and perceptions can be accelerated by sharing folksonomies. A new XHTML microformat has been proposed for this purpose by Bud Gibson and it is named xFolk. See [23].
As a side benefit, tagging enhances the creation of communities around classification. People using the same keywords have a common interest. Therefore, folksonomy can be a «ridiculously low-cost kind of community that's nothing more than a beneficial side effect of people tagging documents for their own future recall» as Gene Smith writes in his post after IA Summit 2005. See [24].
Here are outlined some of the major differences between folksonomies and traditional classification:
Using a sentence from David Weinberger, «Trees are neat; piles of leaves are messy».
For more information see [25].
Folksonomies are a new, rapidly evolving approach to classification of digital objects. Much has still to be discovered and tested. What we have not created yet is probably «a middle ground, somewhere between the pure democracy of bottom-up tagging and the empirical determinism of top-down controlled vocabularies». In this scenario, «users could freely create, adopt or reject terms stored in a distributed repository that gets administered by a representative authority that "owns" the vocabulary». See [6, 26].
All that we have to do is to merge and leverage emerging and traditional tools to improve findability. Somewhere at the intersection of those two models is a more powerful framework for identifying, sharing, and finding information.
The goal is a metadata ecology, where the best tools we have bend towards a real user-centred design. See [13].
The increasing interest in folksonomies is confirmed by new projects like Freetag. Freetag is an API written in PHP for setting up a folksonomy on a website. With such tools, in a near future, we should be able to leverage the power of folksonomies outside of the original environment that introduced them, such as Flickr. See [27].
Traditional hierarchies for organizing information (or reality) will not be replaced by tags, but through tagging we are finding new ways of thinking about classification and new applications for organizing and sharing knowledge. See [28].
1: Content classification -- <http://encyclozine.com/Reference/Library/Classification/>
2: Innovation in classification / Peter Merholz -- <http://www.peterme.com/archives/00000063.html> : September 23, 2001
3: (Weblogs and) The mass amateurisation of (nearly) everything... / Tom Coates -- <http://www.plasticbag.org/archives/2003/09/weblogs_and_the_mass_amateurisation_of_nearly_everything.shtml> : - September 03, 2003
4: An addendum to a definition of social software / Tom Coates -- <http://www.plasticbag.org/archives/2005/01/an_addendum_to_a_definition_of_social_software.shtml> : January 5, 2005
5: Ontology is overrated: categories, links, tags / Clay Shirky -- <http://shirky.com/writings/ontology_overrated.html>
6: Folksonomy / Alex Wright -- <http://www.agwright.com/blog/archives/000900.html> : January 5, 2005
7: Folksonomy (Wikipedia) -- <http://en.wikipedia.org/wiki/Folksonomy>
8: Introduction: Jon Lebkowsky / Jon Lebkowsky -- <http://tagsonomy.com/index.php/introduction-jon-lebkowsky/> : January 5, 2005
9: I've heard of folksonomies. Now how do I apply them to my site? / Joshua Porter -- <http://www.bokardo.com/archives/applying_folksonomies/> : January 5, 2005
10: Collaborative knowledge gardening / Jon Udell -- <http://www.infoworld.com/article/04/08/20/34OPstrategic_1.html> : August 20, 2004
11: Explaining and showing broad and narrow folksonomies / Thomas Vander Wal -- <http://www.personalinfocloud.com/2005/02/explaining_and_.html> : February 21, 2005
12: Ethnoclassification and vernacular vocabularies / Peter Merholz -- <http://www.peterme.com/archives/000387.html> : - August 30, 2004
13: Folksonomies? How about metadata ecologies? / Louis Rosenfeld -- <http://louisrosenfeld.com/home/bloug_archive/000330.html> : January 06, 2005
14: Folksonomy / Clay Shirky -- <http://www.corante.com/many/archives/2004/08/25/folksonomy.php> : - August 25, 2004
15: Controlled vocabularies cut off the long tail / Joshua Porter -- <http://bokardo.com/archives/controlled_vocabularies_long_tail/> : - March 09, 2005
16: Findability vs discoverability / Donna Maurer -- <http://www.maadmob.net/donna/blog/archives/000609.html> : March 08, 2005
17: Folksonomies are a forced move: a response to Liz / Clay Shirky -- <http://www.corante.com/many/archives/2005/01/22/folksonomies_are_a_forced_move_a_response_to_liz.php> : January 22, 2005
18: Folksonomies + controlled vocabularies / Clay Shirky -- <http://www.corante.com/many/archives/2005/01/07/folksonomies_controlled_vocabularies.php> : January 07, 2005
19: The cognitive cost of classification / Jess McMullin -- <http://www.interactionary.com/index.php?cat=7> : August 19, 2004
20: Introduction: Tino Hannay / Timo Hannay -- <http://tagsonomy.com/index.php/introduction-timo-hannay/> : August 19, 2004
21: IBM's Intranet and Folksonomy / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/03/ibms_intranet_a.html> : August 19, 2004
22: Using mapped folksonomy to break corporate silos / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/02/using_mapped_fo.html> : February 16, 2005
23: Folksonomy : practical application and xFolk / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/03/folksonomy_prac.html> : March 29, 2005
24: IA summit Folksonomies panel / Gene Smith -- <http://atomiq.org/archives/2005/03/ia_summit_folksonomies_panel.html> : March 08, 2005
25: Taxonomies and tags: from trees to piles of leaves / David Weinberger -- <http://www.hyperorg.com/blogger/misc/taxonomies_and_tags.html>
26: Bridging the gap: folksonomy and taxonomy / James Melzer -- <http://www.jamesmelzer.com/bearings/archives/2005/02/bridging_the_ga.html#more> : February 11, 2005
27: Freetag : an open source tagging : Folksonomy module for PHP/MySQL applications <http://getluky.net/freetag/>
28: Introduction: Jon Lebkowsky / Jon Lebkowsky -- <http://tagsonomy.com/index.php/introduction-jon-lebkowsky/> : May 3, 2005
1: Faceted classification of information (The Knowledge management connection) -- <http://kmconnection.com/DOC100100.htm>
2: Metacrap: putting the torch to seven straw-men of the meta-utopia / Cory Doctorow -- <http://www.well.com/~doctorow/metacrap.htm> : August 26, 2001
3: Social bookmarking tools / T Hammond, T Hannay, B Lund, J Scott -- <http://www.dlib.org/dlib/april05/hammond/04hammond.html/> : April 2005
4: Folksonomies : cooperative classification and communication through shared metadata / Adam Mathes -- <http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html> : December 2004
5: Bookmark, classify and share: a mini-ethnography of social practices in a distributed classification community -- <http://ideant.typepad.com/ideant/2004/12/a_delicious_stu.html>
Folksonomies: power to the people : presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005 / by Emanuele Quintarelli (( ISKO Italia -- <> : 2005.06.15 - 2005.11.14 -