[TagCommons-WG] mechanisms for sharing tag data

Fri Mar 2 05:38:28 PST 2007

Hi all,

<snip>
Here is a quick list of ways that people have shared data on the web:
1. point-to-point translation of static data files with a proprietary
data formats
2. point-to-point integration using data derived by crawling and
screen-scraping sites 
3. point-to-point integration using an API (REST, Web Service, etc) that
assumes a particular data model and encapsulates the format in code
4. point-to-point integration accessing databases with documented
schemas
5. common content formats, such as microformats and I-tags
6. common database schemas using a standard schema definition language
7. common ontologies and RDF for interchange
</snip>

This looks like a really useful basis for our discussions, and pretty
comprehensive. The one change/addition I would make is to explicitly add
SPARQL into the list. It goes a tiny bit along with point 3 but mostly
with point 7. It's worth noting that SPARQL itself doesn't assume any
particular data model (so different from 3 in that sense), only the
query writer needs to know about the underlying data model of the SPARQL
endpoint (and in some cases even that isn't required; but that's a
discussion for elsewhere).

Perhaps a good way to understand all these different approaches (for me
at least) is to imagine them according to the following dimensions (I'm
sure there are others):

A) processibility: how easy is it to process/parse the tag data? I see
scraping HTML at one end of this dimension, XML and RDF at the other.
B) queryibility: how easy is it to retrieve the data you *actually* want
from a particular source? Single 'static' documents would be at one end
of this spectrum, APIs and SPARQL at the other.
C) formality: how well specified/reusable (other features too?) is the
data model describing the tags? "AN Other Custom Format" at one end, Tag
ontologies at the other?
D) linkibility: how easy is it to link together and mashup tag data from
different sources? I can't think of a good example for the low end of
this dimension, as most tagging services have quite pretty tag URLs (if
not URIs per se), but at the high end of the dimension is RDF;
linkibility is one of the key things that distinguishes it from vanilla
XML.

I'd be interested to hear people's thoughts about these
dimensions/refinements/suggestions for others. They help me understand
the space of possibilities, so hope others find them useful too.

Perhaps another useful exercise would be to think about specific
examples of services and how they fit in with the points above, or where
specific services fit in a space defined by those dimensions. It's
always useful to have something tangible to look at; for example, the
del.icio.us API doesn't allow for real querying. The RSS1.0 output is
cool and quite flexible, but (AFAIK) you can't get directly from
del.icio.us (e.g.) all the URLs that I've tagged "blog" since an
arbitrary date. In that sense it scores well on processibility due to
the RSS1.0 output, but less well on queryibility. I hope that Revyu.com
via it's SPARQL endpoint [1] scores well on all the dimensions.

[1] http://revyu.com/sparql/welcome (temporarily quite locked down -
contact me if you'd like access)

> ACTION REQUESTED: 
> This message summarizes some mechanisms for sharing, but I would 
> like to ask members of the group to please pipe in with how you 
> envision the interoperability could work if we proposed adapting 
> and/or extending some formal mechanisms and coming to some kind of 
> agreement on it.  For example, application and tool developers, what >
practically would you need from the agreement to enable your work?  >
Content and research people, what would help you get the data you 
> need?  Ontology / Semantic Web people, what else needs to happen to >
hook up the various levels to make a tagcommons work?

I'm hesitant about mandating anything (which probably isn't what you
actually mean anyway Tom), but am all in favour of strongly arguing in
favour of something...

> For example, here are some teaser topics for this thread:
> - How can a web service translate among or aggregate different tag 
> data sources?  How can we enable "semantic mashups"?

...as an application developer/researcher/Semantic Web person, I can't
advocate strongly enough an approach based on URIs+RDF+SPARQL. There
would no doubt be a lot of learning to do along the way, but we stand to
gain an awful lot. Moves such as the Linking Open Data project [2] are
working through many of the issues; tag sharing initiatives such as this
could easily piggyback on the project's findings.

[2]
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenD
ata

> - How can ontologies be layered on top of existing tag data sources >
that we don't own (for example, how does Annotea talk to 
> del.icio.us?

Exposing del.icio.us data (or at least the x latest items tagged y)
according to Richard's or Tom's tag ontology should be quite
straightforward using a SPARQL CONSTRUCT query. I'll have a look into
this.

OK, this email is plenty long enough already - I'll stop now :)

Cheers,

Tom.

-- 
Tom Heath
PhD Student
Knowledge Media Institute
The Open University
Walton Hall
Milton Keynes
MK7 6AA
United Kingdom

Tel: +44 (0)1908 653565
Fax: +44 (0)1908 653169
Web: http://kmi.open.ac.uk/people/tom
Email: t.heath at open.ac.uk
Jabber: t.heath%open.ac.uk at buddyspace.org