[TagCommons-WG] SPARQL and query services over tag data

Fri Mar 2 14:49:07 PST 2007

Thanks, Tom Heath, for your point about SPARQL
<http://en.wikipedia.org/wiki/SPARQL> ; I added it to the sharing
<http://tagcommons.org/2007/03/02/ontologies-vs-formats-vs-schema-vs-apis/>
mechanisms list on the blog post.

(Group: Here is a good
<http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-semantic-we
b-tutorial.html>  introductory article on SPARQL that explains that it is
both a query language and a data access protocol layered on Web Services.  I
would also point people to the tools and services enabled by the Redland RDF
Libraries <http://librdf.org/>  maintained by Dave Beckett, who is on this
list).

Your example of revyu.com and del.icio.us is instructive.   As you and
others have pointed out, the various ontologies for tagging we have looked
at all have core concepts that could be mapped to the data that both
revyu.com and del.icio.us expose through APIs.  The main difference seems to
be in the completeness of the inferential/retrieval services exposed.  I
don't know whether being a compliant SPARQL endpoint implies returning
results for all legal (and computationally tractable) queries in the
language (does SQL?  I think so).  The del.icio.us API only answers some
kinds of queries, such as "return all the tag assertions by me".

If we specified a tag ontology, and showed how it mapped to various APIs
(possibly by just writing some of them as in the XSLT example from Marja),
then we would have the ability to write brokers and mediators that could
query across sources -- with one caveat.  The broker / mediator would have
to know the computational capabilities and limitations of each endpoint, and
reason about that.  For example, one service might be able to directly
return the set of tag labels for a given item by a given person, whereas
another could only return the whole row set of tag assertions by any person
using any tag label for the item.   The output of the latter would need to
be buffered and transformed in some way to return the requested query
results.  This is isomorphic to the operations done internally by a
relational database engine, except that it is distributed over systems with
unknown performance characteristics on various queries.  I imagine some
people are doing research in this area.

Another approach would be use RDF as a kind of canonical form for a data
warehouse, where tag data from various systems are crawled/synchronized and
put into a common pool of data, which in turn could be accessed using a
SPARQL query service.   Is this the sort of thing that Linking
<http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
>  Open Data is doing?

--tom

<snip>

> -----Original Message-----

> From: wg-bounces at tagcommons.org [mailto:wg-bounces at tagcommons.org] On

> Behalf Of T.Heath

> Sent: Friday, March 02, 2007 5:38 AM

> To: Tag Commons Working Group

> Subject: Re: [TagCommons-WG] mechanisms for sharing tag data

> 

> Hi all,

> 

> 

> This looks like a really useful basis for our discussions, and pretty

> comprehensive. The one change/addition I would make is to explicitly add

> SPARQL into the list. It goes a tiny bit along with point 3 but mostly

> with point 7. It's worth noting that SPARQL itself doesn't assume any

> particular data model (so different from 3 in that sense), only the

> query writer needs to know about the underlying data model of the SPARQL

> endpoint (and in some cases even that isn't required; but that's a

> discussion for elsewhere).

...

> Perhaps another useful exercise would be to think about specific

> examples of services and how they fit in with the points above, or where

> specific services fit in a space defined by those dimensions. It's

> always useful to have something tangible to look at; for example, the

> del.icio.us API doesn't allow for real querying. The RSS1.0 output is

> cool and quite flexible, but (AFAIK) you can't get directly from

> del.icio.us (e.g.) all the URLs that I've tagged "blog" since an

> arbitrary date. In that sense it scores well on processibility due to

> the RSS1.0 output, but less well on queryibility. I hope that Revyu.com

> via it's SPARQL endpoint [1] scores well on all the dimensions.

> 

> [1] http://revyu.com/sparql/welcome (temporarily quite locked down -

> contact me if you'd like access)

> ...as an application developer/researcher/Semantic Web person, I can't

> advocate strongly enough an approach based on URIs+RDF+SPARQL. There

> would no doubt be a lot of learning to do along the way, but we stand to

> gain an awful lot. Moves such as the Linking Open Data project [2] are

> working through many of the issues; tag sharing initiatives such as this

> could easily piggyback on the project's findings.

> 

> [2]

> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenD

> ata

> 

> > - How can ontologies be layered on top of existing tag data sources >

> that we don't own (for example, how does Annotea talk to

> > del.icio.us?

> 

> Exposing del.icio.us data (or at least the x latest items tagged y)

> according to Richard's or Tom's tag ontology should be quite

> straightforward using a SPARQL CONSTRUCT query. I'll have a look into

> this.

</snip>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tagcommons.org/pipermail/wg-tagcommons.org/attachments/20070302/c59056fd/attachment-0004.htm>