[TagCommons-WG] Rel-tag
Harry Halpin
hhalpin at ibiblio.org
Wed Mar 7 13:35:08 PST 2007
I'd like to draw focus onto rel-tag as a common starting place.
I might add, if we're interested in data-formats like RDF, GRDDL gives one
a way to bootstrap rel-tag data automatically into RDF (if we can get
rel-tag to get a profile URI)..
Tom mentioned to me offline that there were a host of problems with
rel-tag. Could someone iterate through them for me?
The main problem seems to be that there's not too much there, it's just a
(a) between the page you are on and (b) the target of a link.
It's clearly missing, say, who tagged the page. However, the great thing
about RDF is you can just underspecify things.
What I'd like to do is to build a sort of "layer-cake" for tags, where at
the bottom somewhere is the very basic data between between a page, a
tagged relationship, and it's tagged data. This could easily be put in an
API/RDF/XML.
Then, we can look at common services and see what else they provide, like
identity of who tagged the page, time of tagging, language, etc. And then
add these in as "slots".
But first things first! Can we support the relationship of a tag as
between "one URI" and "another URI".
Since in SemWeb world "one URI" could mean a person, then this mechanism
wouldn't necessarily exclude the idea of "people tagging pages" which I
think is much more intuitive than rel-tag's idea of "pages tagging pages".
Then we can move up to deal with more complex cases like xFolk, Annotea
annotations, and the many issues Tom brought up in his folksonomy paper.
On Wed, 7 Mar 2007, Tom Gruber wrote:
> In regards to the conversation among Nitin and Richard and Marja about the
> difference between database-level specifications and ontologies:
>
> An analogy might help. Ontologies are to database schemas as database
> logical designs are to physical designs (denormalization, precision choices,
> etc). In other words, ontologies are an abstraction away from the details
> toward the conceptual.
>
> Both ontologies and database modeling are formal with Standard languages and
> open source tools. Both are amenable to modeling methodologies and formal
> languages. Any model describable at the database level can be described at
> the ontology level, so there is no loss of power from specifying at the
> ontological level. For example, if you want to model the world in terms of
> a traditional Model Driven Architecture (MDA) and UML, you can easily do it
> because UML is expressively simpler than the languages used for ontology
> definitions.
> http://www.sfu.ca/~dgasevic/Tutorials/ISWC2005/
>
> But why bother with the ontology level?
>
> The point of going more abstract is exactly because you don't want to have
> to drink someone else's koolaid or store your data in someone else's format.
> (Committing to an ontology definitely does NOT require that data ever be
> *stored* in RDF tuples -- just as buying in to SQL doesn't require that you
> use a particular table management technique.)
>
> By describing data in a common ontology, one is not agreeing to share a
> common data model but rather to have a common language with which to capture
> commonalities and differences among data sources. For instance, at Tag Camp
> there was talk about having tags point to tags. Why not, it's
> computationally easy to do this and you can use the tag-to-tag relation all
> kinds of ways (clustering, synonymy, etc). But just saying that it that
> relation is many-to-many does not tell you what it means in a way that can
> say whether the tag-to-tag tuples are comparable across any two systems.
> That is because just describing the syntactic data integrity constraints
> does not tell you enough about the semantic commitment behind using that
> relation. On the other hand, you could agree that there are a few ways to
> talk about tag-to-tag relationships, such as an explicit relationship among
> tag labels such as "isSameTagLabel". Then you explicitly say that in system
> A, isSameTagLabel is case and space sensitive string matching, and in system
> B it is culture-specific, case insensitive, phase canonicalizing matching.
> Then, for instance, if you were comparing the frequency of tagging with some
> string on the two systems, you would know that a match in system A implies a
> match in system B, but not vice versa. Or you could have a simple identity
> matcher that knows how to transform queries or results when talking to
> system A, so it would be consistent with system B.
>
> <soapbox>
> Folks, this is not a new thing, and the tags problem is really quite a
> trivial case that we ought to be able to come to some agreement on.
> Compare, for instance, the problem of data integration among all the world's
> geoscience data. There are thousands of data sources in as many formats and
> schemas, and data sets which are quite large scale and complex (Google Earth
> is a tiny subset). After decades of database-level standards -- even
> massive controlled vocabularies -- this community is turning to ontologies
> as an enabling technology for data integration across these disparate
> sources. For example, a bunch of work under the organization called GEON
> (http://www.geongrid.org/about.html) is using ontologies for *describing the
> data* from various sources so tools can reason about the relationships and
> how to do integrated query and compute services over them. Based on the
> *semantic* descriptions of the data (much more than cardinality and type),
> there are systems that can map from scientific hypotheses to operational
> queries from databases of geography, geology, climate, and remote sensing
> data on the biosphere (if you care, look at work by geoinformatics
> researchers Krishna Sinha, Boyan Brodaric, and Mark Gahegan). The data are
> not only different in type, but are different in the modeling assumptions,
> resolutions, and even notions of completeness across country and state
> borders. There are similar activities for ontology-based, intelligent data
> integration in fields such as biomedical data (NLH). So if they can do it
> for massively complex data sets, we can do it for tags data. Besides, we
> are us and not them. :-)
> </soapbox>
>
> To me, the Catch 22 for ontology-based sharing is the assumption that you
> need to get other people to do more work to commit to your ontology, which
> will then give it a network-effect of value to everyone. I think we can
> break free of this paradox in simple cases such as tag data, by
> bootstrapping multiple levels at once. In an ideal world, we can present a
> stack of ways to buy in, from the top down:
>
> - the abstract conceptualization (what is a tag assertion, etc)
> - its specification in a standard ontology modeling language (eg,
> OWL)
> - its data access over something like SPARQL
> - reference implementations in high performance database schemas (Nitin?)
> - examples of wrappers for important tag sources (flickr, delicious, etc)
> - examples of natively compliant tag sources (revyu, etc)
> - reference implementations of applications that reason over multiple tag
> sources (identity matching, clustering & visualization, tag-based search)
>
> Within the ontology level, we can easily deliver the specification in
> multiple formats including UML and ER diagrams (I thing these are done by
> tools based on OWL input) and even example database designs. The important
> thing is get something that can be useful to lots of different stakeholders
> for the problems they currently face without assuming they share the same
> tools, data storage, or reasoning services.
>
> --tom
>
>
> _______________________________________________
> WG mailing list
> WG at tagcommons.org
> http://lists.tagcommons.org/listinfo.cgi/wg-tagcommons.org
>
--
--harry
Harry Halpin
Informatics, University of Edinburgh
http://www.ibiblio.org/hhalpin
More information about the Wg
mailing list