[TagCommons-WG] mechanisms for sharing tag data

Wed Mar 7 13:25:41 PST 2007

First - I think we should have a shared conceptualization, that can then 
be *implemented* in both conformant APIs and data-interchange languages.

The question is - what's the "conceptualization" - and let's make it, as 
Tom put it, with zero semantical committment to any particular methodology 
such as databases, APIs, RDF, etc. Let's just go through the use-cases and 
get the list of properties and their values needed.

Then I'd like to see a list of common APIs/RDF vocabs/etc. from people 
like Marja, Nitin, and Richard - and then do a quick check-box "yes, no, 
or maybe" to see if the implement each of these properties. I'd do this in 
a simple HTML table on a wiki - Tom, do we have a TagCommons Wiki?

Then, once we have the list of properties down, it should be relatively 
easy to move this down to some concrete API specs and RDF | XML formats.

I imagine that, to be honeset, we'll probably end up with at least one API 
spec and then three different data-serializations (JSON, XML, RDF) with 
converters between the data-serializations. That's fine with me - as long 
as the API is well-documented so it's guaranteed to give you the same data 
in that would be in the dataformat, and you can round-trip between 
whatever data--serilization formats we use, we'll be doing great.

  On Wed, 7 Mar 2007, Tom Gruber wrote:

> In regards to the conversation among Nitin and Richard and Marja about the
> difference between database-level specifications and ontologies:
>
> An analogy might help.  Ontologies are to database schemas as database
> logical designs are to physical designs (denormalization, precision choices,
> etc).  In other words, ontologies are an abstraction away from the details
> toward the conceptual.
>
> Both ontologies and database modeling are formal with Standard languages and
> open source tools.  Both are amenable to modeling methodologies and formal
> languages.  Any model describable at the database level can be described at
> the ontology level, so there is no loss of power from specifying at the
> ontological level.  For example, if you want to model the world in terms of
> a traditional Model Driven Architecture (MDA) and UML, you can easily do it
> because UML is expressively simpler than the languages used for ontology
> definitions.
> http://www.sfu.ca/~dgasevic/Tutorials/ISWC2005/
>
> But why bother with the ontology level?
>
> The point of going more abstract is exactly because you don't want to have
> to drink someone else's koolaid or store your data in someone else's format.
> (Committing to an ontology definitely does NOT require that data ever be
> *stored* in RDF tuples -- just as buying in to SQL doesn't require that you
> use a particular table management technique.)
>
> By describing data in a common ontology, one is not agreeing to share a
> common data model but rather to have a common language with which to capture
> commonalities and differences among data sources.  For instance, at Tag Camp
> there was talk about having tags point to tags.  Why not, it's
> computationally easy to do this and you can use the tag-to-tag relation all
> kinds of ways (clustering, synonymy, etc).  But just saying that it that
> relation is many-to-many does not tell you what it means in a way that can
> say whether the tag-to-tag tuples are comparable across any two systems.
> That is because just describing the syntactic data integrity constraints
> does not tell you enough about the semantic commitment behind using that
> relation.  On the other hand, you could agree that there are a few ways to
> talk about tag-to-tag relationships, such as an explicit relationship among
> tag labels such as "isSameTagLabel".  Then you explicitly say that in system
> A, isSameTagLabel is case and space sensitive string matching, and in system
> B it is culture-specific, case insensitive, phase canonicalizing matching.
> Then, for instance, if you were comparing the frequency of tagging with some
> string on the two systems, you would know that a match in system A implies a
> match in system B, but not vice versa.  Or you could have a simple identity
> matcher that knows how to transform queries or results when talking to
> system A, so it would be consistent with system B.
>
> <soapbox>
> Folks, this is not a new thing, and the tags problem is really quite a
> trivial case that we ought to be able to come to some agreement on.
> Compare, for instance, the problem of data integration among all the world's
> geoscience data.  There are thousands of data sources in as many formats and
> schemas, and data sets which are quite large scale and complex (Google Earth
> is a tiny subset).   After decades of database-level standards -- even
> massive controlled vocabularies -- this community is turning to ontologies
> as an enabling technology for data integration across these disparate
> sources.  For example, a bunch of work under the organization called GEON
> (http://www.geongrid.org/about.html) is using ontologies for *describing the
> data* from various sources so tools can reason about the relationships and
> how to do integrated query and compute services over them.  Based on the
> *semantic* descriptions of the data (much more than cardinality and type),
> there are systems that can map from scientific hypotheses to operational
> queries from databases of geography, geology, climate, and remote sensing
> data on the biosphere (if you care, look at work by geoinformatics
> researchers Krishna Sinha, Boyan Brodaric, and Mark Gahegan).  The data are
> not only different in type, but are different in the modeling assumptions,
> resolutions, and even notions of completeness across country and state
> borders.  There are similar activities for ontology-based, intelligent data
> integration in fields such as biomedical data (NLH).  So if they can do it
> for massively complex data sets, we can do it for tags data.  Besides, we
> are us and not them.  :-)
> </soapbox>
>
> To me, the Catch 22 for ontology-based sharing is the assumption that you
> need to get other people to do more work to commit to your ontology, which
> will then give it a network-effect of value to everyone. I think we can
> break free of this paradox in simple cases such as tag data, by
> bootstrapping multiple levels at once.  In an ideal world, we can present a
> stack of ways to buy in, from the top down:
>
> - the abstract conceptualization (what is a tag assertion, etc)
> 	- its specification in a standard ontology modeling language (eg,
> OWL)
> 	- its data access over something like SPARQL
> - reference implementations in high performance database schemas (Nitin?)
> - examples of wrappers for important tag sources (flickr, delicious, etc)
> - examples of natively compliant tag sources (revyu, etc)
> - reference implementations of applications that reason over multiple tag
> sources (identity matching, clustering & visualization, tag-based search)
>
> Within the ontology level, we can easily deliver the specification in
> multiple formats including UML and ER diagrams (I thing these are done by
> tools based on OWL input) and even example database designs.  The important
> thing is get something that can be useful to lots of different stakeholders
> for the problems they currently face without assuming they share the same
> tools, data storage, or reasoning services.
>
> --tom
>
>
> _______________________________________________
> WG mailing list
> WG at tagcommons.org
> http://lists.tagcommons.org/listinfo.cgi/wg-tagcommons.org
>

-- 
 				--harry

 	Harry Halpin
 	Informatics, University of Edinburgh
         http://www.ibiblio.org/hhalpin