Dec 02 2008

Introducing OpenVocab

Published by Ian Davis at 11:37 pm under Projects and tagged as , , ,

OpenVocab is a project I have been working on in my spare time since spring. The recent VoCampsin Oxford and Galway gave me the opportunity to focus on getting it public and usable. The idea behind it is quite simple: it’s a collaborative space for building an RDF schema. It provides a simple editing interface for classes and properties plus some wiki-like characteristics that hopefully will allow many people to participate.

Often I find that people have ideas for a couple of useful properties or classes but they don’t want to go to the trouble of deciding on URIs, writing the schema, fixing problems and committing to making it available for the long term. They have better things to do. The goal of OpenVocab is to remove those barriers and make it incredibly simple to turn an idea into reality. I set up vocab.org a few years back to help with the problem of persisting schemas in a stable fashion for long periods. OpenVocab adds the other parts of the equation: naming, authoring and maintenance.

Importantly all the data and the content submitted to OpenVocab is in the public domain, free and unencumbered by any restrictions. The code that runs the site is open source as are all of its dependencies and libraries. I want this project to persist beyond a single person or organisation.

The Mechanics

All the vocabulary terms share a common URI prefix: http://open.vocab.org/terms/

The term URIs are set up to perform 303 redirects to a document describing the term. That redirection uses the HTTP Accept header to pick a suitable format. For example, visiting http://open.vocab.org/terms/favouriteDrink with a web browser will normally redirect to the HTML description http://open.vocab.org/terms/favouriteDrink.html. An RDF crawler or explorer will be sent to the RDF version http://open.vocab.org/terms/favouriteDrink.rdf. There are also turtle and JSON/RDF versions.

A full list of all the terms in the vocabulary can be obtained from http://open.vocab.org/terms which also uses content negotiation to pick a suitable format. The HTML version lists all the properties and classes in a simple alphabetical summary. The RDF version includes full descriptions of every term and of the vocabulary itself.

I enforce some strict naming conventions on term URIs, mainly because I like to have them be consistent. For example, terms are constrained to be alphanumeric plus hyphens. Properties start with a lower case first letter, classes with an upper case one. These are just generally accepted RDF conventions but there’s no technical justification for them, just my aesthetic.

I’m also recommending that labels be written in the role noun style. I encourage this by including a little bit of Javascript that dynamically plugs the label being typed into sample sentences to get the context of use right. For example property labels get plugged into sentences like “foo is “author” of thing” and “thing has “author” foo”. I’m also encouraging the additional of plural labels and using the same kind of hints: “the “authors” of thing are foo and bar” or “foo and bar are “authors” of foo”. There’s obviously no validation on the values of these labels (I certainly don’t want to write the code that checks that grammar) but providing hints like these could help keep the labelling of terms in the schema consistent. Once again this this an aesthetic choice, but all software should be opinionated.

OpenVocab supports a smattering of RDFS and OWL to help relate properties and classes to one another as well as to define their semantics. The subset of OWL supported is roughly what many people are calling RDFS++ or OWL Mini

Behind The Scenes

OpenVocab is written in PHP and uses my moriarty library and Benjamin Nowack’s ARC RDF library. The schema data is stored in a Talis Platform store (http://api.talis.com/stores/openvocab) which means it’s searchable and SPARQLable. Currently it uses the Konstrukt PHP framework to handle all the webby stuff but I plan to port it to my paget framework. All the code for OpenVocab is open source and available from Google code. I plan to write a more detailed account of how it works over on the blog.

Future Plans

I’m working on some visualisations of the vocabulary. Some of these experiments are in the Google code subversion repository but I’m still exploring ideas. I want to be able to see the relationships between classes and properties like an entity relationship diagram and view class and property hierarchies.

My next big plan is to add OpenID support. I’m in two minds whether to force the use of OpenID before allowing edits or retain anonymous editing. Suggestions and comments welcome on this issue.

At the moment every term created in OpenVocab is deemed to be “unstable”, i.e. it is subject to change at any time by any person using the site. Clearly that makes it difficult to build applications that depend on the meanings of the terms even if you subscribe to a Wittgenstein view of the world where meaning is use. My plan is to introduce a way for terms to migrate to being stable. One idea is that once a term has survived for 6 months without any edits to its semantics then it could change status to “testing”. Editing a term with this status would be more difficult and would reset the status back to “unstable”. If a term survives the “testing” phase for 12 months then it could move to a “stable” status where it would be locked down and become extremely difficult to change. That’s just one possible sequence and I’m looking for some suggestions here.

I have ideas for many minor additions too such as RSS feeds of the list of recent changes and just of new terms added. It would be nice when you enter the URI of a non-existent term if you were presented with a page prompting you to create it rather than a 404. I have more ideas around better usability and prompting to reduce all that typing of URIs. These are all the sort of things that I’ll probably do as late night wind-downs.

I also think there’s no reason why the codebase couldn’t be modified to manage multiple vocabularies. Currently it assumes a single global vocabulary but this could be switchable based on the URI used to access the site. If options were added to restrict edits to logged in users then the codebase could become a general purpose schema editing tool, one that has the advantage of keeping its data natively in RDF.

Competition!

Finally, OpenVocab needs a logo. I exhausted my limited artistic talent a few years ago so I’m looking for help here. There must be someone out there who could create a nice logo for the website, something that would work in the top-left area of the header and as a favicon. Email me your ideas at nospam@iandavis.com and I’ll publish them on the OpenVocab site and work out some way for the community to vote on one.

If you have any suggestions or want to get involved in this project then leave a comment here, email me or post a message to the RDF Schema Dev mailing list.

11 responses so far

11 Responses to “Introducing OpenVocab”

  1. Alexander Johannesenon 03 Dec 2008 at 10:38 am

    Wow, this bit of news has got me very psyched! This is great stuff. I’ll dig in more seriously by next week, but what I’ve seen and understood so far is really, really great work. I’ve been proposing this kind of stuff to the library world for years, so it’s wonderful so see this come from people the library world respects. Great stuff.

  2. Richard Cyganiakon 03 Dec 2008 at 11:12 am

    Feature wishlist:

    1. RSS feed for the recent changes to each term (I want to subscribe to the feeds for the terms I care about, so I can watch out for morons and vandals messing with them)
    2. Better versioning/history; most of all I want to see what has actually changed in the history (not just the comment); and I want to be able to view older versions so I could revert to them if necessary
    3. Ability to comment on terms. This is actually really important. Vocabulary development needs FEEDBACK.
    4. Along with comments, I should be able to give some sort of “vote”. Maybe just a “thumbs up” or “thumbs down”. This rewards contributors and emerges high-quality, consensus-built terms.
    5. Ability to attach my name to my terms, changes and comments. OpenID authentication would be a good way to do this. But mainly I want to see real names instead of just “Anonymous”.
    6. Some key statistics should be big and bold on top of each term page: How long has the term existed? How many revisions? How many comments? How many up/down votes? I want to judge in a glance if a term “works” for the community.
    7. Ability to cluster terms into vocabularies. This doesn’t necessarily mean a separate namespace, but rather just a way to show that some terms are supposed to (or can) be used together. A “vocabulary” could simply be a list of previously existing OpenVocab terms plus a wiki page for prose documentation and examples. If a vocabulary has been set up that contains term X, then the term page for X should have a link to the vocabulary.

  3. Peter Murrayon 03 Dec 2008 at 2:06 pm

    Neat work; I’m going to play around with it a little more today. One initial observation, though, is a broken URL in the page footer for the rights link:

    All text and data are in the <a href="http://open.vocab.org.local/about/rights">Public Domain&;lt;/a>.

    The current OCLC kerfuffle has me thinking more about licenses for data. Is there a common practice for RDF? Is Open Data Commons Public Domain Dedication and License appropriate?

  4. Ian Davison 03 Dec 2008 at 2:46 pm

    Richard,

    All great requests and I agree totally with them all. Some specifics:

    1 shouldn’t be very hard to implement. Number 2 is also possible: all the actual triple changes are held in the platform store as RDF, I’m just not exposing them at the moment. It should be possible to do proper diffs etc.

    3, 4 and 6 are valuable and I think could play into the transition between unstable/stable terms

  5. Ian Davison 03 Dec 2008 at 2:49 pm

    Peter,

    Yes the PDDL is appropriate but it’s only a formal way to declare what I’m saying: all the data is in public domain. In reality the facts are already in the public domain but it doesn’t harm to declare it explicitly. The editing forms also have a note saying that by clicking save the user is agreeing to donate the supplied information to the PD.

    There is no common practice for RDF, but that was Talis’ intention when we embarked on the work that became the Open Data Commons. We do need to be explicit about the rights asserted over data so we can safely reuse it.

  6. Jonathan Rochkindon 03 Dec 2008 at 3:59 pm

    This project seems to have a lot of overlap with Jon Phipp’s fairly mature Metadata Registry app. It’s awfully nice.

    http://metadataregistry.org/

    i _think_ Jon’s software is open source. I’d encourage you to coordinate/cooperate with Jon on this.

    Karen Coyle and others have already done some work in Phipp’s registry on loading traditional library cataloging vocabularies and RDA-related vocabularies in.

    Regarding the PDDL, I just posted something to the PDDL listserv about the need for an established standard for declaring PDDL in a machine-readable/recognizable way. There isn’t actually an acceptable identifier _for_ the PDDL, since the URL where it’s located asks you not to link to it. Hmm.

  7. Tayloron 04 Dec 2008 at 2:47 pm

    This is a good idea Ian, really good. This is the kind of activity that’s been missing with the semantic web, a more community driven shared vocabulary approach. None of this “sameAs” stuff works unless graphs eventually combine at some shared URI’s.

    One thing I’d like see more of is specific ranges for properties, specifically data type properties and their XSD schema types. Much of the Dublin Core, ical, examples seem to be constructed and considered by folks who have never attempted parsing that data into a typed language. ical’s date types point to classes that have no concrete definition…you have to read the schema to know what to expect. That part of RDF/OWL really escapes me…if humans have to read documents to know how to parse a string…just doesn’t seem like what we’re trying to go for.

    So it’d be interested if your tool could coax people into being a little more specific on property ranges than “resource” or “Literal”

  8. Stuart Sierraon 04 Dec 2008 at 5:57 pm

    Finally, the missing link! Happy to see this. I also second Taylor’s suggestion of encouraging specific types for datatype properties.

  9. Jakobon 05 Dec 2008 at 12:03 am

    That’s a nice toy and maybe a great tool! But for the primary step of modeling there is nothing better then a whiteboard or piece of paper ;-)

    Is there a similar tool for instances or can you somehow also add instances to openvocab?

  10. Ian Davison 06 Dec 2008 at 8:06 pm

    Jonathan I’d not seen metadataregistry before. I’m investigating it in a bit more depth. Thanks for the link!

  11. Oz DiGennaroןon 07 Dec 2008 at 4:47 pm

    A great idea. It’s important to coordinate will all of the other efforts on the web. Well, with some of them anyway. I would like to imagine how this effort could combine with others, both technically and philosophically.
    As soon as I think for a while about knowledge representation, I plunge into a morass filled with “really?”, “what does that mean?”, “is that possible”.
    Exciting! What fun! And I hope and plan to make a living from this kind of stuff too.