Archive for October, 2005

Oct 19 2005

Introducing Embedded RDF

Published by Ian Davis under Uncategorized and tagged as ,

This is a copy of a message I sent to the W3C semantic web discussion list today.

I’ve been working on an enhanced way to embed RDF into XHTML that doesn’t require any new markup. There are existing methods such as the recommendations for expressing Dublin Core in HTML[1] but these are limited in scope and expressivity.

I’ve also spent quite a lot of time studying the underlying principles espoused by the Microformats community[2] who are reusing the semantics of XHTML to express commonly used data structures such as contact details or event descriptions. Some of the most important principles are:

  • Visible Metadata – by making metadata visible consumers can easily form an opinion on whether to trust the author. Metadata hidden away in meta tags is easily abused for search engine placement or other gain since most visitors don’t inspect the source code of the page. This principle also helps keep the metadata relevant. Hidden metadata is easily forgotten and can easily go stale whereas if it were visible to humans incaccuracies would soon be discovered and fixed.
  • DRY (don’t repeat yourself) – very often we maintain separate RDF documents with HTML equivilents. Unless these are automatically generated it’s very easy for them to get out of synch. This principle suggests that the metadata should be expressed only once whether it’s for humans or machines
  • Reuse Not Reinvention – if we reuse existing formats then we immediately gain the benefit of being able to use existing tools to generate and consume the metadata. We also hook into the experience and knowledge of the thousands of people who have invested time and money getting to grips with existing technologies.

I’ve taken these principles on board in my design. Embedded RDF uses XHTML attributes such as ‘rel’, ‘rev’, ‘class’,'href’,’src’ and ‘id’ to embed RDF triples into an XHTML document. Triples can have a subject of the embedding document or of a fragment within that document. It’s possible to use the ‘rev’ attribute to embed triples about other resources but the object of the triple must be the embedding document or a fragment within it.

For example, the following XHTML:

<div><address class="dc-creator">Ian Davis</address> wrote this</div>

embeds the triple:

<> dc:creator "Ian Davis" .

Schema prefixes are declared using link elements in the head of the document, exactly as defined in the Dublin Core specification:

<link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />

The ‘id’ attribute is used to denote a seperate resource:

<p id="ian"><span class="foaf-name">Ian Davis</span> wrote this</p>

This generates a triple like this:

<#ian> foaf:name "Ian Davis" .

Anchor elements are used to refer to other resources:


<p id="ian">
  <a rel="foaf.homepage" href="http://example.org/home">my home page</a>
</p>

embeds the following triple:

<#ian> foaf:homepage <http://example.org/home> .

I hope this has given you a flavour of how Embedded RDF works. There is comprehensive documentation on our wiki here:

http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml

I’m also working on some examples. An example FOAF in XHTML document is here:

http://research.talis.com/2005/erdf/foaf-in-html.html

And a sample DOAP in XHTML document:

http://research.talis.com/2005/erdf/doap-in-html.html

There is also an RDF extraction service that scans an XHTML document for Embedded RDF and generates RDF/XML from it

http://research.talis.com/2005/erdf/extract

This work is by no means complete, but I’m soliciting early feedback on the approach and the utility of embedding RDF into XHTML. Please feel free to share your thoughts on the wiki or email me if you have specific questions you’d like to have answered.

2 responses so far

Oct 18 2005

Ward Cunningham Moves On

Published by Ian Davis under Uncategorized and tagged as

I did wonder whether Microsoft would really grok Ward Cunningham, now it seems they didn’t understand what they had:

Microsoft Corp. has lost one of its high-profile hires to an open-source consortium. Mike Milinkovich, executive director of the Eclipse Foundation, announced on Monday that Ward Cunningham is leaving Microsoft to join the staff of the open-source tool consortium.

Found via recently rediscovered Sam Gentile

Comments Off

Oct 17 2005

Refactoring Bio With Einstein Part 2: Conditions

Published by Ian Davis under Random Stuff and tagged as

Genealogy is mostly a detective game dealing with partial knowledge. Given a few facts and dates, the researcher makes informed estimates for related events. For example if you know that someone was born in 1905 you might start searching for their marriage around 1925-1935 because most people marry in their twenties. If you don’t find it you might first look earlier, back to around 1920 and onward from 1935. The genealogist mentally assigns a probability of success to the search and focusses on the highest probability ranges first. Lots of factors affect this estimate. Fashions of the period might affect average marriage ages as can family traditions.

Working the other way is important too. Often the genealogist has a marriage certificate with an age. It’s natural to subtract the age from the date of marriage to look for the birth of the indivudual. However, this is skewed you allow for people adjusting their ages upwards because they’re under the legal age of marriage, or downwards to close up a scandalous age difference with their partner! It’s not unknown for people just to forget or not know how old they are, especially if they had little or no contact with their birth parents.

It is often useful to know the whereabouts of an individual to narrow down a search for information. For example, if it was known that a person was in the army and stationed in a particular country for an extended period of time then that would restrict the range of a search for the person’s marriage certificate.

With these points in mind, I set about trying to understand how I can better model the time-related information used in genealogy. One of the outstanding problems I mentioned at the end of the previous part of this series was that I hadn’t represented Einstein’s mother’s maiden name. Currently I have this description of Pauline:

  <foaf:Person rdf:nodeID="pauline">
    <foaf:name>Pauline Einstein</foaf:name>

    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

I could simply add her maiden name:

  <foaf:Person rdf:nodeID="pauline">
    <foaf:name>Pauline Koch</foaf:name>

    <foaf:name>Pauline Einstein</foaf:name>
    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

But that isn’t very useful since there’s no way to tell which name to use at which time. To get around this I could introduce a maidenName property to explicitly note her former name:

  <foaf:Person rdf:nodeID="pauline">
    <ex:maidenName>Pauline Koch</ex:maidenName>
    <foaf:name>Pauline Einstein</foaf:name>
    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

This works for this simple example but what about people who remarried more than once or people who changed their legal names? It seems that name is an example of a mutable property that takes different values depending on when you ask. How could I represent it? Here’s one way, by modelling the condition of the person at a point in time:

  <foaf:Person rdf:nodeID="pauline">
    <bio:condition>
      <bio:Condition>

        <foaf:name>Pauline Koch</foaf:name>
      </bio:Condition>
    </bio:condition>

    <bio:condition>
      <bio:Condition>

        <foaf:name>Pauline Einstein</foaf:name>
      </bio:Condition>
    </bio:condition>

    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

Here I’ve introduced a new class Condition which I think of as the state of being for an individual at a particular period of time. I can ground those conditions in time by relating them to other events:

  <foaf:Person rdf:nodeID="pauline">
    <bio:condition>

      <bio:Condition rdf:nodeID="pauline-maiden-name">
        <time:intMetBy rdf:nodeID="pauline-birth" />
        <time:intMeets rdf:nodeID="pauline-married-name" />
        <foaf:name>Pauline Koch</foaf:name>
      </bio:Condition>

    </bio:condition>

    <bio:condition>
      <bio:Condition rdf:nodeID="pauline-married-name">
        <time:intMetBy rdf:nodeID="hermann-and-pauline-being-married" />
        <foaf:name>Pauline Einstein</foaf:name>

      </bio:Condition>
    </bio:condition>

    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

The condition of pauline having her maiden name starts as soon as she is born (referred to here by a new blank node ‘pauline-birth’) right up until the condition of her being known as Pauline Einstein which starts at the same time as her being married. There’s no end to this latter condition. As far as I know she kept her name until her death and that is how she is known today.

The concept of Condition may be quite useful. It looks to me as though most conditions will be bounded by events, e.g marriage and divorce events bound the condition of being married. This suggests that a better definition of event is something that brings about a change in condition of an individual

There is a domain issue around the usage of FOAF properties with Conditions. The domains of most of the relevant properties in FOAF are either foaf:Person or foaf:Agent. I’m not sure that I can reconcile the notion of a Condition being a Person or vice-versa. This might mean that I have to create parallel properties for some of the more interesting FOAF ones such as foaf:name and its ilk.

How far do we go with this? I think that most properties relating to a person are temporal in nature. Many of the properties used by FOAF to identify people are mutable over time: foaf:mbox, foaf:weblog, foaf:homepage. For FOAF that doesn’t cause a problem. FOAF is intended to be a general description of a person, not necessarily at a point in time. However, that approach isn’t appropriate for things like genealogy and biographical writing. In those activities the goal is to create a narrative of an individual’s life. This is what I’m aiming at with BIO. I’d like to be able to generate a timeline of a person’s life from an RDF description. It should also be possible to pick a point in time and produce a FOAF description of that person at that time.

Although most attributes of a person do change, there are a couple of relationship-oriented ones that are immutable throughout throughout the lifetime of a person: father and mother. Here’s how they could be used:


<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>
  <bio:father rdf:nodeID="hermann" />
  <bio:mother rdf:nodeID="pauline" />
  ...
</foaf:Person>

This looks to be a useful way to build up simplistic family trees.

So, in conclusion I now have a way to represent the relationship of Einstein to his parents and a way of representing the fact that his mother was known by different names at different points in time. Not quite what I thought I’d be focussing on last time but still progress. What’s remaining? Looking at my list I have still to represent his father’s occupation; the family’s faith; the roles of the participants in events and annotation of events and conditions. More thought for next time…

See also: posts in the “Refactoring Bio” series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

4 responses so far

Oct 06 2005

Web 2.0 Day One – A Personal Glimpse

Published by Ian Davis under Uncategorized and tagged as

Not sure this is appropriate for the silkworm blog but this is my site and so anything goes here!

I’ve been at the Web 2.0 conference all day (see the silkworm blog for my blow-by-blow accounts of the workshops). I stopped blogging at the full sessions this afternoon, leaving that task to Paul (here, here, and here) who was much more diligent than me!

I enjoyed Barry Diller’s session! For a man of his age (all of 63!) he seemed completely switched on and very engaging He came across as very personable which I found strange for a co-founder of Fox! The rest of the participants were out of place and inappropriate for the Web 2.o conference. No disrespect meant to obviously very smart and influential people – but they just weren’t the kind of people I came to see. For example, Philip Rosedale, founder of the immersive environment Second Life, seemed to expouse the antithesis of Web 2.0. Rather than embrace the economics of abundance, Paul’s system actively impedes the free and open exchange of bits. To participate in Second Life you have to buy virtual real estate from his company – real estate that costs them almost nothing to provide. They are maintaining an artificial scarcity in land, objects and facilities.

Last of all was Omid Kordestani from Google. This was another disappointing session – Omid held the party line throughout and even ducked a pertinanent question from Tim O’Reilly on the usage rights behind the Google Maps data. “I’ll be happy to discuss with you offline” was the unacceptable response. Can you believe this guy was here to participate in the new Web conversation?

After a short visit back to the hotel todrop off the laptop we returned for the MSN sponsored dinner. I was lucky enough to share a table with Ross Mayfield from SocialText and the guys from KnowNow who are relaunching themseves as a new startup (didn’t they do that in 2000? Ah it’s enterprise RSS this time!). We were joined by Rohit Khare, who originally founded KnowNow based on his thesis on pub/sub technologies.

Once Microsoft appeared on stage the fun began! We were watching Ray Ozzie (cool), Yusuf Mehdi (who?) and Gary Flake (cool when he was at Yahoo, but now…?) being interviewed by John Batelle and Tim O’Reilly. Of course, our table quickly started a game of buzzword bingo and we were being very loud and raucous whenever an interviewee said “long tail”, “synergy” or “AJAX” amongst others. We held out for the slam dunk of “social” but it never came! Can you imagine talking for an hour at the Web 2.0 conference and not mentioning social even once! Poor, poor, poor.

The best quote from Microsoft (Ray Ozzie, I believe): We believe that in any system the enterprise owns the data first. Secondly the use owns some data too. What kind of message is that? Who owns the data I create in Web 2.0? Me of course!!!

Our table gained a few hard stares from the panel, and some cheers of support from our neighbours – I blame Rohit entirely!

To top the evening off, we were cleared out of the dining room without a pudding or even a coffee. Oh dear, must try harder. At least the subsequent Google party had free drinks with electric icecubes… pictures to follow at some point! But, now to bed to recharge the just-in-time blogging engine. Bye!

Comments Off