Apr 05 2005

Refactoring Bio With Einstein Part 1: First Steps

Published by Ian Davis at 10:56 pm under Random Stuff and tagged as , ,

I’m going to try to describe the life of Albert Einstein using the BIO vocabulary. I’m expecting this to be quite difficult but hopefully should understand better where the vocabulary is deficient. I’m keen to examine how ordering of events can be achieved using OWL-Time and I’d like to be able to enhance the BIO vocabulary to make expressing common biographical information easy.

I’m basing this micro-project on the Wikipedia biography of Einstein. That work is licensed under the GNU Free Documentation License and so I’m putting this article and associated RDF data under the same license.

Why Einstein? I chose Einstein because, being a former phycisist, I have an admiration for him and his theories. He is a popular icon, is the subject of dozens of biographies and, having lived in the modern era, there are photographs and movies that could also be relevant.

This exercise will use a combination of the BIO, FOAF and OWL-Time vocabularies. I’ll use the namespace prefixes bio, foaf and time for these.

My approach is to follow the Wikipedia article and translate each distinct event into RDF.

According to the article introduction, Einstein was born on March 14, 1879. A few paragraphs down is the following:

Einstein was born at Ulm in Württemberg, Germany; about 100 km east of Stuttgart. His parents were Hermann Einstein, a featherbed salesman who later ran an electrochemical works, and Pauline, whose maiden name was Koch. They were married in Stuttgart-Bad Cannstatt. The family was Jewish (and non-observant); Albert attended a Catholic elementary school and, at the insistence of his mother, was given violin lessons.

Here’s a skeleton document to start off:

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>

  <bio:event>

    <bio:Birth rdf:nodeID="albert-birth">
     <rdfs:label>The birth of Albert Einstein</rdfs:label>
     <bio:date>1879-03-14</bio:date>
     <bio:place>Ulm, Württemberg, Germany</bio:place>
    </bio:Birth>

  </bio:event>

</foaf:Person>

That’s a bit dry. I’m not expressing any of the relationship information from the original article. Here’s what it could look like if I used the Relationship vocabulary:

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>

  <rel:childOf>
    <foaf:Person rdf:nodeID="hermann">
      <foaf:name>Hermann Einstein</foaf:name>
      <rel:fatherOf rdf:nodeID="albert" />
      <rel:spouseOf rdf:nodeID="pauline" />

    </foaf:Person>
  </rel:childOf>

  <rel:childOf>
    <foaf:Person rdf:nodeID="pauline">
      <foaf:name>Pauline Einstein</foaf:name>

      <rel:motherOf rdf:nodeID="albert" />
      <rel:spouseOf rdf:nodeID="hermann" />
    </foaf:Person>
  </rel:childOf>

  <bio:event>
    <bio:Birth rdf:nodeID="albert-birth">

     <bio:date>1879-03-14</bio:date>
     <bio:place>Ulm, Württemberg, Germany</bio:place>
    </bio:Birth>
  </bio:event>

</foaf:Person>

However, the problem is that the relationship vocabulary assumes a fixed point in time, whereas the bio vocabulary attempts to express different states across a period of time. Some relationships are immutable throughout time, e.g. childOf, whereas others apply for definite periods, e.g. spouseOf. One interpretation is to assume that the relationships hold for at least some period of time but it is not safe to use them for analysis of time-sensitive data. In other words you can ask “were these two people ever married” but you cannot ask “were the parents of this person married when he was born?”

It would be possible to use a modified relationship schema where the domain of the properties is some kind of “Person At Point In Time” but that feels unnatural. A better way, in my opinion, is to explicitly represent the marriage as a time interval. I can’t use the bio:Marriage class because that represents the actual marriage ceremony, instead I need to use an general Event instance:

<bio:Event rdf:nodeID="hermann-and-pauline-being-married">
  <rdfs:label>The event of Hermann and Pauline being married</rdfs:label>
</bio:Event >

In the BIO vocabulary Event is defined as “A general event, i.e. something that the person participated in.” – it can be any episode with a duration. I can also relate this event to Hermann and Pauline’s marriage:

  <foaf:Person rdf:nodeID="hermann">
    <foaf:name>Hermann Einstein</foaf:name>
    <bio:event>
      <bio:Marriage rdf:nodeID="hermann-and-pauline-marriage">

        <rdfs:label>The marriage of Hermman Einstein and Pauline Koch</rdfs:label>
        <bio:place>Stuttgart-Bad Cannstatt</bio:place>
        <time:intMeets rdf:nodeID="hermann-and-pauline-being-married" />
      </bio:Marriage>
    </bio:event>

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

  <foaf:Person rdf:nodeID="pauline">
    <foaf:name>Pauline Einstein</foaf:name>
    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

The intMeets property says that the married interval starts directly after the marriage event. I’ve also associated the event of being married with each person so they are explicitly participating in the event.

I can enrich the structure of the biography by asserting that Albert was born during his parent’s marriage. It seems to me that it should be possible to derive this fact, but until I understand more about this I need to state it explicitly can do this on the married interval:

<bio:Event rdf:nodeID="hermann-and-pauline-being-married">
  <rdfs:label>The time during which Hermann and Pauline were married</rdfs:label>
  <time:intContains rdf:nodeID="albert-birth" />

</bio:Event>

or, equivilently, in the Birth event itself:

<bio:event>
  <bio:Birth rdf:nodeID="albert-birth">
   <rdfs:label>The birth of Albert Einstein</rdfs:label>
   <bio:date>1879-03-14</bio:date>

   <bio:place>Ulm, Württemberg, Germany</bio:place>
   <time:intDuring rdf:nodeID="hermann-and-pauline-being-married" />
  </bio:Birth>
</bio:event>

The Wikipedia article mentioned that Albert attended a Catholic elementary school and took violin lessons. I’m going to express those as events:

<bio:event>

  <bio:Event rdf:nodeID="albert-attending-elementary-school">
    <rdfs:label>The event of Albert attending elementary school</rdfs:label>
    <time:intAfter rdf:nodeID="albert-birth" />
  </bio:Event>
</bio:event>

<bio:event>

  <bio:Event rdf:nodeID="albert-taking-violin-lessons">
    <rdfs:label>The event of Albert taking violin lessons</rdfs:label>
    <time:intAfter rdf:nodeID="albert-birth" />
  </bio:Event >
</bio:event>

So, what have I been able to represent from that single paragraph of biography? I’ve represented Albert Einstein’s date and place of birth; his parent’s marriage and the fact that Albert was born during the marriage; Albert attending elementary school and taking violin lessons. Each of the events is related to another event to assist with automatic ordering.

What haven’t I represented? His father’s occupation; his mother’s maiden name; the family’s faith. I haven’t explictly stated that Albert is the son of Hermann and Pauline and their participation in the marriage event isn’t strong enough to state that they were actually the couple getting married (other people can participate such as a minister or witnesses). Also the events have no colour – I have dry labels describing the mechanics of the event, but nothing with personality.

I need to be able to annotate events and provide commentary. I need also to be able to resolve the roles of participants in events. I’ll be thinking about those issues for part two.

Here’s an RDF file that collates what I’ve done so far and graphical representation of the graph.

See also: posts in the “Refactoring Bio” series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

11 responses so far

11 Responses to “Refactoring Bio With Einstein Part 1: First Steps”

  1. Richon 05 Apr 2005 at 11:29 pm

    That’s a fascinating example — thank you for blogging about it!

  2. Ed Davieson 06 Apr 2005 at 10:51 am

    Surely treating place and time as independent properties is inappropriate in this particular context. ;-)

  3. Ian Davison 06 Apr 2005 at 11:07 am

    Hadn’t thought of that! Relativistic properties… :)

  4. Drew Perttulaon 06 Apr 2005 at 4:40 pm

    This is a great start, and seems like it might have some good ideas to generalize resume/CV representations too (http://captsolo.net/semweb/resume/cv.rdfs feels too specific to me- e.g. how do I put dates on the projects I did at one job?)

  5. Drew Perttulaon 06 Apr 2005 at 4:58 pm

    That’s a great start, and I think the use of events and OWL-Time could
    also be used to improve resume/CV
    descriptions. http://captsolo.net/semweb/resume/cv.rdfs feels too
    specific to me- e.g. how do I describe some important projects, with
    dates, that I did at a job?

  6. Leo Sauermannon 14 Apr 2005 at 2:17 pm

    Hi Ian,

    one thing I have to add first:

    Please do a review of all the rdfs:comments in the bio namespace. They are abigous and the word “this” and “a person”. Please state exactly, which role the subject and object of the triple plays!!

  7. Leo Sauermannon 14 Apr 2005 at 2:18 pm

    oh, whoa. I meant the Relationship namespace. I was missing the exact stuff in the relationship ontology!

  8. joeon 25 May 2005 at 4:44 pm

    looks good. but the “Here’s an RDF file that collates what I’ve done so far” link is empty.

  9. Ian Davison 25 May 2005 at 5:04 pm

    Ah, I guess it would be useful to have something in that file :)

    Hopefully should be fixed now.

  10. Refactoring Bio With Einstein Part 1: First Steps
    Refactoring Bio With E…

  11. joeon 28 Aug 2005 at 1:11 pm

    Ian

    how is this effort going ? is there anything like this with the wiki guys ?
    I think it is what we need to see in all the wiki values.