Mar 23 2007

BNodes Out!

Published by Ian Davis at 8:13 pm under Uncategorized

Joost joins the list of real world applications that are eschewing blank nodes in their RDF. At Talis we’re doing the same - our RDF versioning protocol doesn’t support bnodes either and we actually replace them with URIs in many places. Over the past year I’ve spoken with several companies making practical, commercial use of RDF and none of them are using bnodes. In fact they are actively avoiding them. Two of these companies claim to have stores in the multi-GigaTriple range.

So far I haven’t found a use case for bnodes that can’t be catered for by URIs - can anyone prove me wrong?

13 Responses to “BNodes Out!”

  1. Chris Bizeron 24 Mar 2007 at 8:18 am

    No, I can not prove you wrong, as I think you are right at the point here.

    bNodes are especially harmful in the context of Linked Data on the Web, where you want to navigate and crawl information.

    Think for example of all the unnecessarily complicated smushing stuff when it comes to FOAF on the Web.

    So, we also don’t use bNodes in our content publication projects like dbpedia, DBLP or the RDF Book Mashup. Our tools like D2R Server, the DISCO browser or the Semantic Web Client Library somehow support bNodes, but work much better without them.

    I also think the SIMILE people follow a similar approach.

    So just forget, alongside with reification, about bNodes and use Web-dereferencable URIs for everything.

    Cheers

    Chris

  2. Dannyon 24 Mar 2007 at 9:38 am

    I doubt there is a counter-case, IANAL but giving bnodes names wouldn’t seem to make a big difference in terms of capabilities. Replacing them with non-http: URIs wouldn’t produce any benefit as far as I can see, and as Chris suggests it’s desirable to be able to dereference.

    But I’m really not convinced bnodes should be thrown away. For a start, there is a cost to minting URIs - having to make them cool (and presumably useful) for eternity.

    Presumably each part of an infrastructure could be optimised to a bnode-free model (e.g. have the server framework offer automatically templated RDF/XML representations of locally-minted URIs, which would grow as a list of sameAs/equivalence statements). But at this point in time I’d worry that this might be premature optimisation looking globally.

    The smushing stuff is a bit of a red herring - if you have a person identified on two separate systems and you want to know if they’re the same person, you still have to smush against a IFP property or use heuristics (unless you use their mailbox or homepage URI their personal URI, which messes up the modelling). You then have to figure out how best to manage two (or more) URIs for one person. Ok, maybe the cost in this particular could be reduced by encouraging the use of personal URIs as portable IDs. But what of things like “talis:iand’s favourite colour”?

    The idea of authoritative URIs is more manageable without arbitrary URI creation, whether that idea is likely to be useful in the long term is another matter.

    So although it seems a reasonable rule of thumb to avoid bnodes, in the general case I think it’s just moving the cost elsewhere. Getting rid of them would reduce the opportunities for dealing with the cost in a way that best fits the local system. Having said all that, my main objection is just a feeling that existentials have a nice small-scale symmetry with the open world model at large.

    Let’s say you encounter an insect of a kind you’ve never before, but presumably is somewhere in the Universal Creature Guide. Is it better to give it a new *global* name, which will effectively be pushed into the Universal Creature Guide as (another) alias, or just to use a local placeholder with associations like “the bug I found in my salad from M & S in Shirley on Thursday”..?

    Anyhow, what of literals? If bnodes are bad, surely literals are worse in the way they hide content in a non-dereferenceable form?

  3. leoon 26 Mar 2007 at 10:26 am

    hey, I know a bnode with a uri:
    http://www.bnode.org/

    seriously: bnodes suck and in the semantic desktop projects (gnowsis, nepomuk.semanticdesktop.org, aperture.sf.net) we try to avoid them or forbid them completly.

    there are enough problems connnected with bnodes, but still they are good to say “hey, this uri must not be clicked and can be smushed at free will”.

    I would suggest something like a new uri scheme or urn scheme using UUIDs as bnodes:
    urn:bnode:
    bnode:

    UUIDs are here: http://en.wikipedia.org/wiki/UUID

  4. iandon 26 Mar 2007 at 11:01 am

    Danny, Minting new URI’s instead of bnode IDs probably doesn’t cost anything at all of you use document fragments for the URIs. They’re even locally scoped.

    I did think of one thing that bnodes give you over using a URI: they are guaranteed to be safely handled when smushing documents, i.e. that can be used to prevent accidental merging that could occur when the same URI were inadvertently used in two documents. But we smush documents containing a mix of URIs and bnodes all the time and this has never really presented itself as a problem, so I think it’s only a minor advantage.

    The other thing to consider is that in every RDBMS implementation of RDF I have seen bnodes are converted to URIs internally so they can be stored in the subject column of a triple table.

  5. Richard Cyganiakon 26 Mar 2007 at 2:35 pm

    Semantically, a bNode is the same as a throwaway URI. Both have their advantages. Throwaway URIs can be linked to, and you can simplify your data model by forbidding bNodes. On the other hand, URIs should be kept stable, and keeping URIs stable has a cost. Using bNodes avoids it in cases where you don’t want to incur this cost.

    I’m not entirely convinced that bNodes should be removed from RDF. But avoiding them for practical reasons is usually a good idea.

  6. iandon 26 Mar 2007 at 2:52 pm

    Richard, I don’t think we should change RDF, but it might be interesting to define a subset that still has the expressivity that we need to get things done.

  7. iandon 26 Mar 2007 at 3:30 pm

    I should have added …and makes our applications simpler to build and use

  8. Henry Storyon 26 Mar 2007 at 8:35 pm

    Removing bnodes won’t remove the problem that bnodes are meant to solve, namely smushing, as you will later be forced to smush uris instead of bnodes, and just have a huge number of uris instead. So the problem remains.

    Having said that bnodes should be avoided if good uris can be found. And often with a little web site organisation uris can be created for people for example. It certainly helps a lot to have dereferenceable urls. Exchanging bnodes for URNs does not seem to give one much, apart from the trouble of having to mint urns.

    There is a good case for bnodes. Imagine you tag something with “bank”. You want to say something like

    </page.html> :tagging <http://tagger.com/bblfish/tag/10002>.
    <http://tagger.com/bblfish/tag/10002> :by <http://bblfish.net/people/card#me>;
    :tag [ a skos:Concept;
    skos:label “bank” ] .

    Here you say that you are tagging something with a concept, but you don’t yet know which concept it is.
    Perhaps later that tag can be nailed down as being a and then the blank node will be given a URI.

    This helps keep things indeterminate while they are.

  9. iandon 26 Mar 2007 at 11:14 pm

    Henry, the problem isn’t the large number of resources, bnode or otherwise, but that by their very nature bnodes can only be handled via indirection.

    From my work at Talis, and from conversations with many people putting RDF to work, being able to diff and patch RDF graphs is very very important. You just can’t do that sensibly with the endless indirection that bnodes require. With named nodes (URIs or literals) it’s trivial.

    I also don’t think your example is any more difficult with a URI vs a bnode. True, it’s easier to write down in N3 using bnodes, but it’s almost the same markup for RDF/XML. Also, just because it’s written down as a blank node doesn’t mean it has to be parsed that way into a triple store - the parser could substitute generated URIs and nothing would break, no meaning would be lost.

  10. Henry Storyon 27 Mar 2007 at 7:27 am

    perhaps we need a bnode urn :-)

    urn:bnode:sflsdjflskdjflksdjf

    Great podcast with Nova Spivak btw.

  11. Tony Hammondon 27 Mar 2007 at 10:00 am

    Have to pipe up here and mention the YADS data model that I had earlier proposed and still maintain over here:

    http://nurture.nature.com/tony/yads/

    Blurb reads as such: “YADS implements a simple, safe and predictable recursive data model for describing resource collections. The aim is to assist in programming complex resource descriptions across multiple applications and to foster interoperability between them.”

    So, the YADS model makes extensive use of bNodes to manage hierarchies of “fat” resources - i.e. resource islands, a resource decorated with properties. The bNodes are only used as a mechanism for managing containment. There is certainly no intention to globally reference the bNode “resource”. I guess one could say that the actual resources managed by YADS (those accorded a URI) are qualified (witth properties) *in context*. That is in the context of the complete YADS description. The “fat” resource managed by the bNode does not have or need a permanent global identifier.

    Seems to me that bNodes perform a very useful function. Yes, I am aware of the “smushing” problem but I think this is a red herring. bNodes give us the possibility of creating local “clumpiness” within the general RDF graph. If everything is reduced to global resources then the RDF graph will remain flat and homogenous and generally unspeakably uninteresting. IMHO. Like the primitive universe with no synthesis of elements and especially the heavier elements. Just a primordial soup.

    I think Danny is also “spot on” when he talks about the cost of minting URIs. As a publisher we are all too aware that URIs are expensive to maintain. This is why scholarly publishing in particular has invested considerable effort in developing the DOI (Digital Object Identifier - http://doi.org/) as a solution to maintaining persistent reference linking (see also CrossRef - http://crossref.org/). Even disposable URIs have an associated cost to mint. I guess top of my head the only no-cost solution to minting URIs would be data: URIs because there is no naming authority to contend with. (I’m not sure about the ethics of using someone else’s DNS name in a tag: URI.)

    In sum, bNodes are useful. Less is more.

  12. Henry Storyon 27 Mar 2007 at 10:22 am

    Another thought that occurred to me. If you don’t want bnodes, then you should quickly ask for a SPARQL enhancement, so that they can incorporate the N3 :- sign. Otherwise writing SPARQL queries is going to be a little tedious.

    This is because you want to write things like

    <kkk> :rel [ :- &lturn:bnode:xyz>;
    a foaf:Person;
    foaf:mbox <mailto:joe@eg.com> ] .

    instead of

    <kkk> :rel <urn:bnode:xyz> .
    <urn:bnode:xyz> a foaf:Person;
    foaf:mbox <mailto:joe@eg.com> .

    The :- keyword (timbl also proposed ‘is’) just gives a name to the blank node. It’s semantically equivalent to owl:sameAs,
    on an infering DB, but most DBs won’t be inferring. But it allows the human reader to see the structure of the graph a lot more easily, especially as the graph gets larger. Perhaps one can think of it as the equivalent of the rdf/xml ‘about’ attribute.

    Henry

  13. Chimezieon 28 Mar 2007 at 10:35 am

    I’ve reluctantly come around to the same conclusions around BNodes: they cause more harm than good. *Ahem* SPARQL