Archive for October, 2004

Oct 18 2004

Jena RDF Output

Published by Ian Davis under Uncategorized

Danny Ayers mentioned Jena’s RDF/XML output in a message to RSS-DEV. It has a concept similar to my idea of preferred namespaces, in that it allows you to specify “pretty types” which are the types of the principle objects (which could be rss:channel and rss:item). These pretty types will be placed at the top level of the document if possible. It also options to disable certain RDF/XML grammar rules such as reification, list expansion, various parse types, id attribute and property attributes. At a glance, it looks like I could map most of my rules into Jena’s writer which will help from an implementation point of view.

2 responses so far

Oct 17 2004

Constraining RDF/XML

Published by Ian Davis under Uncategorized

These are a few notes about rules to constrain RDF/XML output in order to make the serialisation of any given graph deterministic. One strong motivator is RSS 1.0 which uses a restricted profile of RDF/XML designed for compatibility with RSS 0.90. Because of this restriction, I always end up writing a custom RDF writer for my aggregated RSS - just pumping it out as RDF/XML is a no-no. It got me thinking that perhaps there were some simple rules that I could build into my standard RDF/XML writer that would automatically produce RSS 1.0. If the rules produced a lossless representation of the graph then I could use them for any RDF graph I wanted to serialise.

My rough rules so far are:

  1. Output blank nodes that are not the object of any triples as a top-level element without a generated nodeID.
  2. Output any blank nodes that are the object of only one triple as a child element of the triple property without a generated nodeID.
  3. Output blank nodes that are the object of two or more triples as a top-level element with a generated nodeID.
  4. Output all other subject nodes as top-level elements.
  5. Allow the specification of an ordered list of preferred namespace URIs.
  6. Always output typed nodes unless no rdf:type has been specified. If the node has multiple types, use the following algorithm to select the “preferred type”:
    1. split each URI into a namespace and local name
    2. group URIs by namespace
    3. order groups in the same order as the preferred namespace list
    4. sort within each group by the local name, ascending
    5. pick the first URI from the first group

    (This ensures that the main node is written as rss:channel rather than, say, foaf:Document in RSS 1.0)

  7. Top-level nodes should be ordered by “preferred type”. Use the algorithm above to determine the order in which the top-level nodes should appear. The result should be typed nodes from preferred namespaces occuring first in the output document. (This ensures that rss:channel appears before rss:item and all other elements appear at end in RSS 1.0)
  8. RDF container elements: if a contiguous sequence of numeric list properties exist (i.e. rdf:_1, rdf:_2, rdf:_3 etc) then these must be the first properties written out in the container and must be written as rdf:li elements. Only use the rdf:_n form when there is a gap in the sequence or when the parent element is not an RDF container element. (This produces the items /rdf:Seq/rdf:li construct in RSS 1.0)

By top-level I mean child elements of the rdf:RDF element.

More later as I think of them and try some rules out in my writer. Comments on this much appreciated.

Later… Richard Cyganiak suggested the following in comments which help the make output more deterministic:

  • Allow specification of a default namespace
  • Order property elements within a node element using the algorithm described above.
  • Don’t use rdf:parseType=”resource” or property attributes.

I’d also add:

  • Don’t use rdf:ID, use rdf:about/rdf:resource/rdf:nodeID instead.

3 responses so far

Oct 15 2004

Google Desktop

Published by Ian Davis under Uncategorized

Well, it’s finally here, and it appears to work as advertised. More later when it’s finished indexing everything on my machine.

…later. I was… um… surprised to see search results from my local machine appearing when I searched via www.google.com. Machievelian privacy thoughts ran through my mind, until I clicked through to the help text:

Desktop Search allows you to simultaneously send your query to two different programs and locations. One query goes to Google, which performs a standard Google web search. A duplicate query goes to the Desktop Search application running on your computer, which searches the information the application has indexed for you. Desktop Search intercepts Google’s results page before you see it, and adds your Desktop Search results just above your web results so you can see both at once.

These combined results can be seen only from your own computer; your computer’s content is never sent to Google (or anyone else).

Très cool.

Comments Off

Oct 13 2004

WikiFolders

Published by Ian Davis under Uncategorized

Here’s an interesting idea from Justin Chapweske: Wiki + WebDAV = WikiFolders:

The basic idea behind WikiFolders is to ditch hierchical file organization in favor of the ad-hoc approach of Wikis. Instead of a hierchical organization of folders and files, we change the file system semantics to that of linkages between wiki pages and optional file attachments.

All folders in the file system would be WikiWords and likewise, all WikiWords within a wiki page would show up as a folder in the file system.

One response so far

Oct 13 2004

SPARQL First Draft

Published by Ian Davis under Uncategorized

The first draft of SPARQL Query Language for RDF has been issued. Understandably there are huge gaps in the text, but it looks very promising. The syntax is of the RDQL/SquishQL stable. However, it includes something that I’ve always felt was lacking from previous query languages - the capability to return graphs instead of lists of bound variables. There are four query forms:

SELECT
This form is the standard bound variable list, returned as XML or an as-yet-to-be-defined RDF/XML format.
CONSTRUCT
This form returns a graph of the results. You can use an asterisk to specify that the graph consists of all the matching triples, or provide a graph template.
DESCRIBE
This is a bit hazy, but the text suggests that something like CBDs will be returned for the results.
ASK
This is an efficient way of testing whether anything matches in the graph without having to perform all the backtracking to return all the results

Of course, these can and will probably change but I’m going to be watching developments with great interest.

Comments Off

Oct 12 2004

Exchange is King

Published by Ian Davis under Uncategorized

Ben Hyde hits it on the head:

Above I mentioned two ways that XML might get displaced. One on the supply side (data normal form) and one on the demand side (presentation). There is a third - data exchange. Exchange is where the powerful network effects are - always! RDF is better for data exchange than XML. It’s easier, it’s simpler, and it is far better for layering, mixing, and creating new standards.

This trio, and the standards around them, are key to making anything happen in the net.

  • Supply: writers, data, storage, normal forms, etc.
  • Demand: readers, presentations, ui, etc.
  • Exchange: messaging, subscriptions, polling, push, etc.

The RDF folks think that the demand side is where the leverage is. That’s wrong. Content is not king! Exchange is king. In the world of ends, the middle rules.

Comments Off

Oct 04 2004

RSS 1.0 Feed-a-Matic

Published by Ian Davis under Uncategorized

The decision to start work on a revision to the RSS 1.0 specification got me thinking about how best to approach it. The spec was written four years ago and hasn’t changed since. It’s good to have stable specifications because it means the investment of time that application writers put into implementing the spec isn’t wasted. With this in mind, we need to make sure that we don’t unduly affect existing applications that parse RSS 1.0. Since there isn’t a chart of how every application reads and handles its feeds we need to carry out some empirical testing on RSS 1.0 support.

Feed-a-Matic is a tool that I’ve written to help with this testing. It generates RSS 1.0 feeds based on various settings that you set using the web interface. To test a particular feed configuration, you need to subscribe to the url the Feed-a-Matic produces. Then you can visit the survey page to fill in a report about how your application handled the feed. The survey page is linked from each RSS item too so it should be possible to file a report from the application itself.

I’ve chosen a fairly arbitrary set of configuration settings to start with. You can include/exclude all the major RSS elements, set the number of items and specify that various types of embedded markup can appear within titles and descriptions. I have a number of other settings that I’m considering adding, such as listing the items in the rdf:Seq in the opposite order to the items elements themselves. This should test whether applications are using the rdf:Seq for item ordering or not. If you have any suggestions, add a comment here, or send me an email.

The surveys allow you to fill in the application name and version and the operating system/platform you are running it on. There are some general questions that apply to all the feeds such as a subjective measure of whether the feed was usable in the application, how many items were displayed and in what order. I’ve also included some specific questions that only apply when particular settings are chosen, such as how an item title displays when it contains embedded or escaped markup. Again, I’m canvassing for more questions that you’d like to see asked in the surveys.

The results of the surveys will be available online soon too. Just lack of time is stopping me from writing the code to display them. If you have access to one or more aggregators and have some spare time, please visit the Feed-a-Matic, try some feeds out and report the results. Every little helps and the more information we gather about existing applications, the better informed we will be when we start considering changes to the RSS 1.0 specification.

Updated 6 Oct 2004
Changed URL to Feed-a-Matic to make it slightly shorter.

Comments Off

Oct 01 2004

RSS 1.0 Revision

Published by Ian Davis under Uncategorized

I proposed the following motion a couple of days ago to the RSS-DEV interest group:

The RSS-DEV Interest Group should initiate work on a revision to the RSS 1.0 specification limited to editorial clarifications and informative usage guidelines. The syntax of RSS 1.0 will not be altered as part of this revision.

Chris Croome seconded it, so it looks like we’re actually going to do something!

Comments Off