While tinkering with some ideas for describing small businesses in a
FOAF-like way it struck me that the most interesting formats are ones
that promote linkage. FOAF has the knows predicate, but how could you
link businesses together. One intriguing way is to describe what a
business has and what it needs, e.g. this company has ‘engines’
and needs ‘nuts and bolts’ and ‘lubricating oil’.
It might also be fun to use it with FOAF. I might need ‘books written
by Jack Vance’ and have ‘Eminem cds’ that I’m willing to dispose of.
The vocabulary is simple. It has two predicates: wants and has. For a
contrived example, in my FOAF I might add:
Looking at patterns in press releases (a major source of this kind of information), there is often a trailer with the company’s URL…
…A rule for disambiguation could be that the companies in the press release are identified by any URL that is a top-level domain or subdomain root.
It’s a good idea, although not foolproof. It made me think of a related area that James and I used to endlessly debate at Calaba. James was very keen to devise an xml format to describe companies and businesses including contact information and opening hours. Search engines could then use this information to answer questions like “find me a nearby dentist open on Sundays”. Current yellow pages directories can answer the former part of that question but not the latter, probably because it’s too expensive to scale the business to gather and keep that data accurate.
However if the businesses themselves could maintain the data, say by editing some xml on their website, then the scalabality problems would be solved. My position, being a scraper at heart, was that persuading companies to do it would be too difficult and we should try to generate it instead.
However, I’m coming to the opinion that the syndic8 evangelism model could be the way to go. That means informing and educating businesses about the hypothetical format, providing validation, search facilities and help guides.
What would such a format look like? It would make sense to make it RDF-aware in some way, but not so that it impedes its uptake. FOAF is far and away the best example of this.
I was poking around this website and I noticed that I didn’t have the correct MIME types set up for my RSS or for the RDF that is scattered around. I have link tags in my HTML head section that point to my RSS file:
The type attribute on the link tag says that the content pointed to by this link should be of type application/rss+xml, which it wasn’t!
To help diagnose these kind of problems I wrote a short CGI script that fetches a page and parses it to detect <link>, <script> and <object> tags, fetch the content pointed to by the tag and compares its mime type with that specified in the type attribute.
I’ve been looking at Dan Brickley’s FOAFCorp ideas which show a fun way to map relationships between board members of corporations (c.f. They Rule). I’d like to extend it to describe relationships between the companies themselves, rather than their employees.
This could be done by describing the capital structure of a company but this would give you only a snapshot in time and would soon go out of date. Plus, it would only work for public companies who have to publish their share structures. An alternative way is to capture deal events as they happen and build up a picture of inter-company relationships. Types of deals to capture could include investment, divestment, merger, acquisition and perhaps more ephemeral deals such as product licensing, technology partnerships etc.
Much of this information is gathered already by companies such as Hoovers and Corpfin (nice list of deal types at bottom of page) but it’s valuable information and they charge accordingly. I’m wondering whether there’s any way of scraping this information from news pages such as Reuters feeds or Yahoo’s press release archive.
Jim Ley, author of the awe-inspiring foafnaut, has animated my SVG foaf icon. If your browser supports SVG, you should see it embedded here:
Made me chuckle!
Back in the early days of RSS (1999 or thereabouts) I used to run a web scrapign tool called RSSMaker to demonstrate how RSS could be produced from HTML. I stopped running that tool almost two years ago, however I get huge numbers of requests for the dozen or so channels that it used to produce. These channels have been returning 404 status for nearly two years yet some news aggregators are still requesting them several times a day. If you’re running one of the aggregators then please remove any url prefixed with http://internetalchemy.org/channels/rss/ from your database.
If you’re looking for a replacement then try looking on Syndic8 which has the largest directory of RSS channels. If you can’t fin what you’re looking for there, then try myRSS, a direct ancestor of RSSMaker that lets you create your own RSS channels from virtually any news site.
Prevayler is a persistant, in-memory object storage mechanism. Instead of keeping your business objects in a database and writing serialization code to and from SQL, Prevayler simple serializes the whole lot to disk every night. Additionally, all transactions are logged as they happen.