Serialization formats don’t matter

I mean, if working with RDF has taught me one thing, it’s that converting between two different forms of serialization is trivial—it’s the underlying model that matters.

Exactly! And still, many who are in the integration business think that XML schemas is the only product required to exchange data between multiple parties. The serialization format(s) should be based on the use cases of the information. And even in a small organization use cases tend to pop up all the time demanding new formats. Most SOA-people see a problem with multiple serialization formats but I am thinking that it is almost insignificant these days if you have a well defined model.

Feature requests for a vocabulary editor

I have been searching for quite a while now and apparently there is a missing piece of software waiting to be made. If you are working with RDF data in any way you have probably created a vocabulary using OWL and/or RDF schema sometime. This works well for all technologists out there but in my world vocabularies should be created by domain experts rather than developers. Domain experts do not know OWL or RDF schema. Continue reading

Synchronizing RDF data from files with the ARC triple store

I have been playing with the excellent ARC framework for a small legal information project (more on that soon). I am beginning to think that many RDF usage scenarios involve data in files (stored in a file system) combined with a triple store that preferrably should be kept in sync with the files. Inspired by Niklas Lindström’s Oort I wrote a small plugin to do that in ARC.

The plugin extends the ARC2_Store class with a sync_with_folder($path_to_folder) method. Here is the source code for the initial version if you want to play with it. Rename the file to ARC2_FilesystemSynchronizerPlugin.php ansd save in the ARC plugin folder.

By popular request, here is a simple usage example:

[source:php]

 //Include ARC and configuration
include_once(‘path/to/arc/ARC2.php’);
include_once(‘path/to/arc/config.php’);

$store = ARC2::getComponent(‘FileSystemSynchronizerPlugin’, $config);
$store->sync_with_folder(“/root/path/to/file/store”);

[/source]

Save the above code to a file called sync.php and run from the command line with php sync.php

Come celebrate Niklas Lindström’s birthday

You may ask yourself “who is that?” or “wtf?!” but the fact is that in the near future he will have a much greater impact on your life than you may think. Here is why you should head over to his blog and post a random comment about Yak shaving and, if possible, create a link containing the words “Yak shaving” pointing to his blog. With a little bit of effort and luck Google will pick it up and Niklas will be the number one result for people from inner Mongolia. Continue reading

RDF vs Microformats and the Semantic Web

James Simmons writes about some of the pros and cons of Microformats and RDF (with an extended discussion at InfoQ). On the benefits of Microformats (with which he means Microformats.org-style microformats) he mentions:

  • Designed for humans first, machines second
  • Modularity / embeddability
  • Enables and encourages decentralized development, content, services
  • A design principle for formats
  • Adapted to current behaviors and usage patterns
  • Highly correlated with semantic XHTML

I am new to RDF and the semantic web (but have used microformats in previous web projects) but to me the advantages of RDF and RDFa (the “sprinkling” framework) are clear. Microformats may work for a limited set of use cases but I have not yet understood how to use microformats efficiently for the bulk of what I need. However, it is great that a lot of development is going on in the area of embedding machine readable data in documents. Without microformats the pace would probably have been much slower.

Here are my thoughts on the items that James mention:

Designed for humans first, machines second: For me the HTML document that carries the information is for humans. With it we apply styling and markup to allow humans (and their assistive devices) to understand the content. The embedding of data is for machines primarily. Although advanced editors may be great at editing HTML, the fact is that most users are not.

Modularity / embeddability: Embeddability is of course necessary. The problem is that the current versions of (X)HTML were not designed for embedding data. This means that Microformats have to rely on the attributes and elements available of which none were primarily designed for stuffing machine readable information in. RDFa, on the other hand, is making rapid progress. You can use XHTML 1.1 with RDFa right now and validate it with the W3C validator.

Enables and encourages decentralized development, content, services: I am not sure I understand this one, at least not for the development of vocabularies. Microformats encourages a centralized way of storing vocabularies on their web site in a format that isn’t machine readable. The power of RDF is that vocabularies can be stored anywhere in a machine readable way. The world is big and the web has been built to support interaction in a decentralised way. Development of a vocabulary is a local thing for me.

A design principle for formats: See above. Why have a design principle for all? Everyone has different needs and resources and I would prefer to adopt the vocabulary design process to each business case. The Microformats.org website lists design patterns to use when sprinkling a document with embedded data. Instead of calling them design patterns you could say “seeing how far we can go in interpreting the current HTML specification”.

Adapted to current behaviors and usage patterns: Sure, if you limit yourself to a few HTML-adept bloggers. I would venture to guess that there are more people publishing information on the web that know little to nothing about markup than people who do. And they shouldn’t need to. Peple working with information need tools. Tools should help out with the actual markup and embedding of data.

Highly correlated with semantic XHTML: And this is good. But it contradicts the previous statement. Current behaviour is to not use semantic XHTML. It is only a limited number of websites that use valid markup. Both RDFa and Microformats will hopefully help in raising awareness of semantic markup.

What do you think?

RDF for beginners: Part 1: The URI

This will be the first in a series of posts on RDF for beginners. I hope it will be of use for people who are new to RDF but have some background in software development. One of the reasons I am putting this online is to get feedback on how RDF and the semantic web can be explained without sounding like an overenthusiastic preacher. Another reason is that most of the information I find about RDF is written by bearded researcher men. Sometimes they explain RDF in a way I find hard to understand since I’m not an expert on RDF myself. Also, my beard is much smaller.

Before looking at actual RDF, let’s start with a fundamental concept: the Uniform Resource Identifier (URI). URI:s are used heavily in RDF and it is important to understand the basics. Wikipedia has a nice article on URI:s that states:

A Uniform Resource Identifier (URI), is a compact string of characters used to identify or name a resource. The main purpose of this identification is to enable interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes defining a specific syntax and associated protocols.

A resource can be many things, e.g. a record of information such as a purchase order, a person, the online representation of that person (in the form of a wikipedia article). You decide!

One form of URI:s is the Uniform Resource Locator (URL). If you are looking at this article you have a URL in the address bar of your browser. If you know your way around a relational database you can compare URI:s to primary keys or compound primary keys. A cool thing is that a URL also includes information on how to retrieve the resource (e.g. “http”).

Repent

If you have developed software for the web it is likely that you have abused URL:s. Instead of identifying a resource (e.g. a record of information) you may have identified a specific script that acts on the record. By doing so you have made it harder for others to make use of the information you put online.

I have done this on numerous occasions. It typically looked like this:

aHR0cDovL3d3d3cuZXhhbXBsZS5jb20vdmlld2l0ZW0uYXNweD9pZD0xMjM0

I didn’t know better at the time. What it should have looked like is this:

aHR0cDovL3d3d3cuZXhhbXBsZS5jb20vaXRlbXMvMTIzNA==

I guess I never really thought about the R (resource) in the URL. I was more occupied with getting the damn thing online in the first place.

Next time we’ll take a peek at RDF itself.