On Monday 7 September London Web Standards was pleased to host the BBC’s Michael Smethurst, Information Architect and Yves Raimond, Software Engineer. See their presentation slides and the following is a recap of the key points.
Some think that the web is simply the internet plus links. In fact, it’s the internet plus HTTP (Hypertext Transfer Protocol). It’s an important distinction and something that gets forgotten. For example, discussions on accessibility often focus on documents. HTTP and URIs (Uniform Resource Identifiers) don’t get mentioned, they are the other side of the web that gets ignored. Where HTML has been through many revisions, HTTP is still at version 1.1, this standard has stood the test of time.
The web is the Internet, HTTP and HTML (to a lesser extent)
Everything that’s good about the web is linking. People need and want to share content, therefore they need handles, or links, to use. In Search Engine Optimisation (SEO), Google PageRank holds HTTP and the URI as more important than the HTML of the page. By definition, a page of HTML is not a web page until it’s linked to something else.
Crucially, we need to get from a web of documents to a web of things. Linked Data allows for this further level of abstraction. Where humans can make meaning out of the content of a web page, the web needs to progress to a stage where machines can understand it.
Now for the talk
Some types of Linked Data:
- RDF. This was described more than ten years ago. It’s very easy as it’s just data. It works in an Object > Predicate > Object fashion for example: <sky> <has colour> <blue>. This information forms a graph.
- FOAF (Friend of a Friend). When it was first described, people focused on this because it was the only spec that had traction.
- REST (Representational State Transfer). This explains proper use of HTTP. Separation of resources and representations of it. For example, browsers such as Safari and Firefox have a list of preferences such as your preferred language or context (e.g. mobile). When you request a document from a URI you are getting a representation of the resource that matches your headers. The browser and server work together to understand what version of the document you want. Your browser searches for the correct MIME types that suit these preferences. The copy of the document you see is only a representation that best matches your requirements. So, sometimes if the server can’t find a version that fits, it could return a “406 not acceptable” error. This concept is important when combating the ghettoisation of mobile content. The device you use and your language are all important, the same URI could serve to any context. This is important if you don’t want to split your “Google Juice”, as Michael puts it. AKA, your Google PageRank.
Linked Data = Links + HTTP + RDF
Between the development of FOAF and about 2006, there have been lots of new ideas for Linked Data, but all have been too complex. By extension, people thought the semantic web was too complex. Linked Data still uses RDF, but the difference is that now it’s about things not documents. Example, the Eastenders website may have a file that you’ll be able to ask for in several flavours. Linked Data is about giving information that you can’t get “down the wires”.
Using Linked Data you can describe the relationship between different aspects of the same resource. E.g. you can describe the relationship between a website and its creator. Moustaki.org was created by Yves, but not the other way around. You can use Linked Data to make claims about documents and about things.
What’s the difference between Linked Data and Microformats?
Linked Data provides another level of abstraction. Instead of relating two documents (e.g. with a REL statement), you relate two things. Also, with Linked Data there are lots of ways to publish these relationships:
- Hashes. Resources are defined relatively to a document. If you’re concerned about the number of GET requests, this is a good option because you don’t need a 303 error page.
- Slashes. A very popular option, but also the hardest to convey information. Michael and Yves show an example of content negotiation where often with this system two GET requests are needed. On the plus side, the server doesn’t care about anything after the hash.
- RDFa. The cheapest to set up.
Find out more about hashes versus slashes.
What’s the point of Linked Data?
Seperate individuals may host different content about the same thing. The English band The Fall, was used as an example of the power of this abstraction. Linked Data can be used to aggregate information specifically about this band. Linked Data is a web scale database. The OWL:sameAs statement is a prime example of this kind of functionality.
But how do we handle conflicts? As with any web browsing, you must keep track of where you get any data from. Also anyone can make a claim on a URI. Your weighting could be based on PageRank. Spam is also a problem and OWL:sameAs can go wrong with the language eg, there are two bands called ‘U2′ (one Japanese, one Irish). There are sixteen bands called ‘Arora’ etc.
What are the pitfalls, how is it being used and developed?
FRBR (Functional Requirements for Bibliographic Records) is still being developed, lots of work is being done on its different classes. But music ontology is still an area that needs development. There is an need to make a distinction between composer, performances, signal etc.
A recent example of the problems of this web database came when researchers at the BFI discovered the history of a famous actor who had acted in a porn film. They added this to his record. Despite it being true, the actor didn’t want this to be known and the BFI were deemed to be in the wrong.
Search engines are already using Linked Data. For example, Google is publishing content from Bestbuy. Also, social media is already using Linked Data in their activity feeds. RDF is mostly populated by social data. US porn websites are obliged to use RDF. To find out more about the whereabouts of RDF, go to: http://pingthesemanticweb.com/