Naked Metadata

Jonathan O'Donnell

"But in winter the tree stands cold and naked, nothing can be hidden from view. The true souls of both the tree and its artist are exposed to the world's scrutiny."
Colin Lewis, Bonsai: The Naked Truth

The problem

Metadata in Web pages often doesn't get updated when the pages get updated.

The solution

Tag data, and point to it from the appropriate metadata field. Ian Davis has developed RDF in HTML to provide a way of doing this.

Example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

     <html xmlns="http://www.w3.org/1999/xhtml">

     <head profile="http://purl.org/NET/erdf/profile" >

     <link rel="schema.dc"
    href="http://purl.org/dc/elements/1.1/" />

     </head>

     <body>

     <h1 class="dc-title">Naked Metadata</h1>

     <h2 class="dc-creator">Jonathan
    O'Donnell</h2>

     <p
    class="dc-rights">http://purl.nla.gov.au/net/jod/tutorial/naked-metadata.html
    © Jonathan O'Donnell <span class="dc-date">23
    October 2005<span></p>

     </body>

     </html>

Background

When I first learned to put Dublin Core into Web pages, I often found myself replicating data. I would place a DC.creator tag in the head, even though the name of the author was on the Web page. This annoyed me, because I knew that it is bad practice to replicate data like that. When I mentioned this to a workmate at the time, he said that I could probably make a link from the metadata field to the data in XML. At that stage, I didn't understand enough XML to even understand the concept, much less make it work.

Fast forward eight years to DC-ANZ 2005, where Eve Young and Baden Hughes made the point that people updating Web pages often don't update the metadata. One of the problems that they talked about was that metadata in the header is essentially invisible to people editing the page (when, for example, using some wysiwyg editors).

In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place.

Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the Web address, and date information.

If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data.

Method

Ian Davis, of Talis (UK), has developed RDF in HTML, which allows us to point to the data from the metadata fields. The system uses 'class' attributes to delineate metadata information. Many Web developers already use 'class' attributes to style particular aspects of a Web site.

To use RDF in HTML, you should add

This profile to your <head> tag.
<head profile="http://purl.org/NET/erdf/profile">
The profile tells a metadata harvester how to get the metadata out of the page.
A relationship link tag to the metadata schema that you are using.
<link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.terms" href="http://purl.org/dc/terms/" />
These relationship links point to the schema descriptions for Dublin Core and Dublin Core terms (like 'audience'). You should put them in the head of your Web page.
Class attributes for the relevant metadata in the body of your metadata.
<h1 class="dc-title">Naked Metadata</h1> <address class="dc-creator">Jonathan O'Donnell</address> <p class="dc-rights">< span class="dc-identifier>http://purl.nla.gov.au/net/jod/tutorials/metadata.html<span> © Jonathan O'Donnell <span class="dc-date">23 October 2005<span></p>
As an added advantage, these classes can be used in your CSS to style the information.

In his description of RDF in HTML, Ian Davis shows that it can be used for much more than this. Here, I have just shown how to embed basic Dublin Core metadata in the body of your Web page.

Harvesting

It is all well and good to put metadata into a document. You have to be able to get it out again for it to be any use.

RDF in HTML is designed to be harvested by Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL is a mechanism for "getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT".

Although the example in that document illustrates extraction of DC metadata from <meta> html elements, there would be no reason why the mechanism should not extract the metadata from arbitrary elements identified by id; it is just a different XSLT transformation.
Alan Cox, Post to DC-General mailing list, 2 November 2005

One example of an extractor that will parse RDF in HTML is the Embedded RDF Extractor. You can use this extractor to check that you have built your page correctly.

Future developments

Misha Wolf pointed out that XHTML2 tackles this problem well.

Linking: In HTML 3, only a elements could be the source and target of hyperlinks. In HTML 4 and XHTML 1, any element could be the target of a hyperlink, but still only a elements could be the source. In XHTML 2 any element can now also be the source of a hyperlink, since href and its associated attributes may now appear on any element. So for instance, instead of <li><a href="home.html">Home</a></li>, you can now write <li href="home.html">Home</li>. Even though this means that the a element is now strictly-speaking unnecessary, it has been retained.

Metadata: the meta and link elements have been generalized, and their relationship to RDF [RDF] described. Furthermore, the attributes on these two elements can be more generally applied across the language.

World Wide Web Consortium, "Introduction to XHTML2.0: Major differences with XHTML 1", http://www.w3.org/TR/xhtml2/introduction.html#s_intro_differences, accessed 2 November 2005

As far as I can see, this means that:

Meta elements can appear in the body of the document, not just the head
Any element can link to them.

References

RDF in HTML: http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml
Embedded RDF Extractor: http://research.talis.com/2005/erdf/extract
Ian Davis' initial description of RDF in HTML: http://internetalchemy.org/2005/10/introducing-embedded-rdf
Gleaning Resource Descriptions from Dialects of Languages (GRDDL): http://www.w3.org/2004/01/rdxh/spec
eXtensible HyperText Markup Language 2 (XHTML2): http://www.w3.org/TR/xhtml2
Naked metadata thread on the Web Standards Group mailing list: http://www.mail-archive.com/wsg%40webstandardsgroup.org/index.html#22638
Naked metadata thread on DC-General mailing list: http://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0511&L=dc-general#4