"But in winter the tree stands cold and naked, nothing can be hidden from view. The true souls of both the tree and its artist are exposed to the world's scrutiny."
Colin Lewis, Bonsai: The Naked Truth
Metadata in Web pages often doesn't get updated when the pages get updated.
Tag data, and point to it from the appropriate metadata field. Ian Davis has developed RDF in HTML to provide a way of doing this.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://purl.org/NET/erdf/profile" >
<link rel="schema.dc"
href="http://purl.org/dc/elements/1.1/" />
</head>
<body>
<h1 class="dc-title">Naked Metadata</h1>
<h2 class="dc-creator">Jonathan
O'Donnell</h2>
<p
class="dc-rights">http://purl.nla.gov.au/net/jod/tutorial/naked-metadata.html
© Jonathan O'Donnell <span class="dc-date">23
October 2005<span></p>
</body>
</html>
When I first learned to put Dublin Core into Web pages, I often found myself replicating data. I would place a DC.creator tag in the head, even though the name of the author was on the Web page. This annoyed me, because I knew that it is bad practice to replicate data like that. When I mentioned this to a workmate at the time, he said that I could probably make a link from the metadata field to the data in XML. At that stage, I didn't understand enough XML to even understand the concept, much less make it work.
Fast forward eight years to DC-ANZ 2005, where Eve Young and Baden Hughes made the point that people updating Web pages often don't update the metadata. One of the problems that they talked about was that metadata in the header is essentially invisible to people editing the page (when, for example, using some wysiwyg editors).
In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place.
Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the Web address, and date information.
If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data.
Ian Davis, of Talis (UK), has developed RDF in HTML, which allows us to point to the data from the metadata fields. The system uses 'class' attributes to delineate metadata information. Many Web developers already use 'class' attributes to style particular aspects of a Web site.
To use RDF in HTML, you should add
This profile to your <head> tag.
<head
profile="http://purl.org/NET/erdf/profile">
The profile tells a metadata harvester how to get the metadata out of the page.
A relationship link tag to the metadata schema that you are using.
<link rel="schema.dc"
href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.terms"
href="http://purl.org/dc/terms/" />
These relationship links point to the schema descriptions for Dublin Core and Dublin Core terms (like 'audience'). You should put them in the head of your Web page.
Class attributes for the relevant metadata in the body of your metadata.
<h1 class="dc-title">Naked
Metadata</h1>
<address class="dc-creator">Jonathan
O'Donnell</address>
<p class="dc-rights">< span
class="dc-identifier>http://purl.nla.gov.au/net/jod/tutorials/metadata.html<span>
© Jonathan O'Donnell <span class="dc-date">23
October 2005<span></p>
As an added advantage, these classes can be used in your CSS to style the information.
In his description of RDF in HTML, Ian Davis shows that it can be used for much more than this. Here, I have just shown how to embed basic Dublin Core metadata in the body of your Web page.
It is all well and good to put metadata into a document. You have to be able to get it out again for it to be any use.
RDF in HTML is designed to be harvested by Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL is a mechanism for "getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT".
Although the example in that document illustrates extraction of DC metadata from
Alan Cox, Post to DC-General mailing list, 2 November 2005<meta>
html elements, there would be no reason why the mechanism should not extract the metadata from arbitrary elements identified by id; it is just a different XSLT transformation.
One example of an extractor that will parse RDF in HTML is the Embedded RDF Extractor. You can use this extractor to check that you have built your page correctly.
Misha Wolf pointed out that XHTML2 tackles this problem well.
World Wide Web Consortium, "Introduction to XHTML2.0: Major differences with XHTML 1", http://www.w3.org/TR/xhtml2/introduction.html#s_intro_differences, accessed 2 November 2005
- Linking: In HTML 3, only a elements could be the source and target of hyperlinks. In HTML 4 and XHTML 1, any element could be the target of a hyperlink, but still only a elements could be the source. In XHTML 2 any element can now also be the source of a hyperlink, since href and its associated attributes may now appear on any element. So for instance, instead of
<li><a href="home.html">Home</a></li>
, you can now write<li href="home.html">Home</li>
. Even though this means that the a element is now strictly-speaking unnecessary, it has been retained.- Metadata: the meta and link elements have been generalized, and their relationship to RDF [RDF] described. Furthermore, the attributes on these two elements can be more generally applied across the language.
As far as I can see, this means that: