Saturday, June 30, 2007

GRDDL Use Cases: Scenarios of extracting RDF data from XML documents

"There are many dialects of XML in use by documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies. Some are formally defined and others allow for more freedom of interpretation. Recently, two progressive encoding techniques, RDFa and microformats, have emerged to overlay additional semantics onto valid XHTML documents. These techniques offer simple, open data formats built upon existing and widely adopted standards.

While this breadth of expression is quite liberating, inspiring new dialects to codify both common and customized meanings, it can prove to be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet, or an ontology? And how can software determine whether any two of these authors in fact refer to the same person?

Any number of the XML documents on the web may contain data whose value would increase dramatically if they were accessible to systems which might not directly support such a wide variety of dialects but which do support RDF."

No comments: