The DOI Handbook
Home > DOI® Handbook > Table of Contents > 4 DOI® Data Model
 
 

Previous Chapter: 3 Resolution    Next Chapter: 5 Applications

4 DOI® Data Model

This chapter explains the basis for the second main technical component of the DOI® System, the DOI Data Model, and its ability to ensure interoperability of DOI name metadata assigned through existing metadata schemes. The chapter gives an overview of the system, and then separate sections discuss the aims of the DOI Data Model policy -- interoperability and good administration -- and the three tools of the Metadata System -- kernel metadata, the data dictionary and schemas for metadata interchange. Readers are advised to consult the Glossary of Terms at the start of the Handbook in conjunction with this chapter. For RAs and those wishing to explore this further, more extensive discussion of the issues and detailed specification of the components of the DOI Data Model are found in Appendices 4-6.

© International DOI Foundation 2006

 
4.1 Overview of the DOI® Data Model
4.2 Aims of DOI Data Model policy
      4.2.1 Interoperability
      4.2.2 Administrative capability
4.3 DOI Data Model tools
      4.3.1 DOI® Kernel Declaration
      4.3.2 indecs Data Dictionary (iDD)
      4.3.3 Resource Metadata Declaration (RMD) for metadata interchange

 

4.1 Overview of the DOI® Data Model

Without metadata, an identifier is of very little value. Metadata, which may be defined in this context as information about an identified Resource, provides human beings or machines with the data they need to enable them to make use of that identified Resource. Metadata may include names, identifiers, descriptions, types, classifications, locations, times, measurements, relationships and any other kind of information related to a Resource. For a fuller review of the relation of metadata to the DOI System in general see the factsheet "DOI® System and data dictionaries".

There are two ways in which every IDF Registration Agency is bound to deal with metadata. An RA will gather input metadata from Resource providers (typically, descriptions of the Resources and associated rights and policies); and an RA will need to provide some level of output or service metadata to support DOI System services. Input metadata will provide some, but not necessarily all, of the service metadata. In some cases, a metadata declaration will itself be a complete DOI System service (for example, "provide an ONIX Product message for this Resource"). These two flows of metadata declarations are illustrated in figure 1.

Figure 1: Flows of metadata in and out of an RA

Figure 1: Flows of metadata within the RA network

DOI System policy places no restrictions on the form and content of an RA's input and service metadata declarations, except insofar as input metadata must support the minimum requirements implicit in the DOI Kernel (see below). RAs may specify their own metadata schemes and messages, or use any existing schemes in whole or part for their input and service metadata declarations.

DOI data model policy is concerned with the internal management and exchange of metadata between RAs within the "RA network", and is designed to achieve two aims:

  1. To promote interoperability within the network of DOI System users (see 4.2.1), and
  2. To ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI System as a whole (see 4.2.2).

The DOI Data Model has three tools to support its metadata policy:

  • The DOI® Kernel Metadata Declaration (see 4.3.1)
  • DOI® Resource Metadata Declaration schemas for data interchange between RAs (see 4.3.3)
  • The indecs Data Dictionary ("iDD") (see 4.3.2)

The responsibilities of RAs can be summarized in these three statements:

  1. An RA must be capable of producing a Kernel Metadata Declaration for each DOI name issued.
  2. Metadata exchanged between RAs supporting DOI System services should be exchanged using an agreed DOI Resource Metadata Declaration ("RMD") for the Resource or Service type.
  3. Proprietary terms (data elements and values) used by RAs in Kernel and Resource Metadata Declarations should be registered in the IDF's data dictionary ("iDD").

These responsibilities are not mandatory for all DOI names: exceptions are discussed in terms of the requirement for interoperability described in the next section.

4.2 Aims of DOI Data Model policy

4.2.1 Interoperability

The first aim of DOI Data Model policy is to promote interoperability within the network of DOI name users. It does this by providing ways of achieving "semantic compatibility" between different RAs described in this chapter.

Standardization of any kind is driven by a need for interoperability. If an RA is issuing DOI names for Resources for use within a private domain where that RA is able to command all aspects of metadata gathering and output, then it has no need for standardization or conformance with DOI Data Model obligations. The RA will lay out its schemas and declarations, and its providers and users will, hopefully, conform to them. Such a situation is described as restricted use of the DOI System, and applies typically where an organization becomes an RA for the specific purpose of issuing DOI names for use only within its own private organization. Restricted use is discussed more fully in section 6.5 of the Handbook.

However, such isolation is unusual. Normally, when a DOI name is issued to a Resource, one fundamental assumption may be made about interoperability: the RA or the Resource provider may wish (now or in the future) that the DOI name should be available for use in services provided by other RAs. For example, where several RAs are issuing DOI names to journal articles from different publishers, it is likely that some RAs and publishers will want their DOI names to be included in journal-related services supported by other RAs.

In a similar way, many RAs will want DOI names issued by other RAs to be available for inclusion in services they themselves are providing. Such interoperability is one of the principal benefits of the DOI System.

As the RA network grows, such requirements are emerging, and where specific opportunities do not yet exist they are anticipated. In such circumstances neither the RA nor the Resource provider wishes to issue a second DOI name for the Resource, nor to provide and capture the input metadata all over again from its source.

In addition, some DOI System services may not, in future, be the direct responsibility of RAs. Any service provider making use of DOI names issued by different RAs under different Application Profiles will be faced with the question of metadata interoperability.

Any DOI name which is intended for interoperability -- that is, which has the possibility of use in services outside of the direct control of the issuing RA -- is subject to DOI Data Model policy. The aim of metadata interoperability can therefore be expressed in these two objectives:

  1. To ensure that metadata held by different RAs is not fundamentally inconsistent, and
  2. To ensure that an efficient and extensible means of interchange exists for transporting metadata between RAs (and in future other service providers).

The first objective is dealt with by the DOI Kernel, and the second by the interchange provisions of the RMD and iDD.

The above provisions do not apply to DOI names registered under the legacy "Zero AP" described in Chapter 5.

4.2.2 Administrative capability

The second aim of DOI Data Model policy is "To ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI System as a whole". This aim may also be seen as supporting the first aim of interoperability, but it specifically addresses the need to ensure that a prospective RA is competent to issue DOI names responsibly and that ambiguous DOI names do not enter the network.

The Data Model policy provides a simple test for an RA's competence: the ability to make a DOI® Kernel Declaration, which ensures that the RA has an internal system which can support the unambiguous allocation of a DOI name and is fundamentally sound enough to support interoperability within the network. In addition, Data Model policy also requires that an RA maintains records of the date of allocation of a DOI name and the identity of the registrant on whose behalf the DOI name was allocated.

The metadata policy also exists to support the future development of mechanisms for facilitating the administration of the DOI System as a whole. This might be done, for example, through the use of iDD-registered terms as types to classify DOI names, Services or Application Profiles.

4.3 DOI Data Model tools

4.3.1 DOI Kernel Declaration

The Kernel Declaration, which is formally specified in an XML schema, answers a number of basic questions about the identified Resource (see Table 1). The answers to these questions should all be known by the RA at the time the DOI name is issued: if they are not, it will be questionable that the DOI name has been allocated unambiguously.

Questions about the Resource Kernel element(s)
What is the DOI being allocated? DOI
Is it commonly referenced with another identifier (e.g. an ISBN?) resourceIdentifier(s)
What is it usually called? resourceName(s)
Who is principally responsible for its creation or publication?
What role did they play?
principalAgent(s)
agentRole(s)
Is it a physical fixation, a digital fixation, a performance or an abstract work? StructuralType
How is it perceived -- is it audio, visual, audiovisual or abstract? mode(s)
What particular kind of Resource is it? (e.g. an audio file, scientific journal, musical composition, dataset, serial article, eBook, pdf, etc.) ResourceType

Table 1: Kernel elements

There may also be a few questions about the issuing of the DOI name and Kernel itself (Table 2):

Which RA issued this DOI name? RegistrationAgency
When was this Kernel issued? IssueDate
Which version is it? IssueNumber

Table 2: Administrative Kernel elements

The Kernel has one major function: it ensures that a basic set of interoperable, descriptive metadata exists so that DOI names can be discovered and disambiguated across multiple services and Application Profiles in a coherent way. The "AP1" Application Profile for Kernel metadata is under development to enable access to Kernel metadata for any DOI name. It is not mandatory that all DOI names should be accessible through such a service: but no such cross-network tool, however limited, would be feasible without a standard such as the Kernel.

Values of some Kernel elements (names and identifiers) are simply data strings. The other elements are drawn from sets of allowed values: for example, an agentRole might be "Publisher", "Composer" or "Distributor". These values may be expressed in different ways in code lists or "pick-lists", and they may be more or less well defined, but what they share is that, for interoperability to succeed, the values used by different providers or RAs must be reconciled at some point through mapping.

Two Kernel elements (structural Type and mode) have a small, prescribed set of allowed values which all RAs must recognize. For the other elements and sub-elements, RAs may use their own choice of values, and add to them as and when required. These value sets must be registered in the data dictionary (iDD) for mapping purposes, so that any application using Kernel metadata from more than one source may be capable of presenting an integrated set of values to its users.

The use of certain standard values and the registration and mapping of other key values has another essential purpose: to ensure that metadata from RAs is not fundamentally inconsistent. For example, if one RA is issuing DOI names for digital fixations of journal articles, and another is issuing DOI names for abstractions of the same articles, the two cannot be used in the same way in the same service. Such distinctions are by no means self-evident, and unless they are made explicitly, using a common or mapped vocabulary, confusion is inevitable. As the RA network grows such confusion would result in costly problems and constraints on commerce.

The Kernel Declaration described here applies to resources in the form of Creations (items of intellectual property which represent the scope of early DOI System implementation). However, other types of resource (such as Parties and Places) are also necessarily involved in intellectual property transactions and may in principle be identified by DOI names. As DOI names are applied to entities other than Creations, an appropriate Kernel will be defined.

Kernel metadata for all DOI names may be published under Application Profile AP1. Technical arrangements for the provision of Kernel metadata records through a generalized Kernel Metadata service are under development.

The detailed specification and XML schema for the Kernel Declaration is given in Appendix 6.

4.3.2 indecs Data Dictionary (iDD)

The indecs Data Dictionary (iDD) is under development as the repository for all data elements and allowed values used in Kernel Metadata declarations and Resource Metadata Declarations (RMDs).

The iDD enables the definition and ontology of all metadata elements to be available to all RAs, and provides the necessary mappings to support metadata integration and transformations required for data interchange between RAs. For example, if an RA wishes to consolidate metadata provided by several other RAs for a specific service, the iDD will provide the data mappings required to enable the RA to present the consolidated metadata as if from a single set.

iDD also contains mappings of "third party" schemes such as ONIX, the MPEG-21 Rights Data Dictionary and ISO Territory, Currency and Language codes.

The iDD is based on a contextual metadata framework developed under the <indecs> project to support interoperability of multiple metadata schemes (the IDF was a partner in the original indecs activity). The contextual structure of iDD supports mapping and transformation in a richer and more comprehensive way than conventional one-to-one "crosswalks". It is explicitly designed to enable metadata to be expressed in the simplest or most complex ways and transformed from one to the other.

iDD is a structured ontology compliant with logical axioms and constructors common to ontology languages such as W3C's OWL (Web Ontology Language). It can, for example, support the production of legal OWL ontologies.

All allowed values used by an RA in its Kernel Metadata, and all data elements used by an RA when mapping to an RMD, must be registered in the iDD. The iDD is administered on behalf of the IDF by an agency appointed for the purpose.

Each iDD-registered Term will have its own DOI name to support DOI System services accessing the dictionary.

A more detailed description of the iDD is given in Appendix 4. See also see the factsheet "DOI System® and data dictionaries".

4.3.3 Resource metadata Declaration (RMD) for metadata interchange

A DOI Resource Metadata Declaration (RMD) is a message designed specifically for metadata exchange between RAs. The format may also be used for input or service metadata, but it is not intended as a replacement for other domain or service specific schemes. An RMD is in the form of an XML document which conforms to an XML Schema (xsd). All its elements and allowed values are mapped into the iDD.

The first RMD ("Journal-RMD") was designed in the spring of 2004 for exchange of journal metadata used by several RAs to support different services. An RMD may be developed for any "domain", which may be defined in any way that a group of RAs requires. Typically these are expected to be for domains such as "eBooks", "sound recordings", "multimedia rights" or "educational coursepacks", which may be centred on a type of resource, or sector or function, supporting any group of Application Profiles or DOI System services.

Interoperability of RMDs will be ensured by a common structure and the underlying dictionary. The RMD uses a generic metadata structure of ten basic data element classes, developed from the <indecs> framework model and designed to incorporate all types of Resource metadata in a structured and flexible way. Table 3 shows the ten RMD basic elements, and to which class each of the more specialized Kernel elements belong:

Questions about the Resource RMD element class Includes Kernel elements
By what unique names is it known? identifier DOI,
resourceIdentifier
By what non-unique names is it known? name resourceName
How is it described? annotation  
What are its measurements? quantity  
What kind of Resource is it? category structuralType,
resourceType,
mode
What has happened to it? context  
Who has done something to or with it? agent primaryAgent,
agentRole
When has something happened to it? time  
Where has something happened to it? place  
What other Resources are related to it? resource  

Table 3: RMD basic element classes

Subtypes can be added to the ten RMD elements to any level of granularity: for example, an identifier might have a subtype of ISBN, or vehicle registration number, and a relative might be a page or an edition. The elements can be nested in any way required: for example, a place may have a name which has an annotation, or an agent may have a category which has an identifier. Elements can be grouped together in any combination in composite elements.

RMDs may incorporate data elements, allowed values, codes and composites from any other standard or proprietary message or metadata schemes (for example ONIX, SCORM or MARC) and draws on standard ISO codes and formats for Languages, Territories, Currencies, Measures and Dates and Times.

All element types and allowed values for an RMD are registered in the iDD. Every RA wishing to make use of an RMD must register the corresponding data elements and values in its own database to ensure reliable mapping by other RAs. A set of standard element groups or "composites" will be developed to form a core XML schema so that these composites can be re-used in different RMDs. The generic RMD structure and early iDD vocabulary is particularly appropriate for multimedia resource and rights metadata, but is in extensible for any Resource and domain.

RAs are free, of course, to use existing standards to communicate metadata between them where they are suitable. If, for example, two RAs are providing services requiring ONIX metadata, then it would be expected for one to provide ONIX message to the other. Likewise, one RA may wish to make different metadata records available to its users: a MARC-based RA may provide users with ONIX metadata records supplied by another RA for the same DOI name. The RMD is not a replacement for these, but to deal with a different issue: the integration of metadata from RAs and other sources using different standards (or none) where it is required. The RMD and iDD combine to provide a generic solution for this problem, ensuring that all such interchange schemas within the RA network are themselves compatible and maximise the opportunities to re-use data and formats.

An RMD is developed with contributions from two or more RAs. RMDs are available for use by any RA. Any RA making use of a specific RMD may contribute to the editorial development of the RMD. An RMD will include the metadata elements required for all nominated services by any participating RA. Specific data elements within an RMD may be required only for specific RAs or Application Profiles, enabling the same RMD to be used flexibly within a community. The metadata flows between RAs, using RMDs and the iDD, are illustrated in figure 2 below.

Figure 2  Flow of metadata within the RA network

Figure 2: Flows of metadata within the RA network

More detailed discussion of the RMD data elements and structure can be found in Appendix 5.

 

Previous Chapter: 3 Resolution    Next Chapter: 5 Applications