space
Home > Overviews > Introductory Overview
space
Introductory Overview
The DOI® System
A DOI® (Digital Object Identifier) is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.
Unique identifiers are essential for the management of information in any digital environment. Identifiers assigned in one context may be encountered, and may be re-used, in another place (or time) without consulting the assigner, who cannot guarantee that his assumptions will be known to someone else. To enable such interoperability requires the design of identifiers to enable their use in services outside the direct control of the issuing assigner. The necessity of allowing interoperability adds the requirement of persistence to an identifier: it implies interoperability with the future. Further, since the services outside the direct control of the issuing assigner are by definition arbitrary, interoperability implies the requirement of extensibility. Hence the DOI System is designed as a generic framework applicable to any digital object, providing a structured, extensible means of identification, description and resolution. The entity assigned a DOI® name can be a representation of any logical entity.
The DOI System is built using several existing standards-based components which have been brought together and further developed to provide a consistent system: the entire system has recently been accepted for standardisation within ISO (ISO TC46/SC9). The DOI System was developed as a cross-industry, cross-sector, not-for-profit effort managed by an open membership collaborative development body, the International DOI Foundation (IDF) founded in 1998. The DOI Sytsem is in widespread use, e.g. for scientific primary publishing, in government documents and in data. DOI names need not be explicitly declared (though this may be useful): e.g. in a web context a DOI name may be used in a http form as a URL (through a proxy server), whilst retaining the advantages of managed persistence. The DOI System may be used to offer an interoperable common system for identification of data.
DOI System components
The DOI System provides a ready-to-use packaged system of several components:
  • a specified standard numbering syntax;
  • a resolution service (based on the existing Handle System);
  • a data model incorporating a data dictionary (based on the indecs Data Dictionary); and
  • an implementation mechanism through policies and procedures for the governance and application of DOI names.
DOI name syntax
The DOI name syntax is a standard for constructing an opaque string with naming authority and delegation (NISO Z39.84, DOI Syntax). It provides an identifier "container" which can accommodate any existing identifier: e.g.
10.1234/NP5678
10.5678/ISBN-0-7645-4889-4 and
10.2224/2004-10-ISO-DOI
are all valid DOI name syntax. The DOI name has two components, the prefix and the suffix, which together form the DOI name. The portion following the "/" character (the suffix) may be an existing identifier. The portion preceding the "/" character (the prefix) denotes a unique naming authority. There is no limitation on the length of a DOI name.
A DOI name may be assigned to any item of intellectual property, which must be precisely defined by means of structured metadata. The DOI name itself remains persistent through ownership changes, and unaltered once assigned.
A prefix is assigned to an organization that wishes to register DOI names; any organization may choose to have multiple prefixes. Following the prefix (separated by a forward slash) is a suffix (unique to a given prefix) to identify the entity. The combination of a prefix for the Registrant and unique suffix provided by the Registrant avoids any necessity for the centralized allocation of DOI names.
An existing standard identification system number such as ISBN may be incorporated into a DOI name, by using this as the suffix, if the registrant finds it convenient to do so (it is of course recommended that precisely the same entity be identified by the two systems). The DOI System is not alone in being a system that can incorporate existing identifiers: for example, physical bar codes can be used to express ISBNs.
DOI name resolution
Resolution is the process in which an identifier is the input (a request) to a network service to receive in return a specific output of one or more pieces of current information (state data) related to the identified entity: e.g. a location (such as URL) where the object can be found. Resolution provides a level of managed indirection between an identifier and the output. The resolution component allows redirection on a TCP/IP network from a DOI name to associated data. Initial applications have been resolution to a single location (URL), providing a tool for persistence (since even if a URL is changed, the DOI name still functions and redirects to the new location). However more useful resolution may be to multiple associated data such as multiple locations, metadata, common services, or to extensible assigner-defined data. Applications of the DOI System using multiple resolution are now increasingly in use. The resolution tool used in the DOI System is the Handle System. This conforms to the functional requirements of the URI and URN concepts, and has many advantages over other mechanisms including global scalability, full Unicode character support, and security.
The Handle System implementation in the DOI System has been supplemented by expanded technical infrastructure and features specific to DOI System applications. Handle multiple resolution allows one entity to be resolved to multiple other entities; it can therefore be used to embody e.g. a parent-children relationship, or any other relationship, and is therefore suitable for describing relationships of objects (data sets). The Handle System per se deliberately has no pre-existing constraints to define a framework to express relationships (analogy: spreadsheet software ): the DOI System is an application of the Handle System which adds this constraint for a specific purpose of content management (analogy: a spreadsheet application). In the DOI System the constraints are defined through metadata grouping the entities, using a semantically interoperable data dictionary.
DOI® data model
The DOI System data model consists of a data dictionary and a framework for applying it. Together these provide tools for defining what a DOI name specifies (through use of a data dictionary), and how DOI names relate to each other, (through a grouping mechanism, Application Profiles, which associate DOI names with defined common properties). This provides semantic interoperability, enabling information that originates in one context to be used in another in ways that are as highly automated as possible.
The DOI System uses an interoperable data dictionary built from an underlying ontology. The data dictionary component is designed to ensure maximum interoperability with existing metadata element sets; the framework allows the terms to be grouped in meaningful ways (DOI System Application Profiles) so that certain types of DOI names all behave predictably in an application through association with specified Services. This provides a means of integrating the features of Handle resolution with a structured data approach. DOI names need not make use of this data model, but it is envisaged that many will: any DOI name intended to allow interoperability (i.e. which has the possibility of use in services outside of the direct control of the issuing Registration Agency) is subject to DOI System metadata policy, which is based on the registration of terms in the iDD.
A data dictionary is a set of terms, with their definitions, used in a computerized system. Some data dictionaries are structured, with terms related through hierarchies and other relationships: structured data dictionaries are derived from ontologies. An ontology combines a data dictionary with a logical data model, providing a consistent and logical world view. It differs from the traditional taxonomic approach to knowledge representation in that it does not follow a rigid parent/child hierarchical structure (terms may inherit meaning from more than one parent) and a more complex relationship is maintained.
An interoperable data dictionary contains terms from different computerized systems or metadata schemes, and shows the relationships they have with one another in a formal way. The purpose of an interoperable data dictionary is to support the use together of terms from different systems. The IDF is the Registration Authority for one such dictionary, the ISO/IEC MPEG-21 Rights Data Dictionary, and is the co-developer of a wider indecs Data Dictionary which includes this and is used by DOI names.
DOI System implementation
The DOI System is implemented through a federation of Registration Agencies which use policies and tools developed through a parent body, the International DOI Foundation (IDF). The IDF is the governance body of the DOI System, which safeguards (owns or licences on behalf of registrants) all intellectual property rights relating to the DOI System. It works with RAs and with the underlying technical standards of the DOI System components to ensure that any improvements made to the system (including creation, maintenance, registration, resolution and policymaking of DOI names) are available to any DOI name registrant, and that no third party licenses might reasonably be required to practice the DOI name standard. DOI name resolution is freely available to any user encountering a DOI name.
The DOI System has the flexibility to deliver identification and resolution services that fulfil the requirements of any application domain. However, these don't come "in a box" since someone needs to build the specific social and technical structures to support the particular requirements of a community (such as scientific data). The rules about what is identified, and whether two things being identified are (or are not) "the same thing", are made at a lower level: in a specific application of the DOI name. This is a role of Registration Agencies. This provides an identification system of enormous flexibility and power while hugely increasing the importance of an explicit structured metadata layer, since without this the identifier essentially can have no meaning at all outside a specific application.
The IDF provides implementation through agreed standards of governance and scope, policy, to define "rules of the road". It also provides a technical infrastructure (resolution mechanism, proxy servers, mirrors, back-up, central dictionary) and a social infrastructure (persistence commitments, fall-back procedures, cost-recovery (on a self-sustaining model), and shared use of the system. The IDF is not a standards body, but a central authority and maintenance agency. The IDF is already the appointed registration authority for the ISO/IEC MPEG 21 Rights Data Dictionary, and is proposed as the registration authority for the DOI System within ISO TC46/SC9. IDF delegates and licenses authority to use the system through Registration Agencies, each of which can develop its own applications and use the DOI System in "own brand" ways appropriate for their community.
For more information or to arrange for a presentation of the DOI System, please contact Dr. Norman Paskin, care of the International DOI Foundation at n.paskin@doi.org.
 
Updated 18 July 2006

DOI® and DOI.ORG® are registered trademarks and the "doi>" logo is a trademark of the International DOI Foundation.