The Digital Object Identifier System:
Digital Technology Meets Content Management
Norman Paskin
Standard ways exist (or are in development) for managing Internet Resources,
such as Uniform Resource Locators (URLs), Names (URNs), and metadata (Resource Description
Framework, RDF). These resource mechanisms provide an infrastructure for managing resource
discovery and distribution, but not a sufficient framework in which to manage intellectual
content and the rights which accompany that content, such as access rights and
copyright. Publishing -- in whatever medium -- is not only the distribution of
intellectual content, but also concomitant rights management (e.g., royalty payments to
authors and composers), and uses any medium. E-commerce of intellectual content -- digital
publishing -- requires content management of all these forms, with a variety of associated
services to manage access and other rights digitally. Managed Web distribution is only one
component of the required architecture -- necessary, but not sufficient.
The DOI initiative [DOI], launched in October 1997 following a
prototyping phase [Rosenblatt]
aims to develop a common mechanism to enable intellectual content management to be
integrated with Internet technologies. The DOI activity brings together two communities:
the digital technology-oriented community, devising digital library architectures and
appropriate technical solutions; and the content-oriented community which views
"being digital" as one of several possible mechanisms of publishing; the two
communities have differing perspectives.
1. The digital viewpoint.
Information can be captured and manipulated digitally: intellectual
content can be embodied in coherent collections of bits, or Digital Objects. The DOI
builds on the Digital Object concept: "a data structure whose principal components
are digital material, or data, plus a unique identifier for this material" [Kahn/Wilensky];
"not merely a sequence of bits or symbols...it has a structure that allows it to be
identified and its content to be organized and protected..." [XIWT];
"a Document-like Object", according to the Dublin Core activity; a Knowledge
Object (KNOB), [Kelly].
A Digital Object is a meaningful piece of data (a precise definition is difficult
to reach). There has been substantial progress in this community in defining architectures
for digital object structures, e.g., digital libraries, repositories, URIs, and improved
mechanisms for Digital Object access, such as [Handle]
which resolves to multiple data types (such as URLs). It's important to note that whilst
the DOI initiative makes use of the infrastructure offered by these architectures and
tools, it is not synonymous with those activities. In particular the "Digital Object
Identifier" should not be considered to be applicable to all "Digital
Objects" in the Kahn/Wilensky sense. And, importantly, nor is DOI restricted only to
digital objects -- which makes it both ambitious and significant.
2. The content viewpoint
The digital technology community takes as its starting point all
digital mechanisms, and views intellectual content mechanisms as a sub-set. In contrast,
the intellectual content community takes as its starting point all creative works, and
views digital mechanisms as a sub-set. From the standpoint of creation or dissemination of
intellectual content, "digital" is one of many possible carrier mechanisms: an
increasingly important one, but not the only one, since users wish to have access to print
and digital content seamlessly. While the digital world has necessarily worked with
defined and well-structured concepts, the content world has not (until now) found it
necessary to be so rigid: standard numbering (of books, serials, and recordings) and
product bar codes have been useful but there is no widely accepted data model defining all
creative and publishing acts. A useful framework which offers systems analysis thinking
for intellectual content forms is the analysis implemented for the CIS (Common Information
System), originally devised in the context of music [Hill, Rust]; the DOI is
currently using this as a basis. At the heart of intellectual endeavour is creativity; the
analysis starts by defining Creations: "products of human imagination and/or
endeavour in which rights may exist", of four types:
Work: abstract: made of concepts and ideas (e.g. a composition)
Package: physical: made of atoms (e.g. a book)
Object: digital: made of bits (e.g. a file)
Performance: spatio-temporal: made of actions (e.g. a broadcast)
In order to manage these various types of creation, we need to
unambiguously specify them individually, by means of identifiers (and we also need to
recognise that any one Creation may need to make use of multiple, related, identifiers to
label all its manifestations) . For some creation types, accepted international standards
for information identifiers [Paskin]
exist and are used in commercial transactions: e.g., ISBNs for books (Packages). With the
rise of digital publishing, and the need for systematic descriptions of Objects, has
appeared the need for mechanisms to uniquely identify all Creation types, and so to
describe the equivalence of different formats of the same creation type (e.g., different
file formats of a Digital Object) or across creation types (e.g., the digitisation of a
printed article, or the performance of a work of music).
When specifying equivalence across creation types -- which is a new
concept for traditional book publishers, though not for music publishers -- it has become
clear that it could be useful (for example, in citation of a work) to identify the
underlying abstract creative work as the common factor, rather than explicitly naming
every related equivalent form: an early incomplete attempt to do this for one community
(science publishing) was the Publisher Item Identifier [PII], but since
"being digital" dissolves barriers between hitherto separate fields of
intellectual content a more universal scheme is needed, and the concept of an
International Standard Work Code (which arose from the CIS music activity) is being
developed by ISO [ISWC].
That development is outside the scope of the DOI but of great interest to us, and the
Foundation is part of the ISWC Working Group.
3. Scope of DOI : Technology and Content viewpoints brought together
The focus of the DOI initiative has been on digital management of
Content (and not only digital content). In trying to establish guidelines for what should
be the scope of DOI (what should, and what should not, be persistently and reliably
identified by a DOI), a reasonable starting point is to distinguish between the primary
"Creations" which are persistent products issued by a publisher (e.g.,
articles), and related peripheral items (e.g., order form for the article) of the content
provider. The primary Creation should have a DOI; but the order form is an incidental
instantiation of a service associated with a primary entity, and it is that primary entity
which is the reason for the entire exercise. The DOI focuses on the primary product, and
is intended to be of use in enabling services to be offered for those primary products.
DOI focuses on Creations (in the sense of the CIS analysis); some of these are also
Digital Objects (in the sense of the technology analysis), and it is not whether or not a
Creation is digital which determines DOI scope.
4. The Digital Object Identifier system initial implementation
Although the resource mechanisms of the Internet conceptualise
Uniform Resource Identifiers as a means of accomplishing some elements of Object
management (both names of Objects, URNs, and their locations, URLs) only URLs have been
widely implemented; a syntax has been defined for URNs. One of the aims of the DOI
initiative was to put in place an identifier scheme for Creations (including Objects)
which could be readily used by the content industries and which could be used as a URN. In
addition to the actual "Digital Object identifier" -- the unique name of a
Creation in a controlled namespace -- the DOI initiative is a system including a
resolution mechanism (relating the identifier to various services or actions, including
the action of location); a store of Object metadata; an administrative agency to manage
the real-world business process of identifier assignment and management; and finally, an
authority controlling the DOI namespace and defining policies. The relationship of these
components can be schematised as a core technology (three component system of DOI
identifier + DOI resolver + DOI metadata); a surrounding set of activities concerned with
DOI administration; and an outer layer of policies which govern all of the technical
operations and administration and control the overall namespace of DOIs.
The current resolution mechanism used by DOI is a simplifed version of
the Handle system devised by CNRI [Handle].
The simplified version is now being expanded to the full Handle mechanism, which allows
resolution to multiple data types, and the selected type can be determined
programmatically, so services associated with the DOI can be intelligently allocated. A
Handle can return multiple data types but does not directly support service requests
(e.g., "tell me the format of this entity"): service requests are dealt with as
separate data types, or as arguments built on data types. The DOI syntax conforms to the
URN syntax (we could create a URN namespace for DOIs, if/when direct URN resolution
becomes a common feature of browsers; currently resolution is via a http proxy).
The authority controlling the DOI is a not-for-profit Foundation (the
International DOI Foundation); administrative and metadata systems are being put into
place under the guidance of the Foundation as part of an ongoing development. Funding for
the Foundation is from organizations with an interest in creating such a system, who also
govern the Foundation; eventually a cost-recovery operation of DOI for the administrative
agencies is envisaged. It is not necessary to be a member of the Foundation in order to
assign or use DOIs, and we are encouraging widespread use and experimentation.
5. Support for the DOI initiative
The International DOI Foundation began to recruit member
organizations from mid-March 1998. By 1 July 1998, 23 major organizations were Members.
The number continues to grow: significantly, there is wide international representation,
and a broad spread of interests such as technology companies (including Xerox and
Microsoft); professional publishers (currently the majority category); the music industry
(including CIS system members); and author and copyright agencies, representing a unique
achievement in bringing such a wide range of digital technology and intellectual content
interests together in a practical development activity.
Although the W3C (World Wide Web consortium) is dealing with many
issues of interest to the intellectual content world, it is broader in scope and has
relatively few intellectual content members. The DOI initiative takes a different -- but
entirely complementary -- stance from W3C's focus on the much larger universe of Internet
Resources. It also differs from -- and again is complementary to -- the Cross Industry
Working Team's approach (again a body with few content providers) of Managing Access to
Digital Information using Digital Objects.
6. Issues
The development of the DOI system is a practical initiative; the
initial system is currently operational and can be used now, and it is intended to roll
out some prototypes of extended functionality later this year. DOI has implications for
standards development; we are maintaining links with relevant standards organisation
activities such as ISO TC46, NISO, and IETF, and DOI development will proceed in close
cooperation with them.
The DOI system is attempting to create an ambitious common mechanism
for managing intellectual content. In the development of the system, a number of
fundamental questions have been uncovered which we are now working on. For example,
whether a DOI should indicate (in its syntax) that it references a digital or non-digital
work; and how we reconcile the differing needs of uses such as citation (which prefers all
versions to be interchangeable) and commerce (which require all different versions to be
separably identifiable).
For those seeking further information, or wishing to join the debate
(which we welcome), the DOI web site provides current information and useful links. Key
issues are summarised in a DOI discussion paper available at the DOI site [DOI], a much extended version of the
present paper open for comment and criticism, which indicates the priorities for the
Foundation.
Norman Paskin, Director, International DOI Foundation
n.paskin@doi.org
August 1998
References/links
[DOI] Digital Object Identifier system home page
http://www.doi.org/
[Handle] The Handle System home page
http://www.handle.net
[Hill] The Common Information System: Keith Hill
http://www.doi.org/workshop/minutes/CISoverview/index.htm
[ISWC] ISO/TC 46 /SC 9 Working Group 2, International Standard Work Code
(ISWC)
http://www.nlc-bnc.ca/iso/tc46sc9/iswc.htm
[Kahn/Wilensky] A Framework for Distributed Digital Object Services
Robert Kahn & Robert Wilensky
http://www.cnri.reston.va.us/home/cstr/arch/k-w.html
[Kelly] The Role of A&I Services in Facilitating Access to the
E-Archive of Science: Maureen Kelly
http://www.icsti.nrc.ca/icsti/forum/fo9711.html#role
[Paskin] Information Identifiers
http://www.elsevier.nl/homepage/about/infoident/
[PII] Publisher Item Identifier
http://www.elsevier.nl/homepage/about/pii/
[Rosenblatt] The Digital Object Identifier: Bill Rosenblatt
http://www.press.umich.edu/jep/03-02/doi.html
[Rust] Metadata: The Right Approach - An Integrated Model for Descriptive
and Rights Metadata in E-commerce. Godfrey Rust
D-Lib Magazine, July/August 1998
http://www.dlib.org/dlib/july98/rust/07rust.html
[XIWT] Managing Access to Digital Information: An Approach Based on
Digital Objects and Stated Operations. Cross-Industry Working Team
http://www.xiwt.org/documents/ManagAccess/ManagAccessTOC.html |