Norman Paskin

Standard ways exist (or are in development) for managing Internet Resources, such as Uniform Resource Locators (URLs), Names (URNs), and metadata (Resource Description Framework, RDF). These resource mechanisms provide an infrastructure for managing resource discovery and distribution, but not a sufficient framework in which to manage intellectual content and the rights which accompany that content, such as access rights and copyright. Publishing -- in whatever medium -- is not only the distribution of intellectual content, but also concomitant rights management (e.g., royalty payments to authors and composers), and uses any medium. E-commerce of intellectual content -- digital publishing -- requires content management of all these forms, with a variety of associated services to manage access and other rights digitally. Managed Web distribution is only one component of the required architecture -- necessary, but not sufficient.

The DOI initiative [DOI], launched in October 1997 following a prototyping phase [Rosenblatt] aims to develop a common mechanism to enable intellectual content management to be integrated with Internet technologies. The DOI activity brings together two communities: the digital technology-oriented community, devising digital library architectures and appropriate technical solutions; and the content-oriented community which views "being digital" as one of several possible mechanisms of publishing; the two communities have differing perspectives.

1. The digital viewpoint.

Information can be captured and manipulated digitally: intellectual content can be embodied in coherent collections of bits, or Digital Objects. The DOI builds on the Digital Object concept: "a data structure whose principal components are digital material, or data, plus a unique identifier for this material" [Kahn/Wilensky]; "not merely a sequence of bits or has a structure that allows it to be identified and its content to be organized and protected..." [XIWT]; "a Document-like Object", according to the Dublin Core activity; a Knowledge Object (KNOB), [Kelly]. A Digital Object is a meaningful piece of data (a precise definition is difficult to reach). There has been substantial progress in this community in defining architectures for digital object structures, e.g., digital libraries, repositories, URIs, and improved mechanisms for Digital Object access, such as [Handle] which resolves to multiple data types (such as URLs). It's important to note that whilst the DOI initiative makes use of the infrastructure offered by these architectures and tools, it is not synonymous with those activities. In particular the "Digital Object Identifier" should not be considered to be applicable to all "Digital Objects" in the Kahn/Wilensky sense. And, importantly, nor is DOI restricted only to digital objects -- which makes it both ambitious and significant.

2. The content viewpoint

The digital technology community takes as its starting point all digital mechanisms, and views intellectual content mechanisms as a sub-set. In contrast, the intellectual content community takes as its starting point all creative works, and views digital mechanisms as a sub-set. From the standpoint of creation or dissemination of intellectual content, "digital" is one of many possible carrier mechanisms: an increasingly important one, but not the only one, since users wish to have access to print and digital content seamlessly. While the digital world has necessarily worked with defined and well-structured concepts, the content world has not (until now) found it necessary to be so rigid: standard numbering (of books, serials, and recordings) and product bar codes have been useful but there is no widely accepted data model defining all creative and publishing acts. A useful framework which offers systems analysis thinking for intellectual content forms is the analysis implemented for the CIS (Common Information System), originally devised in the context of music [Hill, Rust]; the DOI is currently using this as a basis. At the heart of intellectual endeavour is creativity; the analysis starts by defining Creations: "products of human imagination and/or endeavour in which rights may exist", of four types:

Work: abstract: made of concepts and ideas (e.g. a composition)

Package: physical: made of atoms (e.g. a book)

Object: digital: made of bits (e.g. a file)

Performance: spatio-temporal: made of actions (e.g. a broadcast)

In order to manage these various types of creation, we need to unambiguously specify them individually, by means of identifiers (and we also need to recognise that any one Creation may need to make use of multiple, related, identifiers to label all its manifestations) . For some creation types, accepted international standards for information identifiers [Paskin] exist and are used in commercial transactions: e.g., ISBNs for books (Packages). With the rise of digital publishing, and the need for systematic descriptions of Objects, has appeared the need for mechanisms to uniquely identify all Creation types, and so to describe the equivalence of different formats of the same creation type (e.g., different file formats of a Digital Object) or across creation types (e.g., the digitisation of a printed article, or the performance of a work of music).

When specifying equivalence across creation types -- which is a new concept for traditional book publishers, though not for music publishers -- it has become clear that it could be useful (for example, in citation of a work) to identify the underlying abstract creative work as the common factor, rather than explicitly naming every related equivalent form: an early incomplete attempt to do this for one community (science publishing) was the Publisher Item Identifier [PII], but since "being digital" dissolves barriers between hitherto separate fields of intellectual content a more universal scheme is needed, and the concept of an International Standard Work Code (which arose from the CIS music activity) is being developed by ISO [ISWC]. That development is outside the scope of the DOI but of great interest to us, and the Foundation is part of the ISWC Working Group.

3. Scope of DOI : Technology and Content viewpoints brought together

The focus of the DOI initiative has been on digital management of Content (and not only digital content). In trying to establish guidelines for what should be the scope of DOI (what should, and what should not, be persistently and reliably identified by a DOI), a reasonable starting point is to distinguish between the primary "Creations" which are persistent products issued by a publisher (e.g., articles), and related peripheral items (e.g., order form for the article) of the content provider. The primary Creation should have a DOI; but the order form is an incidental instantiation of a service associated with a primary entity, and it is that primary entity which is the reason for the entire exercise. The DOI focuses on the primary product, and is intended to be of use in enabling services to be offered for those primary products. DOI focuses on Creations (in the sense of the CIS analysis); some of these are also Digital Objects (in the sense of the technology analysis), and it is not whether or not a Creation is digital which determines DOI scope.

4. The Digital Object Identifier system initial implementation

Although the resource mechanisms of the Internet conceptualise Uniform Resource Identifiers as a means of accomplishing some elements of Object management (both names of Objects, URNs, and their locations, URLs) only URLs have been widely implemented; a syntax has been defined for URNs. One of the aims of the DOI initiative was to put in place an identifier scheme for Creations (including Objects) which could be readily used by the content industries and which could be used as a URN. In addition to the actual "Digital Object identifier" -- the unique name of a Creation in a controlled namespace -- the DOI initiative is a system including a resolution mechanism (relating the identifier to various services or actions, including the action of location); a store of Object metadata; an administrative agency to manage the real-world business process of identifier assignment and management; and finally, an authority controlling the DOI namespace and defining policies. The relationship of these components can be schematised as a core technology (three component system of DOI identifier + DOI resolver + DOI metadata); a surrounding set of activities concerned with DOI administration; and an outer layer of policies which govern all of the technical operations and administration and control the overall namespace of DOIs.

The current resolution mechanism used by DOI is a simplifed version of the Handle system devised by CNRI [Handle]. The simplified version is now being expanded to the full Handle mechanism, which allows resolution to multiple data types, and the selected type can be determined programmatically, so services associated with the DOI can be intelligently allocated. A Handle can return multiple data types but does not directly support service requests (e.g., "tell me the format of this entity"): service requests are dealt with as separate data types, or as arguments built on data types. The DOI syntax conforms to the URN syntax (we could create a URN namespace for DOIs, if/when direct URN resolution becomes a common feature of browsers; currently resolution is via a http proxy).

The authority controlling the DOI is a not-for-profit Foundation (the International DOI Foundation); administrative and metadata systems are being put into place under the guidance of the Foundation as part of an ongoing development. Funding for the Foundation is from organizations with an interest in creating such a system, who also govern the Foundation; eventually a cost-recovery operation of DOI for the administrative agencies is envisaged. It is not necessary to be a member of the Foundation in order to assign or use DOIs, and we are encouraging widespread use and experimentation.

5. Support for the DOI initiative

The International DOI Foundation began to recruit member organizations from mid-March 1998. By 1 July 1998, 23 major organizations were Members. The number continues to grow: significantly, there is wide international representation, and a broad spread of interests such as technology companies (including Xerox and Microsoft); professional publishers (currently the majority category); the music industry (including CIS system members); and author and copyright agencies, representing a unique achievement in bringing such a wide range of digital technology and intellectual content interests together in a practical development activity.

Although the W3C (World Wide Web consortium) is dealing with many issues of interest to the intellectual content world, it is broader in scope and has relatively few intellectual content members. The DOI initiative takes a different -- but entirely complementary -- stance from W3C's focus on the much larger universe of Internet Resources. It also differs from -- and again is complementary to -- the Cross Industry Working Team's approach (again a body with few content providers) of Managing Access to Digital Information using Digital Objects.

6. Issues

The development of the DOI system is a practical initiative; the initial system is currently operational and can be used now, and it is intended to roll out some prototypes of extended functionality later this year. DOI has implications for standards development; we are maintaining links with relevant standards organisation activities such as ISO TC46, NISO, and IETF, and DOI development will proceed in close cooperation with them.

The DOI system is attempting to create an ambitious common mechanism for managing intellectual content. In the development of the system, a number of fundamental questions have been uncovered which we are now working on. For example, whether a DOI should indicate (in its syntax) that it references a digital or non-digital work; and how we reconcile the differing needs of uses such as citation (which prefers all versions to be interchangeable) and commerce (which require all different versions to be separably identifiable).

For those seeking further information, or wishing to join the debate (which we welcome), the DOI web site provides current information and useful links. Key issues are summarised in a DOI discussion paper available at the DOI site [DOI], a much extended version of the present paper open for comment and criticism, which indicates the priorities for the Foundation.

August 1998


