This chapter provides a high level overview of the Digital Object Identifier (DOI) for those who have no previous knowledge of it. Those who already have some understanding of the basic issues of identification and of the DOI itself should go straight to Chapter 3.
As commerce has become increasingly less dependent on the physical presence of both buyer and seller, means of identifying things uniquely and describing them unambiguously have become more and more important. The use of computers in mediating some aspects of the trading relationship has further accentuated this requirement. The near-universal adoption of "unique identifiers" such as the ISBN or the EAN barcode has been a direct consequence (and a precondition) for the development of EDI (electronic data interchange) and electronic trading.
The Internet, as it becomes a medium for trading in intellectual property, drives us several steps further. The digital network linking trading partners has for the first time to embrace consumers rather than simply supporting business-to-business transactions. The identity of the things that can be traded becomes much less clearly delineated when they may be computer files rather than physical objects. Users no longer have to access "content" only in pre-packaged products -- it becomes possible to provide them with the precise customized package of content that they want (and which theoretically at least no one else may want).
The management of the myriad transactions implicit in such a complex network environment will only be possible if mediated by computer systems. This puts additional pressure on the requirement for unambiguous identification and description of the content -- the "metadata" that has become the buzzword of e-commerce in intellectual property. Persistent identification and description is a prerequisite for the management of intellectual property rights in the digital environment.
It became increasingly apparent during the 1990s that existing approaches to identification would prove inadequate to meet the need. Publishers, for example, could see the deficiencies of the ISBN as an identifier for electronic publishing, because of its limitation to identifying physical objects, and the difficulties with applying it to items smaller than "a book". At the same time, the only content identifier commonly in use on the Internet -- the Uniform Resource Locator (URL) used to find particular pages on the World Wide Web -- was clearly deficient, not least because it was not used to identify content but rather location. The location is transient, whereas what was necessary was a means of identifying content itself, persistently and without ambiguity.
What has now developed, from a research project begun by the American Association of Publishers in 1996, is the complete DOI System described in this Handbook. Since 1998, the DOI has been managed by the International DOI Foundation (see Chapter 12 Governance).The DOI can be described as "persistent identifier of intellectual property entities". This requires some further explanation.
Firstly, a definition: "entity" is a term that we will use throughout this handbook, and it is important to understand what we mean by it: by our definition, it is simply something that is identified. (The underlying idea, borrowed from the <indecs> project, is that nothing exists in any useful sense until it is identified.)
There are many synonyms and near synonyms for this term "entity" as we use it. The equivalent term often used by the World Wide Web community is "resource".
So, what is "intellectual property"? We all know instinctively what we mean by "intellectual property" -- but do we always mean the same thing? One definition that has been advanced is "works of human intellect or imagination" -- which may take us a part of the way towards common understanding, but perhaps not all of the way.
Rather than attempt our own definition of what "intellectual property" may be, we depend instead on definitions agreed by the World Intellectual Property Organization and related international treaties like the Berne Convention.
So, the DOI can be used to identify any of the various physical objects that are "manifestations" of intellectual property: for example, printed books, CD recordings, videotapes, journal articles. A DOI can also be used to identify less tangible manifestations, the digital files that are the common form of all intellectual property in the network environment. But the use of a DOI can go beyond the identification only of "manifestations" -- it can also be used to identify performances of intellectual property or the "abstractions" that underlie the different manifestations, and other types of resources where they are involved in intellectual property transactions (see Chapter 9 Application).
Critically, the DOI is a persistent identifier: even if ownership of the entity or the rights in the entity change, the identification of that entity should not (and does not) change. The responsibility for managing the DOI changes, but not the DOI itself. For more information about persistence, see Chapter 7 Policies.
A DOI is different from commonly used internet pointers to material such as the URL -- Uniform Resource Locator, the usual means of referring to World Wide Web material -- because it identifies an object as a first-class entity, not simply the place where the object is located. A first-class entity or object in the information infrastructure is stored on one or more servers and is accessible from these servers using a globally accessible identifier (URI). An entity is referred to as first class when it represents an object, not some attribute of an object; e.g. an address is an attribute of a thing, whereas the thing itself is a first class object.
The DOI goes beyond simply providing a scheme for the unique and persistent naming of intellectual property entities in the network environment. The identifier itself is simply one element of a more complex system; the system is described in Chapter 3 The components of the DOI System and the chapters that follow it. The purpose of the DOI System is to make the DOI an actionable identifier. A user can use a DOI to do something.
The simplest action that a user can perform using a DOI is to locate the entity that it identifies. In this respect, a DOI may look superficially like a URL. However, the technology, which underlies the DOI, facilitates much more complex applications than simple location; and the DOI identifies the intellectual property entity itself rather than its location.
The DOI was first demonstrated (in a relatively simple form) in 1997. Since then, the International DOI Foundation has initiated a process of continuous development and improvement, in terms of technology, processes and policy. Some aspects of the DOI System have now progressed far enough to be formally standardized (e.g. the "syntax" of the number itself (see Chapter 4 Enumeration and Appendix 1). The DOI system makes use of other technologies (notably the Handle System and the <indecs> framework) which are themselves being further standardized (see Appendices 3 and 4). However, other aspects of the DOI are still subject to rapid change and development.
Several million DOIs have already been registered, by several hundred registrant organizations. These examples come mostly from traditional print-publishing companies that have already established major online publishing programs. This reflects the origins of the DOI in the text sector.
However the fundamental design of the system is applicable to any media or content. The IDF is working closely with many businesses in other sectors of the "content industries" to extend the application of the DOI to many other types of intellectual property.
2.8.1 External benefits (benefits in the distribution and sales life-cycle)
2.8.2 Internal benefits (benefits in the production life-cycle)
2.8.3 Quantified benefits: case studies
A white paper "Enterprise Content Integration with the Digital Object Identifier: a business case for information publishers", quantifies the business benefits for information publishers of implementing the Digital Object Identifier (DOI) to facilitate internal content management and to enable faster, more scalable product development, by delivering four key advantages in making it easier and cheaper to:
This is illustrated by four examples of cost savings, each of which is supported by a worked actual case study:
2.9 The use of the term "Identifier"; Number Schemes, Specification, and Identifier Systems
We need to make an important terminology distinction about the use of the word "identifier". As the use of numbering in digital networks has developed, the historical use of the word in this context has become expanded to the point where it is now used synonymously to cover several different things, all of which are useful but which actually carry different implications that need to be separated in a detailed understanding of practical Digital Rights Management (DRM) applications. It's important to understand the differences here; and to note that these are not mutually exclusive (one particular "identifier" may fit into one or all of these categories).
2.9.1 Identifiers as "Labels": The Output of Numbering Schemes
A numbering scheme is a formal standard, an industry convention, or an arbitrary internal system such as an incremented production serial number etc., to arrive at a consistent syntax for denoting and distinguishing separate members of a class of entities. The scheme is a specification for generating a number: this resulting "number" may include alphanumeric characters, but the accepted parlance is to speak of these as numbers (e.g., ISBN = International Standard Book Number). The intent is of establishing a one-to-one correspondence between the members of a set of labels (numbers), and the members of the set counted and labeled. The product of the process is enumeration, a cardinality judgement, and assigned numbers for each cardinal member. An example would be the ISBN, where a separate ISBN is assigned to each book edition. The numbering scheme may or may not be accompanied by some apparatus -- for example, a registration agency and maintenance agency for the ISO TC 46 series of identifiers.
The important point here is that the resulting number is simply a label string (a "noun"). It does not of itself create a string that is actionable in a digital or physical environment (a "verb") without further steps being taken. It may be used (and probably will be used) in databases; or it may be incorporated into another mechanism later.
The most common standard numbering schemes of interest in DRM include those standardised by ISO:
Whilst these ISO TC46 identifiers were originally simple numbering schemes, of late they have also begun to adopt the notion of associating some minimal structured descriptive metadata with the identifier. Also relevant are the ISO- affiliated NISO standards including:
ANSI/NISO Z39.84 The Digital Object Identifier
2.9.2 Identifiers as "Infrastructure Specifications": Making Labels Actionable
"Identifier" is also sometimes used to mean a mechanism or syntax by which any label (as defined above) can be expressed in a form suitable for use with a specific infrastructure tool. This is sometimes known as creating an "actionable identifier" -- meaning that in the context of that particular piece of infrastructure, the label can now be used to perform some action: e.g., in an internet Web browser, it can be "clicked on" and some action takes place.
Of particular relevance for DRM, the set of internet specifications known as Uniform Resource Identifiers (embracing URLs and URNs) provide mechanisms for taking labels and specifying them as actionable within the Internet. The same principles can apply in the physical as well as Internet environment -- for example by prefixing an ISBN with the EAN sequence 978 or 979, the ISBN becomes a UPC/EAN identifier expressible as a physical bar code symbol, or a radio-frequency tag, for use in the physical supply chain.
Importantly, note here that such "identifiers" do not mandate a way of creating labels, they merely accept any labels: hence if one does not have an existing numbering scheme, it will be necessary to adopt or create one in order to form URIs. A URI specification merely ensures that a label follows the rules to become actionable in an Internet environment: a specification is not an implementation, with all the other aspects that a fully functioning identifier system (see below) may require: URI may for example specify the syntax, and specify a recording registration procedure, but not create a managed environment (e.g. by which registrations are "policed"), or carry any specifications of metadata or policy. Some identifier specifications of this form may have limited rules or requirements for implementation: so far this is limited to the URN specification including a proposed (not implemented) mechanism for resolution. The acid test one should ask of such a specification is: what does specifying my label in this particular form get me, in practical terms, in a specific infrastructure?
2.9.3 Identifiers as "Implemented Systems": Implementing Labels in an Infrastructure Environment
The UPC/EAN is an "identifier system" in the physical supply chain; a DOI is an "identifier system" in the digital supply chain. ISBNs for example become implemented in the physical supply chain through UPC/EAN bar codes or RFID tags. This sense of "Identifier" denotes a fully implemented identification mechanism that includes the ability to incorporate labels, conforms to an infrastructure specification, and adds to these practical tools for implementation such as registration processes, structured interoperable metadata, and a policy/governance mechanism. Such a system is necessary for practical DRM applications; since DRM deals with digital entities, structured metadata will be an essential component of such a system. The DOI is one of the better developed, with several million DOIs currently in use by several hundred organisations.
Both ISO TC 46 and URN have published suggested lists of requirements for their identifiers -- the first covering "labels", the second "infrastructure specifications". These suggest that a practical identifier system (which builds on both concepts) for digital use (DRM) should assume a combination:
The three uses of the word "identifier" (label, infrastructure specification, and implementation) can become easily confused, since one particular string can be in more than one category. But to see why we need to be precise, consider the following statement:
"For use on the Internet, an ISBN label can become a URN specification; an ISBN label can be incorporated into a DOI, which is an implemented identifier system following the URI specification."
Replacing the more precise terms in this statement by the loose unqualified synonym "identifier" results in confusion:
"an ISBN identifier can become a URN identifier; an ISBN identifier can be incorporated into a Digital Object identifier, which is an implemented URI identifier"
(true, but only on close textual analysis!).