|
Previous Chapter: 9 Operating Procedures Next Chapter: Appendix 2 The Handle System®
Appendix 1 ANSI/NISO Z39.84-2005 Syntax for the Digital Object Identifier
This excerpt from ANSI/NISO Z39.84-2005 Syntax for the Digital Object Identifier is
reprinted here with permission of NISO Press. Some text has been omitted from the
standard as originally published. Since the publication of this standard, some URLs have
been updated in this version for accuracy. The full standard can be downloaded as a PDF
file for no charge from the NISO website by clicking on the NISO Press icon. DOI® and DOI.ORG® are registered trademarks and the DOI> logo is a trademark of the International DOI Foundation.
© 2005 National Information Standards Organization.
1. Introduction
2. Standards and References
3. Definitions
4. Format and Characteristics of the DOI
5. Maintenance Agency
APPENDIX A DOI Specifications
APPENDIX B Designation of Maintenance Agency
APPENDIX C Examples of Digital Object Identifiers
APPENDIX D Related Standards and References
APPENDIX E Application Issues
1. Introduction
1.1 Purpose
This standard defines the syntax for a character string called the Digital Object Identifier (DOI).
1.2 Scope
This standard is limited to defining the syntax of the DOI character string. Policies governing the
assignment and use of DOIs are determined by the International DOI Foundation (IDF) and are outside
the scope of this document.
2. Standards and References
Referenced standards are those that need to be used to construct a DOI. Secondary standards and references
include citations to documents that can be of use in conjunction with the DOI. See Appendix D for related
standards and references.
2.1 Referenced Standard
The Unicode Consortium. The Unicode Standard, Version 4.0.1, defined by: The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), as amended by Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1/).
3. Definitions
Deposit. The act of entering into the Directory a DOI and associated information necessary for the DOI to be used.
Digital Object Identifier (DOI). A character string used in a System conforming to the rules of, and
deposited in the Directory administered by, the IDF.
Directory. A repository in which DOIs are deposited and attendant locations are maintained.
Directory Manager. The organization that manages the Directory on behalf of the IDF.
DOI prefix. The Directory and the Registrant codes issued by a Registration Agency to a Registrant for use in the DOIs allocated by that Registrant.
DOI suffix. The character string assigned by a Registrant. The suffix shall be unique within the set of DOIs
specified by the DOI prefix held by the Registrant.
International DOI Foundation (IDF). The body set up to support the needs of the intellectual property community in the digital environment by establishing and governing the DOI System, setting policies for the System, appointing service providers for the System, and overseeing the successful operation of the System.
Registrant. An organization or entity that has requested and been allocated one or more DOI prefixes by a Registration Agency.
Registration. The act of allocating the DOI prefix to a Registrant by the Registration Agency.
Registration Agency [DOI Registration Agency]. An organization appointed by the International DOI Foundation to register and allocate DOI prefixes to Registrants, and which subsequently accepts DOIs being deposited by Registrants.
4. Format and Characteristics of the DOI
The DOI is composed of the prefix and the suffix. Within the prefix are the Directory Code <DIR> and the Registrant Code <REG>. The suffix is made up of the DOI Suffix String <DSS>.
The syntax of the DOI string is: <DIR>.<REG>/<DSS>
There is no practical limit on the length of a DOI string, or any of its components (the Handle System allows strings of up to 4 GB; under UTF-8 encoding each ASCII character takes one byte, hence in ASCII encoding a DOI may be approx 4 billion characters).
Characters 'a' - 'z' and 'A' - 'Z' in the DOI string are case insensitive (e.g. 10.123/ABC is identical to 10.123/AbC). These characters in the DOI string are converted to upper case upon registration and resolution. If a DOI were registered as 10.123/ABC, then 10.123/abc
would resolve it and a later attempt to register 10.123/AbC would be rejected with an error message stating that the DOI was already in existence. Comparison of two DOIs (to decide if they match or not) should be done by first converting all characters 'a' - 'z' in DOI strings to upper case, followed by octet-by-octet comparison of the entire DOI string.
4.1 DOI Character Set
Legal characters are the legal graphic characters of Unicode. This specifically excludes the control character ranges 0x00-0x1F and 0x80-0x9F, which are therefore not valid characters for DOI strings, and will never be present in DOI conformant systems. Reserved characters, if any, are listed in the following descriptions of the prefix and suffix.
4.2 Prefix
<DIR> Directory Code (required)
See Appendix A for all valid values for the Directory Code. The Maintenance Agency is responsible for updating the list of valid values. The Directory Code is numeric; currently the only valid value is <DIR>=10.
<REG> Registrant's Code (required)
Separated from <DIR> by ".". This is assigned to the Registrant by the International DOI Foundation.
DOI Prefix Character Set
Any character within the DOI Character Set as defined above.
<DIR> and <REG> are assigned by the International DOI Foundation.
4.3 Suffix
<DSS> DOI Suffix String (required)
This is assigned by the Registrant.
DOI Suffix Character Set
Any character within the DOI Character Set as defined above, with the exception that the Suffix cannot start with */ where * is any single character. This is reserved for future use. The DSS is case insensitive.
5. Maintenance Agency
The Maintenance Agency designated in Appendix B shall review suggestions for new data elements,
interpret the rules prescribed by this standard, and maintain a listing of inquiries
and responses that may be used for potential future enhancement of this standard. Questions concerning the implementation of this standard and requests for information should be sent to the Maintenance Agency.
(This appendix is not part of the Syntax for the Digital Object Identifier,
ANSI/NISO Z39.84-2005. It is included for information only.)
This appendix provides information on aspects of the DOI system syntax implementation which are determined by the International DOI Foundation and which will not change the DOI syntax defined in this standard.
Valid values for Directory Code
<DIR> <REG> is assigned by the International DOI Foundation. The prefix is numeric.
Valid value for <DIR> = 10
DOIs are persistent, as defined in IETF RFC 1737. Functional Requirements for Uniform Resource Names.
(http://www.ietf.org/rfc/rfc1737.txt): "It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name."
UTF-8 encoding is mandated by the Handle System. Therefore, all Unicode characters must
be encoded using UTF-8.
The Handle System used as the basis for the DOI system allows an unlimited length for the DOI string.
However it is recommended that the suffix (<DSS>) be kept as short as possible to allow for human readability
and ease of use in systems where size may be a consideration (e.g., watermarking).
This information is maintained by the DOI Maintenance Agency (see Appendix B).
(This appendix is not part of the Syntax for the Digital Object Identifier,
ANSI/NISO Z39.84-2005. It is included for information only.)
The functions assigned to the Maintenance Agency as specified in Section 5 will be administered by The
International DOI Foundation (http://www.doi.org/).
Questions concerning the implementation of this standard and requests for information should be sent to:
E-mail: n.paskin@doi.org
Dr Norman Paskin
Director
The International DOI Foundation
5, Linkside Avenue
Oxford
OX2 8HY
UK
Tel: (+44) 1865 559070
(This appendix is not part of the Syntax for the Digital Object Identifier,
ANSI/NISO Z39.84-2005. It is included for information only, and may include editorial updates and corrections.)
DOI registrants can use a variety of strings for the DSS including private identifiers and existing standards such as SICI (Serial Item and Contribution Identifier). The syntax of the identifier numbering scheme is such that any existing identifier syntax string can be expressed in a form suitable for use with the DOI system.
The following are examples of Digital Object Identifiers:
DOI (incorporating a SICI) from an article in the Journal of the American Society for Information Science, published by John Wiley & Sons:
10.1002/(SICI)1097-4571(199806)49:8<693::AID-ASI4>3.0.CO:2-0
DOI for an article from JAMA, the Journal of the American Medical Association:
10.1001/PUBS.JAMA(278)3,JOC7055-ABSY:
DOI for the article "ABO Blood Group System" from Encyclopedia of Immunology Online, 2nd edition, published by Academic Press:
10.1006/rwei.1999.0001
(This appendix is not part of the Syntax for the Digital Object Identifier,
ANSI/NISO Z39.84-2005. It is included for information only, and may include editorial updates and corrections.)
The standard cited in Section 2 is required for the construction of the DOI syntax. This appendix includes references to other standards and citations that may be useful with DOIs or which provide additional information on the DOI.
When American National Standards cited below are superseded by a revision, the revision shall apply.
ANSI X3.4:1986 American National Standard for Information Systems Coded Character Sets 7-bit American National Standard Code for Information Interchange (7-bit ASCII) New York: ANSI, 1986.
DOI Handbook: DOI 10.1000/182, http://www.doi.org/hb.html
DOI factsheets (DOI and Handle; DOI and Numbering Schemes; DOI and Data Dictionaries; DOI and Internet Identifier Specifications; DOI Applications; Value added by the DOI System: http://www.doi.org/factsheets.html
Handle System: http://www.handle.net/
Sun, Sam; Lannom, Larry; Boesch, Brian. "Handle system Overview". RFC 3650, November 2003. http://www.handle.net/rfc/rfc3650.html
Sun, Sam; Reilly, Sean; Lannom, Larry. "Handle system Namespace and Service Definition". RFC 3651, November 2003. http://www.handle.net/rfc/rfc3651.html
Sun, Sam; Reilly, Sean; Lannom, Larry; Petrone, Jason. "Handle System Protocol (Ver 2.1) Specification". RFC 3652, November 2003. http://www.handle.net/rfc/rfc3652.html
UTF-*, A Transform Format for Unicode and ISO 10646", RFC 2044, October 1996, Yergeau, Francois - http://www.ietf.org/rfc/rfc2044.txt
(This appendix is not part of the Standard Syntax for the Digital Object Identifier, ANSI/NISO Z39.84-2005.
It is included for information only.)
Except for the specific requirements imposed by this standard (such as use of Unicode and
reserved characters), no restrictions are imposed or assumptions made about the characters used in DOIs. Appendix E discusses some encoding issues that arise when using DOIs in specific application contexts like URLs and with the HTTP protocol. Other application contexts in which DOIs are used may have similar types of requirements or restrictions. However, such requirements for encoding or restrictions on the use of particular characters only apply when DOIs are used within those particular application contexts. They are not part of the DOI syntax itself as defined by this document.
UTF-8 Encoding
The Handle System specifies UTF-8 as the encoding for DOI strings. ASCII characters are preserved under UTF-8 encoding. No changes need to be made to ASCII characters to comply with UTF-8 encoding. The default encoding of Unicode is that each character consists of 16 bits (2 octets). UTF-8 is a variation of the Unicode encoding that allows characters to be encoded in terms of one to six octets. UTF-8 encoding plays a role when non-ASCII characters are used. For example, the Japanese word "nihongo" is written as:
The Unicode sequence representing the Han characters for "nihongo" is: 65E5 672C 8A9E. These may be encoded in UTF-8 as follows: E6 97 A5 E6 9C AC E8 AA 9E. For further information on UTF-8 see "UTF-8, A Transform Format for Unicode and ISO10646", RFC2044, October 1996.
Encoding Recommendations When Used in URLs
Current Web browser technology requires additional functionality to allow the browser to make full use of DOIs: additional browser features are necessary. It is anticipated that features supporting resolution will commonly be built into browsers in the future.
There is a freely available "resolver plug in" that can be downloaded from http://www.handle.net/resolver/. For both Netscape and Microsoft IE browsers, the plug-in
extends the browser's functionality so that it understands the Handle protocol.
Alternatively, without the need to extend the Web browsers' capability, DOIs may be structured to use the default public DOI proxy server (http://dx.doi.org). The resolution of the DOI in this case depends on the use of URL syntax. For example, "doi:10.123/456" would be written as http://dx.doi.org/10.123/456.
DOIs are also primarily used in HTML pages. The DOI 10.1006/rwei.1999".0001 as a link in
an HTML page would be:
<A HREF="http://dx.doi.org/10.1006/rwei.1999%22.0001">10.1006/rwei.1999%22.0001</A>
Note that " has been encoded (see next section) to distinguish the DOI in the URL from the surrounding text.
The DOI is displayed in its encoded form since users may type the DOI directly into their browsers.
Encoding Issues
There are special encoding requirements when a DOI is used with HTML, URLs, and HTTP. The syntax for Uniform
Resource Identifiers (URIs) is much more restrictive than the syntax for the DOI. A URI can be a Uniform Resource
Locator (URL) or a Uniform Resource Name (URN).
Hexadecimal (%) encoding must be used for characters in a DOI that are not allowed, or have other meanings,
in URLs or URNs. Hex encoding consists of substituting for the given character its hexadecimal value preceded
by percent. Thus, # becomes %23 and http://dx.doi.org/10.1000/456#789 is encoded as http://dx.doi.org/10.1000/456%23789.
The browser does not now encounter the bare #, which it would normally treat as the end of the URL and the start
of a fragment, and so sends the entire string off to the DOI network of servers for resolution, instead of
stopping at the #. Note: The DOI itself does not change with encoding, merely its representation in a URL.
A DOI that has been encoded is decoded before being sent to the DOI Registry. At the moment the decoding
is handled by the proxy server http://dx.doi.org/. Only unencoded DOIs are stored in the DOI Registry database.
For example, the number above is in the DOI Registry as "10.1000/456#789" and not "10.1000/456%23789". The
percent character (%) must always be hex encoded (%25) in any URLs.
There are few character restrictions for DOI number strings per se. When DOIs are embedded in URLs,
they must follow the URL syntax conventions. The same DOI need not follow those conventions in other contexts.
Mandatory and Recommended Encoding for DOI Deposit and URLs
Tables 1 and 2 summarize the encoding guidelines for DOI. URLs have the most restricted set of characters.
Table 1 lists the characters that should always be hex encoded. Table 2 lists additional characters where
it is recommended that characters be replaced by hex-encoding. The distinction between the lists is between
practical experience with current web browsers and the more formal specification of URL syntax.
In the DOI Directory all characters represent themselves.
Table 1: Mandatory Encoding
| Character |
Encoding |
| % |
(%25) |
| " |
(%22) |
| # |
(%23) |
| SPACE |
(%20) |
Table 2: Recommended Encoding
| Character |
Encoding |
| < |
(%3c) |
| > |
(%3e) |
| { |
(%7b) |
| } |
(%7d) |
| ^ |
(%5e) |
| [ |
(%5b) |
| ] |
(%5d) |
| ' |
(%6o) |
| | |
(%7c) |
| \ |
(%5c) |
Previous Chapter: 9 Operating Procedures Next Chapter: Appendix 2 The Handle System®
|