DOI® System Proxy Server
The DOI System uses the Handle System®
to manage digital objects (see the DOI Factsheet "DOI System and the Handle System
"). At the infrastructure level, DOI names are handles.
The DOI System Proxy Server is basically a web server that knows how to talk to the Handle System, and at this writing, most DOI® names found on the web are embedded in URLs that use the proxy server for DOI name resolution. For any HTTP request that combines the proxy's domain name with a DOI name, for example
the proxy will query the Handle System for the DOI name, take the URL in the handle record (or if there are multiple URLs in the handle record it will select one, and that selection is in no particular order) and send an HTTP redirect to that URL to the user's web browser.
The proxy will return a specific URL if the query includes its unique index value, for example,
will redirect the user's web browser to the URL with index value 3.
Increasing numbers of DOI names include data in addition to the single default URL. This is sometimes referenced as multiple resolution. These added values are intended for use by more advanced applications which have the ability to take advantage of multiple pieces of data, e.g., the location of enhanced metadata or related documents. The proxy server, which is assumed to be talking to a plain web browser, ignores these values.
The proxy server is configured to display a "DOI Name Not Found" error page when queried for a DOI name that it cannot find.
The Handle System uses UTF-8, a Unicode implementation, and has no character set constraints. But the DOI System Proxy Server is a web server that sends redirects to web browsers using HTTP syntax, so characters in DOI names that may not be interpreted correctly by web browsers, for example '?', should be avoided or encoded. A non-ASCII character in a DOI name should be converted to UTF-8, and each UTF-8 byte that isn't ASCII should be %encoded.
The "# " is another example. Only if you send the DOI System Proxy Server the DOI name
will the proxy correctly resolve the DOI name. If it is not encoded, #test will be treated as a fragment and removed by the web browser before it gets to the proxy, which will then attempt to resolve 10.1000/res instead.
The DOI names 10.1000/demo_DOI and 10.1000/demo_DOI/ are both valid DOI names, but it is unlikely that a DOI name will be created with a trailing slash. If a resolution request for a DOI name with a trailing slash is received by the proxy server and that DOI name is not found, the proxy server will return an error report that includes a warning that the requested DOI name contained a trailing slash, and a link to click to resolve the same string without the slash.
The DOI System Proxy Server is really multiple servers running at multiple locations, with the load distributed evenly across all servers. To speed resolution, the proxy servers cache handle values, with the TTL set to 24 hours. This means that if a handle value is changed, it can take up to 24 hours before the new value is returned.
The IDF also runs a proxy server for the shortDOI Service
that is not part of this DOI System Proxy Server specification.
For general information on the proxy server, see the DOI Handbook
chapter on Resolution
. Related information on Proxy Policies
for Registration Agencies is available on the RAWG web site.
Some additional functionality has been built into the DOI System Proxy Server to provide additional services for DOI System users who structure their DOI names to use the DOI Proxy Server.
Local Content Servers (The "Appropriate Copy" Problem)
DOI names for articles in scholarly and technical journals generally resolve to the publishers' websites. Retrieving the articles from those websites typically requires a fee or a subscription. Libraries commonly purchase copies of journals to keep in their local collections for their users, and they often own or subscribe to multiple copies of journals. For those institutions' users, the address to which a DOI name should appropriately resolve depends on the location or affiliation of the user who is making the resolution request, and the appropriate choice is usually one of the institution's local copies.
The International DOI Foundation, CrossRef
, libraries and library services providers came to refer to this as the "appropriate copy problem," and began collaborating on a solution and developing a prototype in 1999. (For details see D-Lib Magazine
, September 2001, "Linking to the Appropriate Copy
The combination of the DOI name resolution system, the CrossRef Metadata Database, and OpenURL provides a practical solution to the appropriate copy problem for libraries. The key components of this architecture are (1) local content servers that can match a query for an item to a library's appropriate copy; (2) a DOI name resolution system that can redirect a query to a local content server; (3) a way for the originator of the query to identify the appropriate local content server to the resolution system; and (4) a source of metadata about the item sufficient for the local content server to match the query to the appropriate copy.
The solution offered by CrossRef to its library affiliate members is as follows: users in a member institution click on a DOI name, and the DOI name and a cookie (previously set in the user's web browser through a "CookiePusher" mechanism) are sent to the DOI System proxy server. The proxy server recognizes the local content server identified in the cookie, constructs an OpenURL containing the DOI name, and sends it to the user's local resolver by way of an HTTP "redirect" to the user's browser. The local resolver sends the DOI name in OpenURL format to CrossRef. CrossRef returns metadata for the item named by the DOI name. The local content server recognizes this as a locally-held article, constructs a URL pointing to the item, and sends it to the user's browser as an HTTP "redirect".
If the article is not locally-held, the local content server returns the request to the proxy server, with a flag set to indicate there is no local copy, and the proxy server resolves the DOI name as it ordinarily would, redirecting the user's browser to the publisher's site.
The current solution is specific to CrossRef which, at the time of this writing, remains the source of most DOI names and associated metadata relevant to the problem. This situation is, however, changing with the continued growth of the DOI System and it is generally acknowledged that multiple sources of metadata will have to be accommodated in the future. This will require adjustments both in the DOI name resolution data and in the behavior of local content servers. The groups that came together to solve the initial "Appropriate Copy Problem" recognize this, have held preliminary discussions on the topic, and anticipate the need for additional collaborative efforts as the situation evolves.
A "CookiePusher" mechanism (see D-Lib
article for details) is used to set a cookie in the user's browser that is recognized by the proxy server and contains the URL to which the proxy server will redirect the DOI name resolution request. To prevent unauthorized users from setting cookies and redirecting traffic to their own personal resolvers, a BASE-URL list, containing the URLs of the authorized local content servers, is included in the CookiePusher. The BASE-URL can be the URL for a script or a directory, or even a top level domain, but it must be an OpenURL aware server. If the BASE-URL in the request is not in the list, the script will not set the cookie, but will return a "no cookie for you" message. BASE-URLs are collected from CrossRef affiliates when they join.
The CookiePusher script runs on the DOI System website (http://www.doi.org). The proxy server (http://dx.doi.org) is under the DOI.ORG® domain. The URL for the CookiePusher script is:
A sample request to the CookiePusher containing the URL prefix of the local content server is:
URI hexadecimal (%) encoding is recommended.
The request to add the cookie to the user's web browser's cookie file is usually hidden from view on an introductory or login web page, using a transparent GIF.
A sample cookie, with a TTL of 24 hours, is:
Server Secure: no
Expires: Wednesday, October 23, 2004 10:28:11 PM
After the cookie is set, the proxy server will recognize the local content server identified in the cookie, construct an OpenURL containing the local content server URL and the DOI name:
and send the request by way of an HTTP "redirect" to the local content server.
If there is no local copy of the content, the local server must return the request to the proxy server with a "nols=y" flag set. The proxy server will then resolve the DOI name and direct the user to the publisher's content. (The deprecated setting "nosfx=y" used in the prototype is still supported.) Correctly setting the "no local service" flag is critical to avoiding infinite loops.
The OpenURL Framework is a syntax for transporting metadata and/or identifiers about an object, using an established set of parameter names, to enable context-sensitive linking for the development of user-specific services. It has been developed and approved as NISO standard ANS/NISO Z39.88 The OpenURL Framework for Context-Sensitive Services
. Additional information and KEV Implementation Guidelines are available from NISO Committee AX
The OpenURL Framework includes DOI names as one of its registered Namespaces and DOI names are widely used in OpenURL implementations. This documentation references only part of the OpenURL Framework Registry. More references to OpenURL and the DOI System proxy server will be found in the documentation below on Parameter Passing.
In the OpenURL Format, descriptions of referenced resources, and descriptions of the associated resources that explain the context of the resource, are contained in ContextObjects that are transported using the HTTP protocol. ContextObjects use a Key/Encoded-Value format to create a string of ampersand-delimited pairs. The values must be URL-encoded.
Of the five ContextObject Entities, one of them, the Referent, is required. Within the scholarly information community, the Referent will likely be a journal or journal article, a conference proceeding, or a book. The Identifier for the Referent is its DOI name.
The DOI System proxy server is a web server that understands the Handle System protocol. It is not an OpenURL Resolver per se, and does not provide services to an end-user that pertain to the Referent within the ContextObject of the OpenURL. When it receives an OpenURL, it finds the DOI name in the string, resolves it, and re-directs the end-user's browser to that URL, ignoring all other ContextObject Entities.
The default syntax for a DOI name resolution request to the proxy server is:
The same DOI name resolution request using OpenURL would be:
The OpenURL Format standard approved by the NISO voting members includes significant changes made between Versions 0.1 and 1.0. Relevant to DOI names, in OpenURL Format Version 1.0, 'rft_id' replaced 'id' which was used in Version 0.1.
There were also changes to Namespaces. All Namespaces now follow URI schemes, and the 'uri:' prefix was dropped. ORI Namespaces are now 'info:'.
Also note that the initial implementation of OpenURL in the DOI System proxy server using the 'rft_id=doi:10.1000/demo_DOI_name' syntax will continue to be supported.
Before the DOI System and CrossRef came into existence, the scholarly publishing community implemented bilateral linking agreements that used parameters (name/value pairs) included in standard URLs to exchange data. This practice enabled them to gather information about requests coming to their sites, such as which other publisher's site a request came from, and from which journal and article. They could then implement special access rules, or establish pricing for their content based on who was requesting it.
At the time that the publishers began using DOI names, they also began thinking about how DOI names and the DOI System's proxy server could be used to facilitate the exchange of parameters, and remove the need for individual bilateral linking arrangements. A procedure, evolved over several years time, was agreed on by publishers who are now members of CrossRef, implemented in the proxy server, and has come to be called 'Parameter Passing'.
In Parameter Passing, there are two URLs involved, both of which may be query strings and/or include parameters: (1) the resolution request sent by the 'referrer' to http://dx.doi.org/
that has the DOI name, and (2) the URL associated with that DOI name, registered in the DOI System by the 'referent'. Parameter Passing requires joining the query strings of those two URLs together to form an 'out-bound' link. The names of the parameters used in both strings must be unique and defined for all parties. The OpenURL Format was chosen for the URLs because it specifies a set of parameter names that can be used to eliminate the chance of naming conflicts. (See "Parameter Passing Via the DOI System Proxy
" for the OpenURL parameters applicable for use in Parameter Passing, and the specific Common CrossRef Parameter Set.)
The DOI System proxy server accepts a resolution request in the form of an OpenURL. For example:
http://dx.doi.org/openurl?url_ver=z39.88-2003&rfr_id=ori:rid:crossref.org&rft_id= doi:10.1256/003590&rfr_dat=cr_setver%3d01%26cr_pub%3dSource%20Publisher%26cr_work%3dSource %20Journal%20Title%26cr_src%3dSRC-NAME
would be recognized by the proxy server as a Parameter Passing request. It will resolve the DOI name, and then check the domain of the URL against an 'opt-in' list that identifies organizations participating in Parameter Passing.
If the URL is in the opt-in list, the proxy server will construct a new URL as follows:
- Replace the registered URL's domain name and/or port number with different values, if replacements are specified in the opt-in list.
- Move all the parameters from the in-bound link to the out-bound link, except for the rft_dat parameter.
- For the rft_dat parameter, if the registered URL is an OpenURL, move the rft_dat parameter to the out-bound link. If it is not already in OpenURL format, hexencode the entire query string in the URL and place it into the out-bound link as the value of the rft_dat parameter.
The referent is assumed to have implemented a service capable of using the nested parameters. The assumption is that by agreeing to participate in Parameter Passing, a publisher will accept any and all parameters identified in the Common CrossRef Parameter Set. Changes to parameters resulting from changes to the OpenURL format, or from changes in requirements of Parameter Passing participants, will be noted in subsequent versions of "Parameter Passing Via the DOI System Proxy
Resolution of Multiple URLs
Using the 10320/loc Handle Type
One of the primary uses of the DOI System Proxy Server
, or a web browser plug-in, is to resolve a DOI name (handle
) to get a URL for a resource. For DOI names with multiple URL values, the proxy servers (at http://dx.doi.org, and also the one at http://hdl.handle.net) simply select the first URL value in the list of values returned by the DOI name resolution. Because the order of that list is nondeterministic, there is no intelligent selection of a URL to which the client would be redirected. To improve the selection of specific resource URLs from handles and DOI names that contain multiple URLs, and to add features to the handle-to-URL resolution process, the 10320/loc handle value type was developed.
Every handle, and thus every DOI name, has a set of values assigned to it, and each of those values has a type that defines the syntax and semantics of the data. Some of the typed values are for administration: owner or creation date. The others are for client use: URL strings or email addresses, or complex data types such as binary data, XML code, or other handles.
To avoid conflicts for clients if users assign types that are not registered and recognized across the user community, types are being assigned their own handles so that they can be defined and registered in the Handle System, a process that is currently under development. The prefix '10320', an arbitrary five digit string, has been set aside by the Handle System administrator for identifying handle types. For type 10320/loc, the suffix 'loc' is simply short hand for location.
Type 10320/loc specifies an XML-formatted handle value that contains a list of locations. Each location has a set of associated attributes that help determine if or when that location is used. The overall list of locations can include hints for how the resolving client should select a location, including an ordered set of selection methods. The proxy servers (or any other resolution client) can apply each known selection method, in order, to choose a location based on the resolver's context (the HTTP request in the case of the Proxy Server) and the attributes of each location.
The attributes for the set of locations, as well as each location entry in the set, are open-ended to allow for future capabilities to be added in a backwards-compatible way. A small number of attributes have been defined as "standard" that all resolvers should understand.
At the top level of the XML structure are the following defined attributes:
The chooseby attribute identifies a comma-delimited list of selection methods. If no chooseby attribute is specified then the default (currently "locatt,country,weighted") is assumed.
For each location the following attributes are defined:
The URL for the location.
The weight (from zero to one) that should apply to this location when performing a random selection. Setting the weight attribute to zero results in the location not being selected unless a) it is explicitly referenced by another attribute; b) there are no other suitable locations; or c) the location is selected based on one of the other selection methods, such as country or language. If a location has no weight attribute then it is assumed to have a weight of one.
The currently defined selection methods are:
Selects only locations from an attribute passed in the Proxy/DOI name-URI link. If someone constructs a link as doi:10.123/456?locatt=id:1 then the resolver will return the locations that have an "id" attribute of 1 (i.e., the second location in the resolution example below).
Selects only locations that have a 'country' attribute matching the country of the client. If no matching locations are found then this selects locations that have no country attribute (i.e., not a mismatch). The http://hdl.handle.net and http://dx.doi.org Proxies determine the country of the client using a GeoIP
Selects a single location based on a random choice. The Proxy will observe the 'weight' attribute for each location, which should be a floating point non-negative number. The weighting allows for a very basic load balancing, but is also a way to ensure that some locations can only be addressed directly (for example by country or locatt/attributes). If the weighted selection method is applied to locations that all have non-positive weights, then this selects one of the remaining locations randomly while disregarding location weights.
The Proxy will iterate over the known selection methods, in order, until a single location has been selected. After each iteration the Proxy will take one of four steps:
- if there is only one remaining location element, it is returned as a redirect;
- if there are no remaining location elements, the Proxy reverts to the location elements as they were before the last method was applied;
- if there are multiple location elements the Proxy will apply the remaining selection methods to those locations;
if there are no more selection methods to try, the weighted random selection method is applied, which is guaranteed to return a single location. In a sense, the weighted random is always the "fallback".
For references to DOI name 10.123/456, with a value type 10320/loc that has this list of location attributes:
<location id="0" href="http://uk.example.com/" country="gb" weight="0" />
<location id="1" href="http://www1.example.com/" weight="1" />
<location id="2" href="http://www2.example.com/" weight="1" />
the following selections could be made:
Reference: 10.123/456 from a client located in the UK
Result: The "country" selection method selects the first location based on the 'country' attribute of the first location and the client's position.
Reference: 10.123/456 from a client located outside the UK
Result: The "country" selection method removes the first location from consideration based on its 'country' attribute and chooses one of the last two locations using the "weighted" random selection method.
Result: The second location is used based on the "locatt" selection method and the 'id' attribute.
Result: The first location is used based on the "locatt" selection method and the 'id' attribute. The resolver never gets to the "country" selection method as the "locatt" selection method resulted in only a single matching location.
Result: The first location is used based on the "locatt" selection method and the 'country' attribute.
Result: The "country" selection method removes the first location from consideration based on its 'country' attribute, finds no US-specific location, and chooses one of the last two locations using the "weighted" random selection method.
Specific Use Case CrossRef
The DOI name 10.1177/1522162802239753, was assigned to an article in the journal Graft: Organ and Cell Transplantation, which has ceased publication. The DOI name was updated to point to two archiving services that offer the article.
A 10320/loc type containing the following information was added to the record, to be used by the Proxy for redirection:
<location id="1" cr_type="MR-LIST"
href="http://mr.crossref.org/iPage?doi=10.1177%2F1522162802239753" weight="1" />
<location id="2" cr_src="clockss_su" label="CLOCKSS_SU" cr_type="MR-LIST"
href="http://graft.edina.clockss.org/cgi/reprint/6/1/18" weight="0" />
<location id="3" cr_src="clockss_edina" label="CLOCKSS_Edina" cr_type="MR-LIST"
href="href="http://graft.edina.clockss.org/cgi/reprint/6/1/18" weight="0" />
The 'chooseby' attributes (locatt,country,weighted) are the default set. In this example, the evaluation falls through the first two and the Proxy uses 'weighted' as the selection criteria. The first location (mr.crossref.org) wins with a weight of 1. The Proxy redirects to mr.crossref.org which in this example is a script on the CrossRef site that builds the page a user sees when resolving the DOI name in the form of:
The resulting page shows that two archive services offer the article for download. The 10320/loc data at id="2" and id="3" is used by the CrossRef script to display two sources from one of the services.
The general mechanism could be used in many different configurations, including building a link that specified an attribute of one of the two locations as a parameter, in which case the user would simply be redirected there in the usual fashion, without being shown the CrossRef-built multi-resolution page. The original URL serves as a fall back for older proxies or plug-ins that don't understand 10320/loc.