Resolution of Multiple URLs
Using the 10320/loc Handle Type
One of the primary uses of the DOI® System Proxy Server
, or a web browser plug-in, is to resolve a DOI®
) to get a URL for a resource. For DOI names with multiple URL values, the proxy servers (at http://dx.doi.org, and also the one at http://hdl.handle.net) simply select the first URL value in the list of values returned by the DOI name resolution. Because the order of that list is nondeterministic, there is no intelligent selection of a URL to which the client would be redirected. To improve the selection of specific resource URLs from handles and DOI names that contain multiple URLs, and to add features to the handle-to-URL resolution process, the 10320/loc handle value type was developed.
Every handle, and thus every DOI name, has a set of values assigned to it, and each of those values has a type that defines the syntax and semantics of the data. Some of the typed values are for administration: owner or creation date. The others are for client use: URL strings or email addresses, or complex data types such as binary data, XML code, or other handles.
To avoid conflicts for clients if users assign types that are not registered and recognized across the user community, types are being assigned their own handles so that they can be defined and registered in the Handle System, a process that is currently under development. The prefix '10320', an arbitrary five digit string, has been set aside by the Handle System administrator for identifying handle types. For type 10320/loc, the suffix 'loc' is simply short hand for location.
Type 10320/loc specifies an XML-formatted handle value that contains a list of locations. Each location has a set of associated attributes that help determine if or when that location is used. The overall list of locations can include hints for how the resolving client should select a location, including an ordered set of selection methods. The proxy servers (or any other resolution client) can apply each known selection method, in order, to choose a location based on the resolver's context (the HTTP request in the case of the Proxy Server) and the attributes of each location.
The attributes for the set of locations, as well as each location entry in the set, are open-ended to allow for future capabilities to be added in a backwards-compatible way. A small number of attributes have been defined as "standard" that all resolvers should understand.
At the top level of the XML structure are the following defined attributes:
The chooseby attribute identifies a comma-delimited list of selection methods. If no chooseby attribute is specified then the default (currently "locatt,country,weighted") is assumed.
For each location the following attributes are defined:
The URL for the location.
The weight (from zero to one) that should apply to this location when performing a random selection. Setting the weight attribute to zero results in the location not being selected unless a) it is explicitly referenced by another attribute; b) there are no other suitable locations; or c) the location is selected based on one of the other selection methods, such as country or language. If a location has no weight attribute then it is assumed to have a weight of one.
The currently defined selection methods are:
Selects only locations from an attribute passed in the Proxy/DOI name-URI link. If someone constructs a link as doi:10.123/456?locatt=id:1 then the resolver will return the locations that have an "id" attribute of 1 (i.e., the second location in the resolution example below).
Selects only locations that have a 'country' attribute matching the country of the client. If no matching locations are found then this selects locations that have no country attribute (i.e., not a mismatch). The http://hdl.handle.net and http://dx.doi.org Proxies determine the country of the client using a GeoIP
Selects a single location based on a random choice. The Proxy will observe the 'weight' attribute for each location, which should be a floating point non-negative number. The weighting allows for a very basic load balancing, but is also a way to ensure that some locations can only be addressed directly (for example by country or locatt/attributes). If the weighted selection method is applied to locations that all have non-positive weights, then this selects one of the remaining locations randomly while disregarding location weights.
The Proxy will iterate over the known selection methods, in order, until a single location has been selected. After each iteration the Proxy will take one of four steps:
- if there is only one remaining location element, it is returned as a redirect;
- if there are no remaining location elements, the Proxy reverts to the location elements as they were before the last method was applied;
- if there are multiple location elements the Proxy will apply the remaining selection methods to those locations;
if there are no more selection methods to try, the weighted random selection method is applied, which is guaranteed to return a single location. In a sense, the weighted random is always the "fallback".
For references to DOI name 10.123/456, with a value type 10320/loc that has this list of location attributes:
<location id="0" href="http://uk.example.com/" country="gb" weight="0" />
<location id="1" href="http://www1.example.com/" weight="1" />
<location id="2" href="http://www2.example.com/" weight="1" />
the following selections could be made:
Reference: 10.123/456 from a client located in the UK
Result: The "country" selection method selects the first location based on the 'country' attribute of the first location and the client's position.
Reference: 10.123/456 from a client located outside the UK
Result: The "country" selection method removes the first location from consideration based on its 'country' attribute and chooses one of the last two locations using the "weighted" random selection method.
Result: The second location is used based on the "locatt" selection method and the 'id' attribute.
Result: The first location is used based on the "locatt" selection method and the 'id' attribute. The resolver never gets to the "country" selection method as the "locatt" selection method resulted in only a single matching location.
Result: The first location is used based on the "locatt" selection method and the 'country' attribute.
Result: The "country" selection method removes the first location from consideration based on its 'country' attribute, finds no US-specific location, and chooses one of the last two locations using the "weighted" random selection method.
Specific Use Case CrossRef
The DOI name 10.1177/1522162802239753, was assigned to an article in the journal Graft: Organ and Cell Transplantation
, which has ceased publication. The DOI name was updated to point to two archiving services that offer the article.
A 10320/loc type containing the following information was added to the record, to be used by the Proxy for redirection:
<location id="1" cr_type="MR-LIST" href="http://mr.crossref.org/iPage?doi=10.1177%2F1522162802239753" weight="1" />
<location id="2" cr_src="clockss_su" label="CLOCKSS_SU" cr_type="MR-LIST"
href="http://graft.edina.clockss.org/cgi/reprint/6/1/18" weight="0" />
<location id="3" cr_src="clockss_edina" label="CLOCKSS_Edina" cr_type="MR-LIST"
href="href="http://graft.edina.clockss.org/cgi/reprint/6/1/18" weight="0" />
The 'chooseby' attributes (locatt,country,weighted) are the default set. In this example, the evaluation falls through the first two and the Proxy uses 'weighted' as the selection criteria. The first location (mr.crossref.org) wins with a weight of 1. The Proxy redirects to mr.crossref.org which in this example is a script on the CrossRef site that builds the page a user sees when resolving the DOI name in the form of:
The resulting page shows that two archive services offer the article for download. The 10320/loc data at id="2" and id="3" is used by the CrossRef script to display two sources from one of the services.
The general mechanism could be used in many different configurations, including building a link that specified an attribute of one of the two locations as a parameter, in which case the user would simply be redirected there in the usual fashion, without being shown the CrossRef-built multi-resolution page. The original URL serves as a fall back for older proxies or plug-ins that don't understand 10320/loc.