[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ref-Links] Re: DOIs used for reference linking



Greetings,

On 6  Apr 99, dsidman@wiley.com wrote:

I clipped the subject line a bit -- hope that doesn't mess with the  
archive, but it was getting unwieldy. I'd also like it if someone could tell  
me which forum, discuss-doi or ref-links, is most appropriate -- I have  
seen replies that only go to one or the other list and the discussion is  
somewhat fractured as a result. I suspect discuss-doi has a broader, more  
active audience, but perhaps this isn't of sufficiently broad interest?  
Anyway, this message is cc'ed to both lists.

> I'm chiming in here as one who has long wished for a DOI-based solution due
> to what I think are some inherent advantages. Let me assert some of these,
> then explain why I think that a central lookup database solves the
> reference linking problem in the simplest manner as well as the one most
> likely to succeed in practical terms:

I am glad that you chimed in - I think you have done a wonderful job of  
summing up your viewpoint. I'll try and address your comments as best I can.

> The DOI obviates the need for URLs, formulas and algorithms (which are
> inherently publisher-specific and subject to very dynamic change over time)

This is a bit of a straw man. A publisher can just as well commit to  
keeping the URL's, formulas, and algorithms stable and publishers can even  
agree to use a common format for them. Because of the way articles are  
cited, there is already convergence in this with the introduction of link  
managers by many STM publishers. Any interface to the DOI for querying by  
metadata (which will be the same metadata that is used in the publisher  
links) will have to have the same features as a publisher based system.

> because its very purpose is to provide one central level of indirection for
> same. I.e., the very creation and maintenance of DOIs in the DOI directory
> was designed to handle that problem once, in one place, under a neutral
> (non-profit) organization which publishers would commit to supporting, in a
> permanent manner, and based on an underlying technology which is as
> scalable as the Internet itself has proved to be (possibly because it was
> developed by the same people).

Right, but I think publishers need a second level of indirection to handle  
their DOI -> URL resolution anyway (see below).

> Because of this, the DOI is all that any person, system, or embedded
> reference needs to know to create permanent, reliable, accurate links.
> He/she/it will always be able to get from the DOI to the (current)
> publisher of the content, even if the original publisher has been bought be
> another, or has moved the content to a different server, or has otherwise
> changed the way they assign URLs. This alone simplifies reference linking
> tremendously, because it avoids the need for vast numbers of independent
> systems to be updated continually, or for even vaster numbers of embedded
> references (within other content itself) to have to be actively maintained.

> The only missing link so far in the scenario I've described is:  "How does 
> anyone FIND OUT the DOI for a given piece of content?" I.e., once they know
> the DOI they're home free, but how do they discover it? This is where the
> DOI metadata database comes in. If in the same step in which publishers
> register their DOIs, they also register a minimal amount of metadata about
> the objects, then a simple lookup database would exist which would allow
> people and systems to look up the DOIs based on a simple query.

The "only missing link" is an understatement from my point of view. The  
resolution of metadata -> URL is the whole shebang. What I tried to  
highlight was that the metadata is central to the way in which articles are  
cited. I do not beleive that researchers will adopt DOIs (maybe I am wrong)  
for use in citations and any interface for linking is necessarily tied quite  
tightly to the metadata that is typically cited. Once you have an interface  
based on the metadata, the DOI becomes superfluous. Even if a journal moves  
among publishers, the metadata used to cite a previously published article  
does not change.

> One might argue that maintenance of a central database like this is
> unlikely to be supported faithfully, but if doing so is part and parcel of
> the registration/maintenance of the DOIs themselves, and can be executed in
> the same operational process, quality-assured in the same pass, etc., then
> I think it has an excellent chance of succeeding.

No, I don't agree. The registration process might impose some decent  
quality control, but the problems that are important come after registration  
- most important are maintaining URLs as publishers move them around  
(mitigated greatly if a publisher creates a link manager with indirection)  
and as journals move from publisher to publisher. We are talking about  
having to update 100,000's of records and maintain them faithfully, forever.

> Furthermore, publishers
> will have this metadata available for registration anyway, because
> internally they will need it to tag and manage their own DOIs, prior to
> registering the DOIs in the DOI directory.

This is circular. If I don't use DOIs, I don't need to maintain the DOIs in  
my database. But I do, as a matter of my normal course of business,  
maintain the other metadata in my centralized database. Requiring publishers  
to tow around DOIs is one of the burdens that impacts the scalability. All  
of a sudden every publisher has to maintain DOIs internally and maintain  
them on a centralized server. This is a whole new business expense.  
Maintaining an SLinkS like database (updating for moving journals among  
publishers) is much easier and doesn't require any real investment in  
resources.

>  How else could they control
> (internally) what their various DOIs stand for, who (internally) are the
> various content "owners" responsible for keeping the corresponding URLs up
> to date, etc.?

If I don't use DOIs, I don't have to control them :^). Our link manager,  
with its own level of indirection, is tightly coupled to our manuscript  
database giving us enormous flexibility in handling linking. Such  
flexibility can't be achieved in a centralized setting.

> In any case, all "link processor" type approaches still need maintenance in
> a central place, but in this case it's maintenance of a set of programs,
> rules and algorithms. As hard as it might be to maintain a central
> database, I think it's much harder to maintain a central application, or
> set of algorithms. This may seem simple at first on the assumption that
> citation data can be converted into URLs, but in the real world I think
> this will be much more fragile - especially over time, as all the
> individual publishers' rules change dynamically. For example, this might
> require that the central algorithm not only needs to keep up with the new
> rules, but also to keep track of considerations like: "Well, from March
> 2000 to July 2001, Publisher X used such-and-such a URL-construction
> algorithm, but then changed it to such-and-such other algorithm in July
> 2001 but only for its genetics journals, which in January 2002 it then sold
> to Publisher Y, so all references after that date can only be pointed to
> based on the new publisher's algorithm at that time..." To try and
> construct reliable references based on this kind of consideration would be
> extremely difficult.

This is a straw man too. If a publisher adopts a link manager with  
indirection and a consistent URL scheme, none of these issues will really  
crop up. Only moves to new publishers with new templates are problematic,  
but it is easy to update the SLinkS metadata for something like this. Just  
change the owner and the template. Linkers would periodically check the  
SLinKS and update accordingly.

To repeat, any publisher who forgoes indirection in their own URLs is  
doomed to a maintenance nightmare. Here is a counterexample: The APS, unlike  
many publishers, uses a varied set of platforms to deliver our online  
content. We use AIP's OJPS, we deliver some journals from here at Ridge, we  
have RMP at HighWire and we have PROLA (our 1985-1996 Phys. Rev. Online  
Archive) in house as well. Recently we moved PRD from Ridge to OJPS - no  
links had to be changed, only the link manager had to be updated (a one line  
change). Or, we may move some current content in OJPS or HighWire into  
PROLA in a sliding window. Again, no URLs change, it is a one entry in a  
table change in the link manager. As we move PROLA back in time (ultimately  
to 1893), the link manager will be updated and the URLs are all the same. We  
don't have to create 100,000s of DOIs and we don't have to update central  
servers. Another example: E-first publication. Articles are posted every day  
in PRD. We don't have to update a centralized server so that people can  
start linking. Even with DOIs, we would still need a way to let people know  
that there are now thousands of additional DOI's available for linking which  
is equivalent to updating SLinkS type metadata.

I can't really emphasize enough how easy it is to link once a link manager  
is in place. Library and A&I services such as CERN, SLAC's SPIRES, U. of  
California's Melvyl, ISI have all been able to trivially create links to our  
articles without having to maintain any additional metadata or do lookups.

> To me, then, a database approach, coupled with the DOI as a (permanent)
> identifier-plus-routing-mechanism, and operated under a publisher-neutral 
> framework such as the IDF, is the most elegantly-engineered and
> likely-to-succeed approach, with the fewest "moving parts," and the least 
> complicated business issues.  (The latter alone could thwart any solution 
> even if it did "work" in a technical sense.)  I think that the only
> challenges (not to underestimate them!) are:  1) for the IDF to make real, 
> concrete progress on actually building a database that delivers these
> benefits in a very near-term timeframe; 2) to keep everyone in the IDF
> community clearly focused on why these goals will benefit the entire
> community of users, libraries, A&I services, etc. as well as other
> publishers - so that parochial business interests don't undermine something
> that I truly believe would be a win/win for everyone.

Also needed are a well-defined (stable and robust) interface for mapping  
cited metadata to DOIs and a way to query what content a publisher has  
available (constantly changing with time). Both of these requirements mean  
implementing 1/2 of something like SLinkS. By going the extra step to  
returning publisher specific URL templates based on metadata, we would avoid  
having to maintain, forever, tables of DOIs. Your "fewer moving parts"  
isn't really a global viewpoint. For just the linking part it may be true -  
you wouldn't have to deal with an SLinkS server to decide how to create a  
link. But you do have many more moving parts with every publisher needing to  
maintain the mapping information of citation metadata to DOI, both for  
their own content and for the content of other publishers.

Best Regards,
Mark

Mark Doyle
Research and Development
The American Physical Society
doyle@aps.org


------------------------------------------------------
Ref-Links maillist  -  Ref-Links@doi.org
http://www.doi.org/mailman/listinfo/ref-links