[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ref-Links] Re: DOIs used for reference linking



This has been a very nice discussion - both interesting and useful. A
few assorted comments:

To say that DOIs are not likely to appear in citations, which I think is
what Mark Doyle said, and then to argue from that point that they would
be superfluous to reference linking seems a bit tautological. You can
make different assumptions and reach different conclusions. Recall, as
an example, that the initial object of the DOI was not reference linking
but intellectual property. So assume, for the sake of argument, that the
DOI exists for reasons outside of reference linking. Does it then make
sense to also use it for reference linking?

I agree with Eric Hellman that the tough parts of this problem are
organizational and not technical. This is true, in my experience, across
most of the digital library problem set. It was for that reason that I
was cheered by the recent interchanges between OCLC and IDF and the
resultant membership of OCLC in the IDF.

I also think it would be useful, to the degree possible, to both push
the scenarios a few years into the future and pop the discussion up a
level or two. Anyone creating, maintaining, and making available on the
net large numbers of digital entities is going to have to have some kind
of coherent information management scheme in place. Given a decent set
of tools for accomplishing that, then I would say that the difference
between updating a few algorithms and updating a few hundred thousand
identifiers is zero, or at least transparent to the maintainer. The more
interesting technical issue at that point, I believe, will be to find
the best way to make that information accessible in an ad hoc fashion to
whoever needs it. Herbert Van de Sompel's paper on SFX, an updated
version of which will appear in D-Lib this month, provides a useful
perspective on all of this. But again, the organizational issues are
paramount.

Finally, it may be a little misleading to always think in terms of large
publishers. We publish a single electronic journal and assign 4 or 5
identifiers a month. Given the vanishingly small barriers to entry for
electronic publishing, it seems to me that there will be a lot of this
in the future (although most people reading this list probably have a
better idea of this than I do.)

Larry 

Mark Doyle wrote:
> 
> Greetings,
> 
> On 6  Apr 99, dsidman@wiley.com wrote:
> 
> I clipped the subject line a bit -- hope that doesn't mess with the
> archive, but it was getting unwieldy. I'd also like it if someone could tell
> me which forum, discuss-doi or ref-links, is most appropriate -- I have
> seen replies that only go to one or the other list and the discussion is
> somewhat fractured as a result. I suspect discuss-doi has a broader, more
> active audience, but perhaps this isn't of sufficiently broad interest?
> Anyway, this message is cc'ed to both lists.
> 
> > I'm chiming in here as one who has long wished for a DOI-based solution due
> > to what I think are some inherent advantages. Let me assert some of these,
> > then explain why I think that a central lookup database solves the
> > reference linking problem in the simplest manner as well as the one most
> > likely to succeed in practical terms:
> 
> I am glad that you chimed in - I think you have done a wonderful job of
> summing up your viewpoint. I'll try and address your comments as best I can.
> 
> > The DOI obviates the need for URLs, formulas and algorithms (which are
> > inherently publisher-specific and subject to very dynamic change over time)
> 
> This is a bit of a straw man. A publisher can just as well commit to
> keeping the URL's, formulas, and algorithms stable and publishers can even
> agree to use a common format for them. Because of the way articles are
> cited, there is already convergence in this with the introduction of link
> managers by many STM publishers. Any interface to the DOI for querying by
> metadata (which will be the same metadata that is used in the publisher
> links) will have to have the same features as a publisher based system.
> 
> > because its very purpose is to provide one central level of indirection for
> > same. I.e., the very creation and maintenance of DOIs in the DOI directory
> > was designed to handle that problem once, in one place, under a neutral
> > (non-profit) organization which publishers would commit to supporting, in a
> > permanent manner, and based on an underlying technology which is as
> > scalable as the Internet itself has proved to be (possibly because it was
> > developed by the same people).
> 
> Right, but I think publishers need a second level of indirection to handle
> their DOI -> URL resolution anyway (see below).
> 
> > Because of this, the DOI is all that any person, system, or embedded
> > reference needs to know to create permanent, reliable, accurate links.
> > He/she/it will always be able to get from the DOI to the (current)
> > publisher of the content, even if the original publisher has been bought be
> > another, or has moved the content to a different server, or has otherwise
> > changed the way they assign URLs. This alone simplifies reference linking
> > tremendously, because it avoids the need for vast numbers of independent
> > systems to be updated continually, or for even vaster numbers of embedded
> > references (within other content itself) to have to be actively maintained.
> 
> > The only missing link so far in the scenario I've described is:  "How does
> > anyone FIND OUT the DOI for a given piece of content?" I.e., once they know
> > the DOI they're home free, but how do they discover it? This is where the
> > DOI metadata database comes in. If in the same step in which publishers
> > register their DOIs, they also register a minimal amount of metadata about
> > the objects, then a simple lookup database would exist which would allow
> > people and systems to look up the DOIs based on a simple query.
> 
> The "only missing link" is an understatement from my point of view. The
> resolution of metadata -> URL is the whole shebang. What I tried to
> highlight was that the metadata is central to the way in which articles are
> cited. I do not beleive that researchers will adopt DOIs (maybe I am wrong)
> for use in citations and any interface for linking is necessarily tied quite
> tightly to the metadata that is typically cited. Once you have an interface
> based on the metadata, the DOI becomes superfluous. Even if a journal moves
> among publishers, the metadata used to cite a previously published article
> does not change.
> 
> > One might argue that maintenance of a central database like this is
> > unlikely to be supported faithfully, but if doing so is part and parcel of
> > the registration/maintenance of the DOIs themselves, and can be executed in
> > the same operational process, quality-assured in the same pass, etc., then
> > I think it has an excellent chance of succeeding.
> 
> No, I don't agree. The registration process might impose some decent
> quality control, but the problems that are important come after registration
> - most important are maintaining URLs as publishers move them around
> (mitigated greatly if a publisher creates a link manager with indirection)
> and as journals move from publisher to publisher. We are talking about
> having to update 100,000's of records and maintain them faithfully, forever.
> 
> > Furthermore, publishers
> > will have this metadata available for registration anyway, because
> > internally they will need it to tag and manage their own DOIs, prior to
> > registering the DOIs in the DOI directory.
> 
> This is circular. If I don't use DOIs, I don't need to maintain the DOIs in
> my database. But I do, as a matter of my normal course of business,
> maintain the other metadata in my centralized database. Requiring publishers
> to tow around DOIs is one of the burdens that impacts the scalability. All
> of a sudden every publisher has to maintain DOIs internally and maintain
> them on a centralized server. This is a whole new business expense.
> Maintaining an SLinkS like database (updating for moving journals among
> publishers) is much easier and doesn't require any real investment in
> resources.
> 
> >  How else could they control
> > (internally) what their various DOIs stand for, who (internally) are the
> > various content "owners" responsible for keeping the corresponding URLs up
> > to date, etc.?
> 
> If I don't use DOIs, I don't have to control them :^). Our link manager,
> with its own level of indirection, is tightly coupled to our manuscript
> database giving us enormous flexibility in handling linking. Such
> flexibility can't be achieved in a centralized setting.
> 
> > In any case, all "link processor" type approaches still need maintenance in
> > a central place, but in this case it's maintenance of a set of programs,
> > rules and algorithms. As hard as it might be to maintain a central
> > database, I think it's much harder to maintain a central application, or
> > set of algorithms. This may seem simple at first on the assumption that
> > citation data can be converted into URLs, but in the real world I think
> > this will be much more fragile - especially over time, as all the
> > individual publishers' rules change dynamically. For example, this might
> > require that the central algorithm not only needs to keep up with the new
> > rules, but also to keep track of considerations like: "Well, from March
> > 2000 to July 2001, Publisher X used such-and-such a URL-construction
> > algorithm, but then changed it to such-and-such other algorithm in July
> > 2001 but only for its genetics journals, which in January 2002 it then sold
> > to Publisher Y, so all references after that date can only be pointed to
> > based on the new publisher's algorithm at that time..." To try and
> > construct reliable references based on this kind of consideration would be
> > extremely difficult.
> 
> This is a straw man too. If a publisher adopts a link manager with
> indirection and a consistent URL scheme, none of these issues will really
> crop up. Only moves to new publishers with new templates are problematic,
> but it is easy to update the SLinkS metadata for something like this. Just
> change the owner and the template. Linkers would periodically check the
> SLinKS and update accordingly.
> 
> To repeat, any publisher who forgoes indirection in their own URLs is
> doomed to a maintenance nightmare. Here is a counterexample: The APS, unlike
> many publishers, uses a varied set of platforms to deliver our online
> content. We use AIP's OJPS, we deliver some journals from here at Ridge, we
> have RMP at HighWire and we have PROLA (our 1985-1996 Phys. Rev. Online
> Archive) in house as well. Recently we moved PRD from Ridge to OJPS - no
> links had to be changed, only the link manager had to be updated (a one line
> change). Or, we may move some current content in OJPS or HighWire into
> PROLA in a sliding window. Again, no URLs change, it is a one entry in a
> table change in the link manager. As we move PROLA back in time (ultimately
> to 1893), the link manager will be updated and the URLs are all the same. We
> don't have to create 100,000s of DOIs and we don't have to update central
> servers. Another example: E-first publication. Articles are posted every day
> in PRD. We don't have to update a centralized server so that people can
> start linking. Even with DOIs, we would still need a way to let people know
> that there are now thousands of additional DOI's available for linking which
> is equivalent to updating SLinkS type metadata.
> 
> I can't really emphasize enough how easy it is to link once a link manager
> is in place. Library and A&I services such as CERN, SLAC's SPIRES, U. of
> California's Melvyl, ISI have all been able to trivially create links to our
> articles without having to maintain any additional metadata or do lookups.
> 
> > To me, then, a database approach, coupled with the DOI as a (permanent)
> > identifier-plus-routing-mechanism, and operated under a publisher-neutral
> > framework such as the IDF, is the most elegantly-engineered and
> > likely-to-succeed approach, with the fewest "moving parts," and the least
> > complicated business issues.  (The latter alone could thwart any solution
> > even if it did "work" in a technical sense.)  I think that the only
> > challenges (not to underestimate them!) are:  1) for the IDF to make real,
> > concrete progress on actually building a database that delivers these
> > benefits in a very near-term timeframe; 2) to keep everyone in the IDF
> > community clearly focused on why these goals will benefit the entire
> > community of users, libraries, A&I services, etc. as well as other
> > publishers - so that parochial business interests don't undermine something
> > that I truly believe would be a win/win for everyone.
> 
> Also needed are a well-defined (stable and robust) interface for mapping
> cited metadata to DOIs and a way to query what content a publisher has
> available (constantly changing with time). Both of these requirements mean
> implementing 1/2 of something like SLinkS. By going the extra step to
> returning publisher specific URL templates based on metadata, we would avoid
> having to maintain, forever, tables of DOIs. Your "fewer moving parts"
> isn't really a global viewpoint. For just the linking part it may be true -
> you wouldn't have to deal with an SLinkS server to decide how to create a
> link. But you do have many more moving parts with every publisher needing to
> maintain the mapping information of citation metadata to DOI, both for
> their own content and for the content of other publishers.
> 
> Best Regards,
> Mark
> 
> Mark Doyle
> Research and Development
> The American Physical Society
> doyle@aps.org
> 
> ------------------------------------------------------
> Ref-Links maillist  -  Ref-Links@doi.org
> http://www.doi.org/mailman/listinfo/ref-links

------------------------------------------------------
Ref-Links maillist  -  Ref-Links@doi.org
http://www.doi.org/mailman/listinfo/ref-links