[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Metadata] Re: [Discuss-DOI] Reference Linking: A Note on Syntax




>Appendix 2 of the paper "DOIs used for reference linking"  introduces a
>syntactic convention which purports to simplify the use of DOIs. Leaving
aside
>the thornier issue of whether we really need to assign DOIs to Creations (or
>Works) - virtual things (ghosts?) which have no tangible existence and hence
>cannot be experienced/consumed/enjoyed or in any sense known, I would like to
>suggest that the proposed syntax is imperfectly conceived.
>
>What we are trying to implement with the DOI is a 21st century identifier
(and
>beyond?). Instead we seem to be harking back to the baroque majesty of the
SICI
>code. Why W/P/D/R as a type code? Why the presumption of English in a URN?
Why
>capitalized? (I know that the Handle system may be indifferent to case but
the
>DOI is surely not limited to the Handle technology. And  anyway capitals
>themselves are an older technology superseded by the lowercase, cursive
style -
>note also that the default case on most keyboards is lowercase.) Why the
>addition of a pair of parentheses? One (or none) is sufficient. We already
have
>the slash as a delimiter between prefix and suffix.
>
>I would further suggest that if we need to inspect the identifier it will
be at
>the machine level. No user is going to gaze on at the DOI string to elicit
>semantic evidences. If we really do need to incorporate this inline
intelligence
>then a single digit will suffice. (We have anyway always loosely talked about
>the DOI as a "number".) A digit would also be kinder on I18N. And a digit
lends
>itself more readily to extension as more categories may be conceptualized
later.
>(Of course, maintaining this intelligence in the associated metadata is
the more
>obvious route. Two years ago we didn't know about Works and
Manifestations. Two
>years hence, what other base types will we have discovered? Metadata can
always
>be augmented, the persistent identifier - the DOI - never.)
>
>For background it may be useful to consider Academic Press experience over 2
>years with the DOI which has been to decisively reject any intelligence in
the
>DOI suffix and to focus instead on metadata. In particular, the SICI
string was
>dicovered to be a non-viable identifier. For resolution discovery purposes
it is
>flawed both semantically and syntactically.
>
>Semantically, the identifier carries it's associated metadata inline and each
>and every piece of metadata must be known. A user cannot generate the string
>from standard bibliographic citations. To accommodate this shortcoming AP
>initially opted for minimal SICI codes where we retained only that minimal
set
>of metadata that could be derived from a citation. The next (and final)
step was
>to externalize the metadata. This allows us to make a citation match using
only
>a subset of the associated metadata.
>
>Syntactically, the SICI is a disaster. While version 2 was standardized in
1996,
>it really belongs to an older time. It is based on ASCII. It is written in
>English. It is true that it can be transported via SMTP, but it requires hex
>encoding if used as a URI in HTTP, and it requires entifying if packaged
within
>SGML/XML instances. The SICI is over-specialized.
>
>This has led AP to adopt their own production identifier as a viable suffx
>string, ie
>
>     10.1006/jmbi.1999.1234
>
>This is a robust identifier, primitive enough that it can survive in a wide
>range of environments without encoding. What it refers to will be evident
from
>its usage context. We have accepted that resolution discovery must be
metadata
>driven. The only intelligence in the "number" is that it is a URN  (or
will be
>when registered as a NID) and that should be sufficient.
>

The form of Tony's suggestion seems fine to me: it's syntactic function
that counts,
not the specific character, and I accept the 21st century logic. 

I would disagree on one point: I think people will be looking intelligently
at DOIs on
the screen. I think I recall Ed Pentz, in one argument for the current
syntax, saying
just the opposite - that he couldn't imagine a situation where people would
not be 
looking at a screen and therefore be able to decide in context what it is.
I expect 
the eventual truth lies between the two. 

Godfrey




>
>
>
>
>------------------------------------------------------
>Discuss-DOI maillist  -  Discuss-DOI@doi.org
>http://www.doi.org/mailman/listinfo/discuss-doi
>
>
Godfrey Rust
..........................................
Data Definitions
14 Gloucester Road
London W5 4JB
T (44) 181 567 1047
F (44) 181 579 0938
Mobile  07775 908398



------------------------------------------------------
Metadata maillist  -  Metadata@doi.org
http://www.doi.org/mailman/listinfo/metadata