[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Metadata] Draft DOI Metadata Principles



Thanks for this Norman. I have a number of comments (all in my personal
capacity, and not to be taken as a statement of Wiley's position):

1) In general, I have to say that I think that these proposals go way over
the top and are more likely to frighten off prospective DOI registrants
rather than facilitate take-up. We seem to have gone from an agreement that
it would be useful to be able to associate information about an object with
that object's identifier to a quite prescriptive requirement about form and
content of such metadata. I assume that it should still be sufficient for a
DOI registrant to do nothing more than deposit a DOI  with a URL that will
take the user to whatever response screen the registrant decides is
appropriate. *Requiring*  a load of metadata (even if this is a soi-disant
core, key or kernel set) to be lodged at the same time seems to me to be
presumptuous. Imagine if the ISBN system had required a full set of
cataloguing data to be lodged at the same time as the number was assigned.
(Yes, I know that the DOI goes much further than the ISBN, but one of the
reasons for the ISBN's success was its simplicity, and I believe we are
heading for premature complexity with this proposal.) At a time when we're
already struggling to get widespread take-up within the publishing
community (let alone all the other content-provider communities), this
would appear to me to be raising the entry barrier considerably.

2) I don't quite get the "metadata is collected once" mantra (INDECS's
"principle of Original Authority"). The metadata is provided by the
registrant at the same time the DOI is deposited. (I have used "deposit"
here rather than "registration" or "allocation", in line with the latest
DOI syntax recommendations.) Authority data is also deposited, giving
information about the provenance of the metadata. The metadata can be
updated (see section 5.3.1: "later versions overwrite earlier versions"),
so how does this tie in with "do it once, do it right"? Couldn't there also
be services that add value to the registrant's metadata by expanding it,
refining it, or improving it (just like the secondary publishers do today)?
If these services do a better job than the registrants, many users will
actually get to the objects of their desire via this route rather than
relying on the registrants' metadata which across the board is frankly
likely to be of varying quality. This proposal to me seems to put too big a
burden on the registrant's metadata. (It may seem to be intuitive that the
registrant is in the best position to attach accurate and well-formed
metadata but I have my doubts.)

3) I think the issue of what rights pertain to the metadata itself is an
interesting one. Some metadata may be "creative", e.g. the Description, but
other metadata surely can't be, e.g. Extent. In general, I'm not keen on
the idea that rights holders own not only the intellectual content but also
any descriptions of that content.

4) If each DOI is to be "accompanied by a declaration of descriptive
metadata" (section 5.1.1), won't this militate against granularity? It
certainly ups the ante.

5) I think there's potential confusion of the terms "key" and "kernel".
Section 5.3.1 is entitled "Key (mandatory) elements", and the text goes on
to say "Of these fourteen [key elements], seven form the kernel (mandatory)
metadata set". In the fuller descriptions of the set (section 5.2), the
seven kernel elements are described as "key". So, the initial paragraph
implies that there are 14 key and 7 kernel elements, but from this point
onwards, "key" is used to mean "kernel".

I'd also point out that the Event element (which is the 7th kernel element)
itself contains three mandatory subelements

6) The whole scheme depends heavily upon the DOIGenre element. This is a
mandatory element which I think will be very powerful for giving some real
direction to which particular elements should be identified for which
particular genres. Until we have a set of DOIGenres established, we can't
really get going (since, apart from any guidance the genres will give us,
we won't even be able to indicate the DOIGenre until we have a list of
possibilities to choose from).

I realise that this is indicated in the proposal as work to be done, and
I'd like to know far more about the likely timetable and process for the
establishment of DOIGenres. (Remember that Dublin Core came in for a lot of
criticism for not having clearly established due process and ownership
issues for the development of DC Simple and, even more so, DC Qualified.)
How often might the DOIGenre attributes be updated? By what process? With
what sort of consensus or other authority? How will the registrant keep on
top of an evolving set of recommendations?

7) I am a great supporter of the work that INDECS is doing to come up with
an overarching data model that can be used to describe anything anyone
might want to say about anything, but I don't believe that this grand
concept is the most appropriate genitor of the DOI metadata set, which
would seem to me to benefit from a lower-level conceptual approach that can
feed into INDECS's Weltanschauung. For example, we might have a DOIGenre
for STM journals which specifies that an article's author(s) is a key piece
of metadata (although not mandatory since not all articles are by-lined).
Let's also say that this can be identified by using Dublin Core Simple's
(version 1.0) DC.Creator element tag. This fits in to the overarching data
model of Events and Agents in the sense that DC.Creator = J. Smith can
cross-map to INDECS's Event.Creation, Agent = J. Smith (please ignore any
syntax issues regarding use of punctuation here), but in my view a set of
metadata that is explicitly written in a simpler form than INDECS, but
which is compliant with the INDECS data model, stands more chance of
acceptance.

If INDECS is the equivalent of Constitutional Law, the DOI metadata set
should be more like a bylaw (or, rather, each DOIGenre is a bylaw); a bylaw
has to fit in with the Constitution, but it isn't written in the same
high-level style.

8) As far as the DOI/INDECS elements themselves go, I'm not convinced that
Context is a key element: it looks more like a nice-to-have than a
have-to-have to me.

The whole set reminds me a bit of those homunculi you see in medical
textbooks where the body is redrawn to show what it would look like if its
proportions were based on sensory areas of the brain, say. That is, it has
a sort of distorted feel about it, with all this meaty stuff down at the
bottom lumped into the Event element. That may be OK from a high-level
conceptual point of view, but again, I think this is the wrong conceptual
entry point for the actual form of the DOI metadata.

9) Under "Identifier", the "declarer's own internal reference" may be used
and "the namespace of the identifier must be declared". How is the
internal-reference namespace declared?

10) Section 5.2.4 Title: a key element, but surely it's possible for a
digital object to have an identifier but no title?  What if I wanted to
give a DOI to an unnumbered and uncaptioned figure in a book? (You may ask
how anyone would be able to find such a figure, but I may want to imbed a
DOI in button next to the figure in an electronic version of the book,
whereby clicking the DOI would take the user off to a response screen which
could give more information about it.)

11) Section 5.2.7 Origination: I can see the Origination/Creation Link
elements leading us into the same sort of discussions that we get with DC
Source/Relation.

Also, are all the references in the attribute definitions to "of its own
type" really correct? A Physical Manifestation is specifically a different
Type from a Digital Manifestation; if you OCR'd a book, you wouldn't have a
Replica because of the lack of 100% accuracy, but wouldn't' you have a
digital Version of a physical manifestation (i.e. you will have transformed
the Type)? Am I confusing "type" with "Type"? Am I confusing "Origination"
with "Creation Link"?

12) Section 5.2.9 Form: OK to have "genre" here? Could it get confused with
DOIGenre? I know the examples aren't meant to be fully worked out, but I
find it a bit confusing that DOIGenre is sometimes used in the same way as
DC.Type (e.g. a DOIGenre  value of "Textual Work") and sometimes not (e.g.
a DOIGenre value of "Book"). Presumably, it's conceivable that a specific
type of work such as "Novel" could be a DOIGenre in its own right rather
than being a value of a DOI/INDECS Form attribute?

13) Section 5.2.12 Subject: Why isn't this a Descriptor attribute instead
of an element in its own right?

The table of qualifiers looks like creating a real hostage to fortune to
turn poor old registrants into fully-fledged cataloguers! I can just hear
the debates about when something is a Topic/Concept subject qualifier and
when it is a Creation subject qualifier: I thought we were trying to keep
this simple? (Yes, yes, I know it's all supposed to be no more granular
than is judged to be necessary using the principle of functional
granularity, but this document sets off so many hares running that I think
it needs reigning in.)

14) Section 5.2.13 Event: the fourth paragraph says that "only the Agent
and Role elements are mandatory", but Table 1 indicates that Event Type is
mandatory too.

For a document that generally tries to be non-prescriptive in terms of data
*content*, I was surprised to see such a forceful declaration that "all
primary creators must be declared, even if there are 40 of them". In the
context in which this statement was made (journal articles), this just
ain't going to happen. Various citation styles have various ways of dealing
with this (some abbreviate to first three authors then et al., some to
first six authors then et al., and some first three authors plus last
author) - whatever, the scientific community recognises that you don't have
to rigorously list every single author of a multi-author paper. (Does
anybody out there know what the record number of authors for an article is?
I'm sure that I've heard apocryphally that is over 100.) The rights
argument is misplaced in this context since it is highly unlikely that
*any*, let alone *all*, of these authors will have retained rights (and
indeed this example shows what the problem would be if they did). I believe
anyway that this would be an issue for the DOIGenre Working Group that
looks at (STM) journals to advise on. And no, it's not the same as saying
that only Lennon wrote the Beatles songs; it's more like the equivalent of
a group composition where the individuals are not specifically identified.
(I know I've got examples in my record collection, but none spring to mind
- maybe if the London Philharmonic were to write a jam together!) Then,
something like Smith et al. may be recorded as the Creator of an article,
in the same way as other group creations such as The Wilms' Tumor Study
Group.

In the paragraph before the table of Event attributes, reference is made to
*Making", Dissemination and Use: to comply with the table itself, shouldn't
this read *Creation* etc.?

15) Section 5.2.14 Creation Link: I'd like to see an example of a Reference
Creation Link. I can see that, in the metadata attached to an article, you
could put in its relationship to the journal issue, volume, etc. (i.e.
using the Component Creation Link), and I can see how you'd use the Version
Creation Link, but what would you put in to the metadata *of the article
itself* regarding Reference Creation Links? Links to other resources that
referenced this article, or links to articles that are referenced *in* the
article? Or both? And if both, does this mean our DOI metadata sets come
with a whole load of links hither and thither?

16) Section 5.3.3 : Mapping DOI/INDECS to Dublin Core. I know the
introductory paragraph states that the mapping doesn't take into account DC
Qualified, but it might be worth saying that Extent is likely to be covered
in DCQ Format.

17) Section 5.4: I don't see that the statement that there are "no
constraints on individual sector requirements" sits well with the general
tenor of control. Won't the degree of prescription be essentially a
DOIGenre issue?

18) Appendix: Examples 4 and 5 - Example 4 uses a DOIGenre of "Journal"
whereas Example 5 uses "Textual Work", and the "Journal Article" as the
Form.Textual Genre. Again, I know these examples were not meant to be fully
worked out, but I think the likely level of DOIGenre is so important that
it would be better to work with similar levels right from the start.

Any reason why Example 5 uses "Affiliated Agent" in the unextended version
and "Agent Affiliate" in the extended version? (This also strikes me as
particularly offputting anyway in terms of DOI metadata: before you know
where you are, you'll end up writing XML DTDs to express the relationships
between authors (sorry, Creation Agents) and affiliations, and anyone who's
been involved in SGML for journal headers knows the nightmares.)

19) In conclusion, sorry if my reactions are more negative than positive,
but we are danger of derailing the DOI initiative if we go in with an
over-complicated minimal metadata set.

Cliff



------------------------------------------------------
Metadata maillist  -  Metadata@doi.org
http://www.doi.org/mailman/listinfo/metadata