[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss-DOI] Re[2]: [Discuss-DOI] DOI Syntax Draft Standard



I think we're having trouble with the concept of encoding.

The doi is a dumb string of characters, as far as I understand it. So if
"%20" appears in a doi, it is the 3 characters %,2,0, not an encoding for
space. Just as "dog" in a doi is the 3 characters d,o,g, and not Man's Best
Friend.

It makes no sense to have "space" on the required encoding list, because it
may not appear AT ALL in the doi string.  "%20" may not appear in the doi
part of an encoded URL, because a standards compliant browser will
interpret it as an encoded character and will translate it to a space.

If I'm wrong, and doi allows encoded characters, then an escape character
needs to be defined.  The same issue is present if UTF-8 or UTF-7 encoding
is to be allowed.

Also, can we have clarification in the spec that doi is case sensitive?

Eric



>Eric,
>
>Thanks for the questions - they are all very good and need
>to be answered.
>
>1) The Standard does need more detail on character sets.
>The Internet Draft ("Handle System: A Persistent Global
>Naming Service Overview and Syntax"
>(http://hdl.handle.net/4263537/4006) of the Handle
>specification says:
>
>"[a handle] may consist of any UTF-8 encoded characters
>defined in the Unicode 2.0. It does not impose any reserved
>or excluded characters."
>
>The Syntax Committee decided to limit this and used the SICI
>standard as a guideline, however, that standard does go into
>more detail and has a table of allowed characters that is a
>"subset of 7-bit ASCII".  I don't think the DOI syntax needs
>to be as restrictive, but we need to clarify tab, new line
>and space.
>
>2)The Committee understood that the proxy server that resolves
>DOIs reads a space (generic white space, I assume) as the end
>of the DOI string.  If there is a space in a DOI string it
>should be %20 since it is on the mandatory encoding list.
>
>Regards,
>
>Ed
>______________________________ Reply Separator
>_________________________________
>Subject: Re: [Discuss-DOI] DOI Syntax Draft Standard
>Author:  Eric Hellman <eric@hellman.net> at ~internet
>Date:    1/20/99 12:21 AM
>
>
>1. Can the standard be a bit more explicit on allowed characters in DOI
>Suffix Strings? Usually an alphanumeric character is [a-zA-Z0-9]. However,
>the examples show that ":,';<>%&  etc are allowed characters. Are tab,
>return, new line, etc allowed? They are 7-bit ASCII characters, after all.
>I'm guessing that  "printable ASCII character" ($20-$7E) is what is meant.
>
>><DSS> DOI Suffix String (required, any alphanumeric character except
>>space, no limits)
>>Preceded by the end "/" of the DOI Prefix, this is the actual identifier
>>string assigned by the
>>Registrant.
>
>2. Does the standard mean to say that a DOI must end with a space (ASCII
>$20)? Is the space part of the <128 character DOI? Or does SP in this
>context mean generic white space?
>
>>Excluded character - Space (SP). Any 7-bit ASCII character can be used
>>except "space" as
>>this denotes the end of a DOI string.
>
>3. If a character such as tab is encountered in a DOI string, does that
>denote the end of a DOI string, too?
>
>4. In the discussion on encoding (Table I), it implies that SP is allowed
>in DOI's, but has to be encoded on input. This would make sense if SP is a
>required termination character, but otherwise SP should not be allowed,
>even when encoded.
>For example this:   10.1000/456%2520789 would be decoded as
>10.1000/456%20789 , which is legal. But if you decode it again you get
>10.1000/456 789 which should NOT be legal.
>
>Eric Hellman
>
>
>At 4:01 PM -0500 1/19/99, Ed Pentz wrote:
>>As Chair of the NISO DOI Syntax Committee I am sending the
>>Draft Standard (attached as PDF) of the DOI Syntax to the
>>Discuss-DOI list for comments and feedback.
>>
>>This document is in the standard format for NISO standards
>>and is not entirely complete.  However, the main decisions
>>about the structure of the DOI syntax have been made, and
>>the Committee is interested in getting feedback from the
>>wider DOI community.
>>
>>Please be aware that the document is a draft and subject to
>>further revision.  Also, the Standard must be voted upon by
>>NISO members according to standard voting procedures before
>>being approved as an official NISO standard.
>>
>>Regards,
>>
>>Ed Pentz
>>Academic Press
>>Content-Type: application/pdf; name="syntax.pdf"
>>Content-Description: Adobe Acrobat PDF
>>Content-Disposition: attachment; filename="syntax.pdf"
>Eric Hellman
>Openly Informatics, Inc.
>http://www.openly.com/           Tools for 21st Century Scholarly Publishing
>
>------------------------------------------------------
>Discuss-DOI maillist  -  Discuss-DOI@doi.org
>http://www.doi.org/mailman/listinfo/discuss-doi
>Content-Type: text/plain; charset=US-ASCII; name="RFC822 message headers"
>Content-Transfer-Encoding: 7bit
>Content-Description: cc:Mail note part
>Content-Disposition: inline; filename="RFC822 message headers"
>
>Received: from frank.harcourtbrace.com (frank.harcourtbrace.com
>[167.208.101.32]) by smtpgate.harcourtbrace.com with SMTP
>  (IMA Internet Exchange 3.11) id 00190F13; Wed, 20 Jan 1999 00:27:47 -0500
>Received: from cnri.reston.va.us (ns.CNRI.Reston.VA.US [132.151.1.1])
>	by frank.harcourtbrace.com (8.9.1/8.9.1) with ESMTP id AAA15393;
>	Wed, 20 Jan 1999 00:27:53 -0500 (EST)
>Received: from www1.cnri.reston.va.us (www1 [132.151.1.143])
>	by cnri.reston.va.us (8.9.1a/8.9.1) with SMTP id AAA16294;
>	Wed, 20 Jan 1999 00:21:57 -0500 (EST)
>Received: by www1.cnri.reston.va.us (SMI-8.6/SMI-SVR4)
>	id AAA09537; Wed, 20 Jan 1999 00:20:27 -0500
>Received: from cnri.reston.va.us by www1.cnri.reston.va.us (SMI-8.6/SMI-SVR4)
>	id AAA09521; Wed, 20 Jan 1999 00:20:25 -0500
>Received: from mail.rdc1.nj.home.com (imail@ha1.rdc1.nj.home.com
>[24.3.128.66])
>	by cnri.reston.va.us (8.9.1a/8.9.1) with ESMTP id AAA16276
>	for <Discuss-DOI@doi.org>; Wed, 20 Jan 1999 00:21:17 -0500 (EST)
>Received: from [24.3.147.63] by mail.rdc1.nj.home.com
>          (InterMail v4.00.03 201-229-104) with ESMTP
>          id <19990120052008.MXNN11472.mail.rdc1.nj.home.com@[24.3.147.63]>;
>          Tue, 19 Jan 1999 21:20:08 -0800
>Mime-Version: 1.0
>Content-Type: text/plain; charset="us-ascii"
>X-Sender: mailbox@hellman.net
>Message-Id: <v04020a15b2cb06c42aae@[192.168.1.1]>
>In-Reply-To: <0018E9A8.C21470@harcourtbrace.com>
>Date: Wed, 20 Jan 1999 00:21:35 -0500
>To: epentz@harcourtbrace.com (Ed Pentz), Discuss-DOI@doi.org
>From: Eric Hellman <eric@hellman.net>
>Subject: Re: [Discuss-DOI] DOI Syntax Draft Standard
>Errors-To: discuss-doi-admin@doi.org
>X-BeenThere: discuss-doi@doi.org

Eric Hellman
Openly Informatics, Inc.
http://www.openly.com/           Tools for 21st Century Scholarly Publishing

------------------------------------------------------
Discuss-DOI maillist  -  Discuss-DOI@doi.org
http://www.doi.org/mailman/listinfo/discuss-doi