[yaco-idsubmit-tool] Testing Notes / Henrik / 17 March

Henrik Levkowetz <henrik@levkowetz.com> Sat, 19 March 2011 22:06 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: yaco-idsubmit-tool@core3.amsl.com
Delivered-To: yaco-idsubmit-tool@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 09D1D3A69B8 for <yaco-idsubmit-tool@core3.amsl.com>; Sat, 19 Mar 2011 15:06:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.669
X-Spam-Level:
X-Spam-Status: No, score=-102.669 tagged_above=-999 required=5 tests=[AWL=0.130, BAYES_00=-2.599, GB_I_LETTER=-2, J_CHICKENPOX_24=0.6, J_CHICKENPOX_25=0.6, J_CHICKENPOX_57=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qdI12L-pJiaK for <yaco-idsubmit-tool@core3.amsl.com>; Sat, 19 Mar 2011 15:06:47 -0700 (PDT)
Received: from merlot.tools.ietf.org (merlot.tools.ietf.org [IPv6:2a01:3f0:0:31:214:22ff:fe21:bb]) by core3.amsl.com (Postfix) with ESMTP id 60F9D3A69B3 for <yaco-idsubmit-tool@ietf.org>; Sat, 19 Mar 2011 15:06:46 -0700 (PDT)
Received: from 90-230-136-60-no45.tbcn.telia.com ([90.230.136.60]:62836 helo=vigonier.lan) by merlot.tools.ietf.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.74) (envelope-from <henrik@levkowetz.com>) id 1Q14JS-0006AI-JC; Sat, 19 Mar 2011 23:08:04 +0100
Message-ID: <4D852938.80708@levkowetz.com>
Date: Sat, 19 Mar 2011 23:07:52 +0100
From: Henrik Levkowetz <henrik@levkowetz.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9
MIME-Version: 1.0
To: yaco-idsubmit-tool@ietf.org
Content-Type: multipart/mixed; boundary="------------080809090707050005040004"
X-SA-Exim-Connect-IP: 90.230.136.60
X-SA-Exim-Rcpt-To: yaco-idsubmit-tool@ietf.org, esanchez@yaco.es, henrik-sent@levkowetz.com
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 22 Mar 2010 06:51:10 +0000)
X-SA-Exim-Scanned: Yes (on merlot.tools.ietf.org)
Subject: [yaco-idsubmit-tool] Testing Notes / Henrik / 17 March
X-BeenThere: yaco-idsubmit-tool@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discussion of the Yaco / I-D Submission Tool Project <yaco-idsubmit-tool.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/yaco-idsubmit-tool>, <mailto:yaco-idsubmit-tool-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/yaco-idsubmit-tool>
List-Post: <mailto:yaco-idsubmit-tool@ietf.org>
List-Help: <mailto:yaco-idsubmit-tool-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/yaco-idsubmit-tool>, <mailto:yaco-idsubmit-tool-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Mar 2011 22:06:49 -0000

Hi,

Here are some additional testing notes.  I've already covered some of these
with Emilio (the developer) on jabber; but also send them to the list for
info and for the record.  This testing covers author extraction from 191
drafts, including all of the drafts which wasn't accepted for automatic
posting by the current submission tool during the period leading up to the
Prague meeting posting cut-off.

Already covered:

   * The author extraction code supplied to Yaco handles obfuscated
     email addresses such as "joe (at) example.com", but does not handle
     the case "joe at example.com" which it probably should.

   * The upload form should also catch possible exceptions from the
     author extraction code, to let people know there's a problem in
     a better format than a 'Server 500' error.  Pointed out by Emilio.

   * All -00 submissions were treated the same way, requiring WG Chair
     permissions to post -- but this should not apply to individual
     submissions, only to new WG documents.

New:

   (For the issues below which affect the ietf/utils/draft.py module,
   a patch file is enclosed.)

   * Extraction of Title which don't have the draft name on a separate
     page fails.  See for instance this example:
     http://www.ietf.org/staging/draft-ma-cdni-publisher-use-cases-00.txt
     The regex should maybe be updated to permit but not require a newline
     before the draft filename:
     '(?:\n\s*\n\s*)((.+\n){1,2}(.+\n?))(\s+<?draft-\S+\s*\n)\s*\n'
     Fixed in patch.

   * If there are blank lines before the start of the author list on the
     first page, the author extraction will fail.  This sometimes happens
     when there's junk at the start of a draft, see for instance
     http://www.ietf.org/id/draft-ietf-mpls-tp-process-00.txt .  Fixed in
     patch.

   * Sometimes the Authors' Addresses section lists authors with the same
     workplace address on the same line: "Sam Spade and Joe Smith".  This
     needs a fix in the author extraction code.  Provided in the patch.

   * Sometimes the order of first name, surname is different on the first
     page and in the author list, and sometimes the surname is uppercase
     in one place, but not in the other.  This also needs a fix in the
     author extraction code.  Provided in the patch.

   * The header stripping code had a bug, where multiple blank lines could
     be replaced by a single blank line in the stripped text, which could
     mess up title extraction.  Fixed in the patch.

   * Title space normalization should be done also for titles from the
     'unusual title format' code branch of the title extraction code.
     Fix provided in the patch.

   * Company names on the first page are sometimes rendered with different
     case than in the Authors' Addresses section.  Fixed in the patch.

   * Some drafts list the draft filename _before_ the title, rather than
     after the title.  Permit this too. Covered in the patch.

   * Spanish names can be shown as either
	<given_name> <fathers_first_surname> <mothers_first_surname>
     or less formally as
	<given_name> <fathers_first_surname>
     If the first form is used in the Authors' Addresses section, but the
     second form (with the given name possibly abbreviated to its first
     letter) the author extraction will fail.  Fix provided in patch.

   * Drafts containing tabs will be caught by idnits during I-D submission,
     but in case the drafts.py module is used independently from idnits,
     convert tabs to spaces in order for the author extraction and other
     methods to work as expected.  Example: recently submitted draft
     draft-bergeron-payload-rtpfec-rs-00.txt.  Fix provided in patch.

   * Found a draft with a previously unhandled header/footer format:
     draft-fang-mpls-tp-oam-toolset-01.txt.  Tweak needed for header/footer
      stripping.  Fix provided in patch.

   

The patch also includes code to extract lists of references used in the
document.  This is not expected to be of use for the submission tool, but
later in other parts of the datatracker.


Best,

	Henrik