Re: [Tools-discuss] RFCmarkup v1.28

Henrik Levkowetz <henrik@levkowetz.com> Thu, 27 July 2006 13:08 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1G65au-0003lL-Pj; Thu, 27 Jul 2006 09:08:04 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1G65at-0003lG-1a for tools-discuss@ietf.org; Thu, 27 Jul 2006 09:08:03 -0400
Received: from av10-2-sn2.hy.skanova.net ([81.228.8.182]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G65ar-0005hR-HU for tools-discuss@ietf.org; Thu, 27 Jul 2006 09:08:03 -0400
Received: by av10-2-sn2.hy.skanova.net (Postfix, from userid 502) id CCB813811D; Thu, 27 Jul 2006 15:08:00 +0200 (CEST)
Received: from smtp4-1-sn2.hy.skanova.net (smtp4-1-sn2.hy.skanova.net [81.228.8.92]) by av10-2-sn2.hy.skanova.net (Postfix) with ESMTP id BC12F38000; Thu, 27 Jul 2006 15:08:00 +0200 (CEST)
Received: from shiraz.levkowetz.com (81-232-110-214-no16.tbcn.telia.com [81.232.110.214]) by smtp4-1-sn2.hy.skanova.net (Postfix) with ESMTP id A8B1137E42; Thu, 27 Jul 2006 15:08:00 +0200 (CEST)
Received: from localhost ([127.0.0.1]) by shiraz.levkowetz.com with esmtp (Exim 4.62) (envelope-from <henrik@levkowetz.com>) id 1G65ac-0007fV-CY; Thu, 27 Jul 2006 15:08:00 +0200
Message-ID: <44C8BAA2.8000404@levkowetz.com>
Date: Thu, 27 Jul 2006 15:07:46 +0200
From: Henrik Levkowetz <henrik@levkowetz.com>
User-Agent: Thunderbird 1.5.0.4 (Macintosh/20060530)
MIME-Version: 1.0
To: Elwyn Davies <elwynd@dial.pipex.com>
Subject: Re: [Tools-discuss] RFCmarkup v1.28
References: <44C78E71.9050003@levkowetz.com> <44C7B93E.7020105@dial.pipex.com> <44C7C471.9020908@levkowetz.com> <44C7D035.9000209@dial.pipex.com> <44C7F662.3050803@levkowetz.com> <44C88FCF.2070801@dial.pipex.com> <44C8991B.6040509@levkowetz.com> <44C8AA9B.6060109@dial.pipex.com>
In-Reply-To: <44C8AA9B.6060109@dial.pipex.com>
X-Enigmail-Version: 0.94.0.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-SA-Exim-Connect-IP: 127.0.0.1
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Scanned: No (on shiraz.levkowetz.com); SAEximRunCond expanded to false
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 6d95a152022472c7d6cdf886a0424dc6
Cc: Tools Team Discussion <tools-discuss@ietf.org>
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/tools-discuss>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
Errors-To: tools-discuss-bounces@ietf.org

Hi Elwyn,

on 2006-07-27 13:59 Elwyn Davies said the following:
> Hi Henrik.
> 
> A couple of thoughts:
> 
> Without doing major parsing you could use re capabilities to separate 
> off the header part and only apply certain rules to the parts.
> Thus:
> Use findall to find all the blank lines.
> Identify the start of the title as the first group of blank lines inside 
> the document (i.e., ignoring any blank lines at the beginning) - use 
> group and start to get positions.
> Chop the data up and apply re's as required. then resplice.

Yes, that could be a possibility.

> Some more below...
> 
> /Elwyn
>  
> Henrik Levkowetz wrote:
>> Hi Elwyn,
>>
>> Thanks for more feedback;
>>
>> on 2006-07-27 12:05 Elwyn Davies said the following:
>>   
>>> IE now looks fine - printing on both Firefox and IE looks good. BTW I 
>>> realized that the 75% on IE is not how it scales the printing but is a 
>>> way of zooming the on screen display of the preview. Doh!
>>>
>>> The product of a paranoid's breakfast:
>>>
>>> 1. http://www1.tools.ietf.org/html/rfc3410 : The second item (2.2) that 
>>> claims to be on page 4 in the ToC doesn't get a link (something to do 
>>> with longish title and only one period in the leader?)
>>>     
>>
>> Right. Won't fix.
>>   
> One possibility if you did want to try would be to identify the leader 
> end/number/end of line pattern from the early part of the ToC and then 
> apply it throughout ToC.

Right.  As soon as you move from a straight overall re approach to something
more stateful, such as identifying and splitting off doc header, title, toc
etc, you can do better handling.

I'll consider this for a later version, but it should be a major re-write,
to change the approach, rather than trying to tweak this in here and there,
I think.

> Not a big deal.
>>   
>>> 2. 
>>> http://www1.tools.ietf.org/html/draft-aoun-middlebox-token-authentication-00: 
>>> The section headers are now <h2> but the title is still body text. This 
>>> one has 'Expires on'
>>>     
>>
>> Mmm.  Right.  Won't fix now (not trivial), but maybe later.
>>   
> See above.
>>   
>>> 3. http://www1.tools.ietf.org/html/draft-ietf-ipngwg-icmp-v3-07: (no 
>>> 'Expires:' at all) - how about not looking for the title etc until after 
>>> the second group of totally blank lines (or the first group that isn't 
>>> at the start of the document)?
>>>     
>>
>> Can you put that in a regexp ;-) ?
>>
>> The boldfaced 11 July is taken to be a section - I tried to require the
>> section numbers to contain a period, but had to revert that as too many
>> document have major section numbers without a period.
>>
>> Currently the title is the first group of lines which are preceded by
>> a line which begins with "Category:" or ends in a year, then a blank
>> line.  I'll look at changing that to the pattern you suggest, for the
>> next version.
>>   
> See above.

Agreed.  This might be an easier fix than the rest.

>>> 4. http://www1.tools.ietf.org/html/draft-aoun-mgcp-nat-package-02: This 
>>> is a very badly formatted draft.. you fixed the link in the ToC problem 
>>> but it has the same problem as #2 above and thereafter the markup of 
>>> section headers is semi-random. Sections 1, 2 and 3 miss out; the first 
>>> three non-empty body text lines on p3 become a header.  Sections 3.x are 
>>> found but not s4 onwards.  s4.x you would have difficulty with as they 
>>> are indented. Horrible! I think I owe you a beer if you can canonicalize 
>>> this one!
>>>     
>>
>> Actually, it seems you're looking at an old cached copy - after refreshing
>> here, this one looks pretty good too me, on all 3 servers.
>>
>>   
> Right.. this is pretty much OK apart from the s4.x which are 
> additionally indented.  ... let's not bother too much.

Right.

Regards,

	Henrik

_______________________________________________
Tools-discuss mailing list
Tools-discuss@ietf.org
https://www1.ietf.org/mailman/listinfo/tools-discuss