Re: Complexity of :matches ?

Philip Guenther <guenther+mtafilters@sendmail.com> Tue, 12 July 2005 10:52 UTC

Received: from above.proper.com (localhost.vpnc.org [127.0.0.1]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6CAqCnu068100; Tue, 12 Jul 2005 03:52:12 -0700 (PDT) (envelope-from owner-ietf-mta-filters@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.11/8.12.9/Submit) id j6CAqCNo068099; Tue, 12 Jul 2005 03:52:12 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-mta-filters@mail.imc.org using -f
Received: from foon.sendmail.com (tls.sendmail.com [209.246.26.40]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6CAqCuv068091 for <ietf-mta-filters@imc.org>; Tue, 12 Jul 2005 03:52:12 -0700 (PDT) (envelope-from guenther@sendmail.com)
Received: from lab.smi.sendmail.com ([10.210.100.93]) by foon.sendmail.com (Switch-3.1.7/Switch-3.1.7) with ESMTP id j6CAq1t2013742 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Tue, 12 Jul 2005 03:52:02 -0700
X-DomainKeys: Sendmail DomainKeys Filter v0.2.7 foon.sendmail.com j6CAq1t2013742
DomainKey-Signature: a=rsa-sha1; s=tls; d=sendmail.com; c=nofws; q=dns; b=iPYLMJ4bKBPJhX5+6Dm8m3LAsvsBHiCkOg/ZbHLOIpCy2K1GGMa+LCgOtFcBPh6w5 btlnSmzlbP3acs+HqUeSGw2HsQB9CWAtM2TqDNRGPu7I1IC/8tKpQ1cPo6DZo61hkoq fzQzWgkI5zxHJBDg5UN8ZuUetnKdTQof82CFsYc=
Received: from lab.smi.sendmail.com (localhost [127.0.0.1]) by lab.smi.sendmail.com (8.12.11/8.12.11) with ESMTP id j6CAq1rH038682; Tue, 12 Jul 2005 03:52:01 -0700 (PDT) (envelope-from guenther@lab.smi.sendmail.com)
Message-Id: <200507121052.j6CAq1rH038682@lab.smi.sendmail.com>
To: Michael Haardt <michael@freenet-ag.de>
Cc: ietf-mta-filters@imc.org
From: Philip Guenther <guenther+mtafilters@sendmail.com>
Subject: Re: Complexity of :matches ?
In-reply-to: <E1DsGVb-0003gO-Fg@nostromo.freenet-ag.de>
References: <E1DsGVb-0003gO-Fg@nostromo.freenet-ag.de>
Date: Tue, 12 Jul 2005 03:52:01 -0700
Sender: owner-ietf-mta-filters@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-mta-filters/mail-archive/>
List-ID: <ietf-mta-filters.imc.org>
List-Unsubscribe: <mailto:ietf-mta-filters-request@imc.org?body=unsubscribe>

Michael Haardt <michael@freenet-ag.de> writes:
...
>So you suspect that there might be an algorithm that does not perform
>backtracking? I can not imagine O(N*M) as worst case unless we can kick
>backtracking out, although it might well be an average case.  But that's
>just a feeling and it may be wrong.

Does your :contains implementation (substr()?) perform backtracking?
What's its worst-case big-oh?


...
>If we can not come up with a proof for worst-case complexity of non-greedy
>algorithms,  <...>

Given that you can obviously locate a match for a string of 'n' non-star
characters in O(n*M), isn't the algorithm that Tim described a
worst-case of O(N*M) where N is the number of characters _other_ than
star in the pattern?  (Stars are the cheap part, as they can never force
a backtrack!)


...
><Ned wrote>
>> In any case, the computational complexity of glob-style matches is
>> clearly less than that of regexp, which is hardly surprising given
>> that regexps require a NFA. This is why we disable regexp support by
>> default in our implementation.
>
>That's what I wonder about.  The NFA can be converted to a DFA,
>backtracking being an alternative, but that same appears to be true for
>wildcard patterns to me, if multiple stars are allowed.

<pause to pull out the dragon book>

(To reiterate, N is the size of the pattern, M is the size of the text
being matched against.)

For an NFA, the setup is O(N) in time and space and matching is O(N*M).

For a non-lazy DFA, the matching is O(M), but the worst-case setup is
O(2^N).

A lazy DFA has setup O(N), but the dragon book glosses its matching
complexity: they claim it's "almost as fast" as a non-lazy DFA, but I
think it degenerates to a weird NFA if no states are reused, making it
O(N*M).  A precise statement of the complexity of calculating a new
state is needed to fully characterize that.


Philip Guenther