Re: Complexity of :matches ?
Philip Guenther <guenther+mtafilters@sendmail.com> Tue, 12 July 2005 10:52 UTC
Received: from above.proper.com (localhost.vpnc.org [127.0.0.1]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6CAqCnu068100; Tue, 12 Jul 2005 03:52:12 -0700 (PDT) (envelope-from owner-ietf-mta-filters@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.11/8.12.9/Submit) id j6CAqCNo068099; Tue, 12 Jul 2005 03:52:12 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-mta-filters@mail.imc.org using -f
Received: from foon.sendmail.com (tls.sendmail.com [209.246.26.40]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6CAqCuv068091 for <ietf-mta-filters@imc.org>; Tue, 12 Jul 2005 03:52:12 -0700 (PDT) (envelope-from guenther@sendmail.com)
Received: from lab.smi.sendmail.com ([10.210.100.93]) by foon.sendmail.com (Switch-3.1.7/Switch-3.1.7) with ESMTP id j6CAq1t2013742 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Tue, 12 Jul 2005 03:52:02 -0700
X-DomainKeys: Sendmail DomainKeys Filter v0.2.7 foon.sendmail.com j6CAq1t2013742
DomainKey-Signature: a=rsa-sha1; s=tls; d=sendmail.com; c=nofws; q=dns; b=iPYLMJ4bKBPJhX5+6Dm8m3LAsvsBHiCkOg/ZbHLOIpCy2K1GGMa+LCgOtFcBPh6w5 btlnSmzlbP3acs+HqUeSGw2HsQB9CWAtM2TqDNRGPu7I1IC/8tKpQ1cPo6DZo61hkoq fzQzWgkI5zxHJBDg5UN8ZuUetnKdTQof82CFsYc=
Received: from lab.smi.sendmail.com (localhost [127.0.0.1]) by lab.smi.sendmail.com (8.12.11/8.12.11) with ESMTP id j6CAq1rH038682; Tue, 12 Jul 2005 03:52:01 -0700 (PDT) (envelope-from guenther@lab.smi.sendmail.com)
Message-Id: <200507121052.j6CAq1rH038682@lab.smi.sendmail.com>
To: Michael Haardt <michael@freenet-ag.de>
Cc: ietf-mta-filters@imc.org
From: Philip Guenther <guenther+mtafilters@sendmail.com>
Subject: Re: Complexity of :matches ?
In-reply-to: <E1DsGVb-0003gO-Fg@nostromo.freenet-ag.de>
References: <E1DsGVb-0003gO-Fg@nostromo.freenet-ag.de>
Date: Tue, 12 Jul 2005 03:52:01 -0700
Sender: owner-ietf-mta-filters@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-mta-filters/mail-archive/>
List-ID: <ietf-mta-filters.imc.org>
List-Unsubscribe: <mailto:ietf-mta-filters-request@imc.org?body=unsubscribe>
Michael Haardt <michael@freenet-ag.de> writes: ... >So you suspect that there might be an algorithm that does not perform >backtracking? I can not imagine O(N*M) as worst case unless we can kick >backtracking out, although it might well be an average case. But that's >just a feeling and it may be wrong. Does your :contains implementation (substr()?) perform backtracking? What's its worst-case big-oh? ... >If we can not come up with a proof for worst-case complexity of non-greedy >algorithms, <...> Given that you can obviously locate a match for a string of 'n' non-star characters in O(n*M), isn't the algorithm that Tim described a worst-case of O(N*M) where N is the number of characters _other_ than star in the pattern? (Stars are the cheap part, as they can never force a backtrack!) ... ><Ned wrote> >> In any case, the computational complexity of glob-style matches is >> clearly less than that of regexp, which is hardly surprising given >> that regexps require a NFA. This is why we disable regexp support by >> default in our implementation. > >That's what I wonder about. The NFA can be converted to a DFA, >backtracking being an alternative, but that same appears to be true for >wildcard patterns to me, if multiple stars are allowed. <pause to pull out the dragon book> (To reiterate, N is the size of the pattern, M is the size of the text being matched against.) For an NFA, the setup is O(N) in time and space and matching is O(N*M). For a non-lazy DFA, the matching is O(M), but the worst-case setup is O(2^N). A lazy DFA has setup O(N), but the dragon book glosses its matching complexity: they claim it's "almost as fast" as a non-lazy DFA, but I think it degenerates to a weird NFA if no states are reused, making it O(N*M). A precise statement of the complexity of calculating a new state is needed to fully characterize that. Philip Guenther
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Ned Freed
- Re: Complexity of :matches ? Tim Showalter
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Michael Haardt