Re: Complexity of :matches ?
Michael Haardt <michael@freenet-ag.de> Tue, 12 July 2005 08:52 UTC
Received: from above.proper.com (localhost.vpnc.org [127.0.0.1]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6C8qwoh024147; Tue, 12 Jul 2005 01:52:58 -0700 (PDT) (envelope-from owner-ietf-mta-filters@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.11/8.12.9/Submit) id j6C8qwfB024146; Tue, 12 Jul 2005 01:52:58 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-mta-filters@mail.imc.org using -f
Received: from mout0.freenet.de (mout0.freenet.de [194.97.50.131]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6C8quOS024131 for <ietf-mta-filters@imc.org>; Tue, 12 Jul 2005 01:52:57 -0700 (PDT) (envelope-from michael@freenet-ag.de)
Received: from [194.97.55.192] (helo=mx8.freenet.de) by mout0.freenet.de with esmtpa (Exim 4.52) id 1DsGVb-0007rH-U0 for ietf-mta-filters@imc.org; Tue, 12 Jul 2005 10:52:55 +0200
Received: from nostromo.freenet-ag.de ([194.97.7.6]) by mx8.freenet.de with esmtps (TLSv1:AES256-SHA:256) (Exim 4.52 #3) id 1DsGVb-0005fU-Pw for ietf-mta-filters@imc.org; Tue, 12 Jul 2005 10:52:55 +0200
Received: from michael by nostromo.freenet-ag.de with local (ID michael) (Exim 4.52 #5) id 1DsGVb-0003gO-Fg for ietf-mta-filters@imc.org; Tue, 12 Jul 2005 10:52:55 +0200
To: ietf-mta-filters@imc.org
Subject: Re: Complexity of :matches ?
Message-Id: <E1DsGVb-0003gO-Fg@nostromo.freenet-ag.de>
From: Michael Haardt <michael@freenet-ag.de>
Date: Tue, 12 Jul 2005 10:52:55 +0200
Sender: owner-ietf-mta-filters@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-mta-filters/mail-archive/>
List-ID: <ietf-mta-filters.imc.org>
List-Unsubscribe: <mailto:ietf-mta-filters-request@imc.org?body=unsubscribe>
> The complexity of a naive implementation of greedy glob-matching is pretty > clearly O(M*N**K), where N is the length of the target string, M is the length > of the pattern, and K is the number of *'s present in the pattern, and it is > pretty easy to construct inputs that will exhibit worse-case behavior - all you > need is lots of *'s in a pattern that never matches. Right, and that's what happened. It wasn't a problem until recently, when users started to use multiple stars instead of one, like ********@domain. Don't ask me why. > However, I do not know if simply using an optimized non-greedy approach > actually reduces the computational complexity to something less than O(M*N**K) > for extremely pathological cases. If it doesn't, my guess is that there's some > sort of dynamic programming trick that can get the complexity down to O(N*M) or > something similar. I haven't tried to find such a trick, mostly because we've > never encountered a pathological case when using optimized non-greedy matching > that would make it relevant. So you suspect that there might be an algorithm that does not perform backtracking? I can not imagine O(N*M) as worst case unless we can kick backtracking out, although it might well be an average case. But that's just a feeling and it may be wrong. > In any case, the computational complexity of glob-style matches is clearly less > than that of regexp, which is hardly surprising given that regexps require > a NFA. This is why we disable regexp support by default in our implementation. That's what I wonder about. The NFA can be converted to a DFA, backtracking being an alternative, but that same appears to be true for wildcard patterns to me, if multiple stars are allowed. > Perhaps it makes sense to recommend a non-greedy approach in the security > considerations section of the base specification? Allowing implementations to > limit the number of *'s they allog in glob-style patterns might also make > sense, if only to catch cases where users used :matches when they intended to > use :contains. If we can not come up with a proof for worst-case complexity of non-greedy algorithms, then I think we should document that pattern matching may require time exponential to the number of stars in the pattern and indeed allow implementations to put a limit on pattern matching, generating an error if the limit is exceeded. The implementation should be free to express the limit in the number of stars, the number of performed backtracking operations, used CPU time or whatever else. Interestingly, the PCRE library already contains functions to do that for regular expressions, which is why the Exim filter language (quite similar to Sieve) can offer regular expressions at no risk. Michael
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Ned Freed
- Re: Complexity of :matches ? Tim Showalter
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Michael Haardt