Re: Complexity of :matches ?
Ned Freed <ned.freed@mrochek.com> Mon, 11 July 2005 16:10 UTC
Received: from above.proper.com (localhost.vpnc.org [127.0.0.1]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6BGAOJ4059957; Mon, 11 Jul 2005 09:10:24 -0700 (PDT) (envelope-from owner-ietf-mta-filters@mail.imc.org)
Received: (from majordom@localhost) by above.proper.com (8.12.11/8.12.9/Submit) id j6BGAORq059956; Mon, 11 Jul 2005 09:10:24 -0700 (PDT)
X-Authentication-Warning: above.proper.com: majordom set sender to owner-ietf-mta-filters@mail.imc.org using -f
Received: from mauve.mrochek.com (mauve.mrochek.com [209.55.107.55]) by above.proper.com (8.12.11/8.12.9) with ESMTP id j6BGAMoF059949 for <ietf-mta-filters@imc.org>; Mon, 11 Jul 2005 09:10:23 -0700 (PDT) (envelope-from ned.freed@mrochek.com)
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01LQH95TGZXS000092@mauve.mrochek.com> for ietf-mta-filters@imc.org; Mon, 11 Jul 2005 09:10:21 -0700 (PDT)
Cc: ietf-mta-filters@imc.org
To: Michael Haardt <michael@freenet-ag.de>
Message-id: <01LQHS7CFONU000092@mauve.mrochek.com>
Date: Mon, 11 Jul 2005 08:15:21 -0700
From: Ned Freed <ned.freed@mrochek.com>
Subject: Re: Complexity of :matches ?
In-reply-to: "Your message dated Mon, 11 Jul 2005 12:48:37 +0200" <E1Drvq1-00039y-Vw@nostromo.freenet-ag.de>
MIME-version: 1.0
References: <E1Drvq1-00039y-Vw@nostromo.freenet-ag.de>
Sender: owner-ietf-mta-filters@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-mta-filters/mail-archive/>
List-ID: <ietf-mta-filters.imc.org>
List-Unsubscribe: <mailto:ietf-mta-filters-request@imc.org?body=unsubscribe>
> I just had a user using a wildcard pattern that took a large amount of > CPU time. Now that this particular problem is solved, I wonder about > ":matches". > Does anybody know the complexity order of recognising wildcard patterns, > which are a subset of regular expressions? The complexity of a naive implementation of greedy glob-matching is pretty clearly O(M*N**K), where N is the length of the target string, M is the length of the pattern, and K is the number of *'s present in the pattern, and it is pretty easy to construct inputs that will exhibit worse-case behavior - all you need is lots of *'s in a pattern that never matches. Using a non-greedy approach makes it much harder to construct inputs that will take a long time to process, especially if you also do some simple optimizations like collapsing strings of repeated *'s. In practice this seems to avoid many of the more common problems, such as when users specify an argument to :matches that only makes sense to give to :contains, such as "******** ATTENTION ********". However, I do not know if simply using an optimized non-greedy approach actually reduces the computational complexity to something less than O(M*N**K) for extremely pathological cases. If it doesn't, my guess is that there's some sort of dynamic programming trick that can get the complexity down to O(N*M) or something similar. I haven't tried to find such a trick, mostly because we've never encountered a pathological case when using optimized non-greedy matching that would make it relevant. In any case, the computational complexity of glob-style matches is clearly less than that of regexp, which is hardly surprising given that regexps require a NFA. This is why we disable regexp support by default in our implementation. Perhaps it makes sense to recommend a non-greedy approach in the security considerations section of the base specification? Allowing implementations to limit the number of *'s they allog in glob-style patterns might also make sense, if only to catch cases where users used :matches when they intended to use :contains. Ned P.S. I'm curious what pattern you encountered that took a long time. Mind sharing it?
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Philip Guenther
- Re: Complexity of :matches ? Michael Haardt
- Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Ned Freed
- Re: Complexity of :matches ? Tim Showalter
- Re: Complexity of :matches ? Michael Haardt
- Re: Complexity of :matches ? Michael Haardt