Re: [Emailcore] Ticket #5: G.5. Remove or deprecate the work-around from code 552 to 452?

Viktor Dukhovni <ietf-dane@dukhovni.org> Fri, 12 March 2021 04:51 UTC

Return-Path: <ietf-dane@dukhovni.org>
X-Original-To: emailcore@ietfa.amsl.com
Delivered-To: emailcore@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B81B93A1808 for <emailcore@ietfa.amsl.com>; Thu, 11 Mar 2021 20:51:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IlFGf80oZhIx for <emailcore@ietfa.amsl.com>; Thu, 11 Mar 2021 20:51:54 -0800 (PST)
Received: from straasha.imrryr.org (straasha.imrryr.org [100.2.39.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B80CB3A1807 for <emailcore@ietf.org>; Thu, 11 Mar 2021 20:51:54 -0800 (PST)
Received: by straasha.imrryr.org (Postfix, from userid 1001) id DA1A2CC7F9; Thu, 11 Mar 2021 23:51:51 -0500 (EST)
Date: Thu, 11 Mar 2021 23:51:51 -0500
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
To: emailcore@ietf.org
Message-ID: <YErzZ83+R3uBFctt@straasha.imrryr.org>
Reply-To: emailcore@ietf.org
References: <ca851fda-63ac-8739-c3eb-bde725aa25f3@isode.com> <c188413b-9337-40d8-8062-9c0f58f6cd98@www.fastmail.com> <CAHej_8kHwEOmq5bf49=Tt6ZEVkuidMhy5s4XPu7JC+k22qraZg@mail.gmail.com> <CAHej_8ma-kDkVh3Oj11R5Fn6BbwJsWfFpx0Zqv61fPL35CJNUA@mail.gmail.com> <0F2370D0-D04C-45A1-A5A1-8FF1F174FFE5@dukhovni.org> <CAHej_8=deJU1CW2AzDBu5ji3Uir+_zF6Gp59Z-hHRmRipz8Osw@mail.gmail.com> <9A7BDB22F3A0396EF24BF91D@PSB> <5772045E-2EB3-44B7-BCA8-0135DF809C62@dukhovni.org> <1F8CB46BBE3C0D5C4EB49C96@PSB>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <1F8CB46BBE3C0D5C4EB49C96@PSB>
Archived-At: <https://mailarchive.ietf.org/arch/msg/emailcore/cZOQ8oF7chtbXuEu8u4IZIyMPrM>
Subject: Re: [Emailcore] Ticket #5: G.5. Remove or deprecate the work-around from code 552 to 452?
X-BeenThere: emailcore@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EMAILCORE proposed working group list <emailcore.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/emailcore>, <mailto:emailcore-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/emailcore/>
List-Post: <mailto:emailcore@ietf.org>
List-Help: <mailto:emailcore-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/emailcore>, <mailto:emailcore-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Mar 2021 04:51:57 -0000

On Thu, Mar 11, 2021 at 10:08:50PM -0500, John C Klensin wrote:

> (1) Independent (at least for now) about what clients do when they get
> a 552, have enough servers switched to sending 452 rather than 552
> when "too many recipients" is intended that the text about
> interpretation of the 552 code as a 4yz one can actually be removed
> (or retained only in an historical note).

I two decades of monitoring the Postfix users list, I've seen no reports
that I can recall of users having problems delivering mail to some
recipients due to servers incorrectly replying 552 when an envelope size
limit is exceeded.

For any 5XX "RCPT" response, Postfix arranges to bounce the message for
the recipient in question (delayed until all currently scheduled
recipients are delivered, deferred or fail, and thus consolidated across
affected failed recipients), so any non-negligible population of problem
servers would have surfaced at least a few times in the last 20 years.

> (2) As I read Victor's note, there were many places where my response
> was "that is what the spec say to do already".  That implies that
> either Victor and I are reading the same text differently (in which
> case I think some clarification effort is in order) or that he and I
> are having problems understanding each other.

Likely some of both, but I think more on the communication side.  Absent
nods in acknowledgement or quizzical looks, it is much to hard to know
when to keep it brief and when to elaborate a point.  Async
communication is fundamentally limited here, especially when multicast,
and not every reader needs the same level of detail.

> * Victor has made the [800] pound Gorilla argument.  I think that
> argument is a serious challenge for the WG and its decisions about
> scope and goals.  I'll try to address that soon in a separate note.

> > <quote 2>
> >                               Clients SHOULD treat a
> > 552 code in    this case as a temporary, rather than
> > permanent, failure so the logic    below works.
> > </quote 2>
>  
> > The above is not followed by any of the major MTAs (Exim,
> > Postfix, or Sendmail, nor any others reported).  It MUST be
> > scrapped.  If the server responds with 552, the recipient in
> > question will almost always be immediately bounced by the
> > sending client.
> 
> For whatever it is worth, I don't know what it means to "bounce
> a recipient".

It means that the message will not be delivered for that recipient, and
if any bounce is to be sent (non-empty envelope sender and not DSN
NOTIFY=..., which excludes failure), then this recipient will be
reported in the bounce.  Redundantly, overeloborating the point,
the recipient WILL NOT be included in any attempts to retry the
message delivery.  This should be clear now I hope.

> More important, I would hope that the client would accumulate
> undeliverable addresses and include them in a single "bounce" message,
> not send one bounce message per undeliverable recipient for reasons I
> hope are obvious.

So obvious, I didn't bother to make it explicit. :-)

> > Note, that the 552 in reply to "RCPT" will not be taken to mean
> > anything about the message as a whole, earlier or subsequent
> > recipients. If any recipients are ultimately not rejected the
> > partially accepted message envelope will be relayed by the
> > client unless the server rejects the "DATA" command.
> 
> See my questions to Jeremy, but, if that is what they are doing,
> then they are conforming to the requirement to treat 552 as 442,

No, definitely NOT.  A 4XX (read my reply in lieu of Jeremy, though of
course he is still quite welcome to answer for himself) results in the
recipient being added to the list of recipients to retry, while a 5XX
causes the recipient to be dropped from the list of still pending
recipients for the message and to be added to the list of recipients
to be bounced.

Perhaps you're imagining some message-level scope for these replies,
but replies to "RCPT" don't have message-level scope, they're only
about that particular recipient and nothing else.  It is full
steam ahead to deliver the message to all the 2XX recipients if
at least one has been accepted (and DATA or BDAT is accepted)
regardless of any 4XX or 5XX to some proper subset of the recipients.


> because that is exactly the behavior one would expect 442 to do.
> Whether they (or you) think about that as conforming behavior
> doesn't make a lot of difference.  If there is a difference it
> is in your suggestion above about an immediate bounce.  

Again no.  5XX "erases" the recipient, "4XX" puts the recipient
on the "try later" list.  So they're manifestly quite different.

> And that immediate bounce, or even returning the 552 code to an
> earlier SMTP sender, raises another problem.  One of the most
> important principles in SMTP-land, going back to 821, is that
> one does not report a message as undeliverable and then deliver
> it anyway.  "Delayed" messages are an odd case for which 821->
> 2821-> 5321 do not have any provision, but, if the message (or,
> if you prefer, address) is bounced, that is the end of it -- the
> only way that message is going to be delivered to that recipient
> is if the originator creates and sends what, as far as SMTP is
> concerned, is a new message.

You appear to me to be thinking within a model where a message has an
atomic envelope with an indivisible group of recipients, but of course
each recipient's fate is an independent event, and MTAs routinely "split
the envelope" for any number of reasons.

When some of the recipients in an envelope that the sender had attempted
to deliver in a single transaction (had not intended to split) are
accepted, and some are temporarily deferred (and the rest rejected) the
envelope is dynamically split with the accepted batch delivered via
the first transaction, the deferred batch retried separately, and
the rejected batch included in a bounce (consolidated across all
the scheduled pre-split envelopes envelopes of the same message).

In any case you keep talking about *messages* in a context ("RCPT"
replies) which is per-recipient.  Again a 4XX or 5XX reply to "RCPT" is
NOT about the message, it signals the status of just *that* one
recipient.

> Good.  Again, you are describing behavior consistent with the
> client treating the 552 as equivalent to 452 (or some other 4yz
> code).

Absolutely not.  See above.  They are fundamentally incompatible
(for the affect "RCPT"), and don't convey anything about the
transaction as a whole (unless *all* recipients temp or hard
fail).

> > Instead clients typically limit the number of recipients per
> > envelope to somewhere south of 100 (perhaps hand-tuned for mail
> > destined to some of the 800lb gorilla providers), and just send
> > all the RCPT commands they intended to send, with perhaps the
> > tail of the list ending up deferred.
> 
> Again, consistent with treating the 552 as a 442, whether they
> precisely follow the rest of the description in that section or
> not.

Still, and emphatically, nope, the two are NOT the same.

> > So long as the server is willing to defer (4XX) 100-N
> > recipients or more without prejudicing any it already
> > accepted, it will interoperate with clients that expect to be
> > able to send up to 100, despite the lower limit.
> 
> But that is an issue with the current semi-hard limit of a
> minimum or 100, not with the 452/552 requirements.

Let's, at least for now, move on from any distraction that 452 is
functionaly equivalent to 552, it simply isn't.  And switch to the issue
of the 100 "limit" (minimum the server is required to accept, and
hence maximum a sensible client should attempt by default, barring
prior arrangements, special local knowledge, ...).

> And, while the text is not as precise as we might like it, clients
> (and, for that matter servers) are allowed to pick lower limits under
> the "Resistance to Attack" / Operational necessity principle.  So
> there is nothing in the above that is non-conforming, even to the
> current text in 4.5.3.1.10.   And, fwiw, I note your explanation above
> uses "defer" and "4XX" together, not 552.

A server is free of course to respond however it likes, but if it wants
to be interoperable, it needs to take into account that compliant
clients aren't psychic, and may legitimately attempt up to 100
recipients.  If the server wants to accept fewer recipients *per
transaction*, say 20, it needs to be willing to soft-fail (4XX/defer) up
to at least 80 subsequent "RCPT" commands, without tarpitting or other
barriers to having at least the first 20 recipients delivered.

If a client choose to initiate transactions for over 100 recipients with
"strangers", it may needs to be prepared to slow down and perhaps treat
the first "452" it encounters as a strong hint to perhaps defer all the
rest, without even attempting to try and include them.

There appears to be some logic in Exim's source code along these lines,
to stop when at least one recipient got a 2XX or 5XX when subsequently a
452 is seen.  That logic is not always a good idea, because 452 can also
signal other per-recipient resource limits, and the decision to defer
everyone else can substantially delay the remaining recipients (perhaps
beyond the message expiration time).  But this is a plausible heuristic
if it is correct in a sufficient supermajority of cases.  I guess, since
it is in the code, it is working out mostly okay...

So this is one case in which Exim appears to be reading more into the
server's reply that just the 2XX, 4XX, 5XX trichotomy.  There is no such
logic in Postfix, and none likely any time soon.

> > Another reason for fewer than 100 is when servers want have the
> > envelope split by some server-specific classification of
> > recipients, that may affect recipient-specific downstream
> > content transformations. In such a case the server might 452
> > every recipient in a class that is different from that of the
> > first accepted recipient.  The remaining recipient classes
> > would then need to be tried in a separate transaction.
> 
> Again,  5321 now allows clients and/or servers to pick a limit
> lower than 100 and encourages / specifies orderly behavior when
> that is done.  Modulo the recommendation to use 452 for clarity
> rather than 552, that is what everyone you have described seems
> to be doing in practice.

[ I'm ignoring the 452/552 equivalence catnip. ]

Note that an Exim-like strategy of truncating the envelope at the first
452 works poorly in this case.  Because if the various recipients
classes (as is most likely) are sprinkled randomly through the envelope,
with high probability only one or two get through per transaction, and
as soon as there's a new class, the rest are deferred.  An envelope with
100 recipients and say 5 recipient classes, might then take ~80 retries
to deliver (and likely would never get to the tail of the list).

So the key thing here is "orderly" deferral of the 100-N recipients,
when choosing to accept N-at-a-time, where orderly does not imperil
the delivery of the first N.

> If you think it is useful for either 5321bis or the A/S
> to explicitly point that out and discuss it, please suggest how
> and where to do that.  My personal guess (and so-far weak)
> preference is that the A/S is headed for a discussion of the
> practical issues and tradeoffs associated with the various cases
> and options.  In particular, from the standpoint of 5321 "more
> retries... before the message expires" is not 5231bis-level SMTP
> problem.

Perhaps, but lack of clear protocol guidance for servers can mean that
servers end up doing counter-productive things that harm the ecosystem,
by imposing capricious burdens on clients that the client has no way to
anticipate.  Where to explain what keeps the ecosystem in balance is a
judgement call.  I tend to favour over-communicating such messages, but
other approaches are also possible.

> > If, on the other hande, based on the sequence of presented
> > recipient addresses the server has somehow concluded that the
> > client is a spammer, it can of course with full knowledge of
> > the fact that the recipients will NOT be retried, send 552 or
> > better yet at that point, 554.
> 
> Agreed.  And I think that is what the text says.  If you think
> it needs clarification to distinguish between those two cases,
> please say so and encourage opening of another ticket.

Once the "treat 552" as 452" text goes away, and it is clear that all
"5XX" replies are permanent errors for the recipient in question, the
servers may mostly do as they see fit, but if it were up to me, I'd
include the "100-N" 4xx guidance for servers that wish to receive only
short envelopes, and are expecting clients retry (and ultimately
typically succeed) the rest in some separate transaction.

> > This is problematic and generally pointless.  It serves no
> > purpose and breaks interoperability.  Clients can always send
> > multiple copies in separate envelopes, so such absolute limits
> > (below 100) damage interoperability and don't achieve the
> > desired goal of actually stopping the same message arriving to
> > more than a given number of recipients.
> 
> But that is precisely intended to allow for the 552/554 response
> you are suggesting above.    So, "pointless", perhaps not.  More
> confusing than it ought to be, maybe.  Suggestions welcome.

I am talking about 4XX after N < 100.  If a server determines that a
message is spam after it hits 3 honeypot addresses in the first 20 it
can start replying 5XX for the rest, though frankly I'd prefer to not
expose that side-channel as to which addresses are the honeypots and
which not.  Simpler to just reply "5XX" to DATA, or all BDAT commands.

> > The solution is quite simple delete the above text, the
> > overall picture is much simpler.  Respond with 4XX to defer
> > and 5XX to permanently reject a given RCPT.  There's no reason
> > for the specification to dwell on all the creative ways in
> > which a decision to reject might be made.
> 
> However, what I thought we heard Wednesday morning was that a
> number of servers continue to send 552 where 452 is now
> preferred ("preferred" may be a little weak). 

If there are such servers, that's their problem, as mentioned above, in
20 years of helping a community of users keep their MTAs humming, I've
seen no reports I can recall of such instances.

> As long as that
> is the case, then the purpose of that extended section is to try
> to explain ways to handle the situation, including both
> receiving 452 and receiving 552 in a situation where it is
> intended to be interpreted as 452.

Two MTAs with a combined market share of IIRC ~90% by server count
(though not email volume, given the concentration of email hosting by
the 800 pounders) have no 552 to 452 downgrade heuristics, that ship has
sailed over the horizon and is not coming back.

> If we are convinced that all
> of the SMTP servers on the Internet (or enough that others don't
> count -- and I'm counting gorillas who feel free to define their
> own standards as one each, not by their user or daily message
> counts) have followed the advice of 5321 so that no server is
> any longer sending 552 when 452 (and breaking up the mail
> transaction and resending) is intended, then the section should
> be significantly reworked, possibly including a historical note
> for servers that are still in existence but we cannot identify.  

Yes, that section has long outlived its usefulness.  The Postfix
changelog says:

    20010520
        RFC 2821 recommendation: treat a RCPT 552 reply as if the
        server sent 452. Files: smtp/smtp_proto.c, lmtp/lmtp_proto.c.

    20010607
        Safety: dropped the RFC 2821 compliant code that treats
        552 RCPT TO replies as 452. It created more problems than
        it solved. Files: smtp/smtp_proto.c, lmtp/lmtp_proto.c.

The feature lasted but a few weeks, before it was withdrawn almost 20
years ago, never to reappear.  I think that says everything that needs
to be said.

> > The key thing in this section is to specify that "452" MUST be
> > used when the client should retry the recipient separately
> > because too many were sent so far to process in a single batch
> > on the server.  The number of "452" replies the server is
> > willing to send without blocking message delivery or imposing
> > punitive delays that'll prevent transaction completion needs
> > to be at least 100 minus the number of initially accepted
> > recipients.
> 
> And, modulo the limit of 100, which I continue to believe is
> either a separate issue or no problem at all given the current
> text in 5321, I believe that is exactly what that current text
> says.

Still not going for the catnip (again).  Been there, done that... :-)

-- 
    Viktor.