Re: [Asrg] 3. Proof-of-work analysis

>>> We then
>>> carefully worked through all the calculations, using the best data
>>> that we could obtain -- and we did indeed come to the conclusion that
>>> proof-of-work is not a viable proposal :(
>>
>> That's a very interesting paper, thank you.  I wonder, however, what
>> the distribution curves are like when "regular correspondents" are
>> exempted from proof-of-work, not just mailing lists.  Would it be
>> possible to re-examine the MTA logs for this type of pattern?
>
> in principle yes ... however I doubt that the systems at the top of the
> curve (sending lots of email per day) would have regular 
> correspondents.
> Besides the people running mailing lists, they will be e-commerce
> systems sending acknowledgements, hospitals confirming appointments, 
> fax
> delivery systems relaying incoming messages etc.

Let's think about the statistical significance a bit.  The Net-wide 
average is about 3 people per host, but most hosts have only one or two 
people behind them.  The discrepancy may well be made up for by a small 
number of hosts handling very large numbers of people.

As an example, Lancaster University provides a central UNIX-shell 
cluster for all students, among the services of which was the official 
e-mail system.  Because it's a UNIX shell system, the mail is sent from 
the members of the cluster, not from the terminal used to log on.  So, 
in theory, the Internet sees 10,000 students sharing three 4-way Sun 
workstations (this, at least, was what the configuration used to be).

In practice, about 60% of those students are off-campus and typically 
use third-party ISPs for personal correspondence anyway, and an 
increasingly large proportion of the remainder use personal computers 
from their rooms - but you can see the principle.

And yes, I agree that this particular use-case is fairly pessimal in 
terms of proof-of-work scenarios.  However, intra-campus communications 
are typically quite well-ordered, so (with careful management around 
the beginning of the academic year) it could still be possible to use 
the same three workstations in a proof-of-work world.

>> By "regular correspondents" I mean people who know each other well
>> enough to send mail regularly, not necessarily frequently - even once 
>> a
>> week over a period of months.  I ask this because I expect that users
>> with slow machines - who would otherwise be the group most
>> inconvenienced by proof-of-work schemes - send mail that mostly falls
>> into this category.  I don't know, however, how much of the overall
>> picture is accounted for by these.
>
> I don't see why one should expect any correlation between machine speed
> and regularity of sending email. Many businesses will not splash out 
> for
> admin staff machines, so it is they as well as aged parents who might 
> be
> expected to have old kit :)

I'm afraid I don't have much insight into how business e-mail patterns 
go.  That's why I'm asking you for the statistics.  :)

Point taken, anyway - chalk this one up as another use-case to be 
considered.  It could be that the business might set up it's own 
proof-of-work server cluster for internal use, rather than upgrading 
individual workstations, or else rent time as needed on a third-party's 
cluster.

FWIW, I treat mailing lists as a special case of "regular 
correspondent", and as such I don't think it's necessary to distinguish 
them per se.  You might like to consider this when compiling your 
statistics.

>> For future work, it might be instructive to identify various non-spam
>> use-cases which appear to have a high proof-of-work load - ie. on the
>> "long tail" of the distribution curves presented - and consider
>> practical ways of relieving or accommodating it.
>
> indeed so ... though you should note that there is not much difference
> between spam viability thresholds and the average case, let alone 
> power-
> users.

For the brute-force proof-of-work scheme you assume in the paper, this 
is undoubtedly true.  I'm asking for more statistics to try and reveal 
whether the ways we've thought of, for making it less brute-force, are 
viable.

> For proof-of-work to look plausible (and not a high-risk strategy) I'd
> like to see factors of a thousand or more between plausible workloads
> for legitimate senders and any economically viable spamming activity 
> :-(

For my own usage pattern, assuming proof-of-work is exempted for 
regular correspondents, this is approximately true.  I talk almost 
exclusively to mailing lists and people I know pretty well.

Occasionally I get a question from someone I don't know, but this is 
rare enough that I could, if necessary, give up 60 seconds of my 
PowerBook's CPU time to send a reply, without too much fuss - after 
all, it would have taken me at least that long to write it.  I'd still 
be concerned about the time taken on a slower machine, though.

However, looking instead at the mail I *receive*, I can see a number of 
remote systems which could, potentially, be heavily burdened by 
proof-of-work.  However, these aren't as common as you might think.

Forum update notifications?  These come in frequently and from 
predictable sources, so I might as well whitelist them as a regular 
correspondent.  The same goes for news and status mailings from my ISP 
and various other organisations.

E-commerce transaction confirmations?  If it only costs a fractional 
cent per PoW token (because you're managing the hardware in bulk), it 
disappears next to the cost of the currency handling.  Remember, I'm 
imagining that you can centralise the effort and effectively rent space 
on someone else's cluster for this, so even small shops see the same 
kinds of low cost.  In practice, most e-commerce systems I've seen use 
a double-opt-in e-mail registration process, very similar to mailing 
lists, so similar mechanisms could apply.

As for tech support and sales enquiries, you're paying the staff 
sitting at the workstation at least national minimum wage (several 
dollars an hour), and they have a physical limit to how fast they can 
type and send mails.  The cost of attaching tokens to those mails is 
miniscule in comparison.  Whether it happens on the workstation or 
centrally is a matter of logistics - but if the workstation is already 
fast enough, the costs become essentially nil.

Registration confirmations?  Seriously, these are *supposed* to be 
rare.  I'd like to know what kind of system processes more 
registrations than actual service, before I consider this to be a 
problem.

I used to get e-cards from a few people.  These are typically sent from 
the e-card vendor's system at present - a bad practice, but oh well.  
If the cost of generating proof-of-work is too high for the e-card 
vendor, they can get the sender to download the e-card and send it 
themselves.  I'm not too worried about that.

That leaves one big category:  Web Mail.  The likes of Hotmail and 
Yahoo don't charge for sending e-mail from their systems, except 
perhaps in terms of banner ads.  They also handle ginormous amounts of 
said mail, which could make a proof-of-work switch-on relatively 
difficult for them.  However, most of their clients are low-end home 
users, who, on average, may have relatively favourable contact 
patterns.  For this, we could do with more statistics.

Any more?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@chromatix.demon.co.uk
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.

_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg