Recently, My proposal to implement VERP technology for MediaWiki got accepted into the Google Summer of Code 2014 edition. I thank my mentors Jeff Green and Legoktm and various other WMF members who helped me complete the write-up.
It’s likely that many Wikipedia accounts have a validated email address that once worked but is out of date. Wikipedia do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10+ years old. The wiki should not keep sending email that is just going to bounce. It’s a waste of resources and might trigger spam heuristics. Two API calls need to be implemented:
One to generate a VERP address to use when sending mail from MediaWiki.
One that records a non-transient failure. That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user’s address so mail will stop going to it.
For the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests. The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don’t want to react to one bounce. We want a history of consecutive mails bouncing. There would be a Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address. Since its the time MediaWiki mail infrastructure is being moved to new Data Center, this is the right time to implement VERP.
VERP stands for Variable Envelope Return Path, and on implementation alters the default envelope sender. For eg: if an email needs to be send to firstname.lastname@example.org, VERP alters the default envelope sender from : email@example.com to a prefix/delim/hash: [bob][-][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, so that the bounce can be used more effectively . The API would record the return address of the bounce and deduce that a mail to bob have failed. On consecutive failures, say at least 3 bounces with the oldest at least 7 days ago, the second API un-confirms the user’s address.
The return path address needs to be a prefix/delim/hash as to avoid fake bounces DoSing a user. The VERP address will generally look like this :
The prefix /^bounce-/ is used by the incoming MTA as a hook to route messages to the bounce processor, and $key is used by the bounce processor to figure out which wiki user is having delivery issues. An attacker needs to be prevented from spoofing bounce messages and causing mass unsubscribes. This can be accomplished by making $key secret, and not a simple hash that can be reversed or guessed. Generating an HMAC, with a secret key, over a string containing the user’s email address, timestamp, and the list name will be the best option as per security experts in MediaWiki. HMAC can be generated by one of PHP’s built in function.
When an email is sent, on the Wiki web server a message is injected to the local MTA in a shell call by the user the MediaWiki web server daemon runs under. MediaWiki uses the config variable $wgPasswordSender to set the envelope sender, and all messages are sent as the user (for example ‘firstname.lastname@example.org’). In WMF’s environment, the webserver’s MTA is configured to route all messages through the organization’s main mail server, which relays them to the destination/remote server as determined from DNS MX records. There are many points where the delivery can fail, for example:
- DNS lookup failure (Permanent failure)
- Network failure (Temporary failure)
- Remote server could be overloaded (Temporary failure)
- Remote server might blacklisted wikimedia.org or email@example.com (Temporary failure)
- Remote server could say firstname.lastname@example.org is a bad address (Permanent failure)
- Remote server could say email@example.com is over quota (Temporary failure)