Implementing VERP to MediaWiki [ GSoC Proposal ]

Recently, My proposal to implement VERP technology for MediaWiki got accepted into the Google Summer of Code 2014 edition. I thank my mentors Jeff Green and Legoktm and various other WMF members who helped me complete the write-up.

Project Summary
It’s likely that many Wikipedia accounts have a validated email address that once worked but is out of date. Wikipedia do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10+ years old. The wiki should not keep sending email that is just going to bounce. It’s a waste of resources and might trigger spam heuristics. Two API calls need to be implemented:
One to generate a VERP address to use when sending mail from MediaWiki.
One that records a non-transient failure. That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user’s address so mail will stop going to it.
For the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests. The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don’t want to react to one bounce. We want a history of consecutive mails bouncing. There would be a Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address. Since its the time MediaWiki mail infrastructure is being moved to new Data Center, this is the right time to implement VERP.
VERP stands for Variable Envelope Return Path, and on implementation alters the default envelope sender. For eg: if an email needs to be send to bob@example.com, VERP alters the default envelope sender from : wiki@wikimedia.org to a prefix/delim/hash: [bob][-][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org, so that the bounce can be used more effectively . The API would record the return address of the bounce and deduce that a mail to bob have failed. On consecutive failures, say at least 3 bounces with the oldest at least 7 days ago, the second API un-confirms the user’s address.
The return path address needs to be a prefix/delim/hash as to avoid fake bounces DoSing a user. The VERP address will generally look like this :
bounce-{$key}@wikimedia.org
The prefix /^bounce-/ is used by the incoming MTA as a hook to route messages to the bounce processor, and $key is used by the bounce processor to figure out which wiki user is having delivery issues. An attacker needs to be prevented from spoofing bounce messages and causing mass unsubscribes. This can be accomplished by making $key secret, and not a simple hash that can be reversed or guessed. Generating an HMAC, with a secret key, over a string containing the user’s email address, timestamp, and the list name will be the best option as per security experts in MediaWiki. HMAC can be generated by one of PHP’s built in function.

Problem Background
When an email is sent, on the Wiki web server a message is injected to the local MTA in a shell call by the user the MediaWiki web server daemon runs under. MediaWiki uses the config variable $wgPasswordSender to set the envelope sender, and all messages are sent as the user (for example ‘wiki@wikimedia.org’). In WMF’s environment, the webserver’s MTA is configured to route all messages through the organization’s main mail server, which relays them to the destination/remote server as determined from DNS MX records. There are many points where the delivery can fail, for example:

  • DNS lookup failure (Permanent failure)
  • Network failure (Temporary failure)
  • Remote server could be overloaded (Temporary failure)
  • Remote server might blacklisted wikimedia.org or wiki@wikimedia.org (Temporary failure)
  • Remote server could say example@gmail.com is a bad address (Permanent failure)
  • Remote server could say example@gmail.com is over quota (Temporary failure)

Each case can result in the mailserver currently handling the transaction to originate a bounce message. So a bounce can originate within the local system (i.e. the WMF environment) or the remote system (the recipient’s environment). Bounce messages generally go back to the envelope sender. Currently, in the case of WMF’s system, bounces coming back to wiki@wikimedia.org are sent to /dev/null.

Deliverables
Since its time the WMF is moving its servers to a new data center and the mail infrastructure is being rebuilt, this is the right time to implement the functionality. The final results should be :
All emails for users of WMF-hosted wikis should have their default envelope sender changed from wiki@wikimedia.org to a VERP generated envelope sender as (prefix/delim/hash) say : [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org
If the mail delivery fails due to any of the problem discussed above, a return mail should reach WMF mail servers with the receipient [bob][_][mdfkdjw6R4xGdiflfdfkQ]@wikimedia.org , and an API running there should record the failure and check for the past history of bounces of for this user from a database, and unconfirm the user if threshold level met.
The VERP generated recipient address will be the output of an HMAC with a secret key, over a string containing the user’s email address, timestamp, and the list name.
Additional Deliverables
Replacing PHP MailUser with SwiftMailer : Swift_Mailer seems to be lot robust than PHP MailUser. Parent5446 Writes- “PHPMailer has everything packed into a few classes, whereas Swift_Mailer actually has a separation of concerns, with classes for attachments, transport types, etc. A result of this is that PHPMailer has two different functions for embedding multimedia: addEmbeddedImage() for files and addStringEmbeddedImage() for strings. Another example is that PHPMailer supports only two bodies for multipart messages, whereas Swift_Mailer will add in as many bodies as you tell it to since a body is wrapped in its own object. In addition, PHPMailer only really supports SMTP, whereas Swift_Mailer has an extensible transport architecture, and multiple transport providers. (And there’s also plugins, and monolog integration, etc.) “. Tracked at BZ#63483

Skipping the Workflow, Participation and About me section ! 🙂

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s