Using puppet realm switch to select between beta/prod ( Wikimedia clusters )

Since the BounceHandler extension is currently installed only in the beta clusters ( Official testing servers of Wikimedia- ), writing a custom router in the exim configs of operations/puppet ( configuration repo managed by puppet ) to collect in all the bounce emails and HTTP POST to the extension API seemed risky. This was due to the fact that operations/puppet is meant for production and something beta specific in it, that too continuous API requests dont look sane ( we cant estimate the traffic yet ).
Marius Hoch ( WMF ) came with the idea of an exim-realm switch, which seemed to solve the issue. You can switch variables between beta and production by:
We edited

$ sudo <editor> templates/exim/manifests/mail.pp

and added this before the inclusion of exim4.conf.SMTP_IMAP_MM.erb.

# config - labs vs. production 
case $::realm {
    'labs' : {
	$verp_post_connect_server = ''
	$verp_bounce_post_url = ''
    'production': {
	$verp_post_connect_server = 'appservers.svc."${::mw_primary}".wmnet'
	$verp_bounce_post_url = ''
    default: {
	fail('unknown realm, should be labs or production')

The internal networks selection was done to make sure the bounce gets POSTed to the right wiki. Now, this configuration can be used in exim4.conf.SMTP_IMAP_MM.erb by editing the bouncehandler router by:

command = /usr/bin/curl -H 'Host: <%= @verp_post_connect_server %>' <%= @verp_bounce_post_url %> -d "action=bouncehandler" --data-urlencode "email@-"

Hope it helps someone in beta 🙂

Using Plancake::MailParse library to strip email out of its headers

Due to some issues with composer loading we had to shift our mail parse library from pear::mimeDecode to a more good looking email parse library by Plancake. Just quoting down how we will employ the library to extract headers from an email:
Add the dependancy to your composer.json

        "floriansemm/official-library-php-email-parser": "dev-master"

Now in the mail decode class, add this to perform the extraction:

  * Extract headers from the received bounce email using Plancake mail parser
  * @param string $email
  * @return array $emailHeaders.
public function extractHeaders( $email ) {
	$emailHeaders = array();
	$decoder = new PlancakeEmailParser( $email );

	$emailHeaders[ 'to' ] = $decoder->getHeader( 'To' );
	$emailHeaders[ 'subject' ] = $decoder->getSubject();
	$emailHeaders[ 'date' ] = $decoder->getHeader( 'Date' );
	$emailHeaders[ 'x-failed-recipients' ] = $decoder->getHeader( 'X-Failed-Recipients' );

	return $emailHeaders;

Now you can have the headers ready in $emailHeaders 🙂 yay!
* The library has no other dependencies and looks neat as a whole. The code can be found here : ! Enjoy ! Happy Hacking!

Using PEAR::mimeDecode to strip email bounce out of its headers

Writing indigenous email header stripping functions involve tedious work and a lot of regex, as the bounce headers email can be encoded. up/down cased, and saving it, we planned to include the mailMimeDecode class – a pretty straightforward approach. Header stripping is quite easy, and it can be installed via composer too.
* Prepare your composer.json

        "pear/mail_mime-decode": "1.5.5",
        "pear/pear_exception": "1.0.x-dev"

* Now, in the mail decode class, add the following to initialize the mimeDecode object

	// mimeDecode configurations
	$params['include_bodies'] = true;
	$params['decode_bodies'] = true;
	$params['decode_headers'] = true;

	$decoder = new Mail_mimeDecode( $email );
	$structure = $decoder->decode( $params );

	$emailHeaders = $structure->headers;
	$to = $emailHeaders[ 'to' ];
	$subject = $emailHeaders[ 'subject' ];
	$emailDate = $emailHeaders[ 'date' ];
	$permanentFailure = $emailHeaders[ 'x-failed-recipients' ];

Now, you have the To, Subject, Date and X-Failed Recipients headers ready. Please note that, you need to just put the lower-cased default email header-name as ‘key’ in the $emailHeaders array, to fetch it.
Yay! Happy Hacking

Writing a Job queue to deal with load when POST-ing from exim to MediaWiki API

Last day, Tim Landscheidt from Wikimedia scribbled on my earlier post that I should use a job queue to handle load of the bounce handling API. I talked with Legoktm on this, and he said it was a great idea, as there can be a chance of multiple email bounces reaching the API simultaneously. I will jot down how we made that happen.
*Firstly, register and load the Job handler class

//Register and Load Jobs
$wgAutoloadClasses['BounceHandlerJob'] = $dir. '/includes/job/BounceHandlerJob.php';
$wgJobClasses['BounceHandlerJob'] = 'BounceHandlerJob';

* Now, create the file BounceHandlerJob.php extending class BounceHandlerJob from Job.
I wanted to get the $email, which will be passed from ApiBounceHandler::exectue();

class BounceHandlerJob extends Job {
	public function __construct( Title $title, array $params ) {
		parent::__construct( 'BounceHandlerJob', $title, $params );

	 * Queue Some more jobs
	 * @return bool
	public function run() {
		$email = $this->params[ 'email' ];

		if ( $email ) {
			// The function in the API where the header 
			// stripping and other stuff happen
			ApiBounceHandler::processEmail( $email );

		return true;

* Now, we need to make the APIBounceHandler class to receive the POST request, and create a Job queue object so that our objective is accomplished.

$email = $this->getMain()->getVal( 'email' );
$params = array ( 'email' => $email );
$title = Title::newFromText( 'BounceHandler Job' );
$job = new BounceHandlerJob( $title, $params );
JobQueueGroup::singleton()->push( $job );

Yay ! done. now the public static processEMAIL function needs to be defined with the necessary actions and you are good to go.
PS: Please add the following to LocalSettings.php to see the results.

$wgRunJobRate = 0;

To ensure things are working well, run the script

php runJobs.php

. Correct errors if any, and the perfect run will output something like.

2014-07-13 15:02:38 BounceHandlerJob BounceHandler_Job email=string(1862) STARTING
2014-07-13 15:02:38 BounceHandlerJob BounceHandler_Job email=string(1862) t=100 good

. Thanks