V

Regexes to filter mail receipts

Filtering automated e-mail responses such as return receipts, out-of-office notices, status reports, and reading confirmations is made pretty difficult

  • due to the many encodings and character sets in use,
  • due to the many different languages out there,
  • due to the fact that most mail agents (and some servers) write automated responses in very different ways,
  • simply due to bugs (e.g. one often finds extra spaces such as in "Subject: Read : ..."),
  • and finally due to missing standards, or at least the ignorance of programmers thereof.

Below you will find a list of regular expressions (to be used with egrep) which I use on several computers to filter a good bunch of automated responses.

Usage

Actually, I use the list from within a procmailrc recipe such as

:0
* ? egrep --quiet -f $HOME/.mail_receipts
receipts

Assuming you stored the regular expressions in $HOME/.mail_receipts this should move messages to the receipts folder. Note that when your grep does not support --quiet (unlike GNU grep) you have to use

egrep -f $HOME/.mail_receipts >/dev/null

Typical lines caught

Here is small list of Subject lines which are typical for automated responses and which are i.a. caught by my list of regular expressions:

Subject: Delivery Status Notification
Subject: Delivery Status Notification (Success)
Subject: Empfangsbestätigung (angezeigt) - ...
Subject: Gelesen: ...
Subject: Lesebestätigung: ...
Subject: Mail System Delivery Report
Subject: Read: ...
Subject: Return Receipt (displayed) ...
Subject: Return receipt
Subject: Successful Mail Delivery Report

False positives

In some situations, the presented list of regular expressions might lead to false positives, e.g. when someone sends a reply to a reading confirmation or forwards an "Out of office" notice. One can add exceptions for Subject lines which start with Re: or Fwd. However, one cannot exclude (fwd) at the end of Subject lines as this is often part of return receipts. The following procmailrc snippet shows an example of how to exclude Subject: Re: etc.

:0
* ! ^Subject: *((Antw(ort)?|FWd?|AW|WG|RE|ReRe):|Re([([]|:?( ?Re:|\^?[234])))
* ? egrep --quiet -f $HOME/.mail_receipts
receipts

Moreover, it should be noted that some mailers don't speak English very well. It's not uncommon that a message entitled Mail Delivery Notification actually wants to say that the message was not delivered. How to best exclude these depends largely on your setup. One quick and dirty way is to add another condition

* ! B ?? ^The following recipients are unknown:

List of regular expressions

Below you'll find a list of regular expression that catch the most common patterns I have observed. It can also be downloaded as regex_mail_receipts.txt which is the actual and updated list we are using. Please, as always, use it at your own risk. Also note that you probably will have to adjust it. It is currently tailored for use in Central European countries with a focus on German nasties.

^content-class: urn:content-classes:(dsn|mdn)
^Content-Type: .*(report-type="?|message/)disposition-notification
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?((Not[ _]r|R)ead)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?(Lu|((Nicht[ _]g|G)el|L)e[sz]en)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?(Letto|Lidas)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028])\?Q\?)?(Le=EDdo|L=E6st|Olvas=E1s|Okundu|(Nep|P)=F8e=E8teno|Pre=E8=EDtan=E9)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +=\?(utf|UTF)-8\?Q\?Le=C3=ADdo:
^Subject: +=\?[Ww]indows-125[028]\?B\?THWgOiA=\?=
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028])\?B\?)?TGVzZW46
^Subject: +=\?(utf|UTF)-8\?B\?QWNjdXPDqSBkZSByw6ljZXB0aW9uIChhZmZpY2jDqSkg
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?Lesebest[^ ]+tigung(((=A0|_)?(:|-|=3A)[ _])|(\?=)?$)
^Subject: +(Confirmation of|[rR]eturn) [rR]eceipt *(\(displayed\))?$
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?(Zugestellt)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +(Mail .*)?Delivery (([Cc]onfirm|(Status )?[Nn]otific)ation|Receipt|[rR]eport)( \([Ss]uccess\)|.*has been successful|.*successfully forwarded)?$
^Subject: +(Return [rR]eceipt.*|Delivered|.+bermittelt)(=A0|[ _])?(:|-|=3A)[ _]
^Subject: +Delivery [Rr]eport:
^Subject: +Reading Confirmation Receipt$
^Subject: +Successful Mail Delivery Report$
^Subject: +(Note: |DSN).*Return[- ]Receipt *$
^Subject: +Benachrichtigung.*Zustellstatus.*\(Erfolg\)
^Subject: +=\?utf-[78]\?Q\?Benachrichtigung__\+APw-ber__Zustells
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?Empfangsbest[^ ]+tigung[ _][^ ]*angezeigt
^Subject: +(RCPT|NDN|Ack|ASCR)(:| -|=3A)[ _]
^Subject: .*: mail delivery status.? *$
^Subject: +Abwesend:
^Subject: +=\?(utf|UTF)-8\?B\?QWJ3ZXNlbmQ6
^Subject: +(=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?)?(Out[ _]of[ _]Office|Abwesenheitsnotiz)
^Subject: +Response automatique d'absence du bureau
^Subject: +=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028])\?Q\?R=E9ponse_automatique_d=27absence_du_bureau
^Subject: +=\?(utf|UTF)-8\?B\?UsOpcG9uc2UgYXV0b21hdGlxdWUgZCdhYnNlbmNlIGR1
^Subject: +=\?(ISO|iso)-8859-1\?B\?RW1wZmFuZ3NiZXN05HRpZ3VuZzp
^Subject: .* is (not in|out of) (the )?office\.?$
^Subject: +=\?((ISO|iso)-8859-[0-9]*|[Ww]indows-125[028]|(utf|UTF)-8)\?Q\?.+_ist_(au=DFer_Haus|nicht_im_B=FCro).*=2E
^Subject: .*ist ?=\?ISO-8859-1\?B\?YXXfZXIgSGF1cy4=\?=$
^Subject: .*ist *=\?ISO-8859-[12]\?Q\?au=DFer_Haus=2E(\?=$|_Betreff=)
 
regex_mail_receipts.txt · Last modified: 2014-06-30 13:38 by andreas