Hi
I have been bugged a lot by embedded image spams recently, although some of these spams got trapped due URI checks, some managed to pass as well as the url wasn't yet blocked in the SURBLs.
I probably found something tht i wanted to share with u guys and try and see if we can trap those spams further on the basis of tht. I have classified those embedded image spams into two classes. Class 1 of image of fulllist of viagra and other meds and Class 2 of image of one liner information on cheap softwares or viagra. I was thinking of if possibly we can understand a common pattern and try and make a ruleset on top of tht so tht we dont have to wait for updates at URIbl, then it would be really some thing good. These image only spams apparently have a prob tht we can trap on :). The loophole is in most of the cases the message id of the mail and the content id or cid of the embedded image is exactly same.
For e.g.
Message-ID: 1066724820.2422@boschkitchencentre.com Content-ID: 1066724820.2422@boschkitchencentre.com
some variations also had something like this
Message-ID: 1064962549.5961@cal.cybersurf.net Content-ID: <sivjxu_onzvh_dzdohvo>
But thts applicable to class1 of the spams and in class 2 which are just images containing oneliners has some variations. In some cases the content id is smartly tampered but again there is a loophole and here is an example of tht
Message-ID: 525F074E3524$72BF31B3$02605c3b@comcast.net Content-ID: e102605c3b@comcast.net
the message id and the content id both contain the domain name of the sending server. And a valid mail that had embedded image in it but was sent from outlook had details something like this
From Outlook Message-ID: 002101c55c2f$b3f540a0$bdc809c0@cg Content-ID: image001.jpg@01C55C5D.CB204210
Frankly I haven't seen how content id appears when images are embedded using other valid email clients like netscape or thunderbird. But if we compare the above set of patterns, what appears is tht if a image is embedded using a client like outlook then "@" appears in the content id of the attachment but the latter part of @ is not the domain name, but has the name of the attachment itself and the messageid is different from the content id, whereas incase of the spammers content ids that appear are either exactly same to tht of the message id, or doesnt have a @ or has the domain name of the server as a latter part of the @ in content id.
So my question is can we have rulesets in spamassassin that can compare the sending host domain with the latter part of @ of content id or look for @ in the content id.
Any suggestions ? comments ?