From jeffc@surbl.org Fri Sep 24 13:40:40 2004 From: Jeff Chan To: discuss@lists.surbl.org Subject: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 04:40:20 -0700 Message-ID: <1362350365.20040924044020@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7355042789106235920==" --===============7355042789106235920== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit In order to assist people hand-classifying spam URI domains and IPs for inclusion or non-inclusion in SURBLs, I've made a draft policy document: http://www.surbl.org/policy.html Please read it and post your comments. Jeff C. -- "If it appears in hams, then don't list it." --===============7355042789106235920==-- From nobody@xyzzy.claranet.de Fri Sep 24 22:39:49 2004 From: Frank Ellermann To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 22:32:09 +0200 Message-ID: <41548449.28C3@xyzzy.claranet.de> In-Reply-To: <1362350365.20040924044020@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9121277059584468908==" --===============9121277059584468908== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote: > Please read it and post your comments. | Don't add domains or IPs that have legitimate, non-spam uses. NAK (known issue, JFTR). | For IP addresses look them up in reverse octet order against | iddb.isipp.com . s/iddb/iadb2/ | check them against iadb.isipp.com or wadb.isipp.com s/wadb/iadb2/. WA is "withdrawn accreditation" (= bulk mailer decided to break the IADB rules), it's a kind of blacklist. You could mention WA elsewhere. e.g. together with SpamHaus. | Visit the site or at least Better remove this, it's too dangerous for the kids, and it can be misleading without JavaScript. If you need more interesting sources, you could add whois.sc (and maybe A9.com (?)) | 13.Apply common sense ACK, much better than 2. | but which other people might consider legitimate. This can | include sites like topica, yahoogroups, joke-of-the-day, and | similar things that people actually subscribe to. Do not list | them, even if they get abused for spam. NAK. Nobody knows what "other people might consider". Let alone to agree with it blindly. That clause makes no sense, and it devaluates the important first part before the "but". Anything else is fine, but a bit long. "When in doubt don't list" could be added to the and / or <h1> header. Bye, Frank --===============9121277059584468908==-- From ryan@sasknow.com Sat Sep 25 02:49:11 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 18:49:08 -0600 Message-ID: <20040924183313.O3793@drizzle.sasknow.net> In-Reply-To: <1362350365.20040924044020@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8702922583551315116==" --===============8702922583551315116== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discuss: > In order to assist people hand-classifying spam URI domains and IPs > for inclusion or non-inclusion in SURBLs, I've made a draft policy > document: Good, although I think there are a few redundant points, and it would read better in a top-down priority format, "firing your biggest guns first". To me, the main points are (using roman numerals so as not to confuse your numbering system): i) Add domains that appear *only* in spam. Do not add any domains that appear in ham. ii) Beware of poisoning/joe job attempts; not every domain that appears in spam belongs to a spammer! iii) Use these important sources of information as additional input: (List the IADB2, whois, etc., in decreasing order of usefulness) The "not your personal blocklist" point (14), and "common sense" (13) are good points that I think are deserving of discussion in paragraph form beneath the "main points". They're not "criteria", per se, but should definitely be mentioned. After the list of main points are first, clearly defined, and, second, *lightly* expanded upon (remember, we want to make sure people get the main points!), you can include the more general discussion from some of your points further down the page in paragraph format. Seeing a numbered list of more than 5-6 items raises some questions for me, indicating that perhaps the big picture could be lost on some people (especially those newcomers just learning of the SURBL policies). So, in brief, what I'm suggesting is just a bit of restructuring to make the main points clearer, while still providing the detailed information you already have in the document. IMO, you've done a fine job with the information. Hope this helps, - Ryan > > http://www.surbl.org/policy.html > > Please read it and post your comments. > > Jeff C. > -- > "If it appears in hams, then don't list it." > > _______________________________________________ > Discuss mailing list > Discuss(a)lists.surbl.org > http://lists.surbl.org/mailman/listinfo/discuss > -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============8702922583551315116==-- From jeffc@surbl.org Sat Sep 25 03:25:13 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 18:24:54 -0700 Message-ID: <596413979.20040924182454@supranet.net> In-Reply-To: <41548449.28C3@xyzzy.claranet.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4763644519133795149==" --===============4763644519133795149== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Friday, September 24, 2004, 1:32:09 PM, Frank Ellermann wrote: > | For IP addresses look them up in reverse octet order against > | iddb.isipp.com . > s/iddb/iadb2/ NAK, iddb is a domain list. Domains are resolved against this list to turn them into IP addresses, which can then be checked against the main lists (iadb, iadb2, wadb). > | check them against iadb.isipp.com or wadb.isipp.com > s/wadb/iadb2/. WA is "withdrawn accreditation" (= bulk mailer > decided to break the IADB rules), it's a kind of blacklist. > You could mention WA elsewhere. e.g. together with SpamHaus. I've added iadb2 as an alternative to iadb. wadb is still useful, with the caveat you mentioned, so I copied the description of WADB. > | Visit the site or at least > Better remove this, it's too dangerous for the kids, and it can > be misleading without JavaScript. If you need more interesting > sources, you could add whois.sc (and maybe A9.com (?)) True, visiting sites can sometimes be dangeous, I added: (I usually use google's cache of the site, or a text browser like lynx. This is somewhat safer than using a full browser to go to a site, which could contain malicious code. Viewing google summaries is often good enough.) > | 13.Apply common sense > ACK, much better than 2. > | but which other people might consider legitimate. This can > | include sites like topica, yahoogroups, joke-of-the-day, and > | similar things that people actually subscribe to. Do not list > | them, even if they get abused for spam. > NAK. Nobody knows what "other people might consider". Let > alone to agree with it blindly. That clause makes no sense, > and it devaluates the important first part before the "but". In this case we need to try to consider what other people may use. It can be difficult but not impossible. Anyone who works at an ISP, works in an IT department, visits chatrooms, knows novice Internet users, friends, relatives, etc. probably is aware of at least some of these kinds of sites. Strictly speaking these may not always be personally knowable, but it's more of an external social or cultural awareness. > Anything else is fine, but a bit long. "When in doubt don't > list" could be added to the <title> and / or <h1> header. > Bye, Frank Thanks as always Frank, Jeff C. -- "If it appears in hams, then don't list it." --===============4763644519133795149==-- From jeffc@surbl.org Sat Sep 25 03:28:17 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 18:27:58 -0700 Message-ID: <1821953187.20040924182758@supranet.net> In-Reply-To: <20040924183313.O3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0185550906230097910==" --===============0185550906230097910== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Friday, September 24, 2004, 5:49:08 PM, Ryan Thompson wrote: > To me, the main points are (using roman numerals so as not to confuse > your numbering system): > i) Add domains that appear *only* in spam. Do not add any domains that > appear in ham. > ii) Beware of poisoning/joe job attempts; not every domain that appears > in spam belongs to a spammer! > iii) Use these important sources of information as additional input: > (List the IADB2, whois, etc., in decreasing order of usefulness) > The "not your personal blocklist" point (14), and "common sense" (13) > are good points that I think are deserving of discussion in paragraph > form beneath the "main points". They're not "criteria", per se, but > should definitely be mentioned. > After the list of main points are first, clearly defined, and, second, > *lightly* expanded upon (remember, we want to make sure people get the > main points!), you can include the more general discussion from some of > your points further down the page in paragraph format. Seeing a numbered > list of more than 5-6 items raises some questions for me, indicating > that perhaps the big picture could be lost on some people (especially > those newcomers just learning of the SURBL policies). > So, in brief, what I'm suggesting is just a bit of restructuring to make > the main points clearer, while still providing the detailed information > you already have in the document. IMO, you've done a fine job with the > information. All good points. Let me re-organize.... Jeff C. -- "If it appears in hams, then don't list it." --===============0185550906230097910==-- From jeffc@surbl.org Sat Sep 25 04:41:47 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 19:41:25 -0700 Message-ID: <1064190932.20040924194125@supranet.net> In-Reply-To: <596413979.20040924182454@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3351299143314074545==" --===============3351299143314074545== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Friday, September 24, 2004, 6:24:54 PM, Jeff Chan wrote: > On Friday, September 24, 2004, 1:32:09 PM, Frank Ellermann wrote: >> | For IP addresses look them up in reverse octet order against >> | iddb.isipp.com . >> s/iddb/iadb2/ > NAK, iddb is a domain list. Domains are resolved against this list > to turn them into IP addresses, which can then be checked against > the main lists (iadb, iadb2, wadb). Oops, I see you were referring to the first mention of iddb which indeed is on IP addresses and should be checked against iadb. You're right. :-) Jeff C. -- "If it appears in hams, then don't list it." --===============3351299143314074545==-- From jeffc@surbl.org Sat Sep 25 06:35:41 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Fri, 24 Sep 2004 21:35:14 -0700 Message-ID: <1895833571.20040924213514@supranet.net> In-Reply-To: <1821953187.20040924182758@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8965178868891782497==" --===============8965178868891782497== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit OK I Updated the policy page, taking Ryan's top rules and general organizational comments: http://www.surbl.org/policy.html Please let me/us know what you think of it now. Jeff C. -- "If it appears in hams, then don't list it." --===============8965178868891782497==-- From ryan@sasknow.com Sat Sep 25 08:25:04 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 00:24:57 -0600 Message-ID: <20040925001344.K3793@drizzle.sasknow.net> In-Reply-To: <1895833571.20040924213514@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2907720358459217207==" --===============2907720358459217207== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to Jeff Chan: > OK I Updated the policy page, taking Ryan's top rules and general > organizational comments: > > http://www.surbl.org/policy.html > > Please let me/us know what you think of it now. Hi Jeff, Aha! I like it very much. I suspect it will still evolve a bit--as most good things do--but it gets the point across, and also provides a lot of good, useful information that will assist human classifiers in listing (only) the spammiest domains. On a related note, do we want to say anything in this document (or possibly another document) about whitelisting criteria? There are really three main categories: 1. Blacklist material (that's what your policy addresses very well) 1.5. "Almost" blacklist material (the grey ones); ala the "UC" list, are the domains that are almost totally spammers, but may have a few borderline uses 2. Domains that should not be listed, but are not necessarily of "whitelist" merit. These are mostly the domains where insufficient data (or effort) exists to make a determination, which, for good or for ill, is where the bulk of our human efforts are currently focused. 3. Domains that are white; i.e., have definite legitimate uses OK, that's four. If we really want to reduce FPs, we need to carefully consider *all* of these categories when analysing potential domains. I spend just as much time pulling domains out of ham as I do pulling domains out of spam. The distinction between 2 and 3 is almost as difficult as the distinction between 1 and 2 sometimes. - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============2907720358459217207==-- From jeffc@surbl.org Sat Sep 25 10:45:48 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 01:45:27 -0700 Message-ID: <517883770.20040925014527@supranet.net> In-Reply-To: <20040925001344.K3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1317740884075511735==" --===============1317740884075511735== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Friday, September 24, 2004, 11:24:57 PM, Ryan Thompson wrote: > do we want to say anything in this document (or > possibly another document) about whitelisting criteria? There are really > three main categories: > 1. Blacklist material (that's what your policy addresses very well) > 1.5. "Almost" blacklist material (the grey ones); ala the "UC" list, are > the domains that are almost totally spammers, but may have a few > borderline uses > 2. Domains that should not be listed, but are not necessarily of > "whitelist" merit. These are mostly the domains where insufficient > data (or effort) exists to make a determination, which, for good > or for ill, is where the bulk of our human efforts are currently > focused. > 3. Domains that are white; i.e., have definite legitimate uses > OK, that's four. If we really want to reduce FPs, we need to carefully > consider *all* of these categories when analysing potential domains. I > spend just as much time pulling domains out of ham as I do pulling > domains out of spam. > The distinction between 2 and 3 is almost as difficult as the > distinction between 1 and 2 sometimes. > - Ryan I agree with 1 and 3, but another way to look at the undecided middle ground might be to say that if a domain or IP has not proven to be blacklist material and has not been falsely listed and therefore in need of whitelisting, then it perhaps can be ignored until it gets into category 1 or 3. I know that goes against the feelings of people who want to catch every spam, and I understand that feeling myself, but in *practical terms* it may be a *useful* solution. Yes, that misses some marginal and probable spammers, but it lets us focus on the first category which are probably the most important to find in terms of the volume of spam they produce. The others can consume a lot of time and effort without producing the level of performance that catching the *major* spammers in the first category can. I realize you guys are trying to sort out some of the stuff in the middle and I understand some of the reasons for wanting to do it, but I think working on the more clear cases gets us the most results for our efforts. Jeff C. -- "If it appears in hams, then don't list it." --===============1317740884075511735==-- From joewein@pobox.com Sat Sep 25 16:52:33 2004 From: Joe Wein <joewein@pobox.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 23:52:13 +0900 Message-ID: <001701c4a30f$3fef49a0$c801a8c0@sumiyoshidai.org> In-Reply-To: <1895833571.20040924213514@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4615804614780144396==" --===============4615804614780144396== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > http://www.surbl.org/policy.html "The older a domain is the less likely it should be listed. Most spam domains are used for 3 days then abandoned. Domains older than 90 days probably should not be added. A domain more than a few years old usually should not be added." I would say, domains older than 90 days probably should not be added *unless* they use a blacklisted nameserver. You really have to look at both the name servers and the date, in that order. I want to give you some data on domain age for my recent blacklistings (last two weeks): year count 2004 4165 2003 582 2002 30 2001 6 2000 3 <=1999 12 total: 4830 There is a significant percentage of domains registered in 2003, but most of these still fall within one year of the listing. There are extremely few blacklistings for domains registered before 2003, about 1% of the total. Most of the 1999 ones are porn sites using a NS by wildrhino.com, plus one each by vendaregroup.com, webfinity.net, allproactive.com, rackhosters.com, all notorious spamhouses with SBL listings. These domains are exceptions to the rule that old domains usually don't merit listing. About 11% of blacklisted domains were registered within 3 days of detection, 18% within 7 days, 34% within 2 weeks. Then it gets interesting: I have no records in the set for 13-24 days, then a whole bunch of pill spam domains registered at least 25 days ago. These guys seem to wait a little before they strike. 50% of all blacklisted domains are registered no more than 35 days before listing, 60% within two months, 66% within three months, 70% with four months. As you see, the incremental gain per extra month gets smaller and smaller. Six months cover 80%, 12 months 90%, 24 months 97%. A few comments in addition to those numbers: 1) There's a very small set of hardcore spammer NSs for which I list *all* domains that use them, regardless of age. 2) For other domains with SBL-listed NS, I routinely list them *if* they are recently registered. 3) For domains with SBL-listed NS older than a few months, I list them if they fit a pattern. Most of these will be porn and gambling sites from usual suspects, i.e. I'll see lots and lots of domains all sharing the same NS, advertised in similar spam mails. These guys stick around, so it doesn't matter much if you don't list them immediately, before you see a pattern. You can still get them later. 4) I also list sites without SBL records on the NS if they are very recently registered (usually < 6 weeks) and they fit a pattern with regard to naming or what kind of spam subject lines / sender names are used. That takes care of discardable spam domains registered with Joker.com such as these: californiapassword.info coloradopassword.info coloradovodka.info dc-user.info dcpassword.info floridaadmin.info georgiapass.info georgiauser.info hawaii-vodka.info idahouser.info iowavodka.info kentucky-password.info 5) Recently registered domains with a name server from the same domain are more suspicious than those using a different server, because it means the name server has no track record to check. Joe -- http://www.joewein.de/sw/jwSpamSpy/ --===============4615804614780144396==-- From ryan@sasknow.com Sat Sep 25 18:42:39 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 10:42:35 -0600 Message-ID: <20040925103422.U3793@drizzle.sasknow.net> In-Reply-To: <517883770.20040925014527@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2419948922254153528==" --===============2419948922254153528== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discussion list: > On Friday, September 24, 2004, 11:24:57 PM, Ryan Thompson wrote: >> do we want to say anything in this document (or >> possibly another document) about whitelisting criteria? There are really >> three main categories: > >> 1. Blacklist material (that's what your policy addresses very well) >> 1.5. "Almost" blacklist material (the grey ones); ala the "UC" list, are >> 2. Domains that should not be listed, but are not necessarily of >> 3. Domains that are white; i.e., have definite legitimate uses > >> OK, that's four. If we really want to reduce FPs, we need to carefully >> consider *all* of these categories when analysing potential domains. I >> spend just as much time pulling domains out of ham as I do pulling >> domains out of spam. Hi Jeff, > I agree with 1 and 3, but another way to look at the undecided middle > ground might be to say that if a domain or IP has not proven to be > blacklist material and has not been falsely listed and therefore in > need of whitelisting, then it perhaps can be ignored until it gets > into category 1 or 3. > > I know that goes against the feelings of people who want to catch > every spam, and I understand that feeling myself, but in *practical > terms* it may be a *useful* solution. > > Yes, that misses some marginal and probable spammers, but it lets us > focus on the first category which are probably the most important to > find in terms of the volume of spam they produce. The others can > consume a lot of time and effort without producing the level of > performance that catching the *major* spammers in the first category > can. > > I realize you guys are trying to sort out some of the stuff in the > middle and I understand some of the reasons for wanting to do it, but > I think working on the more clear cases gets us the most results for > our efforts. Well, suffice to say, I don't want to open up the "grey" can of worms again! I just wanted to identify the major categories which, in real life, we submitters are actually dealing with on a daily basis. :-) I wrote: >> The distinction between 2 and 3 is almost as difficult as the >> distinction between 1 and 2 sometimes. Meaning, whitelisting is usually just about as difficult as blacklisting. - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============2419948922254153528==-- From nobody@xyzzy.claranet.de Sat Sep 25 20:38:28 2004 From: Frank Ellermann <nobody@xyzzy.claranet.de> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 13:26:54 +0200 Message-ID: <415555FE.1732@xyzzy.claranet.de> In-Reply-To: <1895833571.20040924213514@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8397763732149248133==" --===============8397763732149248133== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote: > Please let me/us know what you think of it now. 13: ... </em</strong> That should be </em></strong>, I got the whole page as <em> ;-) 69 - 75: | <a href="http://www.isipp.com/iadbcodes.php">iadb.isipp.com | or iadb2.isipp.com and wadb.isipp.com</a>. [...] <a href="http://www.isipp.com/iadbcodes.php">iadb.isipp.com</a> or <a href="http://www.isipp.com/iadb2codes.php">iadb2.isipp.com</a>. Don't mention WADB here, remove the explanation, it's only confusing (the linked ISIPP page does it, you don't need it). Or do you know any interesting WADB entries at the moment ? 155: wierd news Google has more hits for "weird news", and LEO's dictionary is down, please ignore me if "wierd" is correct or a joke ;-) Bye, Frank --===============8397763732149248133==-- From jeffc@surbl.org Sun Sep 26 00:46:49 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 15:46:30 -0700 Message-ID: <49683680.20040925154630@supranet.net> In-Reply-To: <415555FE.1732@xyzzy.claranet.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6458521221840344664==" --===============6458521221840344664== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Saturday, September 25, 2004, 4:26:54 AM, Frank Ellermann wrote: > Jeff Chan wrote: >> Please let me/us know what you think of it now. > 13: ... </em</strong> > That should be </em></strong>, I got the whole page as <em> ;-) > 69 - 75: > | <a href="http://www.isipp.com/iadbcodes.php">iadb.isipp.com > | or iadb2.isipp.com and wadb.isipp.com</a>. > [...] > <a href="http://www.isipp.com/iadbcodes.php">iadb.isipp.com</a> or > <a href="http://www.isipp.com/iadb2codes.php">iadb2.isipp.com</a>. > Don't mention WADB here, remove the explanation, it's only > confusing (the linked ISIPP page does it, you don't need it). > Or do you know any interesting WADB entries at the moment ? > 155: wierd news Fixed. Thanks! Jeff C. -- "If it appears in hams, then don't list it." --===============6458521221840344664==-- From jeffc@surbl.org Sun Sep 26 01:46:36 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 16:46:15 -0700 Message-ID: <71912187.20040925164615@supranet.net> In-Reply-To: <20040925103422.U3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8016949910258845721==" --===============8016949910258845721== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Saturday, September 25, 2004, 9:42:35 AM, Ryan Thompson wrote: > Meaning, whitelisting is usually just about as difficult as > blacklisting. Whitelisting is sometimes harder than blocklisting. Most pure spams are extremely obvious. We've all seen the many nearly identical pill, mortgage, and warez spams, right? Those ones are clearly spams and easy to blocklist. There are some spammy-mentioned legitimate sites that are harder to identify as legitimate, like those that appear in stock newsletters, joke-of-the-day type, mailing lists, newsletters, etc. Those require more research to find out if the reporter forgot they were subscribed, whether the domain belongs to spam gangs, whether there is a Joe Job going on, or any number of other factors. But the decision needs to be made if we are to prevent or fix false positives. The decision to whitelist is often difficult and usually requires at least some research. Fortunately some of our research tools like GetURI and others help quite a bit, but classification still requires human judgement and effort. Jeff C. -- "If it appears in hams, then don't list it." --===============8016949910258845721==-- From jeffc@surbl.org Sun Sep 26 05:00:40 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 20:00:18 -0700 Message-ID: <641320150.20040925200018@supranet.net> In-Reply-To: <001701c4a30f$3fef49a0$c801a8c0@sumiyoshidai.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8247522215501162881==" --===============8247522215501162881== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Saturday, September 25, 2004, 7:52:13 AM, Joe Wein wrote: >> http://www.surbl.org/policy.html > I would say, domains older than 90 days probably should not be added > *unless* they use a blacklisted nameserver. > You really have to look at both the name servers and the date, in that > order. > I want to give you some data on domain age for my recent blacklistings (last > two weeks): > year count > 2004 4165 > 2003 582 > 2002 30 > 2001 6 > 2000 3 > <=1999 12 > total: 4830 > There is a significant percentage of domains registered in 2003, but most of > these still fall within one year of the listing. There are extremely few > blacklistings for domains registered before 2003, about 1% of the total. [...] > About 11% of blacklisted domains were registered within 3 days of detection, > 18% within 7 days, 34% within 2 weeks. > Then it gets interesting: I have no records in the set for 13-24 days, then > a whole bunch of pill spam domains registered at least 25 days ago. These > guys seem to wait a little before they strike. > 50% of all blacklisted domains are registered no more than 35 days before > listing, 60% within two months, 66% within three months, 70% with four > months. As you see, the incremental gain per extra month gets smaller and > smaller. Six months cover 80%, 12 months 90%, 24 months 97%. > A few comments in addition to those numbers: > 1) There's a very small set of hardcore spammer NSs for which I list *all* > domains that use them, regardless of age. > 2) For other domains with SBL-listed NS, I routinely list them *if* they are > recently registered. > 3) For domains with SBL-listed NS older than a few months, I list them if > they fit a pattern. Most of these will be porn and gambling sites from usual > suspects, i.e. I'll see lots and lots of domains all sharing the same NS, > advertised in similar spam mails. [...] > 4) I also list sites without SBL records on the NS if they are very recently > registered (usually < 6 weeks) and they fit a pattern with regard to naming > or what kind of spam subject lines / sender names are used. That takes care > of discardable spam domains registered with Joker.com such as these: Hi Joe, All your observations and policies seem quite reasonable to me. :-) There can be some lag in SBL detecting new domains and new spam gang name servers, so it's definitely true that non-inclusion in SBL should not give new domains a "free pass". New domains not matching SBL can be real spammers. Thanks also for sharing your research into the age of spam domains! It's very useful data, though it might also be interesting to know how long a domain is used after it appears in the first spams we detect. Many are only used for a few days according to a well-placed spam statistician I spoke with before. It's also interesting that some domains don't get used immediately after registration. (Note that I said many spam domains only get used for a few days, not that they only get used for a few days after registration.) I've updated the domain age guidelines, taking into account your research: "The older a domain is the less likely it should be listed. Most spam domains are used for 3 days then abandoned. Domains older than 90 days probably should not be added. Domains more than 1 year old usually should not be added. However, domains that use name servers listed in SBL as belonging to known spam operators can be included, regardless of age. (See below.)" How does that sound? Jeff C. -- "If it appears in hams, then don't list it." --===============8247522215501162881==-- From ryan@sasknow.com Sun Sep 26 05:58:28 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 21:58:24 -0600 Message-ID: <20040925212135.E3793@drizzle.sasknow.net> In-Reply-To: <71912187.20040925164615@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3637304968477634200==" --===============3637304968477634200== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discussion list: > The decision to whitelist is often difficult and usually requires at > least some research. Fortunately some of our research tools like > GetURI and others help quite a bit, Speaking of which, I've worked hard to convince GetURI (new version pending release!) to follow the SURBL inclusion criteria pretty closely; I've added SBL lookups on the forward IP(s) and nameservers, as well as IADB2 and WADB checks on the IP(s), although the IADB2/WADB checks rarely hit. The SBL lookups are extremely useful. And, of course, GetURI has had the --age option for a while now. Here's what the output looks like now (this took 57s for ~900 messages, even with the great number of DNS queries needed to process the 116 domains not found in SURBL): http://ry.ca/geturi/public/criteriatest.html (62K) Feedback welcome! These features are in the current development version (not available to the public, yet, sorry), but, once testing is complete, there'll be a new release. > but classification still requires human judgement and effort. Agreed! Hopefully GetURI can reduce the human effort, so humans have more energy for judgement. :-) - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============3637304968477634200==-- From jeffc@surbl.org Sun Sep 26 06:38:22 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 21:38:00 -0700 Message-ID: <101945732.20040925213800@supranet.net> In-Reply-To: <20040925212135.E3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2079611156408283851==" --===============2079611156408283851== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Saturday, September 25, 2004, 8:58:24 PM, Ryan Thompson wrote: > Jeff Chan wrote to SURBL Discussion list: >> The decision to whitelist is often difficult and usually requires at >> least some research. Fortunately some of our research tools like >> GetURI and others help quite a bit, > Speaking of which, I've worked hard to convince GetURI (new version > pending release!) to follow the SURBL inclusion criteria pretty closely; > I've added SBL lookups on the forward IP(s) and nameservers, as well as > IADB2 and WADB checks on the IP(s), although the IADB2/WADB checks > rarely hit. The SBL lookups are extremely useful. And, of course, GetURI > has had the --age option for a while now. > Here's what the output looks like now (this took 57s for ~900 messages, > even with the great number of DNS queries needed to process the 116 domains > not found in SURBL): > http://ry.ca/geturi/public/criteriatest.html (62K) > Feedback welcome! OK It might help to have a legend, especially for people not familiar with the output. I assume the domains in white are the grey (uncertain) ones, and the ones in grey are the whitelisted ones. (A little ironic, eh?) Jeff C. -- "If it appears in hams, then don't list it." --===============2079611156408283851==-- From jeffc@surbl.org Sun Sep 26 06:52:29 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 21:52:06 -0700 Message-ID: <171069734.20040925215206@supranet.net> In-Reply-To: <101945732.20040925213800@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5557411165802166176==" --===============5557411165802166176== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Saturday, September 25, 2004, 9:38:00 PM, Jeff Chan wrote: > OK It might help to have a legend, especially for people not > familiar with the output. I assume the domains in white are > the grey (uncertain) ones, and the ones in grey are the > whitelisted ones. (A little ironic, eh?) Or for that matter, why not make the whitelisted ones in white and they uncertain ones in grey..... Hmmm.... Jeff C. -- "If it appears in hams, then don't list it." --===============5557411165802166176==-- From ryan@sasknow.com Sun Sep 26 07:01:11 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sat, 25 Sep 2004 23:01:06 -0600 Message-ID: <20040925224342.Q3793@drizzle.sasknow.net> In-Reply-To: <101945732.20040925213800@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2504572597707898114==" --===============2504572597707898114== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discuss: >> http://ry.ca/geturi/public/criteriatest.html (62K) > >> Feedback welcome! > > OK It might help to have a legend, especially for people not > familiar with the output. I assume the domains in white are > the grey (uncertain) ones, and the ones in grey are the > whitelisted ones. (A little ironic, eh?) Heh. Yeah, good point. In my mind, I never really associated the colours with the "white/grey/black" states of domains. Maybe I should give some thought to flipping those colours around, eh? :-) And, yes, before the next official release, there will be a "legend" of sorts; more on-line documentation, in other words, and some improvements to the output format itself to make it more readable. Thanks for the feedback, Jeff! - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============2504572597707898114==-- From ryan@sasknow.com Sun Sep 26 10:31:45 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 02:31:41 -0600 Message-ID: <20040926022326.E3793@drizzle.sasknow.net> In-Reply-To: <20040925224342.Q3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5990524669138964459==" --===============5990524669138964459== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Ryan Thompson wrote to Jeff Chan and SURBL Discussion list: > Thanks for the feedback, Jeff! Hey everybody, Does this look better? http://ry.ca/geturi/results.html There are many improvements to the output. Even *I'm* impressed. It's getting close to feature freeze/release time again, methinks, to put these improvements into a stable release. So, if anyone has anything they'd like to see right away, please speak now. :-) - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============5990524669138964459==-- From surbl@alexb.ch Sun Sep 26 10:46:42 2004 From: Alex Broens <surbl@alexb.ch> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 10:46:03 +0200 Message-ID: <415681CB.9040509@alexb.ch> In-Reply-To: <20040926022326.E3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1898492271016080103==" --===============1898492271016080103== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Ryan Thompson wrote: > > Ryan Thompson wrote to Jeff Chan and SURBL Discussion list: > >> Thanks for the feedback, Jeff! > > > Hey everybody, > > Does this look better? http://ry.ca/geturi/results.html > > There are many improvements to the output. Even *I'm* impressed. > > It's getting close to feature freeze/release time again, methinks, to > put these improvements into a stable release. So, if anyone has anything > they'd like to see right away, please speak now. :-) Great! Why not add Spamhaus' XBL zone lookups? Alex --===============1898492271016080103==-- From ryan@sasknow.com Sun Sep 26 11:12:42 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 03:12:34 -0600 Message-ID: <20040926030459.V3793@drizzle.sasknow.net> In-Reply-To: <415681CB.9040509@alexb.ch> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4614700854391947348==" --===============4614700854391947348== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Alex Broens wrote to SURBL Discussion list: > Ryan Thompson wrote: >> >> Ryan Thompson wrote to Jeff Chan and SURBL Discussion list: >> >>> Thanks for the feedback, Jeff! >> >> >> Hey everybody, >> >> Does this look better? http://ry.ca/geturi/results.html >> >> There are many improvements to the output. Even *I'm* impressed. >> >> It's getting close to feature freeze/release time again, methinks, to >> put these improvements into a stable release. So, if anyone has anything >> they'd like to see right away, please speak now. :-) > > Great! Thanks! > Why not add Spamhaus' XBL zone lookups? The thought had only briefly crossed my mind. Is XBL really a good resource for SURBL classification? I thought XBL just listed exploited systems and open HTTP proxies. Hmm. I suppose I could just code it up and run it on a bunch of mail to see what happens... Or I could just use the combined sbl-xbl.spamhaus.org list, I suppose. - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============4614700854391947348==-- From jeffc@surbl.org Sun Sep 26 11:15:07 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 02:14:45 -0700 Message-ID: <1685213262.20040926021445@supranet.net> In-Reply-To: <415681CB.9040509@alexb.ch> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0705439994039780115==" --===============0705439994039780115== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Sunday, September 26, 2004, 1:46:03 AM, Alex Broens wrote: > Ryan Thompson wrote: >> Hey everybody, >> >> Does this look better? http://ry.ca/geturi/results.html >> >> There are many improvements to the output. Even *I'm* impressed. >> >> It's getting close to feature freeze/release time again, methinks, to >> put these improvements into a stable release. So, if anyone has anything >> they'd like to see right away, please speak now. :-) > Great! > Why not add Spamhaus' XBL zone lookups? XBL is about mail senders, open relays, open proxies, etc. While it may be interesting to check header addresses for a given message against XBL, strictly speaking it's the URI domain servers and name servers that are most relevant to SURBLs. Those are in SBL. Jeff C. -- "If it appears in hams, then don't list it." --===============0705439994039780115==-- From surbl@alexb.ch Sun Sep 26 11:17:22 2004 From: Alex Broens <surbl@alexb.ch> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 11:16:45 +0200 Message-ID: <415688FD.8020706@alexb.ch> In-Reply-To: <20040926030459.V3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0218808437282402406==" --===============0218808437282402406== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Ryan Thompson wrote: > Alex Broens wrote to SURBL Discussion list: > >> Ryan Thompson wrote: >> >>> >>> Ryan Thompson wrote to Jeff Chan and SURBL Discussion list: >>> >>>> Thanks for the feedback, Jeff! >>> >>> >>> >>> Hey everybody, >>> >>> Does this look better? http://ry.ca/geturi/results.html >>> >>> There are many improvements to the output. Even *I'm* impressed. >>> >>> It's getting close to feature freeze/release time again, methinks, to >>> put these improvements into a stable release. So, if anyone has anything >>> they'd like to see right away, please speak now. :-) >> >> >> Great! > > > Thanks! > >> Why not add Spamhaus' XBL zone lookups? > > > The thought had only briefly crossed my mind. Is XBL really a good > resource for SURBL classification? I thought XBL just listed exploited > systems and open HTTP proxies. Hmm. I suppose I could just code it up > and run it on a bunch of mail to see what happens... Or I could just use > the combined sbl-xbl.spamhaus.org list, I suppose. If a spammy looking msg comes thru an exploited system IMO it would qualify even more to be a SURBL inclusion as a genuine "marketer" would not be expected to use exploited machines, right? (silently waiting for Jeff to bark at me :-) Keeping the lookups separate would give us a bit more detail to evaluate. Alex --===============0218808437282402406==-- From jeffc@surbl.org Sun Sep 26 11:35:22 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 02:35:00 -0700 Message-ID: <11710227982.20040926023500@supranet.net> In-Reply-To: <415688FD.8020706@alexb.ch> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2066398229280084595==" --===============2066398229280084595== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Sunday, September 26, 2004, 2:16:45 AM, Alex Broens wrote: > If a spammy looking msg comes thru an exploited system IMO it would > qualify even more to be a SURBL inclusion as a genuine "marketer" would > not be expected to use exploited machines, right? That's definitely true, and one of the things I usually look for in SURBL listing candidates. (I thought you were referring to checking URI domains against XBL, which probably would not catch much.) XBL is an excellent list of spam senders, by far the biggest catcher of spam senders in my regular RBLs, so it probably would be good as a header check for GetURI also. Ryan can we make this a feature request? As we mentioned earlier, zombies are a major reason for SURBLs to exist. If someone uses fixed mail senders, those are easily blocked using regular RBLs. SURBLs are a largely a response to zombies, since without consistent mail senders to look for, content, specifically spam advertised web sites was the next logical thing, IMO. Jeff C. -- "If it appears in hams, then don't list it." --===============2066398229280084595==-- From raymond@prolocation.net Sun Sep 26 12:39:29 2004 From: Raymond Dijkxhoorn <raymond@prolocation.net> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 12:39:29 +0200 Message-ID: <Pine.LNX.4.61.0409261238190.1126@mailbox.prolocation.net> In-Reply-To: <20040926030459.V3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1819595887885873558==" --===============1819595887885873558== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Hi! >> Why not add Spamhaus' XBL zone lookups? > The thought had only briefly crossed my mind. Is XBL really a good > resource for SURBL classification? I thought XBL just listed exploited > systems and open HTTP proxies. Hmm. I suppose I could just code it up > and run it on a bunch of mail to see what happens... Or I could just use > the combined sbl-xbl.spamhaus.org list, I suppose. I would really only use SBL, if you check agains XBL you could also test on DSBL, but we want to get the hardcore non-proxy spammers. The zombies are stopped with DSBL/XBL and alike anyway. Any thoughts? Bye, Raymond. --===============1819595887885873558==-- From jeffc@surbl.org Sun Sep 26 13:51:19 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 04:50:57 -0700 Message-ID: <299603772.20040926045057@supranet.net> In-Reply-To: <Pine.LNX.4.61.0409261238190.1126@mailbox.prolocation.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6674285412249861489==" --===============6674285412249861489== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Sunday, September 26, 2004, 3:39:29 AM, Raymond Dijkxhoorn wrote: (Alex wrote:) >>> Why not add Spamhaus' XBL zone lookups? (Ryan replied:) >> The thought had only briefly crossed my mind. Is XBL really a good >> resource for SURBL classification? I thought XBL just listed exploited >> systems and open HTTP proxies. Hmm. I suppose I could just code it up >> and run it on a bunch of mail to see what happens... Or I could just use >> the combined sbl-xbl.spamhaus.org list, I suppose. > I would really only use SBL, if you check agains XBL you could also test > on DSBL, but we want to get the hardcore non-proxy spammers. The zombies > are stopped with DSBL/XBL and alike anyway. Any thoughts? Yes, there was perhaps some confusion about what Alex meant in suggesting XBL. If he meant use it to check headers then I agree it's a useful way to spot zombie and open server usage. If he meant to try XBL against spam URIs, then I agree it probably won't do much. Jeff C. -- "If it appears in hams, then don't list it." --===============6674285412249861489==-- From surbl@alexb.ch Sun Sep 26 14:00:47 2004 From: Alex Broens <surbl@alexb.ch> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 14:00:11 +0200 Message-ID: <4156AF4B.6000002@alexb.ch> In-Reply-To: <299603772.20040926045057@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7988304781391245195==" --===============7988304781391245195== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote: > On Sunday, September 26, 2004, 3:39:29 AM, Raymond Dijkxhoorn wrote: > (Alex wrote:) > >>>>Why not add Spamhaus' XBL zone lookups? > > > (Ryan replied:) > >>>The thought had only briefly crossed my mind. Is XBL really a good >>>resource for SURBL classification? I thought XBL just listed exploited >>>systems and open HTTP proxies. Hmm. I suppose I could just code it up >>>and run it on a bunch of mail to see what happens... Or I could just use >>>the combined sbl-xbl.spamhaus.org list, I suppose. > > >>I would really only use SBL, if you check agains XBL you could also test >>on DSBL, but we want to get the hardcore non-proxy spammers. The zombies >>are stopped with DSBL/XBL and alike anyway. Any thoughts? > > > Yes, there was perhaps some confusion about what Alex meant > in suggesting XBL. If he meant use it to check headers then > I agree it's a useful way to spot zombie and open server usage. yep that was the idea..... > If he meant to try XBL against spam URIs, then I agree it > probably won't do much. naaaaaaa... did I say that? :) Alex --===============7988304781391245195==-- From ryan@sasknow.com Sun Sep 26 19:36:28 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 11:36:26 -0600 Message-ID: <20040926112634.I3793@drizzle.sasknow.net> In-Reply-To: <11710227982.20040926023500@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2975635766943181948==" --===============2975635766943181948== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discuss: > On Sunday, September 26, 2004, 2:16:45 AM, Alex Broens wrote: >> If a spammy looking msg comes thru an exploited system IMO it would >> qualify even more to be a SURBL inclusion as a genuine "marketer" would >> not be expected to use exploited machines, right? > > That's definitely true, and one of the things I usually look > for in SURBL listing candidates. (I thought you were referring > to checking URI domains against XBL, which probably would not > catch much.) > > XBL is an excellent list of spam senders, by far the biggest > catcher of spam senders in my regular RBLs, so it probably > would be good as a header check for GetURI also. Ryan can > we make this a feature request? Sure. Now it's making sense. :-) Fortunately, adding header checks will be easy, because I'm already using the SpamAssassin engine. - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============2975635766943181948==-- From ryan@sasknow.com Sun Sep 26 23:22:04 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 15:22:02 -0600 Message-ID: <20040926132657.S3793@drizzle.sasknow.net> In-Reply-To: <20040926112634.I3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6921644168099256147==" --===============6921644168099256147== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Ryan Thompson wrote to SURBL Discussion list: >> XBL is an excellent list of spam senders, by far the biggest catcher >> of spam senders in my regular RBLs, so it probably would be good as a >> header check for GetURI also. Ryan can we make this a feature >> request? > > Sure. Now it's making sense. :-) Fortunately, adding header checks > will be easy, because I'm already using the SpamAssassin engine. OK, I've tried this, but it slows down the runs considerably, and my 2K test corpus had 54 RCVD_IN_XBL hits, but for some reason, *none* of those messages contained domains that were not already listed in SURBL. The run took 26 minutes, instead of the usual 2-3m for the 2K corpus. Then, I used the new --surbl=hostname option to only check against WS only (instead of the default multi), and found only 2/381 (0.5%) domains spamvertised by an XBL listed host. Hmm. Then I fed the --surbl option a local "dummy" SURBL list containing only test entries, effectively disabling the SURBL filter in GetURI, and have 52/3130 (1.6%) domains whose message was RCVD_IN_XBL. So, I think, given the low hit rate (especially in the usual case of only looking for new SURBL domains), and the tremendous amount of extra time required to do the XBL header/net test (the last run took 48 minutes, compared to ~16 minutes without the header tests), so I'm going to make GetURI default to *not* doing the header checks, and let people enable them with the new --header option. With all of these new DNS tests, network delays are now definitely the bottleneck in GetURI. Soon (not for 1.6, maybe 1.7), I think I'm going to have to go to a forked or threaded model. - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============6921644168099256147==-- From jeffc@surbl.org Sun Sep 26 23:55:49 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Sun, 26 Sep 2004 14:55:28 -0700 Message-ID: <190678039.20040926145528@supranet.net> In-Reply-To: <20040926132657.S3793@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8438892328158434464==" --===============8438892328158434464== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Sunday, September 26, 2004, 2:22:02 PM, Ryan Thompson wrote: > So, I think, given the low hit rate (especially in the usual case of > only looking for new SURBL domains), and the tremendous amount of extra > time required to do the XBL header/net test (the last run took 48 > minutes, compared to ~16 minutes without the header tests), so I'm going > to make GetURI default to *not* doing the header checks, and let people > enable them with the new --header option. Sounds reasonable to me. :-) Jeff C. -- "If it appears in hams, then don't list it." --===============8438892328158434464==-- From jeffc@surbl.org Mon Sep 27 10:36:54 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 01:36:34 -0700 Message-ID: <439871890.20040927013634@supranet.net> In-Reply-To: <1895833571.20040924213514@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0163714519818375835==" --===============0163714519818375835== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Friday, September 24, 2004, 9:35:14 PM, Jeff Chan wrote: > OK I Updated the policy page, taking Ryan's top rules and general > organizational comments: > http://www.surbl.org/policy.html > Please let me/us know what you think of it now. Does anyone else have any comments on the updated policy page for adding new records to manual SURBL lists? It includes changes thanks to comments from Frank, Ryan, Joe and others. Please reply if you have anything to add or change. Jeff C. -- "If it appears in hams, then don't list it." --===============0163714519818375835==-- From nobody@xyzzy.claranet.de Tue Sep 28 01:52:08 2004 From: Frank Ellermann <nobody@xyzzy.claranet.de> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Tue, 28 Sep 2004 01:51:27 +0200 Message-ID: <4158A77F.B5F@xyzzy.claranet.de> In-Reply-To: <641320150.20040925200018@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5321642844030390349==" --===============5321642844030390349== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote: > How does that sound? Good. A link to UC somewhere (not necessarily on this page) would be also nice. About the "age" problem: Sometimes spamvertized domains are used for weeks, more than 2 months in the case of aktion2004.net.multi.surbl.org = 127.0.0.118 Some days ago I got a feedback mail from ICANN's WDPRS, it was about a complaint in May. So attacking domains on the "whois data problem" track takes some time, certainly more than 90 days. At the moment Joe's idea "don't add anything older than 90 days" probably works, but the spammers will as always try to bypass any strict rules. Therefore your wording ("should") is IMHO fine, bye, Frank --===============5321642844030390349==-- From jeffc@surbl.org Tue Sep 28 02:15:23 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 17:15:03 -0700 Message-ID: <952169805.20040927171503@supranet.net> In-Reply-To: <4158A77F.B5F@xyzzy.claranet.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1611217294379527512==" --===============1611217294379527512== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Monday, September 27, 2004, 4:51:27 PM, Frank Ellermann wrote: > Some days ago I got a feedback mail from ICANN's WDPRS, it > was about a complaint in May. So attacking domains on the > "whois data problem" track takes some time, certainly more > than 90 days. At the moment Joe's idea "don't add anything > older than 90 days" probably works, but the spammers will > as always try to bypass any strict rules. Actually it's Outblaze that tries to cut off domains at 90 days. Joe is more flexible, suggesting that domains older than 90 days can be included if, for example, they use name servers or hosting addresses in SBL. Joe's statistics did show a large drop off in spam domain registrations older 1 year however: > 50% of all blacklisted domains are registered no more than 35 days before > listing, 60% within two months, 66% within three months, 70% with four > months. As you see, the incremental gain per extra month gets smaller and > smaller. Six months cover 80%, 12 months 90%, 24 months 97%. So there is a point of diminishing returns in going with the older domains. There is also perhaps an increasing chance of FPs with older domains. (I didn't graph the above, but the numbers look like a nice exponential decay....) Jeff C. -- "If it appears in hams, then don't list it." --===============1611217294379527512==-- From ryan@sasknow.com Tue Sep 28 02:50:40 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 18:50:39 -0600 Message-ID: <20040927184437.F49599@drizzle.sasknow.net> In-Reply-To: <952169805.20040927171503@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4046924289359105238==" --===============4046924289359105238== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discuss: > So there is a point of diminishing returns in going with > the older domains. There is also perhaps an increasing > chance of FPs with older domains. > > (I didn't graph the above, but the numbers look like a > nice exponential decay....) I have graphed similar numbers, but I don't have the results handy. It's more like a normal distribution ("bell curve"), with the mean at 0 days (actually slightly greater than zero, but that's a relatively constant skew due to lag between registration time and spam delivery/processing). GetURI uses a modified version of the normal distribution as part of its heuristic. The other parts of GetURI's heuristic are pretty much all additive, but I found that, statistically, domain age is good enough to be multiplicative, and it'll *reduce* rankings for domains that have been registered for a long time. It's so nice when math actually works. :-) - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============4046924289359105238==-- From jeffc@surbl.org Tue Sep 28 03:03:58 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 18:03:38 -0700 Message-ID: <1469158440.20040927180338@supranet.net> In-Reply-To: <20040927184437.F49599@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5669109258728119396==" --===============5669109258728119396== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Monday, September 27, 2004, 5:50:39 PM, Ryan Thompson wrote: > Jeff Chan wrote to SURBL Discuss: >> So there is a point of diminishing returns in going with >> the older domains. There is also perhaps an increasing >> chance of FPs with older domains. >> >> (I didn't graph the above, but the numbers look like a >> nice exponential decay....) > I have graphed similar numbers, but I don't have the results handy. It's > more like a normal distribution ("bell curve"), with the mean at 0 days > (actually slightly greater than zero, but that's a relatively constant > skew due to lag between registration time and spam delivery/processing). > GetURI uses a modified version of the normal distribution as part of its > heuristic. The other parts of GetURI's heuristic are pretty much all > additive, but I found that, statistically, domain age is good enough to > be multiplicative, and it'll *reduce* rankings for domains that have > been registered for a long time. It's so nice when math actually works. > :-) > - Ryan Heh, when I said "normal", statisticians jumped all over that. Turns out the distributions may be more like Zipfian. Zipf curves have most of the data concentrated in a small amount of the curve (e.g., young domains) and a small amount of the data in a larger part of the curve (e.g., old domains). I hope I'm explaining that correctly. That said, if you found some numerical heuristics that fit the data well, that's great! Jeff C. -- "If it appears in hams, then don't list it." --===============5669109258728119396==-- From nobody@xyzzy.claranet.de Tue Sep 28 03:25:32 2004 From: Frank Ellermann <nobody@xyzzy.claranet.de> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Tue, 28 Sep 2004 03:25:00 +0200 Message-ID: <4158BD6C.60CB@xyzzy.claranet.de> In-Reply-To: <952169805.20040927171503@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4371625765492396149==" --===============4371625765492396149== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote: > Joe's statistics did show a large drop off in spam > domain registrations older 1 year however: Makes sense, if one year is the shortest period offered by registrars. Some spammers could try to use their registered domains as long as possible without renewing the registration. But I'm notoriously bad in guessing what spammers "think". Bye, Frank --===============4371625765492396149==-- From ryan@sasknow.com Tue Sep 28 06:18:46 2004 From: Ryan Thompson <ryan@sasknow.com> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 22:18:42 -0600 Message-ID: <20040927220645.Y49599@drizzle.sasknow.net> In-Reply-To: <1469158440.20040927180338@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4516280176794509256==" --===============4516280176794509256== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Jeff Chan wrote to SURBL Discuss: > Heh, when I said "normal", statisticians jumped all over that. :-) > Turns out the distributions may be more like Zipfian. Zipf curves > have most of the data concentrated in a small amount of the curve > (e.g., young domains) and a small amount of the data in a larger part > of the curve (e.g., old domains). I hope I'm explaining that > correctly. > > That said, if you found some numerical heuristics that fit > the data well, that's great! Yup, my function seems to fit quite nicely to the data I had at the time. However, I do plan to work on the scoring in more detail. GetURI is currently in a huge growth spurt with the advent of different relevant tests, and finally getting up to speed with what people are already doing to classify domains. Once that settles down a bit, I'll probably look more closely at scoring. Right now, though, it is definitely quite a useful metric at the extremes (top/bottom of output). It's weak in the middle ground, but, then again, we all know the middle ground is damned hard enough for humans. :-) - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America --===============4516280176794509256==-- From jeffc@surbl.org Tue Sep 28 06:25:21 2004 From: Jeff Chan <jeffc@surbl.org> To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] RFC: SURBL inclusion policy Date: Mon, 27 Sep 2004 21:24:58 -0700 Message-ID: <11510708351.20040927212458@supranet.net> In-Reply-To: <20040927220645.Y49599@drizzle.sasknow.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8919292387842137229==" --===============8919292387842137229== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Monday, September 27, 2004, 9:18:42 PM, Ryan Thompson wrote: > Once that settles down a bit, I'll > probably look more closely at scoring. Right now, though, it is > definitely quite a useful metric at the extremes (top/bottom of output). > It's weak in the middle ground, but, then again, we all know the middle > ground is damned hard enough for humans. :-) Indeed. It's probably the extremes, and the FPs in the middle that are the most important, and GetURI is a nice tool for spotting those. Jeff C. -- "If it appears in hams, then don't list it." --===============8919292387842137229==--