Better spam filter for postfix [Postfix]

Prev: Simple Hack To Get $2000 To Your PayPal Account
Next: null client doc

From: "Steve" on 15 Jul 2010 20:17

-------- Original-Nachricht --------
> Datum: Fri, 16 Jul 2010 02:09:43 +0300
> Von: Henrik K <hege(a)hege.li>
> An: postfix-users(a)postfix.org
> Betreff: Re: Better spam filter for postfix

> On Thu, Jul 15, 2010 at 11:16:43PM +0200, Steve wrote:
> > > >
> > > > If you looking for something that is beyond just being better then I
> > > > recommend CRM114 or DSPAM or OSBF-Lua. If you insist in having the
> AV
> > > > included in the Anti-Spam tool then use something like DSPAM.
> > >
> > > I'd consider those as "engines". You can run one or all of them if you
> > > really want. MailScanner, Amavisd-new, Mimedefang and even SA (as a
> > > framework) are some of the "glues" that might utilize them.
> > >
> >
> > Well.... those so called "engines" can run on their own. They don't need
> > to be wrapped inside any of the "glues" you mention. Especially not when
> > those "glues" are memory hogs.
>
> Can you be more specific? Maybe you are addressing SA memory usage, which
> might only matter on some cases. Servers have lots of memory these days,
> and
> good MTA checks might reduce scanning needs greatly.
>
Yes. Servers have a lot of memory those days but not enough memory to waste it. My point is not only memory. My biggest problem with tools such as SA is that it is very slow compared to other solutions out there. I in general can say that I classify x messages per second with filter XYZ while I in general would say that SpamAssassin needs x seconds per message. All the test in the past I have done with SpamAssassin confirm that statement. And for me system resources are important. Be it memory, CPU cycles, throughput etc...

> > > Generally DSPAM etc require user interaction/learning.
> > >
> > So does CRM114 and OSBF-Lua. But you are wrong in thinking that they
> need
> > an insane amount of training/learning.
>
> That's what I meant with "etc". I did use DSPAM exclusively for few months
> in the past, but for my personal use I saw no benefits from it.
>
Okay.

> > > SA does not, since
> > > it's a framework of rules and plugins and can autolearn Bayes if you
> want
> > > to
> > > - or even do the same for DSPAM etc if you use them as SA plugins.
> Let's
> > > not
> > > forget that DSPAM etc also require a database backend,
> > >
> >
> > You are WRONG. DSPAM does NOT require a database backend. I don't know
> > where you have that from? DSPAM MIGHT use a database backend but can run
> > well without one (using the Hash driver).
>
> So you don't consider the CSS Hash driver a "database backend"? It
> requires
> disk, memory and CPU to store and retrieve tokens. Whatever..
>
Well... it has a structure but I would not consider it a database in the classical way. If the CSS file is a database then a XML file is a database too and I personally don't consider a XML file to be a database.

> > > which might require
> > > lots of memory and/or disk, so it's not exactly "free" either.
> Accuracy
> > > depends heavily on configuration of all the components and other
> voodoo.
> > >
> >
> > What? Voodoo? Yeah right. There is less voodoo in CRM114, OSBF-Lua and
> DSPAM then in SA. I explain a user the following:
> > * you get mail and if it is wrongly classified by the Anti-Spam filter
> then you correct it and the filter will learn.
> > * the wrong classification is done based on YOUR prior classification
> you have feed to the Anti-Spam filter.
> > * if you feed wrong data to the Anti-Spam filter then the filter will
> make errors.
> > * the more you correct the higher the accuracy gets and you need less
> and less to correct errors.
> >
> > That's easy to understand.
> >
> >
> > IMHO it is easier to explain then telling the user:
> > * there is an army of rule writers out there that is writing rules for
> SA where THEY are telling what is spam and what is ham.
> >
> > And if the user asks me: what rules are that?
> > Then I would need to say that there are a gazillion of rules that I can
> not explain in detail without taking much of his time to go throw all the
> rules one by one.
> >
> > Anyway...
>
> So you have made your point. You prefer (or are required) to have user in
> control.
>
Yes. The big problem is that no solution out there is 100% accurate for all users. So the only way to make the user happy is to delegate the control to him.

> I guess you don't use ANY other methods (blacklists etc) than users own
> statistical input, since you might have to tell your users that "THEY"
> though your mail was spam?
>
No. I use other methods. A lot of them. I even developed my own stuff based on research papers from Anti-Spam researchers/companies. My setup is made that way that I have made many defense rings around Postfix. Each ring has it's own techniques and the father the ring is from Postfix the less resources it uses. However... each domain owner and/or user has control over the rings. He/she can turn them on/off, depending on their needs. I preset which are on and which are off but at the end each one of them is controllable by the end-user (or domain owner, which precedes user rules). Some stuff however is not controllable by the end user or domain owner. Stuff like SPF checks and DKIM checks/signing. That can not be turned off.

I know this sounds very complicated but the problem is that when offer mail services to others then you can't impossibly make all of them happy with a simple setup. Each individual has his/her own viewpoint on how mail should work and and and... and soon or later you stop arguing and you implement what you think is good and you set this as default but you allow the owner of the domain or his/her users control whatever they think is ok for them.

> > For me the three mentioned products are all better then SA because they
> > have a smaller memory footprint then SA and are way faster then SA and
> > properly set up require less maintenance and are way more accurate then
> > SA.
>
> Good for you. Naturally resource usage is lower, the less stuff you do.
> One
> has to balance needs against that.
>
I perfectly understand that.

> But let's forget the accuracy bs, there are too many variables for such
> generic claims to be made. You can achieve "happy users" with pretty much
> any tool out there if used right.
>
That is right (I mean the part with "happy users"). You however can not deny that some tools are known to be better then others. Just look for example at OSBF-Lua. That beast has won at TREC 2006 and has been number 1 at CEAS 2008 Spam Filter Live Challenge.

> I'm in a happy position to be able to reject/quarantine spam for 1000+
> users
> without ever bothering them with it, and very rarely get any questions
> about
> mail. If I had to do it the ISP way, I might consider DSPAM, then again I
> see nothing against using SA (or any other tool out there).
>
Per default I would not see anything against SA as well. I know setups that filter millions of mail per day with SA, without any issue. Their HW requirement is huge compared to other solutions but at the end it has to be okay for them and if HW requirements are high and this is not an issue for them then so be it.

> > And regarding the training: DSPAM and CRM114 offers features where you
> can
> > pre-learn so that your users are having from day one already a high
> > accuracy (generally above 95%) and if they re-classify the first bunch
> of
> > errors then their accuracy jumps easy over 98.x%/99.x%. In DSPAM that
> kind
> > of setup is accomplished with merged groups or classification groups or
> > shared groups. In CRM114 you can at run time allocate and merge as many
> > CSS files (one pre-trained should be enough) as you like
>
> You make it sound like statistical filters are invincible against
> different
> mail flows and pure user stupidity.
>
No, no. User stupidity is unbeatable. No machine learning can compensate user stupidity.

> > > There are no easy answers.
> > >
> >
> > And this is generally the field where Anti-Spam tools that do not depend
> > on pre-made rules are shining, because they are very adaptive.
>
> Right, like SA for example only depends on "pre-made" rules and doesn't
> have
> any statistical or realtime capabilities..
>
It has both. Still my main concern regarding SA is the usage of resources. If I setup SA on one system and test how fast it is, how much memory it uses and how accurate it is and then compare those metrics on the same hardware with one of the other mentioned tools then SA looks pretty bad.

> I think continuing this is pointless and a bit off-topic.
>
Yes. It is off-topic. This shall be my last response to you on this topic.

btw: SA is a good tool. I absolutely see a need for something like SA. I just spoke for me that in the last 10+ years since I do filtering SA always has showed to be one of the slower solutions for me. And I rather invest some time at the beginning in implementing a complex solution, than constantly babysit a filtering solution. I have no problem in keeping a filter solution up to date (blocklists/whitelists need that kind of attention) but for a content filter I have no time to fiddle around with >10'000 individual user configuration rules. So something like DSPAM is for me the better solution. It allows me greater control and allows me to quickly update and make changes without the need to update individual configuration user files or at least it allows me with a single command to update settings for all users at once.
--
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.
Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl

From: Stan Hoeppner on 16 Jul 2010 00:06

Steve put forth on 7/15/2010 4:16 PM:

> * if you feed wrong data to the Anti-Spam filter then the filter will make errors.

Content (header/body) filters have always been error prone and always will be.
The key to success is if the error rate is acceptable. For users to train
them, they have to be run in post-queue mode. For performance reasons, most
OPs run them in post-queue mode anyway. And by doing this you're
unnecessarily eating b/w on your internet link(s).

There are plenty of good methods available to drop spam connections at SMTP
time without ever having to accept the spam for content analysis. I use many
such methods, and I don't use content filters. Never have. I probably spend
more time fighting spam than other OPs do. Using content filters such as SA
can definitely cut down on mail OP time spent fighting spam. Which method is
more effective depends on one's priorities, and thus this subject can be
debated ad infinitum.

I will say generically that for an OP who has the time, avoiding content
filters and using SMTP time blocking methods is probably more effective in the
long run and makes more efficient use of network and server resources.

YMMV, etc.

--
Stan

From: Henrik K on 16 Jul 2010 01:28

On Thu, Jul 15, 2010 at 11:06:44PM -0500, Stan Hoeppner wrote:
>
> I will say generically that for an OP who has the time, avoiding content
> filters and using SMTP time blocking methods is probably more effective in the
> long run and makes more efficient use of network and server resources.

You always have time to advertise content filters being "bad", so I just
have to make a pointless rebuttal..

Can you tell me any big public service (not a one man server) that doesn't
use content filtering at all? By public I don't mean a site that has the
ability to block freemailers, universities, etc hacked accounts..

I'm sure any serious site uses lots of SMTP time rejects, but you _need_
some sort of content filtering for the rest. Unless you bear the burden on
clients MUA.

PS. I think I've spent maybe an hour or two maintaining our mail server in
the last few months, and it's still running fine.. how is that not
efficient? My work time costs much more than the imaginary network and
server resources.

From: Robert Schetterer on 16 Jul 2010 03:49

Am 16.07.2010 09:27, schrieb lst_hoe02(a)kwsoft.de:
> Zitat von Henrik K <hege(a)hege.li>:
>
>> On Thu, Jul 15, 2010 at 11:06:44PM -0500, Stan Hoeppner wrote:
>>>
>>> I will say generically that for an OP who has the time, avoiding content
>>> filters and using SMTP time blocking methods is probably more
>>> effective in the
>>> long run and makes more efficient use of network and server resources.
>>
>> You always have time to advertise content filters being "bad", so I just
>> have to make a pointless rebuttal..
>>
>> Can you tell me any big public service (not a one man server) that
>> doesn't
>> use content filtering at all? By public I don't mean a site that has the
>> ability to block freemailers, universities, etc hacked accounts..
>
> In Germany many companies have given up on content filtering because it
> is not allowed to drop mail after accepting, if there is a chance that
> private mail *could* be involved. So with content filter your only
> choice would be to tag spam and let the user sort out, which lead to no
> advantage for using content filter at all.
> So content filter are mostly a selling point and not a favorable
> "solution".
>
> Regards
>
> Andreas
>
>
why not use spamass-milter drops spam during smtp income stage
this is allowed anyway, also clamav-milter with sanesecurity works nice
this way, bouncing mail after recieve by whatever reason may produce
backscatter, so it isnt a good idea in every case or country,
normally you only flag spam and pass it and/or hold it ( for human
postmaster inspection ) i. if use amavis with after queue filter , mail
always needs daily support, and companies who stopped filtering in
germany ( i dont know one ) have mostly a problem with helpless admins
ignorant managers/users etc, not with law or existing antispam solutions
so its mostly a human problem
--
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

From: Robert Schetterer on 16 Jul 2010 05:03

Am 16.07.2010 10:15, schrieb lst_hoe02(a)kwsoft.de:
> Zitat von Robert Schetterer <robert(a)schetterer.org>:
>
>> Am 16.07.2010 09:27, schrieb lst_hoe02(a)kwsoft.de:
>>> Zitat von Henrik K <hege(a)hege.li>:
>>>
>>>> On Thu, Jul 15, 2010 at 11:06:44PM -0500, Stan Hoeppner wrote:
>>>>>
>>>>> I will say generically that for an OP who has the time, avoiding
>>>>> content
>>>>> filters and using SMTP time blocking methods is probably more
>>>>> effective in the
>>>>> long run and makes more efficient use of network and server resources.
>>>>
>>>> You always have time to advertise content filters being "bad", so I
>>>> just
>>>> have to make a pointless rebuttal..
>>>>
>>>> Can you tell me any big public service (not a one man server) that
>>>> doesn't
>>>> use content filtering at all? By public I don't mean a site that has
>>>> the
>>>> ability to block freemailers, universities, etc hacked accounts..
>>>
>>> In Germany many companies have given up on content filtering because it
>>> is not allowed to drop mail after accepting, if there is a chance that
>>> private mail *could* be involved. So with content filter your only
>>> choice would be to tag spam and let the user sort out, which lead to no
>>> advantage for using content filter at all.
>>> So content filter are mostly a selling point and not a favorable
>>> "solution".
>>>
>>> Regards
>>>
>>> Andreas
>>>
>>>
>> why not use spamass-milter drops spam during smtp income stage
>> this is allowed anyway, also clamav-milter with sanesecurity works nice
>> this way, bouncing mail after recieve by whatever reason may produce
>> backscatter, so it isnt a good idea in every case or country,
>> normally you only flag spam and pass it and/or hold it ( for human
>> postmaster inspection ) i. if use amavis with after queue filter , mail
>> always needs daily support, and companies who stopped filtering in
>> germany ( i dont know one ) have mostly a problem with helpless admins
>> ignorant managers/users etc, not with law or existing antispam solutions
>> so its mostly a human problem
>
> The point is
>
> - Before-Queue content filter is expansive and must be combined with
> "cheap" reject techologies anyway

sorry explain "cheap"

if you have non negliable load
> - Tagging spam is nearly useless because no user like to poke through
> the dustbin to search for potential lost mail

i dont understand, as you always need support mail,
its no problem to solve user questions, only the rate of questions
should be handable by the corosponding number of postmaster and/or
supporters

> - Spam-Bouncing is no option at all

why ?, a bounce is no thing of evil, there will be bounces by several
reasons ever

> - In general the false positive rate is a higher and more difficult to
> find out with content filter compared to a sane set of reputation based
> filters

i have false postive under 0,1 promille
no problem here

>
> So the most reasonable approch is to ditch content filter at all and use
> a sane set of reputation based decisions and maybe greylisting to reject
> spam at earliest possible stage.

you should always use all usefull antispam technics which make sense
anyway ( specially that ones that are native in postfix )
greylisting is one of them , but in a few cases on my site
simply does not work anymore defending bots
so antispam is always a filter chain, the real antispam filter such as
spamassassin should always be one of the last
>
> I don't speak about or even recommend to not use spam filtering, but
> content filter is sometimes the bigger problem compared to some slipping
> through spams.

maybe, thats individual, like spam always is,
competent postmaster should choose the right way in the right case

>
> Regards
>
> Andreas

no need to flame, i have no problem with supporting ca 10 mailservers
with antispam enabled up to 10000 mail addresses
some spam always slipping trough,always some false positives , thats the
nature of the beast, the goal is keeping that rate low
in my case spam filtering is no such problem , as mailservers that have
buggy dns setups are in rbls etc,
after all, one of the biggest problems are false tagging to antispam
filters in mail clients i.e outlook
which produces more questions then server side filters, as most users
dont understand their mail client settings

--
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Simple Hack To Get $2000 To Your PayPal Account
Next: null client doc