Index > Scribe > Bayesian filter query | |
---|---|
Author/Date | Bayesian filter query |
Mike Green 09/09/2004 4:28pm | -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Hi, I'd like a bit of clarification on the way that the Bayesian filtering works in InScribe. I've read the help and understand that the SPAM list is updated when initiating re-scanning and that this requires that all spam be left in the spam folder. Fine. It sounds, however, as if the same applies to HAM. i.e. the HAM wdb is built up from scratch each time the word list update function is run. Is this correct? If this IS the case, and it seems like it from a couple of experiments I've done, then this seems to be a bit of an issue it that I personally tend to delete most mail pretty much straight away (maybe that's abnormal or aberrant behaviour?!). This isn't to say it's spam, it just means that I don't want to retain it for any reason. Net result is that the HAM file is very small, and rather prone to frequent, on-going change. I don't claim to understand Bayesian filtering all that well, but doesn't it rely on checking for positive words as well as recognising 'negative', spam-like patterns? If so then my failure to keep much HAM will weaken the filter - but is that a significant problem? Just seeking clarification here really! Thanks, Mike -----BEGIN PGP SIGNATURE----- Version: PGP 8.0.3 - not licensed for commercial use: www.pgp.com iQA/AwUBQUCEsPVnEmFoUW40EQJLKwCgwiezoOTSv1z0l8KMYyNBMjWw/U8AnjJc 9+6gPajUhVlSlPZVv85NZVr1 =D8lU -----END PGP SIGNATURE----- |
fReT 09/09/2004 9:55pm | You are right the ham must remain for the filter to work correctly. I'm working on a incremental version of the algorithm so that you can delete mail and still retain the word counts. |
Mike Green 09/09/2004 10:08pm | -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 So does the ham have a significant effect on the efficiency of the filter? ie. if there isn't much of it will the filter skew, once trained with lots of spam, towards false positives rather more than it would if I didn't delete just about everything rather than retain it? (My ham db is anyway mostly meaningless strings since most of my mail is encrypted and that's what the ham db is picking up.) Excellent program by the way. Far more elegant than most mail programs and the fact that I can install it on a USB drive along with encryption software is a very major plus point. This query is largely out of interest, not something I'd regard as important. Now the other one, regarding attributed quoting, would be a really nice to have, it's the one thing 'The Bat!' does which could, but probably won't, lure me towards its use. I'll keep my fingers crossed that it'll be easy to program :-) Thanks for the response. Mike -----BEGIN PGP SIGNATURE----- Version: PGP 8.0.3 - not licensed for commercial use: www.pgp.com iQA/AwUBQUDUYPVnEmFoUW40EQIIRwCgvIy893OoHYohmbDN+bsxs3H8LbEAn2xs ofdpu+Jue1iupYH96Cv6xvud =TuE8 -----END PGP SIGNATURE----- |
Justin Heiner 17/09/2004 7:54am | I use the ham folder for basically a whitelist. Move all the e-mails that are wrongly marked as spam into ham and all e-mails from then on from that sender are correctly placed.
Probably ham's intended usage, i'm guessing :) |
Reply | |