Index > Scribe > Bayesian Filter
Author/Date Bayesian Filter
16/09/2003 8:54am
I'm new to i.Scribe so be nice to me :)
I have been using POPFile for Bayesian filtering for some while now and am very satisfied with the result. My question to you i.Scribe users is: "Can i.Scribes Bayesian Filter compete with POPFile?" If it can it would be great. I hate using more programs than I have to and i.Scribe seems sofar to be a great, and also important neat, email client.
16/09/2003 9:06am
Scribe's implementation is very new and will not compete with POPFile, which is a mature product with many people working on it.

My implementation has just one (very busy) person working on it and it's likely to always lag behind whatever else is out there.

However in my own experience my implemenation catches over 80% of spam that I get, and well thats most of it. Which cuts down my spam problem enormously... it's all good, not perfect, just good.

Some days it's great other days it lets some through. It's better than nothing IMHO. And it will get better over time as I improve it etc.
16/09/2003 9:13am
Thank you for your reply.
I will keep both i.Scribe and POPFile running, but I will be sure to check yours and other users comments regarding Bayesian filters on your webpage.
Once again thanks for a great email client!
17/09/2003 6:18am
Hello again,
One small question. When I'm using POPFile I let it add [Spam] to the message subject. I have set up i.Scribe to filter out these messages and put them into the Spam folder. It works perfect but the mail icon in the taskbar flashes when a new Spam mail has arrived as well as when a new "proper" mail has arrived. This is not a big problem but since everything else is working so well I want to know if this is possible to change. I've read a post earlier that said that if you use the built in Bayesian filter it will not flash when a mail considered as spam arrives. Can I acheve this without using the built in filter?
Best regards
17/09/2003 8:45am
From what I remember, you can do this by setting the mail to 'read' and then moving it. This will then not trigger the new mail event.

I can't remember exactly how you do it since my home computer is in a broken state at the moment but I do remember that I got it to work as above!

Hope this helps,
- Flex
17/09/2003 9:09am
Perfect! Just what I needed.
17/09/2003 6:23pm
I have a big problem with this filtering, it crashes my In.Scribe, either when new spam arrives or when I select a mail in my spam folder and select "analyze..." from the filter menu, any idea why?
17/09/2003 6:35pm
Markman: Like Flex says you can add another action to the filter to set the spam to "read". Which stops the tray icon flashing.

Flintz: I've emailed you about that.
Rostislav Huèka
18/10/2003 2:03pm
I have some experience with POPFile, but I have no idea how does Bayesian filtering work in InScribe. How do I train the filter? I thought I just put spam e-mails to Spam folder by pushing Spam button, but it work weird. If I use Analyze selected mail from Filters menu, it displays word statistics or message "From address was in the white list" or do nothing. And how does Rebuild Bayesian word lists work?
I have Bayesian filter switched to training mode, but it never put any spam to selected folder. I have Test42 release for win32.
19/10/2003 8:21pm
Rostilav: Have you read this?

It explains how to use the bayesian filtering in Scribe.
Rostislav Huèka
21/10/2003 1:46am
I have read it, but it doesn't completely solve my problem. I have to run Rebuid Baysian Word Lists everytime I open InScribe. If I won't do it, Baysian filtering won't work either. So I have to keep large amount of spam in Spam folder as "feed" for Bayesian filter.
Where is wordlist stored? How could I fix it?
21/10/2003 1:53am
When it's working properly you don't have to run the "Rebuild Wordlists" every time you run Scribe. I often leave it a week between running the word list rebuild.

The word database files are:

  • hamwords.wdb
  • spamwords.wdb
  • whitelist.wdb

Is these files are small (1KB) or missing then your not getting valid word data and the rest of the filtering system won't work.

What test version are you using?

There was a bug in some of the earlier tests that had bayesian filtering that cause the files not to be created. But that [should] be fixed in the most recent test build.

I should add that all this has been working flawlessly for me for months now. And I would love to help any way I can to get other people using it too. If there are teething issues then let me know, keep giving feedback and I'll do my best to resolve all the problems!
21/10/2003 1:54am
I just want to add that the word lists should be in the same directory as the options file, or if thats missing the executable file.
Rostislav Huèka
21/10/2003 2:29am
I fixed it with manual creation of those files. After restart and word list rebuit it worked correctly. But there is another minor problem. If I delete damn spam from Spam folder and then I rebuild word lists again, content of spamwords.wdb vanish. Is Bayesian filtering system designed to keep Spam archive forever as future reference? As far as I have experience with POPFile (it's resource hog, your filtering system fits much better to me) I think that there I was able to delete spam after the filter consumes its words.
21/10/2003 2:52am
At the moment my implementation needs the spam email to stay in the /Spam folder (forever).

But I do intend to change it so that you can delete the spam if you want and the word database will remember it's counts regardless. The implementation is quite simple at the moment and needs a bit of hand holding to work but it will mature into a less interactive system.

The big issue holding that back is that theres no simple way to know when to add a ham's words to the word database. There is a clear event, the user clicking on "Delete As Spam" to signify adding the words of a spam to the spam word list, but ham has no such event to hang off.

What I'll eventually do is add all mail to the ham db until such time the user clicks delete as spam and then decrement the words off the ham db and increment them into the spam db. However this has the unwanted side effect of temporarily have spammy words in the ham database, even if the counts are very low. This will reduce the effectiveness of the system.

Maybe a better idea is to track off the mail becomming read. If the user leaves the mail in the ham pile several minutes after reading it then I guess I can assume it's not a spam and add it to the ham db. This gets tricky if the user doesn't follow a procedure, which means the product on the whole becomes less intuitive. For instance if the user receives a spam, reads it and then exits the program. Then they still know it's a spam but it might get added to the ham db.

Still looking for a foolproof and intuitive system.
04/11/2003 12:49pm
How about using an ordinary delete as an indication as ham? Moving the email from the inbox to another folder (not spam) would also be an indication of ham. (This would only leave the messages that are left in the inbox forever which I think is acceptable).

To ensure a foolproof system a dialog box could be used during learning that informs the user of the result of the action. The dialog could give you the possibility to change that result. E.g. "delete" => "delete as spam" and vice versa. I.e. helping the user to take the correct action.

This is going to be irritating but if you give the user the possibility to turn it off I think they will get the point and the goal is achieved.

This is just my opinion. I don't know if it is possible to implement. Take it for what it is, thoughts.

Best regards