A statistical filter is an automated system that is developed to evaluate documents in a language legible to machines. Statistical filtering as a concept was introduced by M. Sahami et al., in 1998 (Dwork & Naor 2002). Though Paul Graham did not invent it, he drew the attention of machine researchers in his famous paper ‘A Plan for Spam’ which advocated for building Bayesian probability models of spam and non-spam words. The paper emphasized the need to draw attention to software that determines the indicator of spam probability for each word. The essay presented such software as automatic, could operate on a short code, adjustable to suit specific needs and effective in performing the intended function (Zdziarski, 2005).
Filtering of spam mail originated from text classification research. Its ultimate goal is to recognize spam accurately. The different methodologies vary depending on the classification algorithm in place. Statistical filters have the same origin and though they have evolved they still share the same basic functioning principles. They are all multinomial or multivariate models (Okin, 2003). Statistical filters distill a document into a set of pre set features depending on the type. It can be programmed to sieve information based on specific words, numbers or even whole phrases. These features are then coded as Boolean (multivariate) or real values (multinomial) vector upon which filtering is based on. It is then possible to make specific adjustments into the filter by using rule based methods which can either be generated automatically or hand designed. The derived machine learning algorithms are mainly determined by the overall frequencies or statistics of the specific feature being distilled (Bergin, 1996).
There are different types of content filters that vary based on the features they are designed to detect. Generally, spam is more repetitive and in most cases there are certain terms that it contains. Word based spam filters are the simplest form. They trace certain words within the email and block it. It is possible to evade these types of spam filters by configuring the message to remove the most common words in spam mail by either replacing them or misspelling these words.
Heuristic filters are more complex than word filters. They are also called rule filters. They trace multiple terms found in an email. They scan the contents of incoming emails and prescribe points to words or phrases. Key words normally found in spam receive higher points and a total score for the entire email is determined. The owner is actually the one who determines the cut off score that will be used to classify email as spam or legitimate. The filter identifies these messages that rate a certain score or higher and blocks or deletes them while preserving those that rate a lower score. Heuristic filters are relatively easy to operate and quite fast. A major disadvantage of these filters is that they can filter off legitimate mail as well if it happens to contain a high usage of certain words or phrases. Alternatively, mail spammers may avoid them by avoiding the use of these words in their email messages.
Bayesian filters are by far the most effective form of content-based filters. These filters come up with mathematical probability that a message is spam basing it on features that are evident in other spam mail. The first method to use the Bayesian classification was the ifile system invented by Jason Renee and released in 1996.Although a lot of research was taking place and many variants of this software were developed, it was Paul Graham’s publication on Spam that popularized the algorithm to a wider audience. This is because they have an element of adaptability is not easily avoided by spammers. All the websites that have installed the various Bayesian filter have a varying and unique set of words assigned specific statistical value in the database (Domingos & Pazzani, 2002). These are also referred to as tokens. This gives the spam filter a comparative advantage over the spammer because it is not possible for the spammer to write a message that will intentionally evade the filter. Bayesian filters are upgraded after every short while and do to this it is easy to track changes in spam mail and still blocks it. Bayesian spam filtering has evolved over the last few years to a popular spam filtering mechanism that is applied by majority of the mail clients to sort mail into spam and legitimate folders. Individual users can also install specific e mail filtering software. These filters are highly effective but require a lot of considerable patience as the user has to manually mark spam mail initially. The filter acquires the words in legitimate and spam mail and compiles it to a list which it applies to block spam. The Bayesian algorithm however looses efficiency with time as it increases the word list. A major advantage of a Bayesian filter is that it is sensitive to the needs of the client (Drucker, 2006). It is possible to tailor the spam filter to avoid it from filtering off legitimate mail as spam. There are certain tokens that may be traced in large numbers in email lead to email being regarded to as spam
Majority of servers have mail filters that utilize the Bayesian fundamentals. Some server-level spam filters are heuristic based. These are widely applied and include versions of SpamAssassin, ASSPand SpamBayes. It is possible to install the functionality of these programs within the main server software that takes care of a greater number of mails. Once it is installed and given directives, spam filtering software requires no other maintenance follow ups. The user is only required to identify and mark messages as spam or as non spam and the spam filtering software and the spam filtering machine operates upon guidelines drawn from the user’s bias to content in his messages (Mulligan, 1999).
A statistical filter will adjust the settings in the content of spam emails as soon as it can detect changes (Wyman, 1998). They are programmed to also monitor the unique differences in transporting the message by looking at message headers. Headers act as a better option to using content in discriminating against spam mail (Harold,Tipton, & Krause, 2006).
There is an on going competition between Spammers and anti spammer service and experts, as they attempt to evade statistical filtering software by inserting various random data into their messages while concealing it from the most obvious text, making it more probable for the message to be classified as neutral. This randomly placed data in valid and is mainly concealed by setting it in smaller fonts or in the same color with the background of the text (Goodman, 2004).
Software programs that use the Bayesian classification include Bogofilter, Mozilla, Mozilla thunderbird and Mailwasher. CRM114 is a recent innovation that detects spam by Bayesian classification on the phrases within the messages. POPFile is an easily available and applicable free e mail filter that can be used by individual clients to separate mail into easily manageable folders using the principles of Bayesian filtering. Later versions of Spam Assassin are also examples that sorts mail and detect spam by Bayesian principles. Older versions use rule ranking. The operating speed of these spam filtering tools is determined by the data structures. The technique lexing of words that is used determines the ability to discard false random strings (Gregory & Simon, 2005).
Bayesian filters built to detect words work more efficiently than the others. A major disadvantage of these algorithms is that the numbers of words that can be contained in an email are limitless. Even after including most English words from a message, the number of character sequences that can be generated is boundless and if new text is used the scope is even wider. This fact is also applicable for a message that may have incorporated random strings in Message-IDs, UU and base64 encodings. Infrequent terms can be removed from the spam filter to increase operational speed (Goodman 2004).To classify spam filtering, machine researchers mainly employ the use of naive Bayes which is however sensitive to the selection mode of the small feature set and does not perform optimally in situations with heavy penalties for error. Other software like AdaBoost and maximum entropy model can be used. These are preferred because they are not sensitive to the selection strategy. They are also easily adjustable to highest possible performance across different datasets and extremely high feature dimension.
There are other different algorithms that are currently applied in the fight against spam. Support Vector Machine algorithms portray the best accuracy and speed when it is preset to use binary features (Dumais & Horvitz 2005). Using Boosting decision trees is also a good and practical approach to spam filtering and also has a good speed and accuracy.
In conclusion, spam mail is a major problem for all internet users (Goodman, 2004). It is a major time waster and may lead to holders of email accounts ignoring and cancelling out legitimate mail as they try to avoid spam. Spam mail could also be used as sources of viruses and cause damage to important information. The war between spammers and experts and their spam filters has considerably intensified with time (Graham, 2007). It is absolutely impossible to prevent spammers from sending spam mail but spam filters can prevent these spam mail from reaching us. The range of these spam filters has evolved over time from simple non statistical procedures like blacklisting and white listing to more advanced statistical software that ensures that up to 96% of spam mail is blocked from your inbox. Most of these methods conduct various checks to ensure that there is no spam in the inbox. The best spam filter algorithms are content based ones that evaluate the messages for specific words or phrases and calculate a probability of it being spam mail. They are very efficient and are widely applied (Pogue, 2005). Other filtering methods include using the challenge response system that gives the spammers a challenge before the message is sent. Due to the large numbers of the messages spammers send and due to the fact that these spam messages are usually automated, it is not possible for them to eradicate. Collaborative filters have also been applied. These employ a community based approach and spammers cannot access certain communities after individuals there tag their messages as spam. Domain name systems look up systems uses several anti spam techniques to identify and block the action of spammers (Schwartz, 1999).
None of these methods ensure total eradication of spam messages but if a number of approaches are used together there are higher chances of blocking spam mail from the legitimate mail (OECD, 2006).
References
Jonathan A. Zdziarski, (2005), Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification, San Francisco, No Starch Press, Inc.
Thomas J. Bergin, Richard G. Gibson, (1996), History of Programming Languages II, Boston, Addison-Wesley.
Organization for Economic Co-operation and Development (2006), OECD Anti-spam Toolkit of Recommended Policies and Measures, Boston, OECD Publishing.
Carolyn Wyman, Spam: a biography (1998), New York, Harcourt Brace.
Alan Schwartz, S. Garfinkel, Stopping Spam, (1999), Florida, O’Reilly.
Steve H. Graham, (2007), The Good the Spam and the Ugly: Shooting It Out with Internet Bad Guys, New York, Citadel Press Inc.
Geoff Mulligan, (1999), Removing the Spam: Email Processing and Filtering, Michigan, Addison-Wesley.
David Pogue, (2005), Mac OS X: The Missing Manual, Tiger Edition, Florida, O’Reilly.
Marcia S. Smith, B. G. Kutais, (2007), Spam and Internet Privacy, Michigan, Nova Publishers.
Peter H. Gregory, Mike Simon, Michael A. Simon, (2005), Blocking Spam & Spyware for Dummies, New York, For Dummies.
Harold F. Tipton, Micki Krause, (2006), Information Security Management Handbook, Boston, CRC Press.
Dumais M. S. & Horvitz E. (2005), A Bayesian Approach to Filtering Junk E-mail, Texas, Citadel Inc.
.Dwork C. & Naor M., (2002), Pricing through Processing Junk Mail, New York, Prentice Hall
Goodman J. (2004) “Spam Technologies and Policies”,
http://www.research.microsoft.com/~joshuago/spamtech.pdf, Retrieved on 2008-09-16.
Domingos P. & Pazzani M, (2002), Bayesian classifier, California, University Press.
Drucker H. D, (2006) Machines for spam categorization, New York, Addison-Wesley.
Are you busy and do not have time to handle your assignment? Are you scared that your paper will not make the grade? Do you have responsibilities that may hinder you from turning in your assignment on time? Are you tired and can barely handle your assignment? Are your grades inconsistent?
Whichever your reason is, it is valid! You can get professional academic help from our service at affordable rates. We have a team of professional academic writers who can handle all your assignments.
Students barely have time to read. We got you! Have your literature essay or book review written without having the hassle of reading the book. You can get your literature paper custom-written for you by our literature specialists.
Do you struggle with finance? No need to torture yourself if finance is not your cup of tea. You can order your finance paper from our academic writing service and get 100% original work from competent finance experts.
Computer science is a tough subject. Fortunately, our computer science experts are up to the match. No need to stress and have sleepless nights. Our academic writers will tackle all your computer science assignments and deliver them on time. Let us handle all your python, java, ruby, JavaScript, php , C+ assignments!
While psychology may be an interesting subject, you may lack sufficient time to handle your assignments. Don’t despair; by using our academic writing service, you can be assured of perfect grades. Moreover, your grades will be consistent.
Engineering is quite a demanding subject. Students face a lot of pressure and barely have enough time to do what they love to do. Our academic writing service got you covered! Our engineering specialists follow the paper instructions and ensure timely delivery of the paper.
In the nursing course, you may have difficulties with literature reviews, annotated bibliographies, critical essays, and other assignments. Our nursing assignment writers will offer you professional nursing paper help at low prices.
Truth be told, sociology papers can be quite exhausting. Our academic writing service relieves you of fatigue, pressure, and stress. You can relax and have peace of mind as our academic writers handle your sociology assignment.
We take pride in having some of the best business writers in the industry. Our business writers have a lot of experience in the field. They are reliable, and you can be assured of a high-grade paper. They are able to handle business papers of any subject, length, deadline, and difficulty!
We boast of having some of the most experienced statistics experts in the industry. Our statistics experts have diverse skills, expertise, and knowledge to handle any kind of assignment. They have access to all kinds of software to get your assignment done.
Writing a law essay may prove to be an insurmountable obstacle, especially when you need to know the peculiarities of the legislative framework. Take advantage of our top-notch law specialists and get superb grades and 100% satisfaction.
We have highlighted some of the most popular subjects we handle above. Those are just a tip of the iceberg. We deal in all academic disciplines since our writers are as diverse. They have been drawn from across all disciplines, and orders are assigned to those writers believed to be the best in the field. In a nutshell, there is no task we cannot handle; all you need to do is place your order with us. As long as your instructions are clear, just trust we shall deliver irrespective of the discipline.
Our essay writers are graduates with bachelor's, masters, Ph.D., and doctorate degrees in various subjects. The minimum requirement to be an essay writer with our essay writing service is to have a college degree. All our academic writers have a minimum of two years of academic writing. We have a stringent recruitment process to ensure that we get only the most competent essay writers in the industry. We also ensure that the writers are handsomely compensated for their value. The majority of our writers are native English speakers. As such, the fluency of language and grammar is impeccable.
There is a very low likelihood that you won’t like the paper.
Not at all. All papers are written from scratch. There is no way your tutor or instructor will realize that you did not write the paper yourself. In fact, we recommend using our assignment help services for consistent results.
We check all papers for plagiarism before we submit them. We use powerful plagiarism checking software such as SafeAssign, LopesWrite, and Turnitin. We also upload the plagiarism report so that you can review it. We understand that plagiarism is academic suicide. We would not take the risk of submitting plagiarized work and jeopardize your academic journey. Furthermore, we do not sell or use prewritten papers, and each paper is written from scratch.
You determine when you get the paper by setting the deadline when placing the order. All papers are delivered within the deadline. We are well aware that we operate in a time-sensitive industry. As such, we have laid out strategies to ensure that the client receives the paper on time and they never miss the deadline. We understand that papers that are submitted late have some points deducted. We do not want you to miss any points due to late submission. We work on beating deadlines by huge margins in order to ensure that you have ample time to review the paper before you submit it.
We have a privacy and confidentiality policy that guides our work. We NEVER share any customer information with third parties. Noone will ever know that you used our assignment help services. It’s only between you and us. We are bound by our policies to protect the customer’s identity and information. All your information, such as your names, phone number, email, order information, and so on, are protected. We have robust security systems that ensure that your data is protected. Hacking our systems is close to impossible, and it has never happened.
You fill all the paper instructions in the order form. Make sure you include all the helpful materials so that our academic writers can deliver the perfect paper. It will also help to eliminate unnecessary revisions.
Proceed to pay for the paper so that it can be assigned to one of our expert academic writers. The paper subject is matched with the writer’s area of specialization.
You communicate with the writer and know about the progress of the paper. The client can ask the writer for drafts of the paper. The client can upload extra material and include additional instructions from the lecturer. Receive a paper.
The paper is sent to your email and uploaded to your personal account. You also get a plagiarism report attached to your paper.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more