Spam Filtering at the Server
By Joshua Erdman
No little intro paragraph here. We all know what spam is and hate it.
Early on we just had to install an application on the mail server so that all incoming messages are then approved or denied using various methods. These first applications were quick simplistic in nature:
Simple Spam Filtering
The first spam messages were very predictable, many sold amazing creams and pills providing penis enlargement or breast enhancement. This was easy to circumvent by scanning each incoming message containing the phrase 'penis enlargement' or 'breast enhancement'.
Unfortunately spammers were quick to adapt to this and started distorting the most common keywords. Letters were exchanged with similar looking numbers, words were separated with periods or spaces, or the words were spelled slightly wrong. All these methods making the messages still readable but very difficult to filter by such a simple mechanism.
WhiteList/BlackList by E-mail Address or Domain
Again at the infancy of spam Gmail, Yahoo, and Hotmail did not exist so e-mail addresses were not nearly as easy to secure. The spammers' e-mail address did not change often so a mail administrator just had to compile a list of offending e-mail addresses and filter those through the server. Soon these lists were distributed, but the persistent spammers kept adapting. The next logical step was for a spammer to put their own mail server on the internet.
Eventually spam became an epidemic effecting employee productivity, bogging down servers and Internet connections, and intimidating new Internet users. Finally programmers saw an opportunity to create more advanced spam filtering. These new spam filtering methods are quite creative and when used in conjunction with each other you come up with a very effective spam filter.
RBL (Realtime Black List)
These lists vary by type and are maintained by organizations that track spam activities. List types include: Open Mail Relays, Known Spam Servers, Malicious Content Servers, Adult Content, or Dynamic IP Addresses. Read our Article on Using RBLs for more information on how they work and who provides these services.
An ever evolving probability statistical database. All outgoing messages form your server are parsed and statistically analyzed with the assumption that the type of content and the context that each word is used reflects the acceptable incoming e-mail.
The statistical database can be further modified and updated if users provide examples of spam messages for Bayesian analysis.
Sender Policy Framework
Like RBLs, this is another great modification of DNS to lower spam. This type of filtering is intended to prevent a mail server from delivering messages that are not authoritative for that server. For example, an AOL mail server has not business delivering messages that are marked FROM hotmail. This is exactly what spammers are doing. More specifically it the the FROM field that is provided within the SMTP protocol. Read our article on SPF & Sender ID for more information.
Sender ID Filtering
Very much like Sender Policy Framework, Sender ID also compares the address of the FROM field with the server that is sending the message, however it looks not at the FROM field in the SMTP protocol, but within the MIME content which is generated by the e-mail client not the delivering server. Read our article on SPF & Sender ID for more information.
Header checking checks for consistency within the header. As mentioned previously in Sender-ID and SPF, a spammer could spoof the FROM address and use a server that is not authoritative for the specified domain. Header checking verifies valid addresses in the FROM fields and compares the FROM address in the SMTP protocol with the FROM address specified in the MIME data.
Article last reviewed: 10/10/2006