Spam topics
Colin Fahey
1. Introduction
A "spam message", in the context of electronic communication, typically has the following definition:
spam message : an uninvited message sent to a large number of recipients
It is commonly believed that a spam message for a given product or service sent to thousands or millions of people will generate very few actual customers for the sender of the message. Even if the ratio of attracted customers to total spam message recipients is only 1/10000, or even much lower, the overhead costs of spamming is so low that a net profit is possible. In fact, for many of the spam products, selling a single product, or scamming a single victim, might be the break-even point for the business model.
Several factors contribute to the perception of the spam message phenomenon as an important social and technological problem:
(1) Spam messages waste millions of hours of humanity's time each day with the task of differentiating between spam messages and "legitimate" messages;
(2) Spam messages consume a significant fraction of total Internet bandwidth, which causes both a slowdown of other traffic, and possibly raises overall bandwidth cost;
(3) Spam messages consume a large amount of storage space on mail servers, sometimes actually making it temporarily impossible for "legitimate" messages to be received;
(4) Spam messages can be used for campaigns to attempt "identity theft" or other types of fraud. Spam messages can also be used to propagate computer viruses;
This document describes the flaws of many of the futile and harmful methods proposed or employed in past attempts to reduce the problems listed above.
This document offers an alternative solution to the problems of spam messages, such that the solution is simple and reliable, and avoids censorship, and avoids elimination of anonymity, and avoids imposing restrictions, and avoids demanding payments, and avoids centralized services.
2. The original "spam"
"Spam Classic is a conveniently packaged canned meat product made of 100 percent pure pork and ham. Spam Classic contains 180 calories per two-ounce serving. Spam luncheon meat first was produced in 1937. It was one of the first convenient, moderately priced and great tasting meat products on the market."
From the article:
Ultimately, we are trying to avoid the day when the consuming public asks, "Why would Hormel Foods name its product after junk e-mail?"
The use of the word "spam" to mean "making insane with relentless, monotonous bombardment" is directly attributed to a "Monty Python's Flying Circus" (humorous BBC television series) skit that celebrates Spam. A restaurant patron discovers, with chagrin, that everything on the menu contains Spam. For example, "[...] spam spam spam egg and spam; spam spam spam spam spam spam baked beans spam spam spam...". The mention of Spam rouses Viking restaurant patrons to begin singing: "Spam, spam, spam, spam. Lovely Spam, wonderful Spam!" The whole experience causes the frustrated restaurant patron to become insane.

"Spam" skit by "Monty Python's Flying Circus"
3. The reason for this document
I am very concerned by various "solutions" to the spam phenomenon that involve the following:
(1) Invasion of privacy;
(2) Censorship;
(3) Payment and cooperation with a commercial entity;
(4) Making certain types of Internet activity illegal;
The day I started writing this article (2004.03.29) I heard a report on the BBC World Service (rebroadcast on a local public radio station), featuring an interview with a person affiliated with a company that was offering a "new kind of spam filtering" on a paid membership basis.
The method relied on monitoring Internet traffic, searching for "identical" e-mail messages coming from a common source. Suspected spam e-mail is analyzed further to discover any links to Internet sites previously associated with spam efforts.
This service, and similar mechanisms, will fail due to various scenarios described in this document.
However, my concern when hearing proposals similar to the one mentioned in the news broadcast is that the public will embrace the proposed solution without fully considering the consequences, which might involve: invasion of privacy, censorship, corporate interests, or making certain kinds of Internet activity illegal.
Clients of various Internet services are subject to the contracts of the service providers. I have no complaint with that because I can choose to avoid service providers with terms I do not like. My concern is that the current conversations about solutions to the spam phenomenon will lead to a wide acceptance of terms that go against principles I consider important. I believe that a significant fraction of the people who would willingly accept such terms might not be so accepting if the impact of such terms on privacy, freedom of communication, and freedom from the influence of corporate interests, are described in a way that makes the issues very relevant and personal.
4. Gallery of spam messages
This section presents contemporary examples of spam, with some analysis and related information. Although this section is based on spam I have personally received, I believe my experience is typical of users of e-mail.
This section is intended to sketch the basic principles of spam. An attempt at a formal definition of the term "spam" will be postponed until the next section. Presenting examples in this section will make subsequent formal discussion less abstract.
4.1 Spam messages which I have received
Over the past several months I have received an average of approximately 100 uninvited messages each day, and I generally receive several computer viruses as e-mail message attachments each day.
Earlier this year, from 2004.01.15 through 2004.02.8, a period of 25 days, I received 2872 spam messages, of which 207 were computer viruses; which corresponds to an average of 114 spam messages each day, and an average of 8 computer virus attachments per day.

A portion of my e-mail "Inbox" on 2004.03.29 as displayed by the "Microsoft Outlook Express 5" computer program. On this date I received 9 "legitimate" messages, 77 spam messages, and 2 computer virus attachments.
4.2 The sender name and the message subject of a spam message
One of the striking features of most spam messages is that the disingenuousness starts almost immediately with the alleged sender's name. The fact that almost every spam message has a fake sender name cheapens the whole concept of the sender name. Of course that is merely the beginning of the erosion of trust, but I nonetheless pause and consider the bizarre act of a spammer producing a fake sender name. Spam messages promoting "male sexual performance" drugs or pornography often have sender names that are female.
Interestingly, the subject associated with a spam message often really does contain an accurate summary of the spam message. But, as one can see in the small set of subject items above, some spammers believe that sensible descriptions of e-mail messages are not necessary.
Eventually both the sender name and subject line will be recognized by the public at large as totally meaningless claims associated with the messages, which is a reflection of the actual technical fact: these fields are totally unreliable for determining the origin and content of e-mail messages.
4.3 What is the spam all about?
The following table indicates the number of spam messages I received on three recent dates:
(1) 2004.03.29 : 77 spam messages total;
(2) 2004.03.30 : 98 spam messages total;
(3) 2004.03.31 : 121 spam messages total;
The following is an approximate classification of the spam messages I received on those three dates:
MEDICATION:
-------------------------------------------------------------------
PENIS-ENLARGEMENT:
Viagra, Cialis, NaturalGain,
"Weekend Pill", Viagra Patch: 18/77, 17/98, 16/121
ALTERNATIVE-SOURCE PRESCRIPTION
MEDICATIONS/PSYCHOTROPIC DRUGS:
Levitra, Phentermine, Vicodin,
Valium, Ambien, Xanax, Tramadol,
Lipitor, Propecia, Zocor: 14/77, 18/98, 19/121
Marijuana-like product/
Mood Enhancers/Herbal Meds: 1/77, 0/98, 0/121
DIET/NUTRITION:
Diet Pills/Patch: 3/77, 3/98, 3/121
Anti-Aging/HGH: 1/77, 0/98, 1/121
SMOKING:
Cigarettes: 1/77, 1/98, 3/121
HEALTH AID:
Snoring Control: 1/77, 0/98, 0/121
-------------------------------------------------------------------
TOTAL: 39/77(50%), 39/98(40%), 42/121(35%)
FINANCIAL:
-------------------------------------------------------------------
LOANS/CREDIT:
Refinance Mortgage/Equity Loan: 13/77, 12/98, 11/121
"Cancel Debt" (somehow): 0/77, 1/98, 8/121
Car Loans: 0/77, 2/98, 1/121
Payday Cash Advance: 1/77, 1/98, 0/121
Unsecured MasterCard/Credit: 1/77, 0/98, 1/121
INVESTING:
Investor/Stock Alert: 5/77, 5/98, 3/121
INSURANCE:
Life Insurance: 1/77, 1/98, 2/121
Healthcare: 1/77, 0/98, 0/121
Auto/Warranties: 1/77, 0/98, 0/121
BUSINESS OPPORTUNITIES:
"Work" on eBay: 1/77, 6/98, 4/121
Own Resort: 1/77, 0/98, 0/121
"Network Marketing": 0/77, 0/98, 1/121
Real-Estate Auctions: 0/77, 0/98, 1/121
GAMBLING:
Poker/"Earn Money Playing Lotto!": 0/77, 1/98, 2/121
SPAMMING:
Spam 27 million people: 0/77, 1/98, 0/121
-------------------------------------------------------------------
TOTAL: 25/77(32%), 30/98(31%), 34/121(28%)
SOFTWARE/CONTENT:
-------------------------------------------------------------------
PORNOGRAPHY:
Porn (farm sex, schoolgirls,
girls gushing, web cam,
monster cocks): 1/77, 1/98, 6/121
PARANOIA/SNOOPING:
Software to Learn about People: 1/77, 0/98, 0/121
Scan PC: 1/77, 0/98, 0/121
Keyboard Logger: 0/77, 1/98, 0/121
PIRACY:
Cheap software/OS: 2/77, 8/98, 5/121
DVD copying: 0/77, 2/98, 0/121
Cable Descrambling/
Free "Pay-Per-View"(!): 0/77, 2/98, 0/121
-------------------------------------------------------------------
TOTAL: 5/77(6%), 14/98(14%), 11/121(9%)
MALICIOUS/FRAUD:
-------------------------------------------------------------------
VIRUS:
Virus (Mail "Delivery Failed" type
with attachment): 2/77, 0/98, 1/121
IDENTITY THEFT:
Web-based "verification"
(PayPal,eBay,Fleet Bank): 2/77, 2/98, 0/121
-------------------------------------------------------------------
TOTAL: 4/77(5%), 2/98(2%), 1/121(1%)
MISCELLANEOUS:
-------------------------------------------------------------------
Unknown: 2/77, 6/98, 18/121
Blind date/dating: 0/77, 0/98, 5/121
Earn Degree/Degree without Tests: 0/77, 1/98, 3/121
"Colin, Grow 2 Cup Sizes -- FREE!",
Bigger Breast From Pill: 0/77, 1/98, 2/121
Vacation Deals: 1/77, 1/98, 0/121
Your Opinions might make you 1000: 0/77, 1/98, 1/121
Hair Transplants: 0/77, 1/98, 1/121
Misc. Deals: 1/77, 0/98, 0/121
Luxury Sheets: 0/77, 1/98, 0/121
Free Samsung Mobile Phone: 0/77, 1/98, 0/121
Hypnotic MP3 for Depression,
Self-Esteem, Motivation: 0/77, 0/98, 1/121
Wristwatches (Rolex,etc): 0/77, 0/98, 1/121
Print Own Postage: 0/77, 0/98, 1/121
-------------------------------------------------------------------
TOTAL: 4/77(5%), 13/98(13%), 33/121(27%)
SUMMARY:
-----------------------------------------------------------------------
MEDICATION TOTAL: 39/77( 50% ), 39/98( 40% ), 42/121( 35% )
FINANCIAL TOTAL: 25/77( 32% ), 30/98( 31% ), 34/121( 28% )
SOFTWARE/CONTENT TOTAL: 5/77( 6% ), 14/98( 14% ), 11/121( 9% )
MALICIOUS/FRAUD TOTAL: 4/77( 5% ), 2/98( 2% ), 1/121( 1% )
MISCELLANEOUS TOTAL: 4/77( 5% ), 13/98( 13% ), 33/121( 24% )
-----------------------------------------------------------------------
TOTAL: 77/77(100%*), 98/98(100%*), 121/121(100%*)
(*...Percentages in this table are rounded and do not
add to 100% with shown precision.)
Analysis
Medication is the most frequent topic of spam messages during this three-day sample. Two types of medication supply services dominate in this category of spam messages: (1) Penis enlargement; (2) General pharmacy "needs" (often drugs that are expensive in the domestic US market, and drugs which reputable doctors might be hesitant to prescribe due to lack of medical justification and potential for abuse). Spam promoting penis-enlarging drugs are typically very informal, using phrases like: "Haha, U Have A Real Small Pe-nis", "Is Your Me.mber too Teeny?", "Screw ur lover like never before", etc.
Financial topics were very common among the spam messages during this three-day sample. Home mortgage loans and refinancing offers dominate this category of spam messages. Investor "stock alerts" are also common. During this period, the "making a fortune on eBay" plan was significantly promoted. My personal favorite scam concept in this category arrived with the subject: "Earn Money Playing Lotto!"
Software and media content are popular spam topics. Offers of inexpensive software dominate this category; there is no doubt that this software is pirated, despite explanations of how, for example, one can buy Windows XP for $32 USD instead of paying $286 USD. Spam promoting pornographic web sites is also common in this category. My personal favorite offer is for a product that will give a person "Free [Pay-Per-View]" -- an oxymoron if one doesn't consider the fact that the product itself actually costs money. Another really interesting sub-category in spam regarding software products is software designed to address a person's paranoia -- such as software to scan a person's personal computer (PC) for "spyware", or software to spy on children and spouses using the family computer, or software to learn about public records on others (or oneself!). The irony is that installing such software will lead to the very things the target spam recipients fear most.
Of the miscellaneous topics of other spam messages, alleged "blind dates" are frequent, along with offers to earn various college degrees (often by only paying a small fee; no testing or qualifications necessary!). My personal favorite is an offer with the subject: "Colin, Grow 2 Cup Sizes -- FREE!". I don't think breast enlargement is a good idea for me!
4.4 Notable spam from the years 2001-2003
The following images are from spam messages I received during the years 2001-2003.

I received this spam message on 2001.09.11, 10 days after the World Trade Center buildings were destroyed by fires after airplanes were intentionally crashed in to the buildings by terrorists.
This spam message, offering, among other things, a bumper sticker that advocates a plan to "Nuke Afghanistan", demonstrates that spam can be very political. Following the US initiation of the war on Iraq in 2003, spam offering "Terrorist 'Most-Wanted'" playing cards, depicting 52 people targeted by the US anti-terrorism effort, arrived almost daily in my e-mail "Inbox" for many months.
It is important to be aware that some spam is motivated by social or political interests. Such spam benefits an idea or a social agenda, rather than an easily identified business or person.

This creepy spam message, like most spam messages, addresses some type of personal insecurity. This same product, in a funny coincidence, was also promoted elsewhere as a way to spy on naked women, which implies that this device is a method of violating personal security!

This spam message, which I received in the year 2002, makes an indirect reference to the film "The Matrix":
[Morpheus offers Neo a choice between two pills: a blue pill and a red pill.] "You take the blue pill, the story ends, you wake up in your bed and believe whatever you want to believe." "You take the red pill...and I show you how deep the rabbit hole goes."
Although in the film it is the red pill (and not the blue pill) that will result in being shown "how deep the rabbit hole goes", the humor of this spam message for Viagra is not diminished.
Compared to other penis-enlargement product spam messages of 2003 and 2004, this spam message is fine art!

This spam message, which I received some time in the year 2002, is the most outrageous invitation for irony I have ever seen.
The idea of promoting a sense of security by installing an application that is completely invisible and secretly records all instant messages, all chat, all e-mail, all web sites visited, etc, is perverse. Naming the software "IamBigBrother" (I am big brother) is hilarious!

The product which a person might receive after responding to this offer is likely to actually contain computer viruses rather than prevent them!
However, even more hilarious is the hypocrisy of a spam message that implies that the sender of the message wants to help the recipient of the message reduce unwanted e-mail messages!
4.5 Examples of spam from the year 2004
The following images are from spam messages I received during the year 2004.

This spam message is an invitation to start a career in sending spam messages! (Obviously, visiting the specified Internet site address could invite computer viruses, such as spyware or a trojan spam mailing program.)
I love the bravado of the author of this message! This message seems very "human" to me, with its boldness and its desperation. I resonate with the emotional content of the message, even though I am not interested in the idea of the message.

This spam message promotes a service to enable a person to send spam messages to "27 million people". The fact that I received this message is itself evidence (at least 1/27000000 of a full proof) that the sender of the message can do what is promised.

I like the domain name: "YetAnotherDomainName.com" (created 2004.01.29, and resolving to 216.177.88.181 at the time of this writing, 2004.04.03)
This example of a domain name reflects the spirit in the spammer realm, where purchasing disposable domain names to launch the next spam campaign is a small price to pay to avoid spam obstacles. Creating new domain names for each short-term spam campaign helps avoid "IP address blacklisting", or cancellation of service by the Internet hosting provider (who discovers, too late, that a host was rented for use in a spam effort).
Making Internet domain name allocations more difficult does not solve anything, and instead makes anonymity and free speech more difficult to maintain on the Internet. The solution to spam has nothing to do with restricting traffic that flows on the Internet, but instead has to do with detecting human senders and approved senders.

I like this one. Simply enroll in the program and start making money -- while doing "Absolutely Nothing!". Scams are based on greed, and this example is one of the purest appeals to greed that I have ever seen.

This spam message promotes a book which might not actually exist.
I dislike the items "How to get fake identity documents" and "How to hack into other [people's] computers remotely". However, I believe that there should be no restrictions on the possession and distribution of information that is not associated with individual persons.
I also believe that being able to do something privately or anonymously is an important part of human justice and progress. A democracy would be severly compromised if people could not vote in private, because only with privacy can a person vote entirely in accordance with the person's own beliefs. Similarly, only with an assurance of privacy can a person explore ideas without fear. Therefore, when technology enables governments and corporations to monitor the thoughts or actions of individual persons, I believe that humans are entirely justified in pursuing methods to avoid being monitored.

This spam message teases me with one of the central mysteries of spam: How could anyone buy medication (which enters and affects a person's own body) from people who think it is acceptable to use a fake sender name, and use a humiliating taunt as a subject, and make numerous spelling errors in the promotion, and conclude with a collection of random words?!?
However, I try to consider another perspective. Suppose an honest person doesn't believe that certain medications should be restricted by laws. What method other than spam messages could be used to access a market that is otherwise closed by the government? This thinking makes me think that spam might be one of the ultimate examples of freedom.
4.6 Examples of attempts of fraud and "Identity theft" from the year 2004
The following images are from spam messages I received during the year 2004, showing examples of attempts of fraud and "Identity theft".
The basic idea is to convince the message recipient that it is necessary to gather personal information, often to "prevent an account from expiring", or for the recipient's "security" and "protection". This is pure irony, because providing the requested information would eliminate security and cause account trouble.

This spam message is admirable in its professional appearance and for its outrageous inclusion of phone numbers and Internet site addresses to help the victim gather information to be robbed more efficiently and completely.

The PayPal scam e-mail message in the previous image includes the JavaScript code shown in the image directly above. This JavaScript code repeatedly writes the text "http://www.paypal.com" in to the browser status bar (lower-left border of Internet Explorer, for example). Thus, when the user hovers the mouse cursor over the critical links in this spam message, the actual link (which would be a hint that this is a scam) is quickly clobbered by the text "http://www.paypal.com". Only someone watching the status bar carefully while moving the mouse cursor would see the brief flash of the real Internet site address. Future browsers will probably eliminate this obvious kind of abuse.

This "identity theft" scam, masquerading as a message from Citibank, which I received on 2004.04.04, makes a direct request for a debit-card personal identification number (PIN), which can be used to withdraw cash.
This is so unprofessional that it is absolutely hilarious! However, preparing this message and Internet site did indeed require some skill and effort. So, I am confused. Why not try to spell words correctly in the message? Was this message secretly sent by banks, to their customers, to determine how gullible each customer might be? Maybe clicking on the Internet link automatically reduces a person's "credit score".

The "Citibank" scam spam message in the previous image has the HTML code shown directly above.

This eBay scam is not as elaborate as the PayPal scam above, but it probably looks sufficiently professional to be effective.
4.7 Examples of computer virus message attachments from the year 2004
The following images are from spam messages I received during the year 2004, showing examples of messages with computer virus message attachments. If I had a spare computer I would be tempted to download as many computer viruses as possible and have all the computer viruses fight for control of my computer's resources. "Ready...FIGHT!"
There was once a computer virus that included a popular anti-virus program as part of its code, so that it could eliminate competing viruses on the computer, and would thus be able to more efficiently do its job of sending spam messages! Hilarious! The fact that the virus includes the stolen anti-virus software seems to validate in some small way the effectiveness of anti-virus software, but, at the same time, the fact that the anti-virus software is used merely as a part of a virus is really perverse. (There must be examples of this strategy in biological organisms, such as bacteria that exude chemicals that we might generally regard as "antibacterial" with the result that there are no other bacteria competing for the available resources.)

I must say, this message with a computer virus attachment is a contemporary classic. I've never actually tried getting infected by the computer virus, to determine if it suited my lifestyle, but, hey, "500 000" people can't be wrong!

Wow! This computer virus message attachment has quite a background story!
4.8 Examples of simple obfuscation from the year 2004
The following images are from spam messages I received during the year 2004, showing examples of messages with simple obfuscation.

This is a trivial form of obfuscating the content of HTML, to thwart message filters based upon text analysis. The fake HTML tags divide the text that will ultimately appear in the HTML document, making it difficult to determine the text that will actually be seen on the computer screen. One countermeasure is to eliminate HTML tags, and another countermeasure is to somehow consider the visual effect of HTML tags before scanning for spam-indicating words. However, such countermeasures only solve one of the many basic ways that spam can defeat any attempts at automated filtering based on message text.
4.9 Examples of Unicode character abuse from the year 2004
The following images are from spam messages I received during the year 2004, showing examples of Unicode character abuse.

"Unicode" characters allow the characters of major world languages to be encoded in files and data streams, such as HTML documents encoded with UTF-8. The spam message shown above, which I received in 2004.03, shows a conventional use of Unicode characters -- in this example, to represent letters of the Russian (Cyrillic) alphabet.

People who send spam messages have found another use for Unicode characters: displaying characters that look like English letters, but in fact are letters and symbols from other world languages. Thus, English readers, humans, have no trouble reading the text visually, but automated text scanners will fail to detect the presence of "spam-indicating" words.
One solution is to build up a table of how Unicode characters visually relate to English letter and number characters. But, given the large number of Unicode characters that are "visually compatible" with various English letter and number characters, this effort is likely to be impractical. Combine this with strategic misspellings and random interjection of punctuation, and the text filters are doomed to fail.
I suppose an isolationist American could block all e-mail containing Unicode, but even plain English characters can be used in creative ways that humans have no trouble reading but create an intractable problem for text scanners. A filter which rejected ungrammatical text, or which rejected text with many misspellings, would likely block a large fraction of "legitimate" messages! Spelling and grammar have been going out of style ever since they were invented!
4.10 Examples of messages with text intended to thwart filters that are based on statistical text analysis, from the year 2004
The following images are from spam messages I received during the year 2004, showing examples of messages with text specifically intended to thwart filters that are based on statistical text analysis.

This spam message includes a paragraph from a formal text. This example includes text from a Travel Warning issued from the United States Department of State on 2004.03.23 : http://travel.state.gov/israel_warning.html (an Internet search for "curfew should remain indoors" revealed the source of the text).
The meaning of the added text is not as important as the fact that the text is: grammatical, potentially interesting or important to the recipient, and has enough words to greatly "outweigh" any spam indicators that might be detected elsewhere in the message.
4.11 Examples of messages with base-64 encoding, from the year 2004
"Base-64 encoding" is a method of representing sequences of byte values by a sequence of ASCII characters within the following set of 64 ASCII characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
Thus, the ASCII character 'A' corresponds to an integer value 0 (zero), and the ASCII character '/' corresponds to an integer value 63 (111111 in binary). Groups of three bytes from an input sequence are regarded as a sequence of 24 bits. Four 6-bit values are extracted and converted to the corresponding characters in the set above. If the input sequence has a total number of bytes that is not a multiple of three, the input sequence is appended with bytes of value 0 (zero), and the output sequence is appended with '=' characters.
Base-64 encoding is typically used to enable binary data, such as binary file message attachments (with file types ZIP, JPG, MP3, DOC, EXE, etc), to be contained in the plain-text body of an ordinary e-mail. Thus, text-based operations can be conducted on mail archives without worrying about encountering non-ASCII characters, or problematic ASCII "control characters" such as 0 (Null, NUL, 0x00, ^@), and 4 (End of Transmission, EOT, 0x04, ^D).
However, people who send spam messages have used base-64 encoding as a simple method to obfuscate their HTML content. Thus, very simple text filters, or human readers, cannot easily examine the content of such spam messages. It would be simple to add a base-64 decoding stage to a spam filter so that the filter could analyze messages with base-64 encoding, but this is yet another example of the unlimited complexity of automated spam detection. Filtering spam using message analysis is futile.
The following C code compiles to a very simple base64-to-text conversion program. A person must manually place a base-64 block of text, by itself, in a text file, and then use this utility to generate text output. The output can be directed to an output file by operators on the command line. I wrote this code as a simple demonstration.
// Convert base-64 to plain text (Usage: base64decoder.exe [file name])
//
// The specified file must only contain a block of base-64 data
// and optional whitespace (space, carriage return, newline, tab).
#include <stdio.h> // printf(), fopen(), fseek(), ftell(), fread(), fclose()
#include <malloc.h> // malloc(), free()
int main ( int argc, char * argv[] )
{
char * base64Table =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
if (argc!=2){printf("USAGE: %s [filename]\n",argv[0]); return(-1);}
FILE * fp = fopen( argv[1], "rb" );
if (NULL==fp){printf("ERROR: Failed to open:%s\n",argv[1]);return(-2);}
fseek( fp, 0, SEEK_END );
int fileSizeInBytes = (int)( ftell( fp ) );
fseek( fp, 0, SEEK_SET );
if (fileSizeInBytes <= 0)
{printf("ERROR: Seek failed in:%s\n",argv[1]);fclose(fp);return(-3);}
char * fileData = (char *) malloc( (size_t)(fileSizeInBytes) );
if (((char *)(0)) == fileData)
{ printf( "ERROR: Allocate %d bytes failed.\n", fileSizeInBytes );
fclose( fp ); return(-4); }
fread( ((void *)(fileData)), 1, fileSizeInBytes, fp );
fclose( fp );
int count = 0;
int indices[ 4 ];
char out[ 3 ];
for (int dataIndex = 0; dataIndex < fileSizeInBytes; dataIndex++)
{
char in = fileData[ dataIndex ];
int found = (-1);
for ( int trial = 0; ((trial < 64) && ((-1)==found)); trial++ )
{ if (base64Table[trial] == in) found=trial; }
if ('=' == in ) { indices[count] = 0; count++; }
if ((-1) != found ) { indices[count] = found; count++; }
if (4 == count)
{
out[0] = (char)((indices[0]<<2)&0xff)¦((indices[1]>>4)&0xff);
out[1] = (char)((indices[1]<<4)&0xff)¦((indices[2]>>2)&0xff);
out[2] = (char)((indices[2]<<6)&0xff)¦((indices[3] )&0xff);
printf( "%c%c%c", out[0], out[1], out[2] );
count = 0;
}
}
free( (void *)fileData );
return( 0 );
}
Download a project for Microsoft Visual C++ 2005 that includes the source code:
base64decoder.zip
C code for base-64 decoder, for Microsoft Visual C++ 2005
78138 bytes
MD5: 8401dde35a54fbaa1b7db0a6e2c3147f
The following is a C# code version of the C program.
// Convert base-64 to plain text (Usage: base64decodercs.exe [file name])
//
// The specified file must only contain a block of base-64 data
// and optional whitespace (space, carriage return, newline, tab).
namespace base64decodercs
{
class Program
{
static void Main( string[] args )
{
if (args.Length != 1)
{ System.Console.WriteLine( "Specify file name" ); return; }
try
{
System.String fileText = System.IO.File.ReadAllText( args[0] );
byte[] b = System.Convert.FromBase64String( fileText );
System.String outputText = System.Text.Encoding.ASCII.GetString( b );
System.Console.WriteLine( outputText );
}
catch (System.Exception exception)
{
System.Console.WriteLine( exception.ToString() );
}
}
}
}
Download a project for Microsoft Visual C# 2005 that includes the source code:
base64decodercs.zip
C# code for base-64 decoder, for Microsoft Visual C# 2005
6697 bytes
MD5: e3c424906bf95c7f6b28e23b5fc4e324
The following images are from spam messages I received during the year 2004, showing examples of messages with base-64 encoding.

This is the plain-text appearance of a spam message containing HTML encoded as base-64.
The following shows the decoded version of the base-64 data.

This is the decoded version of the base-64 data that was contained in a spam message.
The decoded base-64 data reveals the web site address promoted by the spam message. The decoding also reveals random words designed to "dazzle" Bayesian spam filters. It is important to notice that this spam filter countermeasure was placed within a base-64 encoded block -- which means that the person who sent the spam message assumes that base-64 blocks might be decoded and analyzed.
However, the most important thing to observe about this spam exampls is that the spam message is totally contained in the image specified by the HTML tag. Thus, unless a content-based filter notices the "tabs" and "pills" parts of the Internet site address and file paths, this message is totally benign. It is trivial to eliminate any evidence that the message is spam. A person could block messages that only contain HTML image tags, but the risk of blocking "legitimate" messages that only contain images is too significant.
Also, if a client program downloads the image from the server, then the person who sent the spam message knows that a specific recipient of the message exists and has actually viewed the message, at a known time, from a known IP address (and, therefore, an approximate geographical location). This information is revealing, even though the recipient of the message did nothing more than look at the message. More secure message programs have options to disable previewing of images contained in messages, thus avoiding giving away information to other people.
4.12 Examples of determining the origin of a spam message, from the year 2004
In some situations it might be possible to determine the origin of a spam message. However, the origin of a spam message might not be useful information. For example, some spam messages are sent by computer viruses located on thousands of random infected machines around the world. In such a situation, a spam message might have originated from a computer whose owner is totally unaware that the computer is involved in sending spam messages.
Also, the method of determining the origin of a spam message described in this section is only useful when the spam message includes an Internet address that is actually associated with the person or company that sent the message. In some situations, the spam message might not be authorized by the owners of any of the Internet addresses mentioned in the message. In other situations, the spam message might be made to appear to be sent on behalf of a particular person or company, but was in fact sent by an unaffiliated party -- sometimes with the intention of making the apparent sender seem unethical, or sometimes with the intention of distorting the opinions and goals of the apparent sender. There are many possible reasons why the Internet addresses appearing in a spam message might not have any relation to the actual sender of the spam message.
The method of determining the origin of a spam message described here is not likely to be reliable. However, the method described here is very easy to do, and the resulting information might be useful.
As an example, consider the following spam message, which I received in the year 2004.

Example spam message, with HTML code having links to "buye-soft.biz"
Looking at the HTML code for this spam message reveals links to the Internet domain "buye-soft.biz".
A person can learn about a registered Internet domain, such as "buye-soft.biz", by doing a "domain name registration" query. This was historically called a "whois" query. The InterNIC Internet site is one of many sites offering a "whois" query service:
http://www.internic.net/whois.html
The following image shows the results of a "whois" query for information about the domain "buye-soft.biz", performed by the InterNIC Internet site.

Results of a "whois" query for information about the domain "buye-soft.biz", performed by the InterNIC Internet site.
I assume all of the information returned by this "whois" query is bogus -- except for the trivial details: domain name, domain ID, domain status, registrant ID, name server, created by registrar, last updated by registrar, and the dates.
The only information that I think is interesting is the "domain registration date", which in this case indicates that the domain was created less than one week prior to my receipt of the spam message that contained links to an Internet site within this domain.
This is a common pattern: (1) Register a new Internet domain name ($5 USD); (2) Start a web server, and have the new domain information refer to new server ($10 USD); (3) Wait 48 hours to ensure that the new domain information has enough time to propagate to various domain name service (DNS) servers around the world; (4) Send thousands or millions of spam messages, such that each message has links to the new domain name.
Thus, for a very small price, a person can establish a new Internet domain name, and can start a web server, and can send thousands or millions of spam messages -- all for an isolated attempt to spread a message, or to collect money, or to collect information, etc. The person does not need to worry about the future of the domain name or of the web server. Simply spreading the spam message might already, by itself, easily justify the small cost. However, even if the spam campaign relies on the web server remaining active for at least several days -- so that money or information can be collected from people -- there is only a small chance of being stopped by complaints to the web site hosting provider or being stopped by law enforcers. If the web server is stopped, then the person who sent the spam message can simply pay for more domain names and more server hosting contracts. In many cases, if only a single person in the whole world pays money to the person who sent the spam message, then the whole cost of the spam process might be worthwhile!
The obvious responses to this abuse of the Internet is to attempt to make the process of registering domain names more difficult, and to attempt to make the process of buying web hosting contracts more difficult. However, those responses would be futile, and would hurt more people than it would help.
5. Definition of "uninvited message"
Consider the following simple definition of a "spam message":
spam message : an uninvited message sent to a large number of recipients
That definition depends on the definition of an "uninvited message".
Messages from family, friends, and acquaintances, are implicitly "invited".
If a person broadcasts a message inviting feedback from the public, such as inviting people to add messages to an Internet forum (e.g., a blog), then there is an explicit invitation for messages from the public. However, typically there is also an implicit expectation that the messages will not be advertisements, and will not be extremely irrelevant, and will not interfere with the ability of other people to enjoy using the Internet forum (i.e., the messages will not be enormous, and will not contain computer viruses, and will not be disgusting to the sensibilities of the forum community, and will not be inflammatory or hateful). Some Internet forums explicitly specify the expectations or rules of using the forum.
If a person wishes to use an Internet service (such as being allowed to submit messages to an Internet discussion forum, etc), the service provider often requires the person to submit personal information, such as the person's e-mail address. The process of submitting personal information to get access to a service is often called "registration". Sometimes the service provider will verify the validity of a person's e-mail address by sending a message to the specified e-mail address and requiring an indication that the person received the message (by clicking a link with unique characteristics within the message, or by submitting unique information contained within the message).
If an Internet service provider requires a person to submit a personal e-mail address as a requirement of using a service, then the service provide might eventually send messages to the person. Some service providers clearly indicate how they will use any information that the service provider collects from a person. Sometimes the service provider will allow the person to choose whether or not the service provider is permitted to send messages to the person. However, in some cases, the service provider's description of how they will use personal information is vague or ambiguous. Also, unfortunately, sending e-mail "notifications" and advertisements (or "offers") from "partners and affiliates" is often part of online service contracts. Therefore, some people create temporary e-mail accounts specifically for the purpose of registering for Internet services, and thus avoid any abuse of the trust between the person and the service provider.
Abuse of the interpretation of "opting in"
During the years 2000-2002, many spam messages contained text similar to the following disclaimer:
"This message is not spam. You are receiving this message because you requested this message from this service, or opted-in to mailings from one of our affiliates."
It is not difficult to imagine that a giant corporation might have a slightly less monitored affiliate with slightly lower standards of business ethics. And it is easy to imagine that through the corporate equivalent of "Six Degrees of Separation" (i.e., the theory that any person on the planet is connected to any other person on the planet by way of, at most, six personal relationships) that eventually personal data submitted to, say, any giant corporation might actually, through a chain of "affiliation", be accessible to any arbitrary business, or outright criminals, on our planet.
6. Spam messages in other media types
It is interesting to consider that billboards on the sides of freeways, busses, and taxi cabs, might qualify as a kind of government approved and socially approved visual spam. However, I believe that if a proposition to eliminate all billboards were placed on a state ballot, the overwhelming majority of people would vote in favor of the proposition. The fact that billboards clutter the visual spaces in many cities proves that there is an ideological gap between the local governments and their constituents.
Billboards radiate data via photons in all directions without regard for the wishes of potential recipients. Audio loudspeakers radiate data via sound waves radiating in many directions without regard for the wishes of potential recipients. Postal mail bulk advertisements can be regarded as a physical form of spam messages. Some spam messages have been transmitted to facsimile machines. Automatic telephone dialers with recorded messages have been used to send audio spam messages directly in to individual homes.
Some of these "spam message" variations rely on the proximity of potential recipients, and thus the "technology" to avoid such spam messages is simply to move away from the emitter. But other variations of spam messages essentially bring the message very close to the target recipients, such as spam messages sent by postal mail or by a telemarketing telephone call. Here, the "technology" to avoid being distracted with the task of differentiating between spam messages and desired messages is limited to "requests to block bulk postal mail" and registering with the "national 'do-not-call' list" (which relies on vigilant consumers, and laws to act as a deterrent for would-be violators).
7. Definition of "message"
Obviously, when defining "message" for the purposes of defining "spam message", the definition of "message" must be based upon the idea of the "intention" of the data received by each recipient. Otherwise, simply permuting the sentences of a message might be regarded as producing a new, distinct "message".
Almost all spam messages today that rely on plain text to convey the message (instead of using images to convey the message) contain procedurally generated text that is unique per recipient.
Some of this procedural text is comprised of words randomly selected from a dictionary (to defeat word-frequency filtering, or word-pair Bayesian filtering), or is comprised of random grammatical sentences (to defeat filters that check for basic grammar). Some of this procedural text is comprised of paragraphs of text selected randomly from various sources, including from Internet news sites, and reference articles (e.g., from Wikipedia), and classic texts (e.g., from Project Gutenberg or the Bible); i.e., text which will outweigh any "spam indicators" found elsewhere in the spam message.
Also, the core text of a spam message (i.e., the intended communication of the spam message), can be procedurally modified to be unique per intended recipient. This can occur at the character level, and word level, and sentence level, and paragraph level. Misspellings can be introduced, especially using look-alike characters (e.g., '0' versus 'O'; or using the many similar-looking characters in the Unicode character set). Transposing adjacent letters within a word ("Viagra" versus "Vaigra") will not interfere with human understanding, but will increase the work required to detect "spam indications". Sentences can be permuted to further complicate any attempt to recognize similar messages.
Obviously, defining "message" as a literal sequence of characters (or bytes) cannot be used to define "spam message" because every single spam transmission intended for each individual recipient can contain unique text, despite the fact that all of the spam transmissions are intended to convey the same "message" or "idea".
8. Methods which fail to significantly reduce the amount of spam messages
8.1 Laws
The creation of laws to indirectly cause the reduction of the amount of spam messages is probably based on the following assumptions:
(1) The existence of tough laws against sending spam messages will serve as a sufficient deterrent for potential senders of spam messages;
(2) The person responsible for sending the spam messages can be identified;
Reasons why laws cannot significantly reduce spam messages include:
(1) Spam messages can originate in countries which do not have laws against spam messages. Or, spam messages can originate in countries which do not have sufficient resources to enforce laws against spam messages.
(2) Spam messages can originate from any of the billions of people on our planet with access to the Internet. Although laws might be a sufficient psychological deterrent for the vast majority of the world population, only a few courageous people are required to generate billions of spam messages per day.
(3) The connection between businesses and spam message campaigns will be increasingly difficult to make, especially if there are a few cases in which competitors or hackers seek to implicate a company as a sender of spam messages (by secretly initiating a spam message campaign on that company's "behalf"). Such a scenario, among others, would introduce doubt regarding the simplistic argument that the person or company which benefits from a spam message must have sent the spam message. There is, rightly, a large amount of plausible deniability in this context.
In particular, laws cannot prevent spam message compaigns started spontaneously by a person on his or her own personal initiative on behalf of a political agenda, or on behalf of a publically traded stock, or on behalf of a social agenda, etc. The person can initiate the spam message campaign totally anonymously, possibly by distributing computer viruses which will eventually transmit spam messages. A person with a modest amount of programming ability can compromise an e-mail system and use it to spread a message that was not endorsed by the organization that might ultimately benefit. A person who initiates such a spam message campaign can promote an agenda, which, if successful, will somehow indirectly benefit the person who initiated the spam message campaign. For example, if the spam message campaign promotes a publically traded stock in which the person who initiated the campaign (while remaining totally anonymous) is invested, then the person will benefit from any increase in the stock price, as will thousands of other unrelated people invested in the same stock. Such spam message campaigns achieve a kind of "advocacy laundering", relying on the large number of potential advocates and the large number of potential beneficiaries.
(4) Although this is more of an observation than an explanation, it is interesting to consider that many spam messages are far more illegal than simply wasting people's time with unwanted advertisements. Spam messages can be used to distribute viruses (for mere destruction, or for surveillance and spying, or for propagating more spam messages, or for using computing resources to solve difficult computing problems). Other spam messages are used as part of an "identity theft" campaign (such that the spam message directs a person to a fake clone of a banking Internet site or commerce site (e.g., eBay) which requests and collects confidential information, often, ironically, with the claim of "increasing security" through "verification"). The people who send such spam messages are already aware that they are disobeying laws! Indeed, such people hope to commit crimes which are far worse than the crime of sending spam messages. Therefore, laws against sending spam message will not serve as a deterrent to such people.
8.2 IP address blacklists, or sender e-mail address blacklists
The following describes the idea of an Internet e-mail blacklist service:
An Internet e-mail blacklist service is a service that manages a list of IP addresses of servers or Internet domains from which alleged spam messages have been sent recently. The service offers the list for anyone to download at any time. A person can download this list from the service daily. When a person receives an e-mail message, the person can check the IP address from which the message apparently originated, and if the IP address is one of the IP addresses in the list, then the message is classified as a possible spam message. Also, if a person receives a message and decides that the message qualifies as a spam message, then the person can submit the IP address to the blacklist service, so that the IP address can be added to the current list.
There are many reasons why any attempt to reduce the number of spam messages using an Internet e-mail blacklist service will fail, and will instead cause many new, extremely bad problems:
(1) Increasingly, spam messages for a specific spam campaign are not sent by a single Internet server (with a readily identified IP address), but are instead sent by thousands or millions of random personal computers (PC) infected with computer viruses. The computer viruses are controlled and coordinated to send spam messages for a specific spam message campaign. In such a situation, a blacklist service cannot possibly do anything good. A blacklist which included the Internet domains or the dynamically allocated IP addresses of individual people using cable modems at home or at the office would cause widespread, and seemingly random, communication difficulties.
A computer virus which sends spam messages can be embedded in to any computer program. Millions of people, and perhaps even billions of people, have computer software which has been downloaded from various Internet sites. There are thousands of sources for each of the many popular software files on the Internet, and determining the reputation of a site on the Internet is often difficult. Even Internet sites with good reputations might unknowingly allow some software with computer viruses to be offered for download for some time. Any of the thousands of popular computer programs offered on the Internet can contain computer viruses that will be used to send spam messages from any infected personal computer. Also, some people download software that would ordinarily cost money but which has been modified by computer hackers to enable people to use the software without paying -- and, obviously, the modified software can contain a mechanism to send spam messages. In all of the situations mentioned here, the computer viruses did not need to break through any security defenses that a person might have on their personal computer; instead, the computer viruses are contained in software that was willingly invited in to the computer (and such computer viruses are therefore called "Trojan" viruses). Relying on "anti-virus" software to detect and eliminate such computer viruses is futile, because new computer viruses are created every hour, and even if an "anti-virus" program consulted an Internet database of viruses every hour, there would still be some time for infections to occur, and viruses could therefore have some opportunity to send thousands or millions of spam messages.
(2) The previous item (i.e., (1)) makes sending spam messages from a single Internet server (with a readily identified IP address) obsolete. But even attempts to blacklist individual Internet servers which have been observed sending spam messages is a futile and dangerous idea. By the time a blacklist is updated to include each new origin of spam messages, the spam campaign will already have finished, and the IP address will be discarded. Meanwhile, an innocent person can soon inherit the discarded IP address, and it would thus be unfair to continue to include the IP address in the blacklist. Although there might be a mechanism to protest the inclusion of an IP address in a blacklist, this might be impractical if there are many independent blacklists around the world, each with their own complaint resolution process. (Nobody would be able to trust a single, centralized blacklist! The temptation to accept bribes to temporarily blacklist domains would be huge! News stories could be suppressed, and hence stock prices or government votes could be influenced, by brief blacklist campaigns.) Domain name registration is as low as $10, and renting the use of an Internet web server in a data center can be very inexpensive and can be without any significant commitment. Do a "whois" lookup on the domain name associated with a spam message (if any such links exist in the body of the message), and you might discover that the domain was registered only a few days prior to your receipt of the spam message -- a delay only long enough for the domain name registration to propagate to DNS servers around the world. Even if the source of a spam message is added to a blacklist on the same day as a spam message campaign, the spam message has already reached most of the intended recipients. Even if the messaging mechanism relies on the messages being stored on the sender side of the communication, there will still be some time during which people can access the spam message.
(3) Malicious people can cause the blacklist to include arbitrary Internet addresses, and can therefore cause arbitrary, innocent Internet sites or services to be blocked. By studying how entries are added to a blacklist, a malicious person can invent a mechanism to add arbitrary entries. A blacklist is a very dangerous idea, because it has the power to block things, without any chance for appeal or review, and often without any evidence that anything was blocked. Power is always valuable, and there will always be people who would be willing to pay money or trade favors to access power. There are many people who would benefit financially or politically by being able to block the flow of information, even if the blocking occurs only for several minutes or hours, because after stocks have been traded, and votes cast, the benefit of blocking has already been realized. Even computer hackers who only want a thrill would attempt to adding anything and everything to any blacklist they could find!
An automated blacklist system which attempted to determine the beneficiary of a spam message (by identifying all Internet addresses mentioned in a spam message; e.g., by each link URL in the HTML) could be exploited by malicious people by simply sending many spam messages such that each spam message contains links to innocent Internet addresses. Thus, a malicious person can include links to reputable Internet sites within a spam message, even though the spam message is actually affiliated with any of the mentioned reputable Internet sites.
(4) Creating a blacklist creates the possibility that legitimate messages will be blocked. If a blacklist contains invalid entries, or if the entries in a blacklist lack precision, then the effect might be a broad interference with overall communication or a reduction of freedom to communicate. This would be a disaster scenario.
Some countries (e.g., China) use Internet blacklists to prevent their citizens from learning the truth about history, and to prevent their citizens from discussing certain ideas.
Some corporate Internet web sites refuse traffic referred by other competing sites.
Internet search services, which many people rely on as an "unbiased representation of Internet content", must actually provide a biased view of the whole Internet -- a bias that prioritizes information in accordance with the specific search being performed. However, such search services might prioritize search results according to more than mere relevance; for example, a person can pay the search service to increase the priority of specific search results. Also, a search service might be compelled, by domestic and international laws, to block certain results from appearing among the search results.
Blacklists hurt democracy, because a democracy depends on being able to gather or receive information without any bias in the mechanism of gathering or receiving. When major "news" sources fail to deliver information without bias, the