Base64 and emails
#1
About a day ago, someone asked the question "why are emails encoded into base64, they can easily be deciphered?"

I've always known that base64 was the default encoding for emails, and never really looked into the "why is it like that". So this morning I've spent some time (before starting work) on figuring out why emails are encoded into base64. The following is what I have come up with:


The earliest iterations of NCP as used by ARPAnet where more like bit streams than byte streams, or attempts to negotiate a byte size (8 byte didn't appear until later on). There have been several attempts in creating a file transfer protocol, since mail was initially an FTP function as the MAIL and MLFL commands and later split into MTP/SMTP.

These machines usually had various character encodings from - ASCII vs EBCDIC - or even different byte sizes (6 bit vs 8bit vs ...). So as you can assume, mail transfers where originally defined to send short messages in plain text, specifically encoded into NVT-ASCII.

According to RFC 772:

Code:
MAIL REPRESENTATION AND STORAGE

Mail is transferred from a storage device in the sending host to a storage device in the receiving host. It may be necessary to perform certain transformations on the mail because data storage representations in the two systems are different. For example, NVT-ASCII has different data storage representations in different systems. PDP-10's generally store NVT-ASCII as five 7-bit ASCII characters, left-justified in a 36-bit word. 360's store NVT-ASCII as four 8-bit EBCDIC codes in a 32-bit word. Multics stores NVT-ASCII as four 9-bit characters in a 36-bit word.
For the sake of simplicity, all data must be represented in MTP as NVT-ASCII. This means that characters must be converted into the standard NVT-ASCII representation when transmitting text, regardless of whether the sending and receiving hosts are dissimilar. The sender converts the data from its internal character representation to the standard 8-bit NVT-ASCII representation (see the TELNET specification). The receiver converts the data from the standard form to its own internal form. In accordance with this standard, the sequence should be used to denote the end of a line of text.

Whilst 8 bits where being transferred usually the 8th bit would get mangled or discarded, since it was not a requirement to preserver the final bit.  In fact, some protocols required the 8th bit to be set to zero/NULL. Such as the initial SMTP RPC (see below) which was not 8bit clean;

Code:
Data Transfer

The TCP connection supports the transmission of 8-bit bytes. The SMTP data is 7-bit ASCII characters. Each character is transmitted as an 8-bit byte with the high-order bit cleared to zero.

This continued for a long time even after 8bit ISO-8859-# character encodings reared their ugly heads. Even though some servers where already 8bit clean, many weren't, thus blindly sending 8bit messages all over the place would cause issues, obviously.

Later on, Extended SMTP was published, which in turn allowed mail servers to declare SMTP extensions they supported. One of which was 8BITMIME. Which indicated that the receiving server could accept 8-bit data successfully and safely. MIME messages can have "Content-Transfer-Encoding: 8bit" headers, or parts of them can anyways, indicating that they are not encoded in any way, shape, or form.  

However, the SMTP protocol remained based on the 998 octet line limit, using a "." line (0D 0A 2E 0D 0A) at the end of the message, considering itself the "end of message indicator". This would mean that even most binary files could be sent unaltered (CAN YOU FUCKING IMAGINE?). This would mean that, theoretically, it is possible that ANY files sent that contained the end of line sequence, would be taken as the end of line and executed as an SMTP command. Similarly a line that would be longer then the 998 octets might be cut by the receiving server.

Now that you have a little background on what was happening, lets talk about what they did about it;

In the early 2000's (around 2000 I believe) the BINARYMIM ESMTP extension showed up on the scene, and was published as RFC 3030. This allowed transfers of raw binary data over SMTP. The message would be transferred in chunks of "pre-idicated" lengths with a chunk of zero length used as the end indicator/terminator (kind of like how in C you have to use the \0 at the end of strings), this makes base64 deprecated. Unfortunately, not many SMTP servers support this. For example, neither Postfix nor Exim4 advertise CHUNKING in reply to EHLO. To take advantage of BINARYMIME, it would have to be supported by all servers in the message path, which can be more than just one or two.  

TL;DR: We use base64 because we are to lazy to move to the latest and greatest way of sending mails over the internet.

Hope you enjoyed your history lesson for the day!
Reply
#2
That's pretty interesting, thanks for the information. With regards to the thread where the question was originally posted. If you look at the code i posted you can see that the author imports the SMTP lib. As far as i am aware the SMTP lib handles all the encoding needed to send mails. Yet the author decided to encode the attachment in base64 as well, my question would be. If the SMTP lib handles all encoding related to sending the email, why would you need to encode the attachment if not for the fact you want to obfuscate what your attachment is in order bypass any filters designed to keep malicious email out that contains certain supicious looking strings. Kind of like MailAdmin, that you designed for your mail server?
Reply
#3
(05-30-2018, 02:05 PM)Vector Wrote: That's pretty interesting, thanks for the information. With regards to the thread where the question was originally posted. If you look at the code i posted you can see that the author imports the SMTP lib. As far as i am aware the SMTP lib handles all the encoding needed to send mails. Yet the author decided to encode the attachment in base64 as well, my question would be. If the SMTP lib handles all encoding related to sending the email, why would you need to encode the attachment if not for the fact you want to obfuscate what your attachment is in order bypass any filters designed to keep malicious email out that contains certain supicious looking strings. Kind of like MailAdmin, that you designed for your mail server?

Well SMTP lib does support the encoding, but at the same time, if you encode it into base64, and it gets re-encoded into another base64 counter part. As you said, it would possible bypass the filtering, but it would then show up to the end user as a jumbled mess of base64 encoded strings.

@Null look at this https://docs.python.org/3/library/email.header.html it seems like the encoding isn't handled by SMTPlib, rather it's handled by python itself
Reply
#4
(05-30-2018, 03:32 PM)ekultek Wrote:
(05-30-2018, 02:05 PM)Vector Wrote: That's pretty interesting, thanks for the information. With regards to the thread where the question was originally posted. If you look at the code i posted you can see that the author imports the SMTP lib. As far as i am aware the SMTP lib handles all the encoding needed to send mails. Yet the author decided to encode the attachment in base64 as well, my question would be. If the SMTP lib handles all encoding related to sending the email, why would you need to encode the attachment if not for the fact you want to obfuscate what your attachment is in order bypass any filters designed to keep malicious email out that contains certain supicious looking strings. Kind of like MailAdmin, that you designed for your mail server?

Well SMTP lib does support the encoding, but at the same time, if you encode it into base64, and it gets re-encoded into another base64 counter part. As you said, it would possible bypass the filtering, but it would then show up to the end user as a jumbled mess of base64 encoded strings.

@Null look at this https://docs.python.org/3/library/email.header.html it seems like the encoding isn't handled by SMTPlib, rather it's handled by python itself

Hmm, weird then why the author of the script i posted would base64 encode the attachment then.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
   Spoofing Emails - The Effective Way Corvo 4 6,653 10-13-2021, 11:05 AM
Last Post: motda
  Windows Privilege Escalation (And more...) Insider 1 9,657 08-13-2021, 10:34 PM
Last Post: parzival
  [Links] Resources - Wargames and Hacking Challenges Insider 17 54,939 07-20-2021, 05:57 PM
Last Post: Insider
  eLearnsecurity, offensive security and SANS dangcracker 16 42,596 07-16-2021, 05:43 PM
Last Post: z3r0dwn