Why we need authenticated cryptography
Genesis
I was asked the question "Can someone modify an encrypted document without ever decrypting it?". The answer is yes, although it is quite counter intuitive.
For that reason I recommend using the GCM mode which includes the messages authentication directly. If you are using another mode, for example CBC (which is what many libraries default to) you need to include a HMAC (Hash-based Message Authentication Code) after having encrypted your data. For asymmetric cryptography it is the same: cryptographically sign your messages to detect tampering.
If you don't know what the mode is, it simply describes how to apply the cipher when you have more than a block (generally 16 bytes) of data. It is generally written next to the cipher name. For example AES-256-GCM is using the AES cipher with a 256 bits key and the GCM mode.
The simulation
In order to demonstrate how an attacker could modify an encrypted message I wrote a little bank simulation in python. You can find the full code here.
If you know python you may find some design choices weird. My primary goal was to produce a code that even a beginner can understand and modify to experiment other scenarios.
Let's launch it to see how it goes. To halt it use Ctl+C.
$ python demo_nomac_tamper.py
[0] Requesting a transfer of $6230 from Alice to Bob
5844c2b7c2a31ec2bf47641e23c295c28fc28f54c38dc29e
[0] Transfered $6230 from Alice to Bob
[1] Requesting a transfer of $4731 from Denis to Alice
5b47c2bcc2bc3ac2af6a621536c390c381c28853c282c29fc3bf0f
[1] Transfered $4731 from Denis to Alice
^C
-----------
Alice $8501
Bob $16230
Charles $10000
Denis $5269
Eve $0
We can see some money transfer requests between different persons. The hexadecimal string in-between the request and confirmation is the encrypted message as seen by someone eavesdropping on the network. That person is Eve, the malicious user that would very much like stealing that money as she has nothing left on her account.
For now she is only looking at the packets, but let's turn the BE_EVIL flag at the top of the file and relaunch the demo:
$ python demo_nomac_tamper.py
[0] Requesting a transfer of $3115 from Alice to Bob
5844c2b7c2a31ec2bf47641e23c290c28cc28d51c38dc29e
[0] Transfered $3115 from Alice to Eve
HACKED!
[1] Requesting a transfer of $1734 from Bob to Charles
5943c2b4c2ad33c2b65d7b3930c381c381c28d53c282c29ac3bf0f
[1] User not found Dqfrles
^C
-----------
Alice $6885
Bob $10000
Charles $10000
Denis $10000
Eve $3115
We halted the program at two messages but you can continue. We see that a transfer request from Alice to Bob was confirmed as being a request from Alice to Eve! And we also see that a transfer from Bob to Charles gave a strange error about an user Dqfrles... What is happening there?
We just modified the encrypted message, betting on the fact that it would be for Bob, so that "Bob" actually decrypts to "Eve". As all messages aren't for Bob the others don't decrypt well now, but as the bank will just ignore them that's no concern for us. So we aren't taking Bob's money but we hijack any transfer to his account (and block any other transfer in this simple case).
Why was it possible?
There are lots of useful comments in the code to explain what we did and how we did it. I'll focus here on the more mathematical part.
I used the RC4 cipher which is a stream cipher. This means that I don't encrypt data a block at a time: for each plaintext byte P I will compute a pseudo-random byte R and XOR the two giving me an ciphered byte C.
Of course the random byte isn't really random, it is part of a random-looking sequence generated by the key so that it can be deciphered. But we aren't interested in that part, what is interesting is the XOR and the fact that the first byte of the ciphertext will be the first byte of the plaintext.
XOR-ing two identical bytes gives 0, and this operation can be done in any order. That's how the cipher works, to decipher the message just generate the same random bytes and XOR them with the ciphertext:
This also means that if we XOR some other M byte with the ciphertext it will be reflected in the plaintext:
We can use this to our advantage. Let's suppose that the plaintext byte is a b, then we can do the following.
If P is indeed a b this will simplify into:
We will have changed our b into 0! But that only works if we know what byte to change and what value we expect it to be. All that's left to do is to also XOR the letter we want instead, so an e in our case:
And that's how we were able to change three bytes. Of course when the original bytes aren't what we expected them to be it won't decrypt to anything useful. But as most communications use very structured formats, emails for example, it is often possible to find something to modify to our advantage without ever decrypting the message.