Introduction
In cryptography publishing the algorithm used is an important principle (called Kerckhoffs' law). Many systems have attempted to achieve "security through obscurity" meaning that the algorithm was secret, but this is flawed, because the algorithm is eventually reverse engineered and weaknesses are found -- for example, this was the case with GSM A5/1 digital cell phone encryption. Thus, we must assume that an attacker will know everything about the system except for the encryption key. We strongly support this principle and thus describe our algorithms in detail below, which are based on internationally accepted cryptography standards.
Adherence to Standards
Generation of encryption key from pass phrase: Follows the PBKDF2 standard as described in RFC 2898.
Encryption algorithm: AES-256 in CTR mode (with random 16-byte nonce per block). AES is the new international standard for encryption, and the AES-256 algorithm has been approved by the US government to encrypt TOP SECRET documents. (See Wikipedia articleon AES.) If you have chosen a strong pass phrase you can be assured that your data is encrypted with the most secure algorithm available.
Data integrity: HMAC-SHA-256 [SHA-2] as described in RFC 2104 (SHA-256 is stronger than SHA-1 by 96 bits). (See Wikipedia article on SHA-256.) Use of this mechanism ensures that the data that is restored is exactly the data that was backed up.
Discussion on Implementation
The AES algorithm implementation adheres to the FIPS-197 specification. Our CTR implementation adheres to NIST Special Publication 800-38A. AES-256/CTR is also discussed in the context of IP-SEC in RFC 3686. Note that our algorithm below is even more secure, because even the IV (initialization vector/counter) is encrypted, thus making CTR mode suiteable for static keys. HMAC-SHA-256 is discussed in the context of IP-SEC in RFC 2404.
The comments about AES-256/CTR's suitability for permanent storage relate to the principle that you must never re-use the same keystream, which depends on the IV. We use a 16-byte cryptorandom IV (nonce), which causes it to randomly choose one of any of the 2^128 keystreams. According to the birthday paradox, the chances of two blocks selecting the same keystream are 1 in 2^64 (18 quintillion -- i.e. 18 million trillion). A new cryptorandom IV is used for each 2KB block. Thus, you will have to backup about 37,000 exabytes (billion gigabytes) of information before this aspect of the algorithm becomes a problem.
Currently it is estimated that the world produces about 5 exabytes of new information each year (see Berkeley study). Thus, even if you backed up all of the new information produced by the world for the next 1000 years it would still be extremely unlikely that two data blocks used the same IV (nonce) and thus the same keystream. (Note: theoretically speaking, if two data blocks did use the same IV it would only leak information if an attacker knew the exact plain text (unencrypted data) of the data in one of the blocks, which would allow the attacker to easily calculate the key stream for the other block. Also, the attacker would have to identify which 2 blocks used the same IV out of the 18 quintillion blocks.)
Thus, our implementation of the AES encryption algorithm follows internationally accepted standards and uses the very best cryptography available to protect your data.
Psuedocode
Notes:
- pwd_to_key implements the PBKDF2 standard to generate encryption keys from a textual pass phrase as described in RFC 2898.
- E[k](data) is the encryption of data with the key k. Likewise, D[k](data) is the decryption of data with the key k. In this case we're using AES-256 for the block cipher.
- ^ means XOR and || means concatenation (standard notation in cryptography psuedocode)
Encryption algorithm:
keyA || keyHMAC = pwd_to_key(pwd=(text of pass phrase),salt=(deterministic function of pass phrase),iterations=50000, outLen=64 bytes)
NOTE: This means the encryption key is the first 32 bytes and the HMAC key is the last 32 bytes of the output of PBKDF2.
NOTE: Salt cannot be random because encryption depends solely on pass phrase. Salt serves only to prevent dictionary attacks.
nonce = cryptorandom 16 bytes
keyN = pwd_to_key(pwd=keyA,salt=keyHMAC,iterations=1000, outLen=32 bytes)
enc_N_nonce = E[keyN](nonce)
enc_datastream = enc_N_nonce || (datastream[0..15] ^ E[keyA](nonce+1)) || (datastream[16..31] ^ E[keyA](nonce+2) || ...)
enc_datastream = enc_N_nonce || (datastream[0..15] ^ enc_A_nonce_1) || (datastream[0..15] ^ enc_A_nonce_2) || ...
stored_datablock = enc_datastream || hmac(key=keyHMAC, data=enc_datastream)
Decryption and integrity verification algorithm:
keyA || keyHMAC = pwd_to_key(pwd=(text of pass phrase),salt=(deterministic function of pass phrase),iterations=50000, outLen=64 bytes)
keyN = pwd_to_key(pwd=keyA,salt=keyHMAC,iterations=1000, outLen=32 bytes)
enc_N_nonce || enc_datastream || stored_hmac = stored_datablock
dec_N_nonce = D[keyN](enc_N_nonce)
dec_datastream[0..15] = enc_datastream[0..15] ^ E[keyA](dec_N_nonce+1)
dec_datastream[16..31] = enc_datastream[16..31] ^ E[keyA](dec_N_nonce+1) ...
calc_hmac = hmac(key=keyHMAC, data=enc_datastream)
fail if calc_hmac != stored_hmac
Conclusion
We have built a system based on the best international cryptographic standards available and provide the highest grade of security possible. Your data's protection does not rely on some secret, custom algorithm that we alone have written, but rather relies on an algorithm peer reviewed and selected over a period of years by the world's best cryptographers and the U.S. government. Thus, the strength of the security protecting your data relies solely upon the strength of the pass phrase that you create.
Posted by Kevin Hoffman on May 14, 2006 04:55 AM
