Hash and MAC

- Generates from undetermined size message M, a fixed size hash value h.
- Properties
- It is easy to calculate h from M
- It is practicly impossible to calculate M from h.
- Collision resistance
- It is hard to find M' so that
- It is hard to find such M and M' that

- How many persons is needed in the room in order to have over 50% chance that someone in room has same birthday than you do? A: 253
- How many persons has to be in the room so that it is over 50% chance that two persons have same birthday? 23
- 23 persons form 253 different kind of pairs (1+2+3+…+22 = 253)

- for m-bit hash generated messages is required in order to find out message that has same hash than message M.
- Computer that can calculate 1 million hashes in second, will take average of 600 000 years to find a message which has same 64-bit hash than M.

- For m-nbit hash generated messages are required in order to find two different messages that produce same hash value.
- Computer that can calculate 1 million hashes in second, will take average of one hour in order to find two messages that has same 64-bit hash.
- When birthday attack is feared, hash size should be doubled. (usually is)

- A creates two contract documents. One is beneficial to B and the other forces B to bankruptcy.
- A makes small changes two the contracts e.g. adds spaces before line feed and then generates huge amount of different variations of these two documents.
- A calculates hash values of these documents until she finds two documents that has same hash. A stores these two documents.

- From these documents A sends the document beneficial to B, for signing. (the protocol where hash value is signed is used.)
- Later A switches the documents and claims in court that B signed the document, which drives B to bankrupt.
- Attack can be prevented by making small change (e.g. adding space) in the document before calculating hash and signing it.

- Divide message in blocks
- Feed the block to hash function along with the hash value of previous block

- The result from last block is the hash.

- If hash function provides too short hash, it can be enlarged and thus made safer
- Generate hash value with hash function.
- Concatenate hash value in the begining of the message
- Generate new hash value from the new messages
- Concatenate the hash values
- Run through until you have the required size hash value.

- Developed by Ron Rivest 1992 and is based on MD4. (RFC 1321)
- Returns 128-bit hash value from message.
- The data has to be divided in 512 bit blocks which are run through the hash function one after another.
- The message is padded so that it's length n is
- padding is done by adding one and after that required amount of zeroes.

- At the end, 64-bit value is added that states the length of message. (448+64 = 512)
- data to be hashed is thus k*512 bits, where k is integer.

- Initialisation.
- Initialise four 32.bit chaining variables (CV) A,B,C,D

- MD5 operation
- Compression function (512 bits in, 128 bits out)
- A,B,C,D are copied in variables a,b,c,d
- The incoming 512-bit block is divided in 16 sub blocks that 32-bits long
- The main loop consists of four rounds(F,G,H,I).

- For every round following function is run 16 times.

- a,b,c,d are variables a,b,c,d which switch places in every round (4 values, 16 different combination, 16 rounds)
- i is current round
- is integer part from function , where i is in radian
- is the i:th 32-bit subblock of message

- s is the amount of shifts that is dependant from current round (1-16) and mainloop rounds (1-4)

- Function g is round dependant.

- Analysis of functions F,G and H:
- F is if function (if b, then c else d) (b?c:d)
- G is also if function (d?b:c)
- H provides parity bit

- After 4 rounds a,b,c,d are added to A,B,C and D individually using modulus
- , , ,

- Take next 512-bit block for handling.
- The final hash is A,B,C,D concatenated i.e. 4*32bit = 128-bit hash

- Each step has their own constant
- Each step affects to next step in order to cause more efficient avalanche effect i.e. the effect of one bit is higher to the resulting hash.
- Change of one bit in the original text will change the whole hash, not just one bit.

- Due, going through the basic operations several times, it is highly unlikely that two random messages will produce same hash value.
- The use of multiple basic operations makes it unlikely that two random messages will provide same hash value.

- Some weakness have been found from MD5 compression algorithm which can cause different types of collisions.
- Attacks against single 512-bit inputs in order to find two blocks that causes the same 128-bit result
- md5 collision demo: http://www.mscs.dal.ca/~selinger/md5collision/

- On crypto 2004 conference, a research was published which showed some weaknesses in MD5. (http://www.schneier.com/essay-074.html)
- Same problem applies to SHA-0 and RIPEMD160

- SHA-1 is and algorithm developed by NSA for DSS (Digital Signature Standard) and is thus the algorithm of SHS (Secure Hash Standard)
- SHA-1 is based on MD4 algorithm (as was MD5) developed by Ron Rivest at 1990.
- Maximum message length for SHA-1 is bits from where SHA-1 provides 160-bit hash value
- SHA-1 vs MD5
- SHA-1 uses five vectors where MD5 used 4 (128+32 = 160)
- SHA-1 has 4*20= 80 rounds while MD5 had 4*16=64 rounds
- SHA-1 bit functions are slightly different

- Compared to MD5
- The hash size of SHA-1 is 32 bits longer and is thus more resistant for brute-force attacks
- SHA-1 is slightly slower and requires bit more memory(more rounds, longer vectors to get 160 bit hash)

- SHA-1 uses input bits more often in during the hash function than MD5

- On February 15th, 2005 Bruce Schneier stated on his blog:

“SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing. The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu (mostly from Shandong University in China) have been quietly circulating a paper announcing their results:- collisions in the the full SHA-1 in hash operations, much less than the brute-force attack of operations based on the hash length.
- collisions in SHA-0 in operations.
- collisions in 58-round SHA-1 in operations.”

- Currently all MD5 style Hash functions are susceptible.
- Competition for generating new hash standard will be opened by NIST

- Block cipher based hash function (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
- Uses block cipher W, which is based on AES.
- W is specifically designed to be used for hash function

- W is specifically designed for use with hash functions

- Follows The Miyaguchi-Preneel one-way compression function principle
- One way compression function is a function which takes two fixed length inputs to provide single output which size is same as one of the inputs. Process is irreversible like for hash functions.
- The next block is encrypted by using the result of encryption of previous block XORed with key of previous block and plain message

- Provides 512 bit hash value
- Block size of W is 512 bits (AES has 128 bit block size)

- Has been shown to be resistant on known vulnerabilities of block cipher based hash functions
- Block ciphers do not provide proper randomness as default (They are designed to be invertible), which may result in weaknesses
- Block ciphers are slow compared to hash functions
- W has been designed to be fast

- block cipher based hash function provide hash that is size of the block
- Normal block size 64 or 128 bits, W block size 512 bits

- Endorsed by NESSIE (New European Schemes for Signatures, Integrity and Encryption)

- Message Authentication code, cryptographic checksum.
- Hash value that is dependant on secret key.

- Use of MAC
- The integrity of the data can be confirmed with MAC
- Only those who know what the password is can count MAC. therefore MAC can be saved along data at the same public place.

- Communication packet integrity can be verified by calculating MAC from the packet and it's sequence number (IPSec)

- MAC is not signature.
- Anyone who knows the key can calculate MAC

- When used key in MAC is published then the MAC value is equivalent to hash

- Both can be used to confirm the integrity of data.
- With MAC the information can be read without decrypting it or without knowing password.
- Authenticity of data can be validated on need basis

- MAC can confirm the data authenticity longer than symmetric encryption
- In symmetric encryption the key has to be known by those who are allowed to see the data
- MAC can be saved even when the data is used

- MAC key is stronger against brute force, when key k > than the size of MAC n
- For each message there exists keys that provide same MAC value.

- Easiest way to produce MAC is encrypting hash value with symmetric algorithm
- A message is encrypted by using block cipher in CBC mode.
- After the message is encrypted, the result coming out from the encryption of final block is the MAC value.

- The algorithm can be strengthened by encrypting the final value with another key

- Creating MAC by adding cryptographic feature in hash function
- Hash functions are faster than block ciphers

- design principles
- Uses existing hash functions
- Hash function should be easily replaceable to another one, when necessary
- hash function performance should be preserved
- simple use and handle of key
- Provable security, independent from hash function properties.

- Key K is expanded by adding zeroes on left to get , which size is b bits. b is the size of the hash block (e.g. the amount of bits the hash takes in.)
- XOR with ipad (ipad is repeated times). The result is b bit long block
- Concatenate M with ( in the beginning)
- Run the result from step three through hash function H.
- XOR with opad (opad is repeated times). The result is b-bit long
- pad the result from step 4 to be b bits and concatenate with ( in the beginning)
- Run the result from step 6 through hash function to get MAC value

- use of ipad and opad generates two different keys from original key K.
- Due the hash function both keys affects randomly to the result
- ipad and opad both flip half of the bits of K, thus the generated keys are pseudorandomly generated.

- With long messages the algorithm is just slightly slower than the used hash function
- Three extra executions of hash function is needed (for on the step 4, on step 7 and for the hash itself on step 7)

- and are precomputed and run through hash compression function.
- the compressed is used as an IV for the Hash function
- The result from hash is padded to b bits and run through the compression function of hash using compressed as an IV
- Only one more instance of compression function is executed compared to the normal hash.
- and compressions have to only be calculated when key is changed.

- It can be proved that security is directly dependant on the security of hash function.