Cryptographic hash functions play a fundamental role in modern cryptography. They are used to compress messages of arbitrary length to fixed length hash values which are also called hash codes, hash results, message digests or digital fingerprints. A primary motivation for cryptographic hash functions is that they serve as compact representative images of input messages, which they can uniquely identify. For example, the MD5 hash code "72c067b50c90837c95186d019f686700" can be looked on as the "fingerprint" to represent this thesis title uniquely. Changing the title, even by a single letter, will change most of the digits in the hash code.Hash functions occur as components in various cryptographic applications (e.g. protection of passphrases, protocols for payment, broadcast authentication etc.) where usually their property as a computational oneway function is used. Hash functions are frequently used in digital signature schemes to compress large messages for processing by publickey cryptosystems such as RSA. So the study of the hash functions is of great significance in the cryptanalysis field.In addition to these the two basic properties  compression and ease of computation  cryptographic hash functions needs to satisfy the following three security properties:Preimage resistance: for any prespecified output y, it is computationally infeasible to find an input x such that h(x) = y.Secondpreimage resistance: for any input x, it is computationally infeasible to find another input x' such that h(x) = h(x') Collision resistance: it is computationally infeasible to find any two distinct inputs x, x' with the same output, i.e., h(x) = h(x').Hash functions can be partitioned as: hash functions based on block ciphers and dedicated hash functions; In 1990, Rivest introduced the the first dedicated hash function MD4[39], designed using basic arithmetic and boolean operations. After MD4, several dedicated hash functions were proposed, including, in rough chronological order, MD5[40], HAVAL[56], RIPEMD[38], RIPEMD160[13], SHAl[20], SHA2[21], etc. Although these hash functions are more complex than MD4, they all follow the same design philosophy and have a similar structure. So they are called the MD4family. MD5 and SHA1 are currently the crucial technology that electronic signatures and many other password securities use throughout the international community. They are widely used in banking, securities, and ecommerce. SHA1 has been recognized as the cornerstone for modern Internet security.The first analysis of MD4 and MD5 were made by Merkle (unpublished), den Boer and Bosselaers [11, 12], and by Vaudenay [42]. Then Dobbertin developed a new general technique to attack MD4family hash functions. In 1996, Dobbertin presented a successful attack on MD4 which find a collision with complexity 2^{22}[14]. In 1998, Doberrtin[18] showed that the first two (out of the total three) rounds of MD4 are not oneway. In the rump session of Eurocrypt'96, Dobbertin [15]presented a semi freestart collision of MD5 which consists of two different messages with a chosen initial value.In 1997, Xiaoyun Wang first used an innovative method of bit tracing to show how SHA0 could be broken in theory[45]. In 1998, Wang improved this result in conjunction with the message modification technique. In the same year, Chahaud and Joux[24] also published the theoretical analysis result of SHA0 that used the ordinary differential analysis method. The bit tracing method was public in 2005 and it laid the theoretical foundation for the breaking of most MD4family hash functions, including MD5 and SHA1.In past three years, there has been an extensive focus upon design and analysis of hash functions within the cryptographic community. This has been largely catalyzed by some novel work included within this dissertation, first published in [52, 55, 54]. The attacks used to find the "breakthrough" collisions presented at the rump session of Crypto'04 by Wang et. al[49]. In that conference, Wang announced the presence of collisions within for MD4, MD5, HAVAL128 and RIPEMD that was regarded as the most significant cryptanalytic developments of the year. At the same time, Biham gave an attack on 40step SHA1[6] and Joux improved upon the results of Biham and Chen[5] by presenting practical collisions in SHA0 [24].The most important result is the break of MD5 by Xiaoyun Wang and Hongbo Yu. This break highlights serious potential risks in the fields of information and network security and lead to many practical consequences. In 2005, Lenstra et. al used our MD5 hash collisions to construct two X.509 certificates that contain identical signatures and that differ only in the public keys[28, 27]. Later, they improved this result and presented two MD5 based X.509 certificates with identical signatures but different public keys and different Distinguished Name fields[33]. Lucks and Magnus constructed the MD5 "Target Collision" for special file formats such as Postcript[29]. Gebhardt et. al extended the file format to PDF, TIFF and Word 97[22]. The work "Finding collisions by SHA1" by Wang and Yu is another breakthrough in the hash functions history [51]. Recently Wang et. al improved this attack to find a hash collision with a estimated work of 263 operations. The result is plainly within the realm of feasibility for an attacker well equipped with computer resources. In response to the SHA1 vulnerability by Wang et. al, NIST plans to host a public hash competition that is similar to the successful Advanced Encryption Standard (AES) development and selection process[34].In this thesis, we provide new attacks on MD4, 4 and 5Pass HAVAL, and HMAC/NMAC based on HAVAL. It involves six chapters.In Chapter 1, we introduce some definitions and properties for hash functions and describe the structure of MD4family hash functions. We give descriptions of our main targets of analysis in this thesis, being MD4 and HAVAL.In Chapter 2, we describe modular differential attack on hash functions in detail. This method, also called bit tracing method in Chinese, was presented early in 1997 [45] by Xiaoyun Wang, and formalized at Eurocrypt'05. The modular differential is a precise differential that unlike most differential attacks [9], uses integer modular subtraction in conjunction with exclusiveor as a measure of difference. There are four steps in attacking a hash function using the modular differential. The first step is to select an appropriate message difference, which determines the success probability of the attack. The key step of the modular differential attack is to select a feasible differential path according to the selected message difference. This difficult step requires needs intelligent analysis, sophisticated technique, lots of patience and good luck. The third phase is to derive the sufficient conditions that guarantee the feasibility of the differential path. In the process of searching for differential paths, the chaining variable conditions can be determined. A feasible differential path implies that all the chaining variable conditions deduced from the path do not contradict each other. The last step is the message modification which forces the modified messages to satisfy additional sufficient conditions.In Chapter 3, we present the collision attack for the full 4 and 5Pass HAVAL. Previous results on HAVAL suggested only practical collision attacks for 3pass HAVAL. For 4Pass HAVAL, we describe a twoblock collision with complexity 2^{36}. For 4pass HAVAL, we describe a practical attack for finding twoblock collisions with 2^{36} computations. Using the more complex message modification technique, the complexity can be improved to 2^{30}. In addition, we show that collisions for 5pass HAVAL can be found with about 2^{123} computations, which is the first attack more efficient than the birthday attack. This is the best results on the analysis of 4 and 5Pass HAVAL. A 4Pass HAVAL collision example is given in Table 1.In Chapter 4, we give further analysis on the secondpreimage attack on MD4. Ideally, it needs about 2^{128} operations to find a secondpreimage corresponding a given message. This is equivalent in complexity to the bruteforce approach. However for some messages, there exists a more efficient attack than the brute force attack to find their secondpreimages, and these messages are called "weak messages". In Eurocrypt'05[43], Wang et. al gave a fast attack for MD4 using the modular difference method. The differential path used in their attack also can be used to find the secondpreimage of MD4 with a little advantage than the bruteforce attack, and it is the first theoretical attack to the secondprimage on the full MD4. At the same time, Wang pointed out that it is possible to find a better differential path for the secondpreimage attack of MD4. Motivated by Wang's suggestion, in this chapter we find another new collision differential path. Using it, we have proved that any random message is a weak message with probability 2^{56} which has achieved a practical range of break. Furthermore, we can convert any random message into a weak message with only 227 MD4 computations, such that the original message is very close to the resulting weak message with a hamming distance of about 44. In addition, we give a result that shows weak messages can be used for distinguishing, forgery and key recovery attacks on MD4based MACs. The attacks on MD4based HMAC and NMAC developed by Contini et. al[8] and Kim et. al[26] respectively both used the differential path given in this chapter, published by us previously.In Chapter 5, we present a new attack for building multicollisions and multinearcollisions for hash functions directly instead of combining multicollisions from the pairwise collisions. Finding a K (multi)collision means to find K different messages that map to the same hash value. The multinearcollisions is a generalization for the pairwise nearcollisions presented by Biham and Chen[4]. The basic idea of the new attack method is to use t different differential paths to cause a 2^{t}collision of the compression function, and each different path can be found by the modular differential method. For an ideal hash function, finding a Kcollision requires about O(2^{((k1)/k)n} hash computations where n is the length of the hash value. However, using our new attack method, we can find a 4collision for MD4 compression function with complexity 2^{19}. For the 3Pass HAVAL, we can find a 4collision with complexity 2^{28} and a 8nearcollision with complexity 2^{9}. Our multicollision examples are given in Tables 2 and 3.In chapter 6, we analyze the HMAC based on full 3Pass and 4Pass HAVAL. A message authentication code (MAC) can be viewed as a hash functions that takes two functionally distinct inputs, a message and a secret key, and produces a fixedsize output. HMAC is a standardized hashbased MAC algorithm that is widely used as a MAC algorithm and as a pseudorandom function generator [1]. We use two type of distinguishers to analyze the HMAC: differential distinguishers and rectangle distinguishers. In [26], Kim et. al use the rectangle distinguisher to distinguish the HMAC based on 3Pass and reduced 4Pass HAVAL. In our analysis it turns out that differential distinguishers are better suited than rectangle distinguishers to HMAC based on HAVAL. Nevertheless, to facilitate comparison, we give our results for both types. For the differential distinguisher, we introduce two new collision differential paths for 3 and 4Pass HAVAL, with probabilities is 2^{96.26} and 2^{121.27} respectively. Using our new differential paths, we can execute the distinguishing, forgery and key recovery attack on HMAC based on 3 and 4Pass HAVAL. In the case that the output lengths both for inner and outer functions are 256 bits, our forgery attack is of greater advantage than the birthday attack. For the rectangle distinguisher, we also give two nearcollision differential paths for 3 and 4Pass HAVAL with probabilities 2^{64.5}, and 2^{95.5} respectively, these being better than that of Kim et. al's. Because NMAC is a generalized version of HMAC and they have similar structure, all of our attacks on HMAC are applicable to NMAC. The main results of our attack are shown in Table 4.In conclusion, this thesis summarizes the large body of new work that we have performed on breaking hash functions such as MD5 and SHA1. Previous to this work, it was not possible to get real collisions on these hash functions. Consequently we have shown that these hash functions are no longer safe to use. This work has been acknowledged in the cryptographic community, and precipitated NIST to launch a new hash function competition [34].
