— CH. 1 · DEFINING CORE PROPERTIES —

Cryptographic hash function

~3 min read · Ch. 1 of 5

5 sections

A cryptographic hash function takes a string of any length and produces a fixed-length output. This process must withstand all known types of cryptanalytic attack to remain secure in theoretical cryptography. The first requirement is pre-image resistance, which means finding an input that matches a given hash value should be difficult. A second requirement is second pre-image resistance, where it remains hard to find a different input matching the same hash when one message is already known. The third pillar is collision resistance, making it infeasible to find two distinct messages with identical hashes. These properties ensure that malicious adversaries cannot replace or modify data without changing its digest. If two strings share the same digest, they are likely identical. Collision resistance implies second pre-image resistance but does not guarantee pre-image resistance on its own.
Ronald Rivest designed MD5 in 1991 to replace an earlier function called MD4. It was specified as RFC 1321 in 1992 and produced a 128-bit digest. SHA-0 emerged from the U.S. Government's Capstone project and appeared under FIPS PUB 180 in 1993 before being withdrawn by the NSA. The revised version, SHA-1, arrived in 1995 with a 160-bit output. NIST released SHA-2 in 2001, creating algorithms like SHA-256 and SHA-512 using the Merkle-Damgård structure. On the 5th of August 2015, NIST published SHA-3, which relies on the Keccak algorithm developed by Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Assche. BLAKE2 followed on the 21st of December 2012, created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein. Jack O'Connor and his team announced BLAKE3 on the 9th of January 2020, reducing rounds from ten to seven for higher parallelism.
Collisions against MD5 can be calculated within seconds, rendering it unsuitable for most cryptographic uses. In August 2004, researchers found collisions in several popular functions including MD5 and RIPEMD-128. Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu demonstrated these weaknesses that year. Joux et al. produced a collision for full SHA-0 on the 12th of August 2004, taking about 80,000 CPU hours on a supercomputer with 256 Itanium 2 processors. An attack reported in February 2005 could find SHA-1 collisions in roughly 2^69 operations instead of the expected 2^80. Google announced a practical collision in SHA-1 in February 2017 using the shattered attack method. A successful practical attack broke MD5 used within certificates for Transport Layer Security in 2008. These failures called into question the security of stronger algorithms derived from weak foundations like SHA-1 and RIPEMD-160.
Verifying message integrity involves comparing hash digests before and after transmission to detect changes. Websites often publish MD5 or SHA-1 digests to allow verification of downloaded files retrieved via file sharing systems. Digital signature schemes require cryptographic hashes calculated over messages to create secure authentication methods. Password verification relies on storing only the hash digest rather than cleartext passwords to prevent massive breaches if files are compromised. Systems use key derivation functions like PBKDF2, scrypt, or Argon2 to slow brute-force attacks on stored password hashes. Proof-of-work systems use partial hash inversions to deter denial-of-service attacks and spam on networks. Bitcoin mining uses these mechanisms to unlock rewards by finding messages whose hash begins with specific zero bits. Source code management tools like Git and Mercurial use sha1sum values to uniquely identify content and directory trees.
Most classical hash functions including SHA-1 and MD5 utilize the Merkle-Damgård construction to process arbitrary-length inputs. This method breaks input data into equally sized blocks operating sequentially using a one-way compression function. The last block must be unambiguously length padded to ensure security within this structure. Narrow-pipe designs where output size equals internal state size cause inherent flaws like length-extension and multicollisions. Modern functions employ wide-pipe constructions with larger internal states ranging from tweaks to new sponge designs. Keccak, selected as SHA-3, uses a cryptographic sponge instead of block-cipher-like components found in earlier algorithms. BLAKE3 operates internally as a Merkle tree supporting higher degrees of parallelism than its predecessor BLAKE2. Hash functions based on block ciphers often resemble encryption modes but use large keys designed for hashing resistance to related-key attacks.

Up Next

Common questions

What are the three main security requirements of a cryptographic hash function?

A cryptographic hash function must provide pre-image resistance, second pre-image resistance, and collision resistance. These properties ensure that malicious adversaries cannot replace or modify data without changing its digest.

When was MD5 designed by Ronald Rivest and what output size does it produce?

Ronald Rivest designed MD5 in 1991 to replace an earlier function called MD4. It produced a 128-bit digest and was specified as RFC 1321 in 1992.

Who developed SHA-3 and when did NIST publish it on the 5th of August 2015?

NIST published SHA-3 on the 5th of August 2015 which relies on the Keccak algorithm developed by Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Assche.

Why is MD5 considered unsuitable for most cryptographic uses after 2004?

Collisions against MD5 can be calculated within seconds rendering it unsuitable for most cryptographic uses. Researchers found collisions in several popular functions including MD5 and RIPEMD-128 in August 2004.

How do Bitcoin mining systems use hash functions to unlock rewards?

Bitcoin mining uses partial hash invocations to deter denial-of-service attacks and spam on networks. These mechanisms allow miners to find messages whose hash begins with specific zero bits to unlock rewards.

See all questions about Cryptographic hash function →

All sources

25 references cited across the entry

1inlinehttps://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/key-usage-in-integrated-firmware-images.html
2inlinehttps://www.codecademy.com/resources/blog/what-is-hashing
3inlinehttps://elections.ny.gov/testing-and-review-process
4webmessage digestNIST
5webCryptanalysis of MD5 and SHA: Time for a New StandardBruce Schneier
6webFlickr's API Signature Forgery VulnerabilityThai Duong et al.
7magazineUse MD5 hashes to verify software downloadsChad Perrin — December 5, 2007
8webFile Hashing
9journalDesign Principles for Iterated Hash FunctionsStefan Lucks — 2004
10conferenceA Framework for Iterative Hash Functions – HAIFAEli Biham et al. — 24 August 2006
11reportSecurity Evaluation of SHA-224, SHA-512/224, and SHA-512/256Christoph Dobraunig et al. — February 2015
12webMore Problems with Hash FunctionsHal Finney — August 20, 2004
13inlineAndrew Regenscheid, Ray Perlner, Shu-Jen Chang, John Kelsey, Mridul Nandi, Souradyuti Paul, Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition
14inlineXiaoyunWang, Dengguo Feng, Xuejia Lai, Hongbo Yu, Collisions for Hash Functions MD4, MD5, HAVAL-128, and RIPEMD
15citationCryptographic Hash FunctionImad Fakhri Alshaikhli et al. — IGI Global — 2015
16inlineXiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu, "Finding Collisions in the Full SHA-1 ".
17webCryptanalysis of SHA-1Bruce Schneier — February 18, 2005
18newsGoogle Just 'Shattered' An Old Crypto Algorithm – Here's Why That's Big For Web SecurityThomas Brewster — Feb 23, 2017
19webRandomized Hashing and Digital SignaturesShai Halevi
20webMD5 considered harmful today: Creating a rogue CA certificateA Sotirov — Department of Mathematics and Computer Science of Eindhoven University of Technology — December 30, 2008
21newsThe 15 biggest data breaches of the 21st centuryDan Swinhoe — CSO Magazine — April 17, 2020
22web25-GPU cluster cracks every standard Windows password in <6 hoursDan Goodin — Ars Technica — 2012-12-10
23webUse an 8-char Windows NTLM password? Don't. Every single one can be cracked in under 2.5hrsThomas Claburn — February 14, 2019
24webMind-blowing development in GPU performanceImprosec — January 3, 2020
25bookSP 800-63B-3 – Digital Identity Guidelines, Authentication and Lifecycle ManagementGrassi Paul A. — NIST — June 2017

Cryptographic hash function

1. Defining Core Properties

2. Historical Algorithm Evolution

3. Documented Security Failures

4. Practical Implementation Uses

5. Internal Construction Methods

Up Next

Common questions

All sources