Hash function
A hash function takes data of arbitrary size and maps it to fixed-size values. These returned values are called hash codes, digests, or simply hashes. The primary role is to index a fixed-size table known as a hash table. This process allows for data storage and retrieval applications to access information in nearly constant time per retrieval. Such efficiency requires only fractionally more storage space than the total space needed for the records themselves. Unlike lists or trees, this method provides near-constant access speed while using much less memory for large keys.
When multiple keys map to the same index, collision resolution becomes necessary. Chained hashing places each slot at the head of a linked list where colliding items form a chain. These chains may be kept in random order or searched linearly to find specific entries. Open address hashing probes the table starting from an occupied slot until an empty one appears. Linear probing, quadratic probing, and double hashing serve as common methods for finding open slots. If no slot exists, the entire table has been probed and the item remains unadded.
Standard hashing prioritizes computational speed over security features like integrity checking. Cryptographic hash functions differ by securing sensitive data such as passwords on servers. Integrity checking ensures identical files produce matching values to detect modifications. Key derivation relies on minor input changes resulting in random-looking output alterations known as diffusion. Message authentication codes integrate confidential keys with input data to ensure genuineness. Password storage benefits because the hashed value does not expose any original password details.
Division-based implementations use modulo functions selecting prime divisors close to table size. Multiplicative hashing employs formulas involving multipliers that are relatively prime to the modulus. Fibonacci hashing uses the golden ratio approximately 1.618 as a multiplier for uniform distribution. Zobrist hashing assigns unique random numbers to represent pieces on chess boards for game-playing programs. Algebraic coding divides bits using polynomial arithmetic modulo two instead of standard integers. These techniques vary in complexity from bitwise folding to complex hardware microcode implementation.
Hans Peter Luhn of IBM appears to have been the first to use the concept of a hash function in a memo dated January 1953. The term itself did not appear in published literature until the late 1960s in Herbert Hellerman's Digital Computer System Principles. Donald Knuth notes this timeline while researching the precise origin of the terminology. The word offers a natural analogy with its non-technical meaning of chopping up or making a mess out of something. This reflects how these functions scramble input data to derive their output values.
Up Next
Common questions
What is a hash function and how does it map data?
A hash function takes data of arbitrary size and maps it to fixed-size values. These returned values are called hash codes, digests, or simply hashes. The primary role is to index a fixed-size table known as a hash table.
When did Hans Peter Luhn first use the concept of a hash function?
Hans Peter Luhn of IBM appears to have been the first to use the concept of a hash function in a memo dated January 1953. The term itself did not appear in published literature until the late 1960s in Herbert Hellerman's Digital Computer System Principles.
How do cryptographic hash functions differ from standard hashing methods?
Standard hashing prioritizes computational speed over security features like integrity checking. Cryptographic hash functions differ by securing sensitive data such as passwords on servers. Integrity checking ensures identical files produce matching values to detect modifications.
Which specific techniques does Fibonacci hashing employ for uniform distribution?
Fibonacci hashing uses the golden ratio approximately 1.618 as a multiplier for uniform distribution. Division-based implementations use modulo functions selecting prime divisors close to table size. Multiplicative hashing employs formulas involving multipliers that are relatively prime to the modulus.
What happens when multiple keys map to the same index in a hash table?
When multiple keys map to the same index, collision resolution becomes necessary. Chained hashing places each slot at the head of a linked list where colliding items form a chain. Open address hashing probes the table starting from an occupied slot until an empty one appears.