HASH FUNCTIONS: THEORY, ATTACKS AND APPLICATIONS.

6 min readApr 13, 2020

What is Hashing?

A hash is a value computed from a base input number using a hashing function.

Shortly, the hash value is a summary of the original data. For instance, think of a paper document that you keep crumpling to a point where you aren’t even able to read its content anymore. It’s almost (in theory) impossible to restore the original input without knowing what the starting data was.

Let’s look at an example of a hashing algorithm:

Hashing Algorithm

A hashing algorithm is a cryptographic hash function. It is a mathematical algorithm that maps data of arbitrary size to a hash of a fixed size. It’s designed to be a one-way function, infeasible to invert. However, in recent years several hashing algorithms have been compromised. This happened to MD5, for example – a widely known hash function designed to be a cryptographic hash function, which is now so easy to reverse – that we could only use for verifying data against unintentional corruption.

It’s easy to figure out what the ideal cryptographic hash function should be like:

It should be fast to compute the hash value for any kind of data.

2. It should be impossible to regenerate a message from its hash value (brute force attack as the only option).

3. It should avoid hash collisions, each message has its own hash.

4. Every change to a message, even the smallest one, should change the hash value. It should be completely different. It’s called the avalanche effect.

What do we use it for?

Cryptographic hash functions are used notably in IT. We can use them for digital signatures, message authentication codes (MACs), and other forms of authentication. We can also use them for indexing data in hash tables, for fingerprinting, identifying files, detecting duplicates or as checksums (we can detect if a sent file didn’t suffer accidental or intentional data corruption).

Popular Hashing Algorithms

1. MD5

It’s one of the most widely known. It was used for many years and is still widely used – but despite initially being designed to be used as a cryptographic algorithm function, due to extensive vulnerabilities, it has been compromised.

We already know that the secure hashing algorithm can’t allow collisions, and in MD5 it’s fairly easy to manipulate a document by injecting malicious code while still getting the same hash! One of the things that killed MD5 was its popularity. It was used so much, that the best tool for cracking MD5 hashes is now… Google. Typing the hash in the search box, you’ll receive its before-state within milliseconds!

Now let’s look at this example:

You could think you are secure if your passwords are stored as MD5 hashes, but if somebody gets access to your database, he/she can just type the hash to Google and get its real value!

2. SHA-family

SHA-1 (1995) produces a 160-bit (20-byte) hash value. It’s typically rendered as a 40 digits long hexadecimal number. SHA-1 was built on principles similar to those used in the design of the MD4 and MD5.

Safer, for now, is SHA-2. SHA-2 includes several important changes. Its family has six hash functions with digests: SHA-224, SHA-256 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.

As a bottom line, SHA-2 is a lot more complicated and is still considered safe. However, SHA-2 shares the same structure and mathematical operations as its predecessor (SHA-1) – so it’s likely that it will be compromised in the near future. As so, a new option for the future is SHA-3.

One of SHA-3's requirements was to be resilient to potential attacks that could compromise SHA-2.

Generic Attacks

A generic attack is an attack that applies to all hash functions, no matter how good they are, as opposed to specific attacks that exploit flaws of a particular design. The running time of generic attacks is measured in the number of calls to the hash function, which is treated as a black box.

It is not difficult to see that in the black box model the best strategy for inverting the hash function and for finding a second preimage is the exhaustive search. In the black-box model the problem of finding a second preimage is just as hard as inverting the hash function.

Finding collisions is a different story, the one that goes under the name of the “birthday paradox.” The chances that among 23 randomly chosen people there are two who share the same birthday are almost 51%. The better than even odds appear to be much higher than the intuition would suggest, hence the name. Of course, there is nothing paradoxical (as in being absurd or self-contradictory) about the fact – between 23 people there are 253 pairs, each of which has a one-to-365 odds of being a hit. The reason why the result appears counter-intuitive is because your chances of finding among 22 people somebody with the same birthday as yours (not just someone else’s) to split the cost of the birthday party are strongly against you. This explains why inverting a hash function is much more difficult than finding a collision.

Applications of Hashing

In this article we will be discussing of applications of hashing.

Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of the most used data structure, example problems are, distinct elements, counting frequencies of items, finding duplicates, etc.

There are many other applications of hashing, including modern day cryptography hash functions. Some of these applications are listed below:

• Message Digest

• Password Verification

• Data Structures(Programming Languages)

• Compiler Operation

• Rabin-Karp Algorithm

Linking File name and path together

Message Digest:

This is an application of cryptographic Hash Functions. Cryptographic hash functions are the functions which produce an output from which reaching the input is close to impossible. This property of hash functions is called irreversibility.

2. Password Verification:

Cryptographic hash functions are very commonly used in password verification.

Let’s understand this using an Example:

When you use any online website which requires a user login, you enter your E-mail and password to authenticate that the account you are trying to use belongs to you. When the password is entered, a hash of the password is computed which is then sent to the server for verification of the password. The passwords stored on the server are actually computed hash values of the original passwords. This is done to ensure that when the password is sent from client to server, no sniffing is there.

3. Data Structures(Programming Languages):

Various programming languages have hash table based Data Structures. The basic idea is to create a key-value pair where key is supposed to be a unique value, whereas value can be same for different keys. This implementation is seen in unordered_set & unordered_map in C++, HashSet & HashMap in java, dict in python etc.

4. Compiler Operation:

The keywords of a programming language are processed differently than other identifiers. To differentiate between the keywords of a programming language(if, else, for, return etc.) and other identifiers and to successfully compile the program, the compiler stores all these keywords in a set which is implemented using a hash table.

5. Rabin-Karp Algorithm:

One of the most famous applications of hashing is the Rabin-Karp algorithm. This is basically a string-searching algorithm which uses hashing to find any one set of patterns in a string. A practical application of this algorithm is detecting plagiarism.

6. Linking File name and path together:

When moving through files on our local system, we observe two very crucial components of a file i.e. file_name and file_path. In order to store the correspondence between file_name and file_path the system uses a map(file_name, file_path)which is implemented using a hash table.

HASH FUNCTIONS: THEORY, ATTACKS AND APPLICATIONS.

Written by Viraj Mutha