SHA-3

SHA-3
(Keccak)
General
Designers	Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles van Assche.
First published	2016; 8 years ago
Series	(SHA-0), SHA-1, SHA-2, SHA-3
Certification	FIPS PUB 202
Detail
Digest sizes	arbitrary
Structure	sponge construction
Speed	12.6 cpb on a typical x86-64-based machine for Keccak-f[1600] plus XORing 1024 bits, which roughly corresponds to SHA2-256.
Best public cryptanalysis
	Preimage attack on Keccak-512 reduced to 8 rounds, requiring 2511.5 time and 2508 memory. Zero-sum distinguishers exist for the full 24-round Keccak-f[1600], though they cannot be used to attack the hash function itself

Secure Hash Algorithms
Secure Hash Algorithms
Concepts
	hash functions, SHA, DSA
Main standards
	SHA-0, SHA-1, SHA-2, SHA-3
	v; t; e;

SHA-3 (Secure Hash Algorithm 3) is the latest^[4] member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015.^[5]^[6]^[7] Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

SHA-3 is a subset of the broader cryptographic primitive family Keccak (/ˈkɛtʃæk/ or /ˈkɛtʃɑːk/),^[8]^[9] designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche, building upon RadioGatún. Keccak's authors have proposed additional uses for the function, not (yet) standardized by NIST, including a stream cipher, an authenticated encryption system, a "tree" hashing scheme for faster hashing on certain architectures,^[10]^[11] and AEAD ciphers Keyak and Ketje.^[12]^[13]

Keccak is based on a novel approach called sponge construction.^[14] Sponge construction is based on a wide random function or random permutation, and allows inputting ("absorbing" in sponge terminology) any amount of data, and outputting ("squeezing") any amount of data, while acting as a pseudorandom function with regard to all previous inputs. This leads to great flexibility.

As of 2007, NIST did not plan to withdraw SHA-2 or remove it from the revised Secure Hash Standard.^{[needs update?]} The purpose of SHA-3 is that it can be directly substituted for SHA-2 in current applications if necessary, and to significantly improve the robustness of NIST's overall hash algorithm toolkit.^[15]

For small message sizes, the creators of the Keccak algorithms and the SHA-3 functions suggest using the faster function KangarooTwelve with adjusted parameters and a new tree hashing mode without extra overhead.

History[edit]

The Keccak algorithm is the work of Guido Bertoni, Joan Daemen (who also co-designed the Rijndael cipher with Vincent Rijmen), Michaël Peeters, and Gilles Van Assche. It is based on earlier hash function designs PANAMA and RadioGatún. PANAMA was designed by Daemen and Craig Clapp in 1998. RadioGatún, a successor of PANAMA, was designed by Daemen, Peeters, and Van Assche, and was presented at the NIST Hash Workshop in 2006.^[16] The reference implementation source code was dedicated to public domain via CC0 waiver.^[17]

In 2006, NIST started to organize the NIST hash function competition to create a new hash standard, SHA-3. SHA-3 is not meant to replace SHA-2, as no significant attack on SHA-2 has been publicly demonstrated ^{[needs update]}. Because of the successful attacks on MD5, SHA-0 and SHA-1,^[18]^[19] NIST perceived a need for an alternative, dissimilar cryptographic hash, which became SHA-3.

After a setup period, admissions were to be submitted by the end of 2008. Keccak was accepted as one of the 51 candidates. In July 2009, 14 algorithms were selected for the second round. Keccak advanced to the last round in December 2010.^[20]

During the competition, entrants were permitted to "tweak" their algorithms to address issues that were discovered. Changes that have been made to Keccak are:^[21]^[22]

The number of rounds was increased from 12 + ℓ to 12 + 2ℓ to be more conservative about security.
The message padding was changed from a more complex scheme to the simple 10^*1 pattern described below.
The rate r was increased to the security limit, rather than rounding down to the nearest power of 2.

On October 2, 2012, Keccak was selected as the winner of the competition.^[8]

In 2014, the NIST published a draft FIPS 202 "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions".^[23] FIPS 202 was approved on August 5, 2015.^[24]

On August 5, 2015, NIST announced that SHA-3 had become a hashing standard.^[25]

Weakening controversy[edit]

In early 2013 NIST announced they would select different values for the "capacity", the overall strength vs. speed parameter, for the SHA-3 standard, compared to the submission.^[26]^[27] The changes caused some turmoil.

The hash function competition called for hash functions at least as secure as the SHA-2 instances. It means that a d-bit output should have d/2-bit resistance to collision attacks and d-bit resistance to preimage attacks, the maximum achievable for d bits of output. Keccak's security proof allows an adjustable level of security based on a "capacity" c, providing c/2-bit resistance to both collision and preimage attacks. To meet the original competition rules, Keccak's authors proposed c = 2d. The announced change was to accept the same d/2-bit security for all forms of attack and standardize c = d. This would have sped up Keccak by allowing an additional d bits of input to be hashed each iteration. However, the hash functions would not have been drop-in replacements with the same preimage resistance as SHA-2 any more; it would have been cut in half, making it vulnerable to advances in quantum computing, which effectively would cut it in half once more.^[28]

In September 2013, Daniel J. Bernstein suggested on the NIST hash-forum mailing list^[29] to strengthen the security to the 576-bit capacity that was originally proposed as the default Keccak, in addition to and not included in the SHA-3 specifications.^[30] This would have provided at least a SHA3-224 and SHA3-256 with the same preimage resistance as their SHA-2 predecessors, but SHA3-384 and SHA3-512 would have had significantly less preimage resistance than their SHA-2 predecessors. In late September, the Keccak team responded by stating that they had proposed 128-bit security by setting c = 256 as an option already in their SHA-3 proposal.^[31] Although the reduced capacity was justifiable in their opinion, in the light of the negative response, they proposed raising the capacity to c = 512 bits for all instances. This would be as much as any previous standard up to the 256-bit security level, while providing reasonable efficiency,^[32] but not the 384-/512-bit preimage resistance offered by SHA2-384 and SHA2-512. The authors stated that "claiming or relying on security strength levels above 256 bits is meaningless".

In early October 2013, Bruce Schneier criticized NIST's decision on the basis of its possible detrimental effects on the acceptance of the algorithm, saying:

There is too much mistrust in the air. NIST risks publishing an algorithm that no one will trust and no one (except those forced) will use.^[33]

He later retracted his earlier statement, saying:

I misspoke when I wrote that NIST made "internal changes" to the algorithm. That was sloppy of me. The Keccak permutation remains unchanged. What NIST proposed was reducing the hash function's capacity in the name of performance. One of Keccak's nice features is that it's highly tunable.^[33]

Paul Crowley, a cryptographer and senior developer at an independent software development company, expressed his support of the decision, saying that Keccak is supposed to be tunable and there is no reason for different security levels within one primitive. He also added:

Yes, it's a bit of a shame for the competition that they demanded a certain security level for entrants, then went to publish a standard with a different one. But there's nothing that can be done to fix that now, except re-opening the competition. Demanding that they stick to their mistake doesn't improve things for anyone.^[34]

There was some confusion that internal changes may have been made to Keccak, which were cleared up by the original team, stating that NIST's proposal for SHA-3 is a subset of the Keccak family, for which one can generate test vectors using their reference code submitted to the contest, and that this proposal was the result of a series of discussions between them and the NIST hash team.^[35]

In response to the controversy, in November 2013 John Kelsey of NIST proposed to go back to the original c = 2d proposal for all SHA-2 drop-in replacement instances.^[36] The reversion was confirmed in subsequent drafts^[37] and in the final release.^[5]

Design[edit]

Illustration of the sponge construction — The sponge construction for hash functions. *P_i* are input, *Z_i* are hashed output. The unused "capacity" c should be twice the desired resistance to collision or preimage attacks.

SHA-3 uses the sponge construction,^[14] in which data is "absorbed" into the sponge, then the result is "squeezed" out. In the absorbing phase, message blocks are XORed into a subset of the state, which is then transformed as a whole using a permutation function $f$ . (Calling $f$ a permutation may be confusing. It is technically a permutation of the state space, thus a permutation of a set with $2^{1600}\approx 4.4\cdot 10^{481}$ elements, but it does more than merely permute the bits of the state vector.^{[citation needed]}) In the "squeeze" phase, output blocks are read from the same subset of the state, alternated with the state transformation function $f$ . The size of the part of the state that is written and read is called the "rate" (denoted $r$ ), and the size of the part that is untouched by input/output is called the "capacity" (denoted $c$ ). The capacity determines the security of the scheme. The maximum security level is half the capacity.

Given an input bit string $N$ , a padding function $pad$ , a permutation function $f$ that operates on bit blocks of width $b$ , a rate $r$ and an output length $d$ , we have capacity $c=b-r$ and the sponge construction $Z={\text{sponge}}[f,pad,r](N,d)$ , yielding a bit string $Z$ of length $d$ , works as follows:^[6]^: 18

pad the input N using the pad function, yielding a padded bit string P with a length divisible by $r$ (such that $n={\text{len}}(P)/r$ is an integer)
break P into n consecutive r-bit pieces P₀, ..., P_n−1
initialize the state S to a string of b zero bits
absorb the input into the state: for each block P_i:
- extend P_i at the end by a string of c zero bits, yielding one of length b
- XOR that with S
- apply the block permutation f to the result, yielding a new state S
initialize Z to be the empty string
while the length of Z is less than d:
- append the first r bits of S to Z
- if Z is still less than d bits long, apply f to S, yielding a new state S
truncate Z to d bits

The fact that the internal state S contains c additional bits of information in addition to what is output to Z prevents the length extension attacks that SHA-2, SHA-1, MD5 and other hashes based on the Merkle–Damgård construction are susceptible to.

In SHA-3, the state S consists of a 5 × 5 array of w-bit words (with w = 64), b = 5 × 5 × w = 5 × 5 × 64 = 1600 bits total. Keccak is also defined for smaller power-of-2 word sizes w down to 1 bit (total state of 25 bits). Small state sizes can be used to test cryptanalytic attacks, and intermediate state sizes (from w = 8, 200 bits, to w = 32, 800 bits) can be used in practical, lightweight applications.^[12]^[13]

For SHA3-224, SHA3-256, SHA3-384, and SHA3-512 instances, r is greater than d, so there is no need for additional block permutations in the squeezing phase; the leading d bits of the state are the desired hash. However, SHAKE128 and SHAKE256 allow an arbitrary output length, which is useful in applications such as optimal asymmetric encryption padding.

Padding[edit]

To ensure the message can be evenly divided into r-bit blocks, padding is required. SHA-3 uses the pattern 10^*1 in its padding function: a 1 bit, followed by zero or more 0 bits (maximum $r - 1$ ) and a final 1 bit.

The maximum of $r - 1$ zero bits occurs when the last message block is $r - 1$ bits long. Then another block is added after the initial 1 bit, containing $r - 1$ zero bits before the final 1 bit.

The two 1 bits will be added even if the length of the message is already divisible by r.^[6]^: 5.1 In this case, another block is added to the message, containing a 1 bit, followed by a block of $r - 2$ zero bits and another 1 bit. This is necessary so that a message with length divisible by r ending in something that looks like padding does not produce the same hash as the message with those bits removed.

The initial 1 bit is required so messages differing only in a few additional 0 bits at the end do not produce the same hash.

The position of the final 1 bit indicates which rate r was used (multi-rate padding), which is required for the security proof to work for different hash variants. Without it, different hash variants of the same short message would be the same up to truncation.

The block permutation[edit]

The block transformation f, which is Keccak-f[1600] for SHA-3, is a permutation that uses XOR, AND and NOT operations, and is designed for easy implementation in both software and hardware.

It is defined for any power-of-two word size, $w = 2 ℓ$ bits. The main SHA-3 submission uses 64-bit words, $ℓ = 6$ .

The state can be considered to be a $5 \times 5 \times w$ array of bits. Let $a [i][j][k]$ be bit $(5 i + j) \times w + k$ of the input, using a little-endian bit numbering convention and row-major indexing. I.e. $i$ selects the row, $j$ the column, and $k$ the bit.

Index arithmetic is performed modulo 5 for the first two dimensions and modulo w for the third.

The basic block permutation function consists of $12 + 2 ℓ$ rounds of five steps:

$θ$ (theta)

Compute the parity of each of the

5 w

(320, when

w = 64

) 5-bit columns, and exclusive-or that into two nearby columns in a regular pattern. To be precise,

a [i][j][k] \leftarrow a [i][j][k] \oplus parity(a[0...4][j -1][k]) \oplus parity(a[0...4][j +1][k -1])

$ρ$ (rho)

Bitwise rotate each of the 25 words by a different triangular number 0, 1, 3, 6, 10, 15, .... To be precise, a[0][0] is not rotated, and for all

0 \leq t < 24

,

a [i][j][k] \leftarrow a [i][j][k -(t +1)(t +2)/2]

, where

{\begin{pmatrix}i\\j\end{pmatrix}}={\begin{pmatrix}3&2\\1&0\end{pmatrix}}^{t}{\begin{pmatrix}0\\1\end{pmatrix}}

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]