In several crypto systems, there are P-boxes and S-boxes. P-boxes are permutations of input to outputs. What are S-boxes ? I’ve known that they are static maps, used for “confusion” of input to output, have something to do with Galois fields, and that small changes in them affect the crypto properties of a cryptosystem; so they are analyzed by cryptographers in great detail. Their design is a bit of a black box. Even in cryptol documentation, they are described as a kind of gift from above.
S-box stands for a substitution box. The goal of the S-box is that for a small change of input there should be a large change in the output – an avalanche. This property is clearly not true of pure transposition, substitution, vignere pad ciphers – where a change of an input letter changes exactly one output letter. Because this property is not true, the simple crypto systems can be broken easily by analyzing output/input pairs.
A P-box can be described by a multiplication of the input (symbols, not sequence) with a matrix which is the identity matrix with its rows transposed (scrambled) according to the permutation order. So the P-box transform is an invertible linear transform. Can the S-box be another linear transform ? Suppose it were linear. One could give it unit input vectors and determine the rows of the transform, or use Gauss elimination with known input/output pairs.
So one would want this to be non-linear. In fact, the AES S-box is a sophisticated permutation map – for each byte another unique byte is mapped in a lookup table . Isn’t a permutation linear ? The key is that the bits of an output byte are not linearly related to the bits of the input byte. Looking at the Substitution-Permutation Network design makes it clear that this property will break Gauss elimination techniques on input/output pairs .
With that, here’s a reference which discusses the finite field math and the transform f_affine( g_finite_field_inverse( input_x ) ) which comprises the S-box design of AES – http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf . The design goal is explicitly to maximize the non-linearity and beat differential attacks which analyze differences of output with input.
The design considers bits as coefficients of a polynomial (1111 = x^3+x^2+x^1+x^0), bytes as degree 7 polynomials and defines an invertible multiplication of two bytes as a map to another byte. Irreducible polynomials over a field are not factorizable over the field. A polynomial presupposes multiplication and addition of elements (multiplication is distinct from addition, and specifically is not repeated addition – adding a number to itself gives zero). For a modulo(prime) finite group, multiplication can be defined using the same modulo operator that defines the group (In G7, 3*5 = 15%7 = 1). A polynomial representation of a byte is a polynomial of (max) degree 7 with coefficients 0 or 1. The multiplication that maps (byte * byte -> byte) can be defined by taking the product polynomial (a polynomial of max degree 14) then taking modulus via a irreducible byte polynomial of degree 7 or higher (in analogy with modulo group above) to give back a polynomial of max degree less than 7, i.e. a byte. Note the modulus used in AES is a specific polynomial of degree 8, m(x), which is not itself a byte. The division is done with polynomial long division. For more on GF(28) see http://www.cs.utsa.edu/~wagner/laws/FFM.html.
The irreducibility of m(x) of degree 8 implies that the GCD of the product of any two bytes p(x)=a(x).b(x) and m(x) is 1, so m(x) does not cleanly divide any p(x), so the division p(x)/m(x) always has a non-zero remainder for non-zero p(x), hence a non-zero product according to above definition of multiplication. If m(x) were degree 7, then x*m(x)%m(x) = x, so x would map to itself, which is not good. The GCD=1 also implies that each element has a multiplicative inverse. The m(x) chosen is a pentanomial with non-zero terms (8,4,3,1,0) or 100011011 or 0x11B. This design yields modulus math on bytes. So you can raise a byte to the power of n and get back another (well-scrambled) byte.
What a cool secret language. Raising to a power is non-linear. But why choose n=-1 ?
The choice of n=-1 is described in the Rijndael paper here with the reference to Nyberg’94 paper here which carries the line “The author’s attention to the mapping x ->x^-1 was drawn by C. Carlet. He observed that the high nonlinearity property (i) was actually proven in the work of Carlitz and Uchiyama.” The non-linearity is aimed at defeating linear cryptanalysis where a few input-output pairs can be used to guess the key. The design of the scheme in Nyberg paper is aimed at defeating differential cryptanalysis, by flattening out the variation in the output, so it is close to noise.
Yet, if the letter “s” is common, the corresponding S-box letter (from the lookup table) will also be common. This S-box must be only part of an encryption scheme.
The Nyberg paper has the following line: “However,the high nonlinear order of the inversion mapping and .. comes into effect if these mappings are combined with appropriately chosen linear or affine permutations which may vary from round to round and depend on the secret key.” A counter paper is here –On Exact Algebraic [Non-]Immunity of S-boxes Based on Power Function
A clear explanation of nonlinearity of a function F as the minimum hamming distance between bytes reachable by linear functions and bytes reachable by the non-linear function F is found in The Design Of S-Boxes by Jennifer Miuling Cheung (2010) p.13. For two input bits, there are 8 linear functions (1, x1, x2, x1+1, x2+1, x1+x2, x1+x2+1) and 4 input sequences of 00,01,10,11; the output of each of the 8 functions on the 4 input sequences gives 8 output vectors of length 4. But there are 16 vectors of length 4; of these the linear functions reach only 8, the remaining 8 unreachable by linear functions are are termed nonlinear.
This paper, discusses constant time implementations of AES S-Boxes. Instead of a lookup table, they rely on doing the actual multiplication in the finite field: “During the offline phase, we precompute values H,X ·H,X2 ·H,…,X127 ·H. Based on this precomputation, multiplication of an element D with H can be computed using a series of xors conditioned on the bits of D”