From: Maaartin on
On Jul 27, 4:22 pm, Tran Ngoc Duong <tranngocdu...(a)gmail.com> wrote:
> A TeX version will come later. When we write in TeX, we must surely
> take the opportunity to re-typeset it with standard math symbols
> (rather than C language ones). For now I can only offer a  pure-ASCII
> version, for which we've decided because we thought there aren't many
> math stuffs in it. Thanks.

Whatever.... I refuse to read your first post because of the
adacrypted line breaks. Just put somewhere a simple ASCII file without
mangled lines, it would be a great progress.
From: Francois Grieu on
[reposted with line wrap hopefully fixed - no endorsement implied]

THE BLOCK CIPHER NSABC
(Public domain)

Alice Nguyenova-Stepanikova

Tran Ngoc Duong <tranngocduong at gmail dot com> (corresponding author)

27th July, 2010


Abstract. We introduce NSABC/w (Near-Skipjack Algebraic Block Cipher
operating on w-bit word arithmetics), a 4w-bit analogous of Skipjack
[NSA98] with 5w-bit key. The Skipjack's inner 4-round Feistel structure is
replaced with a w-bit, 2-round cascade of the binary operation
(x,z) |-> (2*x*z + (1-2*e)*(x-z+e)) <<< (w/2), partially encrypting a data
word x under control of a key word z and a round-dependent constant word e,
allowing efficient decryption. Furthermore we introduce an optional secret
4w-bit "tweak", i.e. a randomization parameter, to facilitate applications
such as disk encryption and hashing, that is in turn derrived from the
block index and a 4w-bit [additional] key. With w=64, on a modern 64-bit
processor, encryption takes about 5 clock cycles per byte and, when tweaking
is unused, only 4.4 clock cycles per byte.




1. Introduction

In the today's world full of cryptographic algorithms, one may cast question:
"What should makes a block cipher attractive to cryptographers?"

It is the authors' view that the answer is one word: elegance. If something
looks elegant, there is a good chance that it is also good.

An elegant specification makes it easier to memorize. Memorizability makes it
easier to analyze, thus allows for fruitful cryptanalytic results. Results make
greater confidence in the algorithm.

The elegance comprises the following features:

-- Few algebraic operations. Using of many operations such as tables of
constants, S-boxes, data-dependent transpositions, data-dependent operator
selections etc results in "wild", hardly-tractable and possibly undesirable
interactions between operations.

-- Simple and regular key schedule. A complex key schedule, such as bit
transpositions, linear pseudo-random number generators, non-linear operations
etc, which effectively adds another, complicated and un-related, function to the
cipher, results in hardly-tractable and possibly undesirable interactions
between the functions.

IDEA, a secure block cipher designed by Xuejia Lai and James L. Massey [LM91],
besides being elegant, is outstanding for other features:

-- The use of "incompatible" operations. The incompatibility helps to exclude
any [undesired] algebraic property and makes it impossible to solve the cipher
algebraically.

-- The use of multiplication. Multiplication is more powerful than addition and
thus greatly contributes to the security of the algorithm.

However, IDEA uses multiplication modulo p, where p is a prime of the
form 2**w
+ 1 which does not exist for w=32 or w=64, making it not extendable to larger
machine word lengths. Furthermore, the key schedule uses a rotation on the
[primary] key, making it irregular as some key bits are re-used much more slowly
than others.

Skipjack, a secure block cipher designed by cryptographers of the U.S. National
Security Agency [NSA98], is another example of elegant design. Furthermore
Skipjack is outstanding for the following feature:

-- The use of two ciphers, an *outer* (wrapper) cipher, and an *inner* (core)
cipher. As explained by the authors of MARS [IBM98], a block cipher with similar
design principle, this two-layer structure breaks any iterative property, makes
any iterative characteristic impossible, and complements eventual weakness(s) in
either layer by the other one, thus making attacks more difficult. The wrapper
is designed for fast diffusion and the core for strong confusion. Here
*diffusion* refers to the process of making one data or key bit to affect many
data bits in very different ways, *confusion* refers to the process of making
one data bit to be affected by many data or key bits in very different ways. The
terms are first used by Claude E. Shannon in his pioneer work [Shannon49].

However, Skipjack uses S-boxes, making it slow, hard to implement in a secure
manner, and not extendable to larger machine word lengths, as such.

This article describes an atempt to combine the ideas learned from IDEA and
Skipjack, where Skipjack is used as a structural framework, into a scalable and
tweakable block cipher called NSABC (Near-Skipjack Algebraic Block Cipher).

In cipher design there is always a trade-off between security and efficiency,
and designers always have to ask "What the users want, a very strong and
relatively fast cipher, or relatively strong but very fast?"

If Skipjack is regarded as very secure but not very fast, then NSABC may be
regarded as an atempt to re-use Skipjack with stress on the second aspect: make
it very fast, but probably not very secure. For 64-bit word length, NSABC is
designed to use a 320-bit key and an additional 256-bit "tweak" key, but the
true level of security is yet to be determined. On the other hand, it takes only
5 clock cycles per byte (and only 4.4 clock cycles per byte with disabled
tweaking) on a modern 64-bit processor.

NSABC is not patented and is available for public use. As it re-uses the
[overall] structure of Skipjack, excepting some [cosmetical] word re-numbering,
eventual users should be aware of Skipjack patent(s) which may be held by the
U.S. Government. We are not aware of eventual patent(s) related to other parts
of the design.

The rest of the article is organized as follows. Section 2 defines operations
over machine words, mainly a multiplication-based group operation over w-bit
numbers. Section 3 specifies the algorithm. Section 4 suggests an efficient
implementation. Section 5 concludes the article.




2. Operation over w-bit words

Let +, -, *, ~, ^ denote two-complement arithmetic/logic operations with the
conventional (C programming languague) meanings.

Let's define binary operations "." and "o" as follows.

x . y = 2*x*y + x + y

x o y = 2*x*y + x - y

Note that the bivariate polynomials x.y and x o y are permutation polynomials on
either variable for every fixed value of the other variable [Rivest99]. In other
words, "." and "o" are quasi-group operations.

Furthermore, it is easy to see that "." is an abelian group operation over the
set of w-bit numbers. (Here and later, uncited facts are from the authors'
unpublished work, but they are trivial so the proofs are omitted.) The group
is isomorphic to the multiplicative group of odd integers modulo 2**(w +1). The
isomorphism is

x |-> 2*x + 1 (mod 2**(w+1))

The unit of the group is 0.

The inverse element of x, denoted x', is

x' = - x/(2*x + 1)

where "/" denotes the inverse operation of "*", i.e. modular multiplication
by the inverse of the denominator, which indeed exists for every x.

The following relations hold.

(~x) . y = ~(x . y)

(1-x) o y = 1 - (x o y)

x . y = -((-x) o y)

x o y = -((-x) . y)

x . a = b if and only if x = b . a'

x o a = b if and only if x = b o a'

Let e is an arbitrary fixed w-bit integer. Let's define trivariate polynomials
p and q as follows.

p(x,y,e) = (x-e).(y-e) + e

= 2*x*y + (1 - 2*e)*(x + y - e)

q(x,y,e) = -p(-x,y,e)

= 2*x*y + (1 - 2*e)*(x - y + e)

Then p and q are permutation polynomials on either variable while keeping the
others fixed [Rivest99]. Thus the binary operations "(.)" and "(o)" defined by

x (.) y = p(x,y,e)

x (o) y = q(x,y,e)

are quasigroup operations. Note that the previously defined operations "." and
"o" are instances of "(.)" and "(o)", respectively, with e = 0:

x . y = p(x,y,0)

x o y = q(x,y,0)

Furthermore, "(.)" is an abelian group operation over the set of w-bit numbers
that is isomorphic to the group "." defined above. The isomorphism is

x |-> x - e

The unit of this group is e. The inverse element of x, denoted x('), is

x(') = (x - e)' + e

= ((2*e - 1)*x - 2*e*(e-1)) / (2*(x-e) + 1)

The following relations hold.

x (.) y = -((-x) (o) y)

x (o) y = -((-x) (.) y)

x (.) y = (x-e) . (y-e) + e

x (o) y = (x+e) o (y-e) - e

x (.) a = b if and only if x = b (.) a(')

x (o) a = b if and only if x = b (o) a(')

The operation "(o)" will be used for both encryption and decryption thanks to
the following relation.

(x (o) z) (o) z(') = x




3. Specification

This section provides details of NSABC/w, where w is even.

Throughout this section, X denotes a 4w-bit plaintext block, Y a 4w-bit
ciphertext block, Z a 5w-bit key, T a 4w-bit secret *tweak*, i.e., a value that
is used to encrypt only one block under the key, and E a 64w-bit public *bias*,
i.e. a constant value that will be added to / subtracted from the intermediate
data words and key words before and after every multiplication.

NSABC is specified in terms of two functions:

ENCRYPT(X,Z,T,E), which encrypts X under control of Z, T and E,

DECRYPT(Y,Z,T,E), which decrypts Y under control of Z, T and E.

Relation:
DECRYPT( ENCRYPT(X,Z,T,E),Z,T,E ) = X

Tweaking is optional and can be disabled by letting T constantly zero. With
disabled tweaking, NSABC becomes a conventional, non-tweakable, block cipher.

E is declared as a parameter, abeit a constant with well-defined value in
this specification, to indicate the possibility of e.g. personalization, i.e.
making other, "non-standard", variants of the algorithm.




3.1. Word and byte order

A multi-component vector is written as a string in the usual comma-separated
notation, i.e. the leftmost component being the first one. For example,
(x0,x1,x2).

A multi-digit number is written as a string in the usual notation, i.e. the
rightmost digit being the least significant one. For example, x2 x1 x0.

The same data entity is taken sometimes as a number and sometimes as a vector.
To switch between these meanings, memory addressing is a common terminology
framework. Vectors are stored in low-first order, i.e. the first component is
stored at lowest address. Numbers are stored in memory in low-first order, i.e.
the least significant digit is stored at lowest address. Thus for example,
if V = (x0,x1,x2) and N = x2 x1 x0 then V = N.

The terms "component" and "digit" above both refer to [machine] words. For
multi-byte words we assume little-endian byte order. Thus, a "standard"
representative of NSABC is fully determined by a single parameter: the word
length w.




3.2. Operations on word strings

Let R denote the transposition that reverses the word order of a word string.
For example, for 8-bit words, 0x0123AB R = 0xAB2301.

Let S denote the transposition that swaps the high and low halves of every word
of a word string. For example, for 8-bit words, 0x0123AB S = 0x1032BA.

Let ' denote the word-wise application of the inverse operator "'" on a word
string. For example,

(a,b,c)' = (a',b',c').

The following relations hold.

xRS = xSR

xR' = x'R

Unless otherwise specified, operators "+" and "-" applied on multi-word strings
denote word-wise modular addition and subtraction, respectively, i.e. operands
are regarded as vectors of words. For example,

(a0,a1,a2) + (b0,b1,b2) = (a0+b0, a1+b1, a2+b2)

The following relations hold.

(x + y)R = xR + yR

(x - y)R = xR - yR




3.3. Functions ENCRYPT

This function is specified in terms of three functions: DE, a data encryption
function; KE, a key expansion function; and TE, a tweak expansion function.

Input:
X...4w-bit plaintext block
Z...5w-bit key
T...4w-bit tweak
E...64w-bit bias

Output:
Y...4w-bit ciphertext block

Relation:
ENCRYPT(X,Z,T,E) = DE(X, KE(Z), TE(T), E)




3.4. Functions DE, KE and TE

These functions operate on a 4-word data register (x0,x1,x2,x3) and use a
5-word key register (z0,z1,z2,z3,z4) and a 4-word tweak register (t0,t1,t2,t3).
The key register is initially loaded with the key Z, in low-first order. The
tweak register is initially loaded with the tweak T, in low-first order. The
data register is initially loaded with the plaintext block X in low-first order,
and after encryption it contains the ciphertext block Y in that order.

These functions consist of 32 rounds of operations. In each round, a key-
dependent permutation G applies on the word x0, an exclusive-or (XOR) operation
"mixes" x0 and t0 into another word which is either x1 or x3 (see further).
There are two types of rounds: A and B. In an A-typed round, the mixing takes
place after G and targets x1; in a B-typed round, mixing takes place before G
and targets x3. In each round, the pair of key words (z0,z3) enter G-box in that
order.

At the end of every round there is a rotation on the data register that
circularly exchanges value of the four data words, and other rotations that do
similarly for the key register and the tweak register.

Encryption runs in four passes: 8 rounds of type A, then 8 rounds of type B,
then 8 rounds of type A again, then 8 rounds of type B again.



KE (key expansion)

Input:
Z...5w-bit key

Output:
K...64w-bit expanded key

Pseudo-code:
(z0,z1,z2,z3,z4) := Z
for k := 0, 1, 2,..., 31 loop
K(2*k) := z0
K(2*k+1) := z3
(z0,z1,z2,z3,z4) := (z1,z2,z3,z4,z0)
end loop


TE (tweak expansion)

Input:
T...4w-bit tweak

Output:
L...32w-bit expanded tweak

Pseudo-code:
(t0,t1,t2,t3) := T
for k := 0, 1, 2,..., 31 loop
L(k) := t0
(t0,t1,t2,t3) := (t1,t2,t3,t0)
end loop


DE (data encryption)

Input:
X...4w-bit plaintext block
K...64w-bit expanded key
L...32w-bit expanded tweak

Output:
Y...4w-bit ciphertext block

Pseudo-code:
(x0,x1,x2,x3) := X
for k := 0, 1, 2,..., 31 loop
if 0 <= k <= 7 or 16 <= k <= 23 then
x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) )
x1 := x1 ^ x0 ^ L(k)
elsif 8 <= k <= 15 or 24 <= k <= 31 then
x3 := x3 ^ x0 ^ L(k)
x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) )
end if
(x0,x1,x2,x3) := (x1,x2,x3,x0)
end loop
Y := (x0,x1,x2,x3)

Relations:
(x0(0),x1(0),x2(0),x3(0)) = X
(z0(0),z1(0),z2(0),z3(0),z4(0)) = Z
(t0(0),t1(0),t2(0),t3(0)) = T
(E(0),E(1),...,E(63)) = E
Y = (x0(32),x1(32),x2(32),x3(32))
for 0 <= k <= 7 or 0 <= k <= 23:
x0(k+1) = x1(k) ^ g(k) ^ L(k)
x1(k+1) = x2(k)
X2(k+1) = x3(k)
x3(k+1) = g(k)
for 8 <= k <= 15 or 24 <= k <= 31:
x0(k+1) = x1(k)
x1(k+1) = x2(k)
x2(k+1) = x3(k) ^ x0(k) ^ L(k)
x3(k+1) = g(k)
for 0 <= k <= 31:
g(k) = G(x0(k), (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)))
K(2*k) = z0(k)
K(2*k+1) = z3(k)
z0(k+1) = z1(k)
z1(k+1) = z2(k)
z2(k+1) = z3(k)
z3(k+1) = z4(k)
z4(k+1) = z0(k)
L(k) = t0(k)
t0(k+1) = t1(k)
t1(k+1) = t2(k)
t2(k+1) = t3(k)
t3(k+1) = t0(k)




3.5. Function G

Function G, which takes as arguments a w-bit intermediate plaintext word x and a
pair of w-bit subkey words (k0,k1) to return a w-bit intermediate ciphertext
word, is a 2-round cascade of the quasi-group operation "(o)", each followed
by a swap of half-words. The first "(o)" uses unit e0, the second "(o)" uses
unit e1.

In the relations below, the quasi-group operation is written in terms of the
trivariate polynomial q, rather than "(o)", to specify the unit element.

Relations:
G(x,(k0,k1),(e0,e1)) = q( q(x,k0,e0)S, k1, e1 )S




3.6. Function DECRYPT

DECRYPT can be easily derrived from the (implicit) relation with ENCRYPT. Here
it is given explitly in terms of functions DE, KE and TE to emphasize similarity
between encryption and decryption. Unlike Skipjack, DECRYPT cannot be expressed
explicitly in terms of ENCRYPT. That's why the cipher is termed "near Skipjack".

Relations:
DE( DE(X, KE(Z), TE(T), E)RS, (KE(ZR) - ER)' + ER, TE(TRS), ER )RS = X
DECRYPT(Y,Z,T,E) = DE( YRS, (KE(ZR) - ER)' + ER, TE(TRS), ER )RS




3.7. The bias E

The 64w-bit constant E, which consist of units of the group under the operation
"(.)" in function G, and which is used to bias the data words and key words
from zero, may be used to personalize the algorithm. Every choice of E makes
a unique tweakable block cipher. As a "standard" choice, we propose the first
64 numbers (modulo 2**w) of the Fibonacci sequence.

Relations:
E(0) = 1
E(1) = 2
E(k) = E(k-1) + E(k-2) for k = 2,3,4,...,63.




3.8. Tweak counter mode

The 4w-bit secret tweak T is used to encrypt only one block [under a fixed key
Z]. To encrypt multiple blocks the tweak is derrived from a 4w-bit [additional]
key, called *tweak key*, and the block index i, as follows.

Let T(i) denote the tweak used to encrypt i-th block. For zero-th block, the
tweak key is used as the tweak directly:

T(0) = tweak key

Successive tweak is computed from the last-used tweak by the recurrent relation:

T(i) = T(i-1) + 2*T(0) + 1

or, explicitly,

T(i) = T(0) . i

where all operators, including ".", are defined on 4w-bit modular arithmetics
and all operands, including 1 and i, are regarded as 4w-bit numbers.




4. Notes on implementation

(1) The polynomial q in function G can be evaluated using only one
multiplication and one addition, with a modified key schedule. Indeed,

q(x,z,e) = M*x + N

where x is a data word, z key word, e the unit of the [quasi-]group in
use, and

M = 2*(z - e) + 1

N = (2*e - 1)*(z - e)

is a pair of subkeys precomputed in the [modified] key schedule, which consists
of 64 such pairs.

(2) The cipher is partially parallelizable. The following procedure executes
all 32 rounds in 20 steps, of which half performing two or three parallel
evaluations of the function G. Recall that g(k) is the result of G in round k.

1. Compute g(0)
2. Compute g(1)
3. Compute g(2)
4. Compute g(3)
5. Compute g(4)
6. Compute g(5), g(11) in parallel
7. Compute g(6), g(9) in parallel
8. Compute g(7), g(10), g(13) in parallel
9. Compute g(8), g(14) in parallel
10. Compute g(12), g(15) in parallel
11. Compute g(16)
12. Compute g(17)
13. Compute g(18)
14. Compute g(19)
15. Compute g(20)
16. Compute g(21), g(27) in parallel
17. Compute g(22), g(25) in parallel
18. Compute g(23), g(26), g(29) in parallel
19. Compute g(24), g(30) in parallel
20. Compute g(28), g(31) in parallel

On a modern 64-bit processor, each step should take about 8 clock cycles on
average. So, for w=64 i.e. 32-byte blocks it achieves 20*8/32 = 5 clock cycles
per byte encrypted. If tweak counter mode is not used we may disable tweaking
completely to save about 1 clock cycle per byte per step, resulting in 20*7/32
= 4.4 clock cycles per byte.




5. Conclusion

We introduced NSABC, a Skipjack-like block cipher that is defined for every
even word length, allowing scalability in block length and key length.

NSABC is meant to be simple. It uses no hard-to-memorize S-boxes. It makes use
only two word-oriented algebraic operations that are mutually incompatible: "^"
and "(o)". It re-uses the overall structure of Skipjack that has been made public
for over a decade -- sufficient to become truly understood. The simplicity makes
it easier to analyze, which in turn makes greater confidence in the algorithm.

NSABC uses multiplication which is much more powerful than addition and should
greatly contribute to the security of the algorithm.

NSABC bases on some valuable design of a well-reputed agency in the branch. We
therefore believe that it is worth analysis and it can withstand rigorous
analysis. If this happens to be true, then we may have a strong cipher with
256-bit blocks, allowing to encrypt enormous amount of data under the same key,
and with 320-bit keys, allowing to keep data secret over any imaginable time.

NSABC is fast enough to be comparable with any modern block cipher.




6. References

[IBM98] Burwick C., Coppersmith D., D'Avignon E., Gennaro G., Halevi S.,
Jutla C., Matyas S. M. Jr., O'Connor L., Peyravian M., Safford D.,
Zunic N.: MARS - a candidate cipher for AES. 1998.

[LM91] Lai X., Massey J. L., Murphy S.: Markov Ciphers and Differential
Cryptanalysis. 1991.

[NSA98] National Security Agency: Skipjack and KEA specification. 1998.

[Rivest99] Rivest, R. L.: Permutation polynomials modulo 2**w. 1999.

[Shannon49] Shannon C. E.: Communication Theory of Secrecy Systems. 1949.