Information Technology | IT security » Menezes-Oorschot - Handbook of applied cryptography

Datasheet

Year, pagecount:1997, 776 page(s)

Language:English

Downloads:104

Uploaded:September 21, 2006

Size:4 MB

Institution:
-

Comments:

Attachment:-

Download in PDF:Please log in!



Comments

No comments yet. You can be the first!

Content extract

Chapter 1 Overview of Cryptography Contents in Brief 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 Introduction . Information security and cryptography . Background on functions . Basic terminology and concepts . Symmetric-key encryption . Digital signatures . Authentication and identification . Public-key cryptography . Hash functions . Protocols and mechanisms . Key establishment, management, and certification Pseudorandom numbers and sequences . Classes of attacks and security models . Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 6 11 15 22 24 25 33 33 35 39 41 45 1.1 Introduction

Cryptography has a long and fascinating history. The most complete non-technical account of the subject is Kahn’s The Codebreakers. This book traces cryptography from its initial and limited use by the Egyptians some 4000 years ago, to the twentieth century where it played a crucial role in the outcome of both world wars. Completed in 1963, Kahn’s book covers those aspects of the history which were most significant (up to that time) to the development of the subject. The predominant practitioners of the art were those associated with the military, the diplomatic service and government in general. Cryptography was used as a tool to protect national secrets and strategies. The proliferation of computers and communications systems in the 1960s brought with it a demand from the private sector for means to protect information in digital form and to provide security services. Beginning with the work of Feistel at IBM in the early 1970s and culminating in 1977 with the adoption as a U.S

Federal Information Processing Standard for encrypting unclassified information, DES, the Data Encryption Standard, is the most well-known cryptographic mechanism in history. It remains the standard means for securing electronic commerce for many financial institutions around the world The most striking development in the history of cryptography came in 1976 when Diffie and Hellman published New Directions in Cryptography. This paper introduced the revolutionary concept of public-key cryptography and also provided a new and ingenious method 1 2 Ch. 1 Overview of Cryptography for key exchange, the security of which is based on the intractability of the discrete logarithm problem. Although the authors had no practical realization of a public-key encryption scheme at the time, the idea was clear and it generated extensive interest and activity in the cryptographic community. In 1978 Rivest, Shamir, and Adleman discovered the first practical public-key encryption and signature

scheme, now referred to as RSA. The RSA scheme is based on another hard mathematical problem, the intractability of factoring large integers. This application of a hard mathematical problem to cryptography revitalized efforts to find more efficient methods to factor The 1980s saw major advances in this area but none which rendered the RSA system insecure. Another class of powerful and practical public-key schemes was found by ElGamal in 1985. These are also based on the discrete logarithm problem. One of the most significant contributions provided by public-key cryptography is the digital signature. In 1991 the first international standard for digital signatures (ISO/IEC 9796) was adopted. It is based on the RSA public-key scheme In 1994 the US Government adopted the Digital Signature Standard, a mechanism based on the ElGamal publickey scheme The search for new public-key schemes, improvements to existing cryptographic mechanisms, and proofs of security continues at a rapid pace.

Various standards and infrastructures involving cryptography are being put in place Security products are being developed to address the security needs of an information intensive society. The purpose of this book is to give an up-to-date treatise of the principles, techniques, and algorithms of interest in cryptographic practice. Emphasis has been placed on those aspects which are most practical and applied. The reader will be made aware of the basic issues and pointed to specific related research in the literature where more indepth discussions can be found. Due to the volume of material which is covered, most results will be stated without proofs. This also serves the purpose of not obscuring the very applied nature of the subject. This book is intended for both implementers and researchers It describes algorithms, systems, and their interactions. Chapter 1 is a tutorial on the many and various aspects of cryptography. It does not attempt to convey all of the details and subtleties

inherent to the subject. Its purpose is to introduce the basic issues and principles and to point the reader to appropriate chapters in the book for more comprehensive treatments. Specific techniques are avoided in this chapter 1.2 Information security and cryptography The concept of information will be taken to be an understood quantity. To introduce cryptography, an understanding of issues related to information security in general is necessary Information security manifests itself in many ways according to the situation and requirement. Regardless of who is involved, to one degree or another, all parties to a transaction must have confidence that certain objectives associated with information security have been met. Some of these objectives are listed in Table 11 Over the centuries, an elaborate set of protocols and mechanisms has been created to deal with information security issues when the information is conveyed by physical documents. Often the objectives of information

security cannot solely be achieved through mathematical algorithms and protocols alone, but require procedural techniques and abidance of laws to achieve the desired result. For example, privacy of letters is provided by sealed envelopes delivered by an accepted mail service. The physical security of the envelope is, for practical necessity, limited and so laws are enacted which make it a criminal c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.2 Information security and cryptography privacy or confidentiality data integrity entity authentication or identification message authentication signature authorization validation access control certification timestamping witnessing receipt confirmation ownership anonymity non-repudiation revocation 3 keeping information secret from all but those who are authorized to see it. ensuring information has not been altered by unauthorized or unknown means. corroboration of the identity of an entity (e.g, a person, a

computer terminal, a credit card, etc.) corroborating the source of information; also known as data origin authentication. a means to bind information to an entity. conveyance, to another entity, of official sanction to do or be something. a means to provide timeliness of authorization to use or manipulate information or resources. restricting access to resources to privileged entities. endorsement of information by a trusted entity. recording the time of creation or existence of information. verifying the creation or existence of information by an entity other than the creator. acknowledgement that information has been received. acknowledgement that services have been provided. a means to provide an entity with the legal right to use or transfer a resource to others. concealing the identity of an entity involved in some process. preventing the denial of previous commitments or actions. retraction of certification or authorization. Table 1.1: Some information security objectives

offense to open mail for which one is not authorized. It is sometimes the case that security is achieved not through the information itself but through the physical document recording it. For example, paper currency requires special inks and material to prevent counterfeiting Conceptually, the way information is recorded has not changed dramatically over time. Whereas information was typically stored and transmitted on paper, much of it now resides on magnetic media and is transmitted via telecommunications systems, some wireless. What has changed dramatically is the ability to copy and alter information One can make thousands of identical copies of a piece of information stored electronically and each is indistinguishable from the original. With information on paper, this is much more difficult What is needed then for a society where information is mostly stored and transmitted in electronic form is a means to ensure information security which is independent of the physical medium

recording or conveying it and such that the objectives of information security rely solely on digital information itself. One of the fundamental tools used in information security is the signature. It is a building block for many other services such as non-repudiation, data origin authentication, identification, and witnessing, to mention a few Having learned the basics in writing, an individual is taught how to produce a handwritten signature for the purpose of identification At contract age the signature evolves to take on a very integral part of the person’s identity. This signature is intended to be unique to the individual and serve as a means to identify, authorize, and validate. With electronic information the concept of a signature needs to be Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 4 Ch. 1 Overview of Cryptography redressed; it cannot simply be something unique to the signer and independent of the information signed. Electronic

replication of it is so simple that appending a signature to a document not signed by the originator of the signature is almost a triviality. Analogues of the “paper protocols” currently in use are required. Hopefully these new electronic based protocols are at least as good as those they replace. There is a unique opportunity for society to introduce new and more efficient ways of ensuring information security Much can be learned from the evolution of the paper based system, mimicking those aspects which have served us well and removing the inefficiencies. Achieving information security in an electronic society requires a vast array of technical and legal skills. There is, however, no guarantee that all of the information security objectives deemed necessary can be adequately met The technical means is provided through cryptography. 1.1 Definition Cryptography is the study of mathematical techniques related to aspects of information security such as confidentiality, data

integrity, entity authentication, and data origin authentication Cryptography is not the only means of providing information security, but rather one set of techniques. Cryptographic goals Of all the information security objectives listed in Table 1.1, the following four form a framework upon which the others will be derived: (1) privacy or confidentiality (§1.5, §18); (2) data integrity (§1.9); (3) authentication (§17); and (4) non-repudiation (§16) 1. Confidentiality is a service used to keep the content of information from all but those authorized to have it. Secrecy is a term synonymous with confidentiality and privacy There are numerous approaches to providing confidentiality, ranging from physical protection to mathematical algorithms which render data unintelligible. 2. Data integrity is a service which addresses the unauthorized alteration of data To assure data integrity, one must have the ability to detect data manipulation by unauthorized parties. Data manipulation

includes such things as insertion, deletion, and substitution. 3. Authentication is a service related to identification This function applies to both entities and information itself Two parties entering into a communication should identify each other. Information delivered over a channel should be authenticated as to origin, date of origin, data content, time sent, etc. For these reasons this aspect of cryptography is usually subdivided into two major classes: entity authentication and data origin authentication. Data origin authentication implicitly provides data integrity (for if a message is modified, the source has changed). 4. Non-repudiation is a service which prevents an entity from denying previous commitments or actions When disputes arise due to an entity denying that certain actions were taken, a means to resolve the situation is necessary. For example, one entity may authorize the purchase of property by another entity and later deny such authorization was granted. A

procedure involving a trusted third party is needed to resolve the dispute. A fundamental goal of cryptography is to adequately address these four areas in both theory and practice. Cryptography is about the prevention and detection of cheating and other malicious activities. This book describes a number of basic cryptographic tools (primitives) used to provide information security. Examples of primitives include encryption schemes (§15 and §18), c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.2 Information security and cryptography 5 hash functions (§1.9), and digital signature schemes (§16) Figure 11 provides a schematic listing of the primitives considered and how they relate. Many of these will be briefly introduced in this chapter, with detailed discussion left to later chapters These primitives should Arbitrary length hash functions Unkeyed Primitives One-way permutations Random sequences Block ciphers Symmetric-key ciphers Arbitrary length

hash functions (MACs) Security Primitives Stream ciphers Symmetric-key Primitives Signatures Pseudorandom sequences Identification primitives Public-key ciphers Public-key Primitives Signatures Identification primitives Figure 1.1: A taxonomy of cryptographic primitives be evaluated with respect to various criteria such as: 1. level of security This is usually difficult to quantify Often it is given in terms of the number of operations required (using the best methods currently known) to defeat the intended objective. Typically the level of security is defined by an upper bound on the amount of work necessary to defeat the objective. This is sometimes called the work factor (see §1.134) 2. functionality Primitives will need to be combined to meet various information security objectives Which primitives are most effective for a given objective will be determined by the basic properties of the primitives. 3. methods of operation Primitives, when applied in various ways and with

various inputs, will typically exhibit different characteristics; thus, one primitive could provide Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 6 Ch. 1 Overview of Cryptography very different functionality depending on its mode of operation or usage. 4. performance This refers to the efficiency of a primitive in a particular mode of operation (For example, an encryption algorithm may be rated by the number of bits per second which it can encrypt.) 5. ease of implementation This refers to the difficulty of realizing the primitive in a practical instantiation. This might include the complexity of implementing the primitive in either a software or hardware environment The relative importance of various criteria is very much dependent on the application and resources available. For example, in an environment where computing power is limited one may have to trade off a very high level of security for better performance of the system as a whole.

Cryptography, over the ages, has been an art practised by many who have devised ad hoc techniques to meet some of the information security requirements. The last twenty years have been a period of transition as the discipline moved from an art to a science. There are now several international scientific conferences devoted exclusively to cryptography and also an international scientific organization, the International Association for Cryptologic Research (IACR), aimed at fostering research in the area. This book is about cryptography: the theory, the practice, and the standards. 1.3 Background on functions While this book is not a treatise on abstract mathematics, a familiarity with basic mathematical concepts will prove to be useful. One concept which is absolutely fundamental to cryptography is that of a function in the mathematical sense. A function is alternately referred to as a mapping or a transformation 1.31 Functions (1-1, one-way, trapdoor one-way) A set consists of

distinct objects which are called elements of the set. For example, a set X might consist of the elements a, b, c, and this is denoted X = {a, b, c}. 1.2 Definition A function is defined by two sets X and Y and a rule f which assigns to each element in X precisely one element in Y . The set X is called the domain of the function and Y the codomain. If x is an element of X (usually written x ∈ X) the image of x is the element in Y which the rule f associates with x; the image y of x is denoted by y = f (x). Standard notation for a function f from set X to set Y is f : X − Y . If y ∈ Y , then a preimage of y is an element x ∈ X for which f (x) = y. The set of all elements in Y which have at least one preimage is called the image of f , denoted Im(f ). 1.3 Example (function) Consider the sets X = {a, b, c}, Y = {1, 2, 3, 4}, and the rule f from X to Y defined as f (a) = 2, f (b) = 4, f (c) = 1. Figure 12 shows a schematic of the sets X, Y and the function f . The preimage of the

element 2 is a The image of f is {1, 2, 4}.  Thinking of a function in terms of the schematic (sometimes called a functional diagram) given in Figure 1.2, each element in the domain X has precisely one arrowed line originating from it. Each element in the codomain Y can have any number of arrowed lines incident to it (including zero lines). c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.3 Background on functions 7 f 1 a 2 X Y b 3 c 4 Figure 1.2: A function f from a set X of three elements to a set Y of four elements Often only the domain X and the rule f are given and the codomain is assumed to be the image of f . This point is illustrated with two examples 1.4 Example (function) Take X = {1, 2, 3, , 10} and let f be the rule that for each x ∈ X, f (x) = rx , where rx is the remainder when x2 is divided by 11. Explicitly then f (1) = 1 f (2) = 4 f (3) = 9 f (4) = 5 f (5) = 3 f (6) = 3 f (7) = 5 f (8) = 9 f (9) = 4 f (10) = 1. The image of

f is the set Y = {1, 3, 4, 5, 9}.  1.5 Example (function) Take X = {1, 2, 3, , 1050 } and let f be the rule f (x) = rx , where rx is the remainder when x2 is divided by 1050 + 1 for all x ∈ X. Here it is not feasible to write down f explicitly as in Example 1.4, but nonetheless the function is completely specified by the domain and the mathematical description of the rule f .  (i) 1-1 functions 1.6 Definition A function (or transformation) is 1 − 1 (one-to-one) if each element in the codomain Y is the image of at most one element in the domain X. 1.7 Definition A function (or transformation) is onto if each element in the codomain Y is the image of at least one element in the domain. Equivalently, a function f : X − Y is onto if Im(f ) = Y . 1.8 Definition If a function f : X − Y is 1−1 and Im(f ) = Y , then f is called a bijection 1.9 Fact If f : X − Y is 1 − 1 then f : X − Im(f ) is a bijection In particular, if f : X − Y is 1 − 1, and X and Y are finite

sets of the same size, then f is a bijection. In terms of the schematic representation, if f is a bijection, then each element in Y has exactly one arrowed line incident with it. The functions described in Examples 13 and 1.4 are not bijections In Example 13 the element 3 is not the image of any element in the domain. In Example 14 each element in the codomain has two preimages 1.10 Definition If f is a bijection from X to Y then it is a simple matter to define a bijection g from Y to X as follows: for each y ∈ Y define g(y) = x where x ∈ X and f (x) = y. This function g obtained from f is called the inverse function of f and is denoted by g = f −1 . Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 8 Ch. 1 Overview of Cryptography f X g a 1 1 a b 2 2 b c 3 3 c d 4 4 d e 5 5 e Y Y X Figure 1.3: A bijection f and its inverse g = f −1 1.11 Example (inverse function) Let X = {a, b, c, d, e}, and Y = {1, 2, 3, 4, 5},

and consider the rule f given by the arrowed edges in Figure 1.3 f is a bijection and its inverse g is formed simply by reversing the arrows on the edges. The domain of g is Y and the codomain is X.  Note that if f is a bijection, then so is f −1 . In cryptography bijections are used as the tool for encrypting messages and the inverse transformations are used to decrypt. This will be made clearer in §1.4 when some basic terminology is introduced Notice that if the transformations were not bijections then it would not be possible to always decrypt to a unique message. (ii) One-way functions There are certain types of functions which play significant roles in cryptography. At the expense of rigor, an intuitive definition of a one-way function is given. 1.12 Definition A function f from a set X to a set Y is called a one-way function if f (x) is “easy” to compute for all x ∈ X but for “essentially all” elements y ∈ Im(f ) it is “computationally infeasible” to find any

x ∈ X such that f (x) = y. 1.13 Note (clarification of terms in Definition 112) (i) A rigorous definition of the terms “easy” and “computationally infeasible” is necessary but would detract from the simple idea that is being conveyed. For the purpose of this chapter, the intuitive meaning will suffice. (ii) The phrase “for essentially all elements in Y ” refers to the fact that there are a few values y ∈ Y for which it is easy to find an x ∈ X such that y = f (x). For example, one may compute y = f (x) for a small number of x values and then for these, the inverse is known by table look-up. An alternate way to describe this property of a one-way function is the following: for a random y ∈ Im(f ) it is computationally infeasible to find any x ∈ X such that f (x) = y. The concept of a one-way function is illustrated through the following examples. 1.14 Example (one-way function) Take X = {1, 2, 3, , 16} and define f (x) = rx for all x ∈ X where rx is the

remainder when 3x is divided by 17. Explicitly, x f (x) 1 2 3 4 5 6 7 8 9 10 11 3 9 10 13 5 15 11 16 14 8 7 12 13 14 15 16 4 12 2 6 1 Given a number between 1 and 16, it is relatively easy to find the image of it under f . However, given a number such as 7, without having the table in front of you, it is harder to find c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.3 Background on functions 9 x given that f (x) = 7. Of course, if the number you are given is 3 then it is clear that x = 1 is what you need; but for most of the elements in the codomain it is not that easy.  One must keep in mind that this is an example which uses very small numbers; the important point here is that there is a difference in the amount of work to compute f (x) and the amount of work to find x given f (x). Even for very large numbers, f (x) can be computed efficiently using the repeated square-and-multiply algorithm (Algorithm 2.143), whereas the process of finding x from

f (x) is much harder. 1.15 Example (one-way function) A prime number is a positive integer greater than 1 whose only positive integer divisors are 1 and itself. Select primes p = 48611, q = 53993, form n = pq = 2624653723, and let X = {1, 2, 3, . , n − 1} Define a function f on X by f (x) = rx for each x ∈ X, where rx is the remainder when x3 is divided by n. For instance, f (2489991) = 1981394214 since 24899913 = 5881949859 · n + 1981394214. Computing f (x) is a relatively simple thing to do, but to reverse the procedure is much more difficult; that is, given a remainder to find the value x which was originally cubed (raised to the third power). This procedure is referred to as the computation of a modular cube root with modulus n. If the factors of n are unknown and large, this is a difficult problem; however, if the factors p and q of n are known then there is an efficient algorithm for computing modular cube roots. (See §822(i) for details)  Example 1.15 leads one to

consider another type of function which will prove to be fundamental in later developments. (iii) Trapdoor one-way functions 1.16 Definition A trapdoor one-way function is a one-way function f : X − Y with the additional property that given some extra information (called the trapdoor information) it becomes feasible to find for any given y ∈ Im(f ), an x ∈ X such that f (x) = y. Example 1.15 illustrates the concept of a trapdoor one-way function With the additional information of the factors of n = 2624653723 (namely, p = 48611 and q = 53993, each of which is five decimal digits long) it becomes much easier to invert the function. The factors of 2624653723 are large enough that finding them by hand computation would be difficult. Of course, any reasonable computer program could find the factors relatively quickly. If, on the other hand, one selects p and q to be very large distinct prime numbers (each having about 100 decimal digits) then, by today’s standards, it is a

difficult problem, even with the most powerful computers, to deduce p and q simply from n. This is the wellknown integer factorization problem (see §32) and a source of many trapdoor one-way functions. It remains to be rigorously established whether there actually are any (true) one-way functions. That is to say, no one has yet definitively proved the existence of such functions under reasonable (and rigorous) definitions of “easy” and “computationally infeasible” Since the existence of one-way functions is still unknown, the existence of trapdoor one-way functions is also unknown. However, there are a number of good candidates for one-way and trapdoor one-way functions. Many of these are discussed in this book, with emphasis given to those which are practical. One-way and trapdoor one-way functions are the basis for public-key cryptography (discussed in §1.8) The importance of these concepts will become clearer when their application to cryptographic techniques is considered

It will be worthwhile to keep the abstract concepts of this section in mind as concrete methods are presented. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 10 Ch. 1 Overview of Cryptography 1.32 Permutations Permutations are functions which are often used in various cryptographic constructs. 1.17 Definition Let S be a finite set of elements A permutation p on S is a bijection (Definition 18) from S to itself (ie, p : S − S) 1.18 Example (permutation) Let S = {1, 2, 3, 4, 5} A permutation p : S − S is defined as follows: p(1) = 3, p(2) = 5, p(3) = 4, p(4) = 2, p(5) = 1. A permutation can be described in various ways. It can be displayed as above or as an array:   1 2 3 4 5 p= , (1.1) 3 5 4 2 1 where the top row in the array is the domain and the bottom row is the image under the mapping p. Of course, other representations are possible  Since permutations are bijections, they have inverses. If a permutation is written as an array (see 1.1),

its inverse is easily found by interchanging the rows in the array and reordering the elements in the new top row if desired (the bottom rowwould have to be reordered  1 2 3 4 5 −1 correspondingly). The inverse of p in Example 118 is p = . 5 4 1 3 2 1.19 Example (permutation) Let X be the set of integers {0, 1, 2, , pq − 1} where p and q are distinct large primes (for example, p and q are each about 100 decimal digits long), and suppose that neither p−1 nor q −1 is divisible by 3. Then the function p(x) = rx , where rx is the remainder when x3 is divided by pq, can be shown to be a permutation. Determining the inverse permutation is computationally infeasible by today’s standards unless p and q are known (cf. Example 115)  1.33 Involutions Another type of function which will be referred to in §1.53 is an involution Involutions have the property that they are their own inverses. 1.20 Definition Let S be a finite set and let f be a bijection from S to S (ie, f : S −

S) The function f is called an involution if f = f −1 . An equivalent way of stating this is f (f (x)) = x for all x ∈ S. 1.21 Example (involution) Figure 14 is an example of an involution In the diagram of an involution, note that if j is the image of i then i is the image of j.  c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.4 Basic terminology and concepts 11 1 1 2 2 S 3 3 4 4 5 5 S Figure 1.4: An involution on a set S of 5 elements 1.4 Basic terminology and concepts The scientific study of any discipline must be built upon rigorous definitions arising from fundamental concepts. What follows is a list of terms and basic concepts used throughout this book. Where appropriate, rigor has been sacrificed (here in Chapter 1) for the sake of clarity. Encryption domains and codomains • A denotes a finite set called the alphabet of definition. For example, A = {0, 1}, the binary alphabet, is a frequently used alphabet of definition. Note

that any alphabet can be encoded in terms of the binary alphabet. For example, since there are 32 binary strings of length five, each letter of the English alphabet can be assigned a unique binary string of length five. • M denotes a set called the message space. M consists of strings of symbols from an alphabet of definition. An element of M is called a plaintext message or simply a plaintext. For example, M may consist of binary strings, English text, computer code, etc. • C denotes a set called the ciphertext space. C consists of strings of symbols from an alphabet of definition, which may differ from the alphabet of definition for M. An element of C is called a ciphertext. Encryption and decryption transformations • K denotes a set called the key space. An element of K is called a key • Each element e ∈ K uniquely determines a bijection from M to C, denoted by Ee . Ee is called an encryption function or an encryption transformation. Note that Ee must be a bijection if the

process is to be reversed and a unique plaintext message recovered for each distinct ciphertext.1 • For each d ∈ K, Dd denotes a bijection from C to M (i.e, Dd : C − M) Dd is called a decryption function or decryption transformation. • The process of applying the transformation Ee to a message m ∈ M is usually referred to as encrypting m or the encryption of m. • The process of applying the transformation Dd to a ciphertext c is usually referred to as decrypting c or the decryption of c. 1 More generality is obtained if E is simply defined as a 1 − 1 transformation from M to C. That is to say, e Ee is a bijection from M to Im(Ee ) where Im(Ee ) is a subset of C. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 12 Ch. 1 Overview of Cryptography • An encryption scheme consists of a set {Ee : e ∈ K} of encryption transformations and a corresponding set {Dd : d ∈ K} of decryption transformations with the property that for each e ∈ K

there is a unique key d ∈ K such that Dd = Ee−1 ; that is, Dd (Ee (m)) = m for all m ∈ M. An encryption scheme is sometimes referred to as a cipher. • The keys e and d in the preceding definition are referred to as a key pair and sometimes denoted by (e, d). Note that e and d could be the same • To construct an encryption scheme requires one to select a message space M, a ciphertext space C, a key space K, a set of encryption transformations {Ee : e ∈ K}, and a corresponding set of decryption transformations {Dd : d ∈ K}. Achieving confidentiality An encryption scheme may be used as follows for the purpose of achieving confidentiality. Two parties Alice and Bob first secretly choose or secretly exchange a key pair (e, d). At a subsequent point in time, if Alice wishes to send a message m ∈ M to Bob, she computes c = Ee (m) and transmits this to Bob. Upon receiving c, Bob computes Dd (c) = m and hence recovers the original message m. The question arises as to why keys

are necessary. (Why not just choose one encryption function and its corresponding decryption function?) Having transformations which are very similar but characterized by keys means that if some particular encryption/decryption transformation is revealed then one does not have to redesign the entire scheme but simply change the key. It is sound cryptographic practice to change the key (encryption/decryption transformation) frequently. As a physical analogue, consider an ordinary resettable combination lock The structure of the lock is available to anyone who wishes to purchase one but the combination is chosen and set by the owner. If the owner suspects that the combination has been revealed he can easily reset it without replacing the physical mechanism. 1.22 Example (encryption scheme) Let M = {m1 , m2 , m3 } and C = {c1 , c2 , c3 } There are precisely 3! = 6 bijections from M to C. The key space K = {1, 2, 3, 4, 5, 6} has six elements in it, each specifying one of the

transformations. Figure 15 illustrates the six encryption functions which are denoted by Ei , 1 ≤ i ≤ 6. Alice and Bob agree on a transE1 E2 E3 m1 c1 m1 c1 m1 c1 m2 c2 m2 c2 m2 c2 m3 c3 m3 c3 m3 c3 E4 E5 E6 m1 c1 m1 c1 m1 c1 m2 c2 m2 c2 m2 c2 m3 c3 m3 c3 m3 c3 Figure 1.5: Schematic of a simple encryption scheme formation, say E1 . To encrypt the message m1 , Alice computes E1 (m1 ) = c3 and sends c3 to Bob. Bob decrypts c3 by reversing the arrows on the diagram for E1 and observing that c3 points to m1 . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.4 Basic terminology and concepts 13 When M is a small set, the functional diagram is a simple visual means to describe the mapping. In cryptography, the set M is typically of astronomical proportions and, as such, the visual description is infeasible. What is required, in these cases, is some other simple means to describe the encryption and decryption

transformations, such as mathematical algorithms.  Figure 1.6 provides a simple model of a two-party communication using encryption Adversary encryption Ee (m) = c m c UNSECURED CHANNEL decryption Dd (c) = m m plaintext source destination Alice Bob Figure 1.6: Schematic of a two-party communication using encryption Communication participants Referring to Figure 1.6, the following terminology is defined • An entity or party is someone or something which sends, receives, or manipulates information. Alice and Bob are entities in Example 122 An entity may be a person, a computer terminal, etc. • A sender is an entity in a two-party communication which is the legitimate transmitter of information. In Figure 16, the sender is Alice • A receiver is an entity in a two-party communication which is the intended recipient of information. In Figure 16, the receiver is Bob • An adversary is an entity in a two-party communication which is neither the sender nor receiver, and which

tries to defeat the information security service being provided between the sender and receiver. Various other names are synonymous with adversary such as enemy, attacker, opponent, tapper, eavesdropper, intruder, and interloper An adversary will often attempt to play the role of either the legitimate sender or the legitimate receiver. Channels • A channel is a means of conveying information from one entity to another. • A physically secure channel or secure channel is one which is not physically accessible to the adversary. • An unsecured channel is one from which parties other than those for which the information is intended can reorder, delete, insert, or read. • A secured channel is one from which an adversary does not have the ability to reorder, delete, insert, or read. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 14 Ch. 1 Overview of Cryptography One should note the subtle difference between a physically secure channel and a secured

channel – a secured channel may be secured by physical or cryptographic techniques, the latter being the topic of this book. Certain channels are assumed to be physically secure These include trusted couriers, personal contact between communicating parties, and a dedicated communication link, to name a few. Security A fundamental premise in cryptography is that the sets M, C, K, {Ee : e ∈ K}, {Dd : d ∈ K} are public knowledge. When two parties wish to communicate securely using an encryption scheme, the only thing that they keep secret is the particular key pair (e, d) which they are using, and which they must select. One can gain additional security by keeping the class of encryption and decryption transformations secret but one should not base the security of the entire scheme on this approach. History has shown that maintaining the secrecy of the transformations is very difficult indeed. 1.23 Definition An encryption scheme is said to be breakable if a third party, without

prior knowledge of the key pair (e, d), can systematically recover plaintext from corresponding ciphertext within some appropriate time frame. An appropriate time frame will be a function of the useful lifespan of the data being protected. For example, an instruction to buy a certain stock may only need to be kept secret for a few minutes whereas state secrets may need to remain confidential indefinitely. An encryption scheme can be broken by trying all possible keys to see which one the communicating parties are using (assuming that the class of encryption functions is public knowledge). This is called an exhaustive search of the key space It follows then that the number of keys (i.e, the size of the key space) should be large enough to make this approach computationally infeasible. It is the objective of a designer of an encryption scheme that this be the best approach to break the system. Frequently cited in the literature are Kerckhoffs’ desiderata, a set of requirements for

cipher systems. They are given here essentially as Kerckhoffs originally stated them: 1. the system should be, if not theoretically unbreakable, unbreakable in practice; 2. compromise of the system details should not inconvenience the correspondents; 3. the key should be rememberable without notes and easily changed; 4. the cryptogram should be transmissible by telegraph; 5. the encryption apparatus should be portable and operable by a single person; and 6. the system should be easy, requiring neither the knowledge of a long list of rules nor mental strain. This list of requirements was articulated in 1883 and, for the most part, remains useful today. Point 2 allows that the class of encryption transformations being used be publicly known and that the security of the system should reside only in the key chosen. Information security in general So far the terminology has been restricted to encryption and decryption with the goal of privacy in mind. Information security is much broader,

encompassing such things as authentication and data integrity A few more general definitions, pertinent to discussions later in the book, are given next. • An information security service is a method to provide some specific aspect of security. For example, integrity of transmitted data is a security objective, and a method to ensure this aspect is an information security service. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.5 Symmetric-key encryption 15 • Breaking an information security service (which often involves more than simply encryption) implies defeating the objective of the intended service. • A passive adversary is an adversary who is capable only of reading information from an unsecured channel. • An active adversary is an adversary who may also transmit, alter, or delete information on an unsecured channel. Cryptology • Cryptanalysis is the study of mathematical techniques for attempting to defeat cryptographic techniques,

and, more generally, information security services. • A cryptanalyst is someone who engages in cryptanalysis. • Cryptology is the study of cryptography (Definition 1.1) and cryptanalysis • A cryptosystem is a general term referring to a set of cryptographic primitives used to provide information security services. Most often the term is used in conjunction with primitives providing confidentiality, i.e, encryption Cryptographic techniques are typically divided into two generic types: symmetric-key and public-key. Encryption methods of these types will be discussed separately in §15 and §1.8 Other definitions and terminology will be introduced as required 1.5 Symmetric-key encryption §1.5 considers symmetric-key encryption Public-key encryption is the topic of §18 1.51 Overview of block ciphers and stream ciphers 1.24 Definition Consider an encryption scheme consisting of the sets of encryption and decryption transformations {Ee : e ∈ K} and {Dd : d ∈ K}, respectively,

where K is the key space. The encryption scheme is said to be symmetric-key if for each associated encryption/decryption key pair (e, d), it is computationally “easy” to determine d knowing only e, and to determine e from d. Since e = d in most practical symmetric-key encryption schemes, the term symmetrickey becomes appropriate. Other terms used in the literature are single-key, one-key, privatekey,2 and conventional encryption Example 125 illustrates the idea of symmetric-key encryption 1.25 Example (symmetric-key encryption) Let A = {A, B, C, , X, Y, Z} be the English alphabet. Let M and C be the set of all strings of length five over A The key e is chosen to be a permutation on A. To encrypt, an English message is broken up into groups each having five letters (with appropriate padding if the length of the message is not a multiple of five) and a permutation e is applied to each letter one at a time. To decrypt, the inverse permutation d = e−1 is applied to each letter of

the ciphertext. For instance, suppose that the key e is chosen to be the permutation which maps each letter to the one which is three positions to its right, as shown below   A BC D E FG H I J K L MNOP Q R S T UVW XY Z e= D E F GH I J KLMNO P Q R S T UVWXY Z A B C 2 Private key is a term also used in quite a different context (see §1.8) The term will be reserved for the latter usage in this book. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 16 Ch. 1 Overview of Cryptography A message m = THISC IPHER ISCER TAINL YNOTS ECURE is encrypted to c = Ee (m) = WKLVF LSKHU LVFHU WDLQO BQRWV HFXUH.  A two-party communication using symmetric-key encryption can be described by the block diagram of Figure 1.7, which is Figure 16 with the addition of the secure (both conAdversary key source e SECURE CHANNEL e encryption Ee (m) = c m c UNSECURED CHANNEL decryption Dd (c) = m m plaintext source destination Alice Bob Figure 1.7: Two-party

communication using encryption, with a secure channel for key exchange The decryption key d can be efficiently computed from the encryption key e. fidential and authentic) channel. One of the major issues with symmetric-key systems is to find an efficient method to agree upon and exchange keys securely. This problem is referred to as the key distribution problem (see Chapters 12 and 13). It is assumed that all parties know the set of encryption/decryptiontransformations (i.e, they all know the encryption scheme). As has been emphasized several times the only information which should be required to be kept secret is the key d However, in symmetric-key encryption, this means that the key e must also be kept secret, as d can be deduced from e. In Figure 17 the encryption key e is transported from one entity to the other with the understanding that both can construct the decryption key d. There are two classes of symmetric-key encryption schemes which are commonly distinguished: block

ciphers and stream ciphers. 1.26 Definition A block cipher is an encryption scheme which breaks up the plaintext messages to be transmitted into strings (called blocks) of a fixed length t over an alphabet A, and encrypts one block at a time. Most well-known symmetric-key encryption techniques are block ciphers. A number of examples of these are given in Chapter 7. Two important classes of block ciphers are substitution ciphers and transposition ciphers (§1.52) Product ciphers (§153) combine c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.5 Symmetric-key encryption 17 these. Stream ciphers are considered in §154, while comments on the key space follow in §1.55 1.52 Substitution ciphers and transposition ciphers Substitution ciphers are block ciphers which replace symbols (or groups of symbols) by other symbols or groups of symbols. Simple substitution ciphers 1.27 Definition Let A be an alphabet of q symbols and M be the set of all strings of

length t over A. Let K be the set of all permutations on the set A Define for each e ∈ K an encryption transformation Ee as: Ee (m) = (e(m1 )e(m2 ) · · · e(mt )) = (c1 c2 · · · ct ) = c, where m = (m1 m2 · · · mt ) ∈ M. In other words, for each symbol in a t-tuple, replace (substitute) it by another symbol from A according to some fixed permutation e. To decrypt c = (c1 c2 · · · ct ) compute the inverse permutation d = e−1 and Dd (c) = (d(c1 )d(c2 ) · · · d(ct )) = (m1 m2 · · · mt ) = m. Ee is called a simple substitution cipher or a mono-alphabetic substitution cipher. The number of distinct substitution ciphers is q! and is independent of the block size in the cipher. Example 125 is an example of a simple substitution cipher of block length five Simple substitution ciphers over small block sizes provide inadequate security even when the key space is extremely large. If the alphabet is the English alphabet as in Example 125, then the size of the key space is

26! ≈ 4 × 1026 , yet the key being used can be determined quite easily by examining a modest amount of ciphertext. This follows from the simple observation that the distribution of letter frequencies is preserved in the ciphertext. For example, the letter E occurs more frequently than the other letters in ordinary English text. Hence the letter occurring most frequently in a sequence of ciphertext blocks is most likely to correspond to the letter E in the plaintext. By observing a modest quantity of ciphertext blocks, a cryptanalyst can determine the key Homophonic substitution ciphers 1.28 Definition To each symbol a ∈ A, associate a set H(a) of strings of t symbols, with the restriction that the sets H(a), a ∈ A, be pairwise disjoint. A homophonic substitution cipher replaces each symbol a in a plaintext message block with a randomly chosen string from H(a). To decrypt a string c of t symbols, one must determine an a ∈ A such that c ∈ H(a). The key for the cipher consists

of the sets H(a) 1.29 Example (homophonic substitution cipher) Consider A = {a, b}, H(a) = {00, 10}, and H(b) = {01, 11}. The plaintext message block ab encrypts to one of the following: 0001, 0011, 1001, 1011. Observe that the codomain of the encryption function (for messages of length two) consists of the following pairwise disjoint sets of four-element bitstrings: aa ab ba bb − − − − {0000, 0010, 1000, 1010} {0001, 0011, 1001, 1011} {0100, 0110, 1100, 1110} {0101, 0111, 1101, 1111} Any 4-bitstring uniquely identifies a codomain element, and hence a plaintext message.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 18 Ch. 1 Overview of Cryptography Often the symbols do not occur with equal frequency in plaintext messages. With a simple substitution cipher this non-uniform frequency property is reflected in the ciphertext as illustrated in Example 1.25 A homophonic cipher can be used to make the frequency of occurrence of ciphertext

symbols more uniform, at the expense of data expansion. Decryption is not as easily performed as it is for simple substitution ciphers Polyalphabetic substitution ciphers 1.30 Definition A polyalphabetic substitution cipher is a block cipher with block length t over an alphabet A having the following properties: (i) the key space K consists of all ordered sets of t permutations (p1 , p2 , . , pt ), where each permutation pi is defined on the set A; (ii) encryption of the message m = (m1 m2 · · · mt ) under the key e = (p1 , p2 , . , pt ) is given by Ee (m) = (p1 (m1 )p2 (m2 ) · · · pt (mt )); and −1 −1 (iii) the decryption key associated with e = (p1 , p2 , . , pt ) is d = (p−1 1 , p2 , . , pt ) 1.31 Example (Vigenère cipher) Let A = {A, B, C, , X, Y, Z} and t = 3 Choose e = (p1 , p2 , p3 ), where p1 maps each letter to the letter three positions to its right in the alphabet, p2 to the one seven positions to its right, and p3 ten positions to its right. If m =

THI SCI PHE RIS CER TAI NLY NOT SEC URE then c = Ee (m) = WOS VJS SOO UPC FLB WHS QSI QVD VLM XYO.  Polyalphabetic ciphers have the advantage over simple substitution ciphers that symbol frequencies are not preserved. In the example above, the letter E is encrypted to both O and L. However, polyalphabetic ciphers are not significantly more difficult to cryptanalyze, the approach being similar to the simple substitution cipher. In fact, once the block length t is determined, the ciphertext letters can be divided into t groups (where group i, 1 ≤ i ≤ t, consists of those ciphertext letters derived using permutation pi ), and a frequency analysis can be done on each group. Transposition ciphers Another class of symmetric-key ciphers is the simple transposition cipher, which simply permutes the symbols in a block. 1.32 Definition Consider a symmetric-key block encryption scheme with block length t Let K be the set of all permutations on the set {1, 2, . , t} For each e ∈ K

define the encryption function Ee (m) = (me(1) me(2) · · · me(t) ) where m = (m1 m2 · · · mt ) ∈ M, the message space. The set of all such transformations is called a simple transposition cipher. The decryption key corresponding to e is the inverse permutation d = e−1 . To decrypt c = (c1 c2 · · · ct ), compute Dd (c) = (cd(1) cd(2) · · · cd(t) ) A simple transposition cipher preserves the number of symbols of a given type within a block, and thus is easily cryptanalyzed. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.5 Symmetric-key encryption 19 1.53 Composition of ciphers In order to describe product ciphers, the concept of composition of functions is introduced. Compositions are a convenient way of constructing more complicated functions from simpler ones. Composition of functions 1.33 Definition Let S, T , and U be finite sets and let f : S − T and g : T − U be functions The composition of g with f , denoted g ◦ f (or simply

gf ), is a function from S to U as illustrated in Figure 1.8 and defined by (g ◦ f )(x) = g(f (x)) for all x ∈ S T S U 1 a 2 b 3 c 4 f S U s s a t t b u u c v g v g◦f Figure 1.8: The composition g ◦ f of functions g and f Composition can be easily extended to more than two functions. For functions f1 , f2 , . , ft , one can define ft ◦ · · ·◦ f2 ◦ f1 , provided that the domain of ft equals the codomain of ft−1 and so on. Compositions and involutions Involutions were introduced in §1.33 as a simple class of functions with an interesting property: Ek (Ek (x)) = x for all x in the domain of Ek ; that is, Ek ◦ Ek is the identity function 1.34 Remark (composition of involutions) The composition of two involutions is not necessarily an involution, as illustrated in Figure 19 However, involutions may be composed to get somewhat more complicated functions whose inverses are easy to find. This is an important feature for decryption. For example

if Ek1 , Ek2 , , Ekt are involutions then the inverse of Ek = Ek1 Ek2 · · · Ekt is Ek−1 = Ekt Ekt−1 · · · Ek1 , the composition of the involutions in the reverse order. 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 f g g◦f Figure 1.9: The composition g ◦ f of involutions g and f is not an involution Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 20 Ch. 1 Overview of Cryptography Product ciphers Simple substitution and transposition ciphers individually do not provide a very high level of security. However, by combining these transformations it is possible to obtain strong ciphers As will be seen in Chapter 7 some of the most practical and effective symmetric-key systems are product ciphers. One example of a product cipher is a composition of t ≥ 2 transformations Ek1 Ek2 · · · Ekt where each Eki , 1 ≤ i ≤ t, is either a substitution or a transposition cipher. For the purpose of this

introduction, let the composition of a substitution and a transposition be called a round 1.35 Example (product cipher) Let M = C = K be the set of all binary strings of length six The number of elements in M is 26 = 64. Let m = (m1 m2 · · · m6 ) and define (1) Ek (m) = m ⊕ k, where k ∈ K, E (2) (m) = (m4 m5 m6 m1 m2 m3 ). Here, ⊕ is the exclusive-OR (XOR) operation defined as follows: 0 ⊕ 0 = 0, 0 ⊕ 1 = 1, (1) 1 ⊕ 0 = 1, 1 ⊕ 1 = 0. Ek is a polyalphabetic substitution cipher and E (2) is a trans(1) position cipher (not involving the key). The product Ek E (2) is a round While here the transposition cipher is very simple and is not determined by the key, this need not be the case.  1.36 Remark (confusion and diffusion) A substitution in a round is said to add confusion to the encryption process whereas a transposition is said to add diffusion. Confusion is intended to make the relationship between the key and ciphertext as complex as possible. Diffusion refers to

rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext. A round then can be said to add both confusion and diffusion to the encryption Most modern block cipher systems apply a number of rounds in succession to encrypt plaintext. 1.54 Stream ciphers Stream ciphers form an important class of symmetric-key encryption schemes. They are, in one sense, very simple block ciphers having block length equal to one. What makes them useful is the fact that the encryption transformation can change for each symbol of plaintext being encrypted. In situations where transmission errors are highly probable, stream ciphers are advantageous because they have no error propagation. They can also be used when the data must be processed one symbol at a time (e.g, if the equipment has no memory or buffering of data is limited). 1.37 Definition Let K be the key space for a set of encryption transformations A sequence of symbols e1 e2 e3 ·

· · ei ∈ K, is called a keystream. 1.38 Definition Let A be an alphabet of q symbols and let Ee be a simple substitution cipher with block length 1 where e ∈ K. Let m1 m2 m3 · · · be a plaintext string and let e1 e2 e3 · · · be a keystream from K. A stream cipher takes the plaintext string and produces a ciphertext string c1 c2 c3 · · · where ci = Eei (mi ). If di denotes the inverse of ei , then Ddi (ci ) = mi decrypts the ciphertext string. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.5 Symmetric-key encryption 21 A stream cipher applies simple encryption transformations according to the keystream being used. The keystream could be generated at random, or by an algorithm which generates the keystream from an initial small keystream (called a seed), or from a seed and previous ciphertext symbols. Such an algorithm is called a keystream generator The Vernam cipher A motivating factor for the Vernam cipher was its simplicity and ease

of implementation. 1.39 Definition The Vernam Cipher is a stream cipher defined on the alphabet A = {0, 1} A binary message m1 m2 · · · mt is operated on by a binary key string k1 k2 · · · kt of the same length to produce a ciphertext string c1 c2 · · · ct where ci = mi ⊕ ki , 1 ≤ i ≤ t. If the key string is randomly chosen and never used again, the Vernam cipher is called a one-time system or a one-time pad. To see how the Vernam cipher corresponds to Definition 1.38, observe that there are precisely two substitution ciphers on the set A. One is simply the identity map E0 which sends 0 to 0 and 1 to 1; the other E1 sends 0 to 1 and 1 to 0. When the keystream contains a 0, apply E0 to the corresponding plaintext symbol; otherwise, apply E1 . If the key string is reused there are ways to attack the system. For example, if c1 c2 · · · ct and c01 c02 · · · c0t are two ciphertext strings produced by the same keystream k1 k2 · · · kt then ci = mi ⊕ ki , c0i = m0i

⊕ ki and ci ⊕ c0i = mi ⊕ m0i . The redundancy in the latter may permit cryptanalysis The one-time pad can be shown to be theoretically unbreakable. That is, if a cryptanalyst has a ciphertext string c1 c2 · · · ct encrypted using a random key string which has been used only once, the cryptanalyst can do no better than guess at the plaintext being any binary string of length t (i.e, t-bit binary strings are equally likely as plaintext) It has been proven that to realize an unbreakable system requires a random key of the same length as the message. This reduces the practicality of the system in all but a few specialized situations Reportedly until very recently the communication line between Moscow and Washington was secured by a one-time pad. Transport of the key was done by trusted courier 1.55 The key space The size of the key space is the number of encryption/decryption key pairs that are available in the cipher system. A key is typically a compact way to specify the

encryption transformation (from the set of all encryption transformations) to be used For example, a transposition cipher of block length t has t! encryption functions from which to select Each can be simply described by a permutation which is called the key. It is a great temptation to relate the security of the encryption scheme to the size of the key space. The following statement is important to remember 1.40 Fact A necessary, but usually not sufficient, condition for an encryption scheme to be secure is that the key space be large enough to preclude exhaustive search For instance, the simple substitution cipher in Example 1.25 has a key space of size 26! ≈ 4 × 1026 . The polyalphabetic substitution cipher of Example 131 has a key space of size (26!)3 ≈ 7 × 1079 . Exhaustive search of either key space is completely infeasible, yet both ciphers are relatively weak and provide little security. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 22

Ch. 1 Overview of Cryptography 1.6 Digital signatures A cryptographic primitive which is fundamental in authentication, authorization, and nonrepudiation is the digital signature. The purpose of a digital signature is to provide a means for an entity to bind its identity to a piece of information. The process of signing entails transforming the message and some secret information held by the entity into a tag called a signature. A generic description follows Nomenclature and set-up • M is the set of messages which can be signed. • S is a set of elements called signatures, possibly binary strings of a fixed length. • SA is a transformation from the message set M to the signature set S, and is called a signing transformation for entity A.3 The transformation SA is kept secret by A, and will be used to create signatures for messages from M. • VA is a transformation from the set M × S to the set {true, false}.4 VA is called a verification transformation for A’s signatures, is

publicly known, and is used by other entities to verify signatures created by A. 1.41 Definition The transformations SA and VA provide a digital signature scheme for A Occasionally the term digital signature mechanism is used 1.42 Example (digital signature scheme) M = {m1 , m2 , m3 } and S = {s1 , s2 , s3 } The left side of Figure 1.10 displays a signing function SA from the set M and, the right side, the corresponding verification function VA .  (m1 , s1 ) m1 s3 (m1 , s2 ) m2 s1 (m1 , s3 ) m3 s2 (m2 , s1 ) SA True (m2 , s2 ) False (m2 , s3 ) (m3 , s1 ) (m3 , s2 ) (m3 , s3 ) VA Figure 1.10: A signing and verification function for a digital signature scheme 3 The 4M names of Alice and Bob are usually abbreviated to A and B, respectively. × S consists of all pairs (m, s) where m ∈ M, s ∈ S, called the Cartesian product of M and S. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.6 Digital signatures 23 Signing procedure Entity A

(the signer) creates a signature for a message m ∈ M by doing the following: 1. Compute s = SA (m) 2. Transmit the pair (m, s) s is called the signature for message m Verification procedure To verify that a signature s on a message m was created by A, an entity B (the verifier) performs the following steps: 1. Obtain the verification function VA of A 2. Compute u = VA (m, s) 3. Accept the signature as having been created by A if u = true, and reject the signature if u = false. 1.43 Remark (concise representation) The transformations SA and VA are typically characterized more compactly by a key; that is, there is a class of signing and verification algorithms publicly known, and each algorithm is identified by a key. Thus the signing algorithm SA of A is determined by a key kA and A is only required to keep kA secret. Similarly, the verification algorithm VA of A is determined by a key lA which is made public. 1.44 Remark (handwritten signatures) Handwritten signatures could be

interpreted as a special class of digital signatures To see this, take the set of signatures S to contain only one element which is the handwritten signature of A, denoted by sA . The verification function simply checks if the signature on a message purportedly signed by A is sA . An undesirable feature in Remark 1.44 is that the signature is not message-dependent Hence, further constraints are imposed on digital signature mechanisms as next discussed. Properties required for signing and verification functions There are several properties which the signing and verification transformations must satisfy. (a) s is a valid signature of A on message m if and only if VA (m, s) = true. (b) It is computationally infeasible for any entity other than A to find, for any m ∈ M, an s ∈ S such that VA (m, s) = true. Figure 1.10 graphically displays property (a) There is an arrowed line in the diagram for VA from (mi , sj ) to true provided there is an arrowed line from mi to sj in the diagram

for SA . Property (b) provides the security for the method – the signature uniquely binds A to the message which is signed. No one has yet formally proved that digital signature schemes satisfying (b) exist (although existence is widely believed to be true); however, there are some very good candidates. §183 introduces a particular class of digital signatures which arise from publickey encryption techniques Chapter 11 describes a number of digital signature mechanisms which are believed to satisfy the two properties cited above. Although the description of a digital signature given in this section is quite general, it can be broadened further, as presented in §11.2 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 24 Ch. 1 Overview of Cryptography 1.7 Authentication and identification Authentication is a term which is used (and often abused) in a very broad sense. By itself it has little meaning other than to convey the idea that some means has

been provided to guarantee that entities are who they claim to be, or that information has not been manipulated by unauthorized parties. Authentication is specific to the security objective which one is trying to achieve. Examples of specific objectives include access control, entity authentication, message authentication, data integrity, non-repudiation, and key authentication These instances of authentication are dealt with at length in Chapters 9 through 13 For the purposes of this chapter, it suffices to give a brief introduction to authentication by describing several of the most obvious applications. Authentication is one of the most important of all information security objectives. Until the mid 1970s it was generally believed that secrecy and authentication were intrinsically connected. With the discovery of hash functions (§19) and digital signatures (§16), it was realized that secrecy and authentication were truly separate and independent information security objectives. It

may at first not seem important to separate the two but there are situations where it is not only useful but essential For example, if a two-party communication between Alice and Bob is to take place where Alice is in one country and Bob in another, the host countries might not permit secrecy on the channel; one or both countries might want the ability to monitor all communications. Alice and Bob, however, would like to be assured of the identity of each other, and of the integrity and origin of the information they send and receive. The preceding scenario illustrates several independent aspects of authentication. If Alice and Bob desire assurance of each other’s identity, there are two possibilities to consider 1. Alice and Bob could be communicating with no appreciable time delay That is, they are both active in the communication in “real time”. 2. Alice or Bob could be exchanging messages with some delay That is, messages might be routed through various networks, stored, and

forwarded at some later time. In the first instance Alice and Bob would want to verify identities in real time. This might be accomplished by Alice sending Bob some challenge, to which Bob is the only entity which can respond correctly. Bob could perform a similar action to identify Alice This type of authentication is commonly referred to as entity authentication or more simply identification. For the second possibility, it is not convenient to challenge and await response, and moreover the communication path may be only in one direction. Different techniques are now required to authenticate the originator of the message. This form of authentication is called data origin authentication. 1.71 Identification 1.45 Definition An identification or entity authentication technique assures one party (through acquisition of corroborative evidence) of both the identity of a second party involved, and that the second was active at the time the evidence was created or acquired. Typically the

only data transmitted is that necessary to identify the communicating parties. The entities are both active in the communication, giving a timeliness guarantee c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.8 Public-key cryptography 25 1.46 Example (identification) A calls B on the telephone If A and B know each other then entity authentication is provided through voice recognition. Although not foolproof, this works effectively in practice.  1.47 Example (identification) Person A provides to a banking machine a personal identification number (PIN) along with a magnetic stripe card containing information about A The banking machine uses the information on the card and the PIN to verify the identity of the card holder. If verification succeeds, A is given access to various services offered by the machine.  Example 1.46 is an instance of mutual authentication whereas Example 147 only provides unilateral authentication Numerous mechanisms and protocols

devised to provide mutual or unilateral authentication are discussed in Chapter 10. 1.72 Data origin authentication 1.48 Definition Data origin authentication or message authentication techniques provide to one party which receives a message assurance (through corroborative evidence) of the identity of the party which originated the message. Often a message is provided to B along with additional information so that B can determine the identity of the entity who originated the message. This form of authentication typically provides no guarantee of timeliness, but is useful in situations where one of the parties is not active in the communication. 1.49 Example (need for data origin authentication) A sends to B an electronic mail message (e-mail). The message may travel through various network communications systems and be stored for B to retrieve at some later time. A and B are usually not in direct communication B would like some means to verify that the message received and

purportedly created by A did indeed originate from A.  Data origin authentication implicitly provides data integrity since, if the message was modified during transmission, A would no longer be the originator. 1.8 Public-key cryptography The concept of public-key encryption is simple and elegant, but has far-reaching consequences. 1.81 Public-key encryption Let {Ee : e ∈ K} be a set of encryption transformations, and let {Dd : d ∈ K} be the set of corresponding decryption transformations, where K is the key space. Consider any pair of associated encryption/decryption transformations (Ee , Dd ) and suppose that each pair has the property that knowing Ee it is computationally infeasible, given a random ciphertext c ∈ C, to find the message m ∈ M such that Ee (m) = c. This property implies that given e it is infeasible to determine the corresponding decryption key d. (Of course e and d are Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 26 Ch.

1 Overview of Cryptography simply means to describe the encryption and decryption functions, respectively.) Ee is being viewed here as a trapdoor one-way function (Definition 116) with d being the trapdoor information necessary to compute the inverse function and hence allow decryption. This is unlike symmetric-key ciphers where e and d are essentially the same. Under these assumptions, consider the two-party communication between Alice and Bob illustrated in Figure 1.11 Bob selects the key pair (e, d) Bob sends the encryption key e (called the public key) to Alice over any channel but keeps the decryption key d (called the private key) secure and secret. Alice may subsequently send a message m to Bob by applying the encryption transformation determined by Bob’s public key to get c = Ee (m) Bob decrypts the ciphertext c by applying the inverse transformation Dd uniquely determined by d. Passive Adversary e UNSECURED CHANNEL key source d encryption Ee (m) = c m c UNSECURED

CHANNEL decryption Dd (c) = m m plaintext source destination Alice Bob Figure 1.11: Encryption using public-key techniques Notice how Figure 1.11 differs from Figure 17 for a symmetric-key cipher Here the encryption key is transmitted to Alice over an unsecured channel. This unsecured channel may be the same channel on which the ciphertext is being transmitted (but see §1.82) Since the encryption key e need not be kept secret, it may be made public. Any entity can subsequently send encrypted messages to Bob which only Bob can decrypt. Figure 112 illustrates this idea, where A1 , A2 , and A3 are distinct entities. Note that if A1 destroys message m1 after encrypting it to c1 , then even A1 cannot recover m1 from c1 . As a physical analogue, consider a metal box with the lid secured by a combination lock. The combination is known only to Bob If the lock is left open and made publicly available then anyone can place a message inside and lock the lid. Only Bob can retrieve the

message. Even the entity which placed the message into the box is unable to retrieve it Public-key encryption, as described here, assumes that knowledge of the public key e does not allow computation of the private key d. In other words, this assumes the existence of trapdoor one-way functions (§1.31(iii)) 1.50 Definition Consider an encryption scheme consisting of the sets of encryption and decrypc 1997 by CRC Press, Inc See accompanying notice at front of chapter §1.8 Public-key cryptography 27 c1 A1 Ee (m1 ) = c1 e Dd (c1 ) = m1 c2 A2 Ee (m2 ) = c2 Dd (c2 ) = m2 e c3 A3 Ee (m3 ) = c3 Dd (c3 ) = m3 e Bob Figure 1.12: Schematic use of public-key encryption tion transformations {Ee : e ∈ K} and {Dd : d ∈ K}, respectively. The encryption method is said to be a public-key encryption scheme if for each associated encryption/decryption pair (e, d), one key e (the public key) is made publicly available, while the other d (the private key) is kept secret. For the

scheme to be secure, it must be computationally infeasible to compute d from e. 1.51 Remark (private key vs secret key) To avoid ambiguity, a common convention is to use the term private key in association with public-key cryptosystems, and secret key in association with symmetric-key cryptosystems. This may be motivated by the following line of thought: it takes two or more parties to share a secret, but a key is truly private only when one party alone knows it. There are many schemes known which are widely believed to be secure public-key encryption methods, but none have been mathematically proven to be secure independent of qualifying assumptions. This is not unlike the symmetric-key case where the only system which has been proven secure is the one-time pad (§1.54) 1.82 The necessity of authentication in public-key systems It would appear that public-key cryptography is an ideal system, not requiring a secure channel to pass the encryption key. This would imply that two entities

could communicate over an unsecured channel without ever having met to exchange keys. Unfortunately, this is not the case. Figure 113 illustrates how an active adversary can defeat the system (decrypt messages intended for a second entity) without breaking the encryption system. This is a type of impersonation and is an example of protocol failure (see §1.10) In this scenario the adversary impersonates entity B by sending entity A a public key e0 which A assumes (incorrectly) to be the public key of B. The adversary intercepts encrypted messages from A to B, decrypts with its own private key d0 , re-encrypts the message under B’s public key e, and sends it on to B. This highlights the necessity to authenticate public keys to achieve data origin authentication of the public keys themselves. A must be convinced that she is Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 28 Ch. 1 Overview of Cryptography encrypting under the legitimate public key of

B. Fortunately, public-key techniques also allow an elegant solution to this problem (see §1.11) Adversary key source encryption Ee (m) = c d0 e0 decryption Dd0 (c0 ) = m m e encryption Ee0 (m) = c0 m plaintext source c0 key source c d decryption Dd (c) = m m A destination B Figure 1.13: An impersonation attack on a two-party communication 1.83 Digital signatures from reversible public-key encryption This section considers a class of digital signature schemes which is based on public-key encryption systems of a particular type. Suppose Ee is a public-key encryption transformation with message space M and ciphertext space C. Suppose further that M = C If Dd is the decryption transformation corresponding to Ee then since Ee and Dd are both permutations, one has Dd (Ee (m)) = Ee (Dd (m)) = m, for all m ∈ M. A public-key encryption scheme of this type is called reversible.5 Note that it is essential that M = C for this to be a valid equality for all m ∈ M; otherwise, Dd

(m) will be meaningless for m 6∈ C. 5 There is a broader class of digital signatures which can be informally described as arising from irreversible cryptographic algorithms. These are described in §112 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.8 Public-key cryptography 29 Construction for a digital signature scheme 1. Let M be the message space for the signature scheme 2. Let C = M be the signature space S 3. Let (e, d) be a key pair for the public-key encryption scheme 4. Define the signing function SA to be Dd That is, the signature for a message m ∈ M is s = Dd (m). 5. Define the verification function VA by  true, if Ee (s) = m, VA (m, s) = false, otherwise. The signature scheme can be simplified further if A only signs messages having a special structure, and this structure is publicly known. Let M0 be a subset of M where elements of M0 have a well-defined special structure, such that M0 contains only a negligible fraction of messages

from the set For example, suppose that M consists of all binary strings of length 2t for some positive integer t. Let M0 be the subset of M consisting of all strings where the first t bits are replicated in the last t positions (e.g, 101101 would be in M0 for t = 3). If A only signs messages within the subset M0 , these are easily recognized by a verifier. Redefine the verification function VA as  true, if Ee (s) ∈ M0 , VA (s) = false, otherwise. Under this new scenario A only needs to transmit the signature s since the message m = Ee (s) can be recovered by applying the verification function. Such a scheme is called a digital signature scheme with message recovery. Figure 114 illustrates how this signature function is used. The feature of selecting messages of special structure is referred to as selecting messages with redundancy. Ee (s) m0 Accept if m0 ∈ M0 Verifier B e s key source d Dd (m) = s m message source M0 Signer A Figure 1.14: A digital signature scheme with

message recovery The modification presented above is more than a simplification; it is absolutely crucial if one hopes to meet the requirement of property (b) of signing and verification functions (see page 23). To see why this is the case, note that any entity B can select a random element s ∈ S as a signature and apply Ee to get u = Ee (s), since S = M and Ee is public Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 30 Ch. 1 Overview of Cryptography knowledge. B may then take the message m = u and the signature on m to be s and transmits (m, s) It is easy to check that s will verify as a signature created by A for m but in which A has had no part. In this case B has forged a signature of A This is an example of what is called existential forgery. (B has produced A’s signature on some message likely not of B’s choosing.) If M0 contains only a negligible fraction of messages from M, then the probability of some entity forging a signature of A

in this manner is negligibly small. 1.52 Remark (digital signatures vs confidentiality) Although digital signature schemes based on reversible public-key encryption are attractive, they require an encryption method as a primitive. There are situations where a digital signature mechanism is required but encryption is forbidden In such cases these digital signature schemes are inappropriate Digital signatures in practice For digital signatures to be useful in practice, concrete realizations of the preceding concepts should have certain additional properties. A digital signature must 1. be easy to compute by the signer (the signing function should be easy to apply); 2. be easy to verify by anyone (the verification function should be easy to apply); and 3. have an appropriate lifespan, ie, be computationally secure from forgery until the signature is no longer necessary for its original purpose. Resolution of disputes The purpose of a digital signature (or any signature method) is to

permit the resolution of disputes. For example, an entity A could at some point deny having signed a message or some other entity B could falsely claim that a signature on a message was produced by A. In order to overcome such problems a trusted third party (TTP) or judge is required. The TTP must be some entity which all parties involved agree upon in advance. If A denies that a message m held by B was signed by A, then B should be able to present the signature sA for m to the TTP along with m. The TTP rules in favor of B if VA (m, sA ) = true and in favor of A otherwise. B will accept the decision if B is confident that the TTP has the same verifying transformation VA as A does. A will accept the decision if A is confident that the TTP used VA and that SA has not been compromised. Therefore, fair resolution of disputes requires that the following criteria are met. Requirements for resolution of disputed signatures 1. SA and VA have properties (a) and (b) of page 23 2. The TTP has an

authentic copy of VA 3. The signing transformation SA has been kept secret and remains secure These properties are necessary but in practice it might not be possible to guarantee them. For example, the assumption that SA and VA have the desired characteristics given in property 1 might turn out to be false for a particular signature scheme. Another possibility is that A claims falsely that SA was compromised To overcome these problems requires an agreed method to validate the time period for which A will accept responsibility for the verification transformation. An analogue of this situation can be made with credit card revocation. The holder of a card is responsible until the holder notifies the card issuing company that the card has been lost or stolen. §1382 gives a more indepth discussion of these problems and possible solutions. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.8 Public-key cryptography 31 1.84 Symmetric-key vs public-key

cryptography Symmetric-key and public-key encryption schemes have various advantages and disadvantages, some of which are common to both. This section highlights a number of these and summarizes features pointed out in previous sections. (i) Advantages of symmetric-key cryptography 1. Symmetric-key ciphers can be designed to have high rates of data throughput Some hardware implementations achieve encrypt rates of hundreds of megabytes per second, while software implementations may attain throughput rates in the megabytes per second range. 2. Keys for symmetric-key ciphers are relatively short 3. Symmetric-key ciphers can be employed as primitives to construct various cryptographic mechanisms including pseudorandom number generators (see Chapter 5), hash functions (see Chapter 9), and computationally efficient digital signature schemes (see Chapter 11), to name just a few. 4. Symmetric-key ciphers can be composed to produce stronger ciphers Simple transformations which are easy to

analyze, but on their own weak, can be used to construct strong product ciphers. 5. Symmetric-key encryption is perceived to have an extensive history, although it must be acknowledged that, notwithstanding the invention of rotor machines earlier, much of the knowledge in this area has been acquired subsequent to the invention of the digital computer, and, in particular, the design of the Data Encryption Standard (see Chapter 7) in the early 1970s. (ii) Disadvantages of symmetric-key cryptography 1. In a two-party communication, the key must remain secret at both ends 2. In a large network, there are many key pairs to be managed Consequently, effective key management requires the use of an unconditionally trusted TTP (Definition 1.65) 3. In a two-party communication between entities A and B, sound cryptographic practice dictates that the key be changed frequently, and perhaps for each communication session. 4. Digital signature mechanisms arising from symmetric-key encryption typically

require either large keys for the public verification function or the use of a TTP (see Chapter 11). (iii) Advantages of public-key cryptography 1. Only the private key must be kept secret (authenticity of public keys must, however, be guaranteed). 2. The administration of keys on a network requires the presence of only a functionally trusted TTP (Definition 1.66) as opposed to an unconditionally trusted TTP Depending on the mode of usage, the TTP might only be required in an “off-line” manner, as opposed to in real time. 3. Depending on the mode of usage, a private key/public key pair may remain unchanged for considerable periods of time, eg, many sessions (even several years) 4. Many public-key schemes yield relatively efficient digital signature mechanisms The key used to describe the public verification function is typically much smaller than for the symmetric-key counterpart. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 32 Ch. 1 Overview

of Cryptography 5. In a large network, the number of keys necessary may be considerably smaller than in the symmetric-key scenario. (iv) Disadvantages of public-key encryption 1. Throughput rates for the most popular public-key encryption methods are several orders of magnitude slower than the best known symmetric-key schemes 2. Key sizes are typically much larger than those required for symmetric-key encryption (see Remark 1.53), and the size of public-key signatures is larger than that of tags providing data origin authentication from symmetric-key techniques. 3. No public-key scheme has been proven to be secure (the same can be said for block ciphers). The most effective public-key encryption schemes found to date have their security based on the presumed difficulty of a small set of number-theoretic problems. 4. Public-key cryptography does not have as extensive a history as symmetric-key encryption, being discovered only in the mid 1970s6 Summary of comparison Symmetric-key and

public-key encryption have a number of complementary advantages. Current cryptographic systems exploit the strengths of each. An example will serve to illustrate Public-key encryption techniques may be used to establish a key for a symmetric-key system being used by communicating entities A and B. In this scenario A and B can take advantage of the long term nature of the public/private keys of the public-key scheme and the performance efficiencies of the symmetric-key scheme. Since data encryption is frequently the most time consuming part of the encryption process, the public-key scheme for key establishment is a small fraction of the total encryption process between A and B. To date, the computational performance of public-key encryption is inferior to that of symmetric-key encryption. There is, however, no proof that this must be the case The important points in practice are: 1. public-key cryptography facilitates efficient signatures (particularly non-repudiation) and key

mangement; and 2. symmetric-key cryptography is efficient for encryption and some data integrity applications 1.53 Remark (key sizes: symmetric key vs private key) Private keys in public-key systems must be larger (e.g, 1024 bits for RSA) than secret keys in symmetric-key systems (eg, 64 or 128 bits) because whereas (for secure algorithms) the most efficient attack on symmetrickey systems is an exhaustive key search, all known public-key systems are subject to “shortcut” attacks (e.g, factoring) more efficient than exhaustive search Consequently, for equivalent security, symmetric keys have bitlengths considerably smaller than that of private keys in public-key systems, e.g, by a factor of 10 or more 6 It is, of course, arguable that some public-key schemes which are based on hard mathematical problems have a long history since these problems have been studied for many years. Although this may be true, one must be wary that the mathematics was not studied with this application in

mind. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.9 Hash functions 33 1.9 Hash functions One of the fundamental primitives in modern cryptography is the cryptographic hash function, often informally called a one-way hash function. A simplified definition for the present discussion follows. 1.54 Definition A hash function is a computationally efficient function mapping binary strings of arbitrary length to binary strings of some fixed length, called hash-values. For a hash function which outputs n-bit hash-values (e.g, n = 128 or 160) and has desirable properties, the probability that a randomly chosen string gets mapped to a particular n-bit hash-value (image) is 2−n . The basic idea is that a hash-value serves as a compact representative of an input string. To be of cryptographic use, a hash function h is typically chosen such that it is computationally infeasible to find two distinct inputs which hash to a common value (i.e, two colliding

inputs x and y such that h(x) = h(y)), and that given a specific hash-value y, it is computationally infeasible to find an input (pre-image) x such that h(x) = y. The most common cryptographic uses of hash functions are with digital signatures and for data integrity. With digital signatures, a long message is usually hashed (using a publicly available hash function) and only the hash-value is signed The party receiving the message then hashes the received message, and verifies that the received signature is correct for this hash-value. This saves both time and space compared to signing the message directly, which would typically involve splitting the message into appropriate-sized blocks and signing each block individually. Note here that the inability to find two messages with the same hash-value is a security requirement, since otherwise, the signature on one message hash-value would be the same as that on another, allowing a signer to sign one message and at a later point in time

claim to have signed another. Hash functions may be used for data integrity as follows. The hash-value corresponding to a particular input is computed at some point in time The integrity of this hash-value is protected in some manner. At a subsequent point in time, to verify that the input data has not been altered, the hash-value is recomputed using the input at hand, and compared for equality with the original hash-value. Specific applications include virus protection and software distribution. A third application of hash functions is their use in protocols involving a priori commitments, including some digital signature schemes and identification protocols (e.g, see Chapter 10). Hash functions as discussed above are typically publicly known and involve no secret keys. When used to detect whether the message input has been altered, they are called modification detection codes (MDCs) Related to these are hash functions which involve a secret key, and provide data origin authentication

(§9.76) as well as data integrity; these are called message authentication codes (MACs). 1.10 Protocols and mechanisms 1.55 Definition A cryptographic protocol (protocol) is a distributed algorithm defined by a sequence of steps precisely specifying the actions required of two or more entities to achieve a specific security objective. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 34 Ch. 1 Overview of Cryptography 1.56 Remark (protocol vs mechanism) As opposed to a protocol, a mechanism is a more general term encompassing protocols, algorithms (specifying the steps followed by a single entity), and non-cryptographic techniques (eg, hardware protection and procedural controls) to achieve specific security objectives. Protocols play a major role in cryptography and are essential in meeting cryptographic goals as discussed in §1.2 Encryption schemes, digital signatures, hash functions, and random number generation are among the primitives which may

be utilized to build a protocol 1.57 Example (a simple key agreement protocol) Alice and Bob have chosen a symmetric-key encryption scheme to use in communicating over an unsecured channel. To encrypt information they require a key The communication protocol is the following: 1. Bob constructs a public-key encryption scheme and sends his public key to Alice over the channel. 2. Alice generates a key for the symmetric-key encryption scheme 3. Alice encrypts the key using Bob’s public key and sends the encrypted key to Bob 4. Bob decrypts using his private key and recovers the symmetric (secret) key 5. Alice and Bob begin communicating with privacy by using the symmetric-key system and the common secret key This protocol uses basic functions to attempt to realize private communications on an unsecured channel. The basic primitives are the symmetric-key and the public-key encryption schemes. The protocol has shortcomings including the impersonation attack of §182, but it does convey

the idea of a protocol.  Often the role of public-key encryption in privacy communications is exactly the one suggested by this protocol – public-key encryption is used as a means to exchange keys for subsequent use in symmetric-key encryption, motivated by performance differences between symmetric-key and public-key encryption. Protocol and mechanism failure 1.58 Definition A protocol failure or mechanism failure occurs when a mechanism fails to meet the goals for which it was intended, in a manner whereby an adversary gains advantage not by breaking an underlying primitive such as an encryption algorithm directly, but by manipulating the protocol or mechanism itself. 1.59 Example (mechanism failure) Alice and Bob are communicating using a stream cipher Messages which they encrypt are known to have a special form: the first twenty bits carry information which represents a monetary amount. An active adversary can simply XOR an appropriate bitstring into the first twenty bits of

ciphertext and change the amount. While the adversary has not been able to read the underlying message, she has been able to alter the transmission. The encryption has not been compromised but the protocol has failed to perform adequately; the inherent assumption that encryption provides data integrity is incorrect.  1.60 Example (forward search attack) Suppose that in an electronic bank transaction the 32bit field which records the value of the transaction is to be encrypted using a public-key scheme. This simple protocol is intended to provide privacy of the value field – but does it? An adversary could easily take all 232 possible entries that could be plaintext in this field and encrypt them using the public encryption function. (Remember that by the very nature of public-key encryption this function must be available to the adversary.) By comparing c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.11 Key establishment, management, and certification

35 each of the 232 ciphertexts with the one which is actually encrypted in the transaction, the adversary can determine the plaintext. Here the public-key encryption function is not compromised, but rather the way it is used A closely related attack which applies directly to authentication for access control purposes is the dictionary attack (see §10.22)  1.61 Remark (causes of protocol failure) Protocols and mechanisms may fail for a number of reasons, including: 1. weaknesses in a particular cryptographic primitive which may be amplified by the protocol or mechanism; 2. claimed or assumed security guarantees which are overstated or not clearly understood; and 3. the oversight of some principle applicable to a broad class of primitives such as encryption Example 1.59 illustrates item 2 if the stream cipher is the one-time pad, and also item 1 Example 1.60 illustrates item 3 See also §182 1.62 Remark (protocol design) When designing cryptographic protocols and mechanisms, the

following two steps are essential: 1. identify all assumptions in the protocol or mechanism design; and 2. for each assumption, determine the effect on the security objective if that assumption is violated. 1.11 Key establishment, management, and certification This section gives a brief introduction to methodology for ensuring the secure distribution of keys for cryptographic purposes. 1.63 Definition Key establishment is any process whereby a shared secret key becomes available to two or more parties, for subsequent cryptographic use 1.64 Definition Key management is the set of processes and mechanisms which support key establishment and the maintenance of ongoing keying relationships between parties, including replacing older keys with new keys as necessary. Key establishment can be broadly subdivided into key agreement and key transport. Many and various protocols have been proposed to provide key establishment. Chapter 12 describes a number of these in detail. For the purpose of

this chapter only a brief overview of issues related to key management will be given. Simple architectures based on symmetrickey and public-key cryptography along with the concept of certification will be addressed As noted in §1.5, a major issue when using symmetric-key techniques is the establishment of pairwise secret keys This becomes more evident when considering a network of entities, any two of which may wish to communicate. Figure 115 illustrates a network consisting of 6 entities The arrowed edges indicate the 15 possible two-party communications which could take place. Since each pair of entities wish to communicate, this small network requires the secure exchange of 62 = 15 key pairs In a network with n entities, the  number of secure key exchanges required is n2 = n(n−1) . 2 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 36 Ch. 1 Overview of Cryptography A1 A2 A6 A3 A5 A4 Figure 1.15: Keying relationships in a simple 6-party

network The network diagram depicted in Figure 1.15 is simply the amalgamation of 15 twoparty communications as depicted in Figure 17 In practice, networks are very large and the key management problem is a crucial issue. There are a number of ways to handle this problem. Two simplistic methods are discussed; one based on symmetric-key and the other on public-key techniques. 1.111 Key management through symmetric-key techniques One solution which employs symmetric-key techniques involves an entity in the network which is trusted by all other entities. As in §183, this entity is referred to as a trusted third party (TTP). Each entity Ai shares a distinct symmetric key ki with the TTP These keys are assumed to have been distributed over a secured channel. If two entities subsequently wish to communicate, the TTP generates a key k (sometimes called a session key) and sends it encrypted under each of the fixed keys as depicted in Figure 1.16 for entities A1 and A5 A1 A2 k1 A6 k6 k2

Ek (k) 1 Ek (m) Ek (k) 5 A3 k key source k3 TTP k5 k4 A5 A4 Figure 1.16: Key management using a trusted third party (TTP) Advantages of this approach include: 1. It is easy to add and remove entities from the network 2. Each entity needs to store only one long-term secret key Disadvantages include: 1. All communications require initial interaction with the TTP 2. The TTP must store n long-term secret keys c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.11 Key establishment, management, and certification 37 3. The TTP has the ability to read all messages 4. If the TTP is compromised, all communications are insecure 1.112 Key management through public-key techniques There are a number of ways to address the key management problem through public-key techniques. Chapter 13 describes many of these in detail For the purpose of this chapter a very simple model is considered. Each entity in the network has a public/private encryption key pair. The

public key along with the identity of the entity is stored in a central repository called a public file. If an entity A1 wishes to send encrypted messages to entity A6 , A1 retrieves the public key e6 of A6 from the public file, encrypts the message using this key, and sends the ciphertext to A6 . Figure 117 depicts such a network A1 A2 private key d1 private key d2 c = Ee6 (m) Public file c e6 A 1 : e1 A 2 : e2 A3 private key d6 A 3 : e3 private key d3 m = Dd6 (c) A 4 : e4 A6 A 5 : e5 A 6 : e6 A5 A4 private key d5 private key d4 Figure 1.17: Key management using public-key techniques Advantages of this approach include: 1. No trusted third party is required 2. The public file could reside with each entity 3. Only n public keys need to be stored to allow secure communications between any pair of entities, assuming the only attack is that by a passive adversary. The key management problem becomes more difficult when one must take into account an adversary who is

active (i.e an adversary who can alter the public file containing public keys). Figure 118 illustrates how an active adversary could compromise the key management scheme given above (This is directly analogous to the attack in §182) In the figure, the adversary alters the public file by replacing the public key e6 of entity A6 by the adversary’s public key e∗ . Any message encrypted for A6 using the public key from the public file can be decrypted by only the adversary. Having decrypted and read the message, the Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 38 Ch. 1 Overview of Cryptography adversary can now encrypt it using the public key of A6 and forward the ciphertext to A6 . A1 however believes that only A6 can decrypt the ciphertext c. A1 c e∗ Ee∗ (m) = c Public file A 1 : e1 A 2 : e2 A 3 : e3 Dd∗ (c) = m Ee6 (m) = c 0 0 0 c private key d∗ Adversary Dd6 (c ) = m A 4 : e4 private key d6 A 5 : e5 A6 A 6 : e6 e∗

Figure 1.18: An impersonation of A6 by an active adversary with public key e∗ To prevent this type of attack, the entities may use a TTP to certify the public key of each entity. The TTP has a private signing algorithm ST and a verification algorithm VT (see §1.6) assumed to be known by all entities The TTP carefully verifies the identity of each entity, and signs a message consisting of an identifier and the entity’s authentic public key. This is a simple example of a certificate, binding the identity of an entity to its public key (see §1.113) Figure 119 illustrates the network under these conditions A1 uses the public key of A6 only if the certificate signature verifies successfully. A1 Public file verification VT (A6 ke6 , s6 ) c = Ee6 (m) e 6 , s6 A1 , e1 , ST (A1 ke1 ) = s1 A2 , e2 , ST (A2 ke2 ) = s2 A3 , e3 , ST (A3 ke3 ) = s3 Dd6 (c) = m private key d6 A4 , e4 , ST (A4 ke4 ) = s4 A5 , e5 , ST (A5 ke5 ) = s5 A6 , e6 , ST (A6 ke6 ) = s6 A6 Figure 1.19:

Authentication of public keys by a TTP k denotes concatenation Advantages of using a TTP to maintain the integrity of the public file include: 1. It prevents an active adversary from impersonation on the network 2. The TTP cannot monitor communications Entities need trust the TTP only to bind identities to public keys properly. 3. Per-communication interaction with the public file can be eliminated if entities store certificates locally. Even with a TTP, some concerns still remain: 1. If the signing key of the TTP is compromised, all communications become insecure 2. All trust is placed with one entity c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.12 Pseudorandom numbers and sequences 39 1.113 Trusted third parties and public-key certificates A trusted third party has been used in §1.83 and again here in §111 The trust placed on this entity varies with the way it is used, and hence motivates the following classification. 1.65 Definition A TTP is

said to be unconditionally trusted if it is trusted on all matters For example, it may have access to the secret and private keys of users, as well as be charged with the association of public keys to identifiers. 1.66 Definition A TTP is said to be functionally trusted if the entity is assumed to be honest and fair but it does not have access to the secret or private keys of users. §1.111 provides a scenario which employs an unconditionally trusted TTP §1112 uses a functionally trusted TTP to maintain the integrity of the public file. A functionally trusted TTP could be used to register or certify users and contents of documents or, as in §1.83, as a judge Public-key certificates The distribution of public keys is generally easier than that of symmetric keys, since secrecy is not required. However, the integrity (authenticity) of public keys is critical (recall §182) A public-key certificate consists of a data part and a signature part. The data part consists of the name of an

entity, the public key corresponding to that entity, possibly additional relevant information (e.g, the entity’s street or network address, a validity period for the public key, and various other attributes). The signature part consists of the signature of a TTP over the data part. In order for an entity B to verify the authenticity of the public key of an entity A, B must have an authentic copy of the public signature verification function of the TTP. For simplicity, assume that the authenticity of this verification function is provided to B by noncryptographic means, for example by B obtaining it from the TTP in person. B can then carry out the following steps: 1. Acquire the public-key certificate of A over some unsecured channel, either from a central database of certificates, from A directly, or otherwise. 2. Use the TTP’s verification function to verify the TTP’s signature on A’s certificate 3. If this signature verifies correctly, accept the public key in the certificate

as A’s authentic public key; otherwise, assume the public key is invalid Before creating a public-key certificate for A, the TTP must take appropriate measures to verify the identity of A and the fact that the public key to be certificated actually belongs to A. One method is to require that A appear before the TTP with a conventional passport as proof of identity, and obtain A’s public key from A in person along with evidence that A knows the corresponding private key. Once the TTP creates a certificate for a party, the trust that all other entities have in the authenticity of the TTP’s public key can be used transitively to gain trust in the authenticity of that party’s public key, through acquisition and verification of the certificate. 1.12 Pseudorandom numbers and sequences Random number generation is an important primitive in many cryptographic mechanisms. For example, keys for encryption transformations need to be generated in a manner which is Handbook of Applied

Cryptography by A. Menezes, P van Oorschot and S Vanstone 40 Ch. 1 Overview of Cryptography unpredictable to an adversary. Generating a random key typically involves the selection of random numbers or bit sequences. Random number generation presents challenging issues A brief introduction is given here with details left to Chapter 5. Often in cryptographic applications, one of the following steps must be performed: (i) From a finite set of n elements (e.g, {1, 2, , n}), select an element at random (ii) From the set of all sequences (strings) of length m over some finite alphabet A of n symbols, select a sequence at random. (iii) Generate a random sequence (string) of symbols of length m over a set of n symbols. It is not clear what exactly it means to select at random or generate at random. Calling a number random without a context makes little sense. Is the number 23 a random number? No, but if 49 identical balls labeled with a number from 1 to 49 are in a container, and this

container mixes the balls uniformly, drops one ball out, and this ball happens to be labeled with the number 23, then one would say that 23 was generated randomly from a uniform 1 distribution. The probability that 23 drops out is 1 in 49 or 49 . If the number on the ball which was dropped from the container is recorded and the ball is placed back in the container and the process repeated 6 times, then a random sequence of length 6 defined on the alphabet A = {1, 2, . , 49} will have been generated What is the chance that the sequence 17, 45, 1, 7, 23, 35 occurs? Since each element in the sequence 1 has probability 49 of occuring, the probability of the sequence 17, 45, 1, 7, 23, 35 occurring is 1 1 1 1 1 1 1 × × × × × = . 49 49 49 49 49 49 13841287201 There are precisely 13841287201 sequences of length 6 over the alphabet A. If each of these sequences is written on one of 13841287201 balls and they are placed in the container (first removing the original 49 balls) then the

chance that the sequence given above drops out is the same as if it were generated one ball at a time. Hence, (ii) and (iii) above are essentially the same statements. Finding good methods to generate random sequences is difficult. 1.67 Example (random sequence generator) To generate a random sequence of 0’s and 1’s, a coin could be tossed with a head landing up recorded as a 1 and a tail as a 0. It is assumed that the coin is unbiased, which means that the probability of a 1 on a given toss is exactly 12 . This will depend on how well the coin is made and how the toss is performed. This method would be of little value in a system where random sequences must be generated quickly and often. It has no practical value other than to serve as an example of the idea of random number generation.  1.68 Example (random sequence generator) A noise diode may be used to produce random binary sequences. This is reasonable if one has some way to be convinced that the probability that a 1 will

be produced on any given trial is 12 Should this assumption be false, the sequence generated would not have been selected from a uniform distribution and so not all sequences of a given length would be equally likely. The only way to get some feeling for the reliability of this type of random source is to carry out statistical tests on its output. These are considered in Chapter 5. If the diode is a source of a uniform distribution on the set of all binary sequences of a given length, it provides an effective way to generate random sequences.  Since most true sources of random sequences (if there is such a thing) come from physical means, they tend to be either costly or slow in their generation. To overcome these c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.13 Classes of attacks and security models 41 problems, methods have been devised to construct pseudorandom sequences in a deterministic manner from a shorter random sequence called a seed. The

pseudorandom sequences appear to be generated by a truly random source to anyone not knowing the method of generation. Often the generation algorithm is known to all, but the seed is unknown except by the entity generating the sequence. A plethora of algorithms has been developed to generate pseudorandom bit sequences of various types. Many of these are completely unsuitable for cryptographic purposes and one must be cautious of claims by creators of such algorithms as to the random nature of the output. 1.13 Classes of attacks and security models Over the years, many different types of attacks on cryptographic primitives and protocols have been identified. The discussion here limits consideration to attacks on encryption and protocols. Attacks on other cryptographic primitives will be given in appropriate chapters In §1.11 the roles of an active and a passive adversary were discussed The attacks these adversaries can mount may be classified as follows:. 1. A passive attack is one

where the adversary only monitors the communication channel A passive attacker only threatens confidentiality of data 2. An active attack is one where the adversary attempts to delete, add, or in some other way alter the transmission on the channel. An active attacker threatens data integrity and authentication as well as confidentiality. A passive attack can be further subdivided into more specialized attacks for deducing plaintext from ciphertext, as outlined in §1.131 1.131 Attacks on encryption schemes The objective of the following attacks is to systematically recover plaintext from ciphertext, or even more drastically, to deduce the decryption key. 1. A ciphertext-only attack is one where the adversary (or cryptanalyst) tries to deduce the decryption key or plaintext by only observing ciphertext. Any encryption scheme vulnerable to this type of attack is considered to be completely insecure. 2. A known-plaintext attack is one where the adversary has a quantity of plaintext and

corresponding ciphertext. This type of attack is typically only marginally more difficult to mount 3. A chosen-plaintext attack is one where the adversary chooses plaintext and is then given corresponding ciphertext. Subsequently, the adversary uses any information deduced in order to recover plaintext corresponding to previously unseen ciphertext. 4. An adaptive chosen-plaintext attack is a chosen-plaintext attack wherein the choice of plaintext may depend on the ciphertext received from previous requests. 5. A chosen-ciphertext attack is one where the adversary selects the ciphertext and is then given the corresponding plaintext. One way to mount such an attack is for the adversary to gain access to the equipment used for decryption (but not the decryption key, which may be securely embedded in the equipment). The objective is then to be able, without access to such equipment, to deduce the plaintext from (different) ciphertext. Handbook of Applied Cryptography by A. Menezes, P van

Oorschot and S Vanstone 42 Ch. 1 Overview of Cryptography 6. An adaptive chosen-ciphertext attack is a chosen-ciphertext attack where the choice of ciphertext may depend on the plaintext received from previous requests. Most of these attacks also apply to digital signature schemes and message authentication codes. In this case, the objective of the attacker is to forge messages or MACs, as discussed in Chapters 11 and 9, respectively. 1.132 Attacks on protocols The following is a partial list of attacks which might be mounted on various protocols. Until a protocol is proven to provide the service intended, the list of possible attacks can never be said to be complete. 1. known-key attack In this attack an adversary obtains some keys used previously and then uses this information to determine new keys. 2. replay In this attack an adversary records a communication session and replays the entire session, or a portion thereof, at some later point in time. 3. impersonation Here an

adversary assumes the identity of one of the legitimate parties in a network 4. dictionary This is usually an attack against passwords Typically, a password is stored in a computer file as the image of an unkeyed hash function. When a user logs on and enters a password, it is hashed and the image is compared to the stored value. An adversary can take a list of probable passwords, hash all entries in this list, and then compare this to the list of true encrypted passwords with the hope of finding matches. 5. forward search This attack is similar in spirit to the dictionary attack and is used to decrypt messages. An example of this method was cited in Example 160 6. interleaving attack This type of attack usually involves some form of impersonation in an authentication protocol (see §12.91) 1.133 Models for evaluating security The security of cryptographic primitives and protocols can be evaluated under several different models. The most practical security metrics are computational,

provable, and ad hoc methodology, although the latter is often dangerous. The confidence level in the amount of security provided by a primitive or protocol based on computational or ad hoc security increases with time and investigation of the scheme. However, time is not enough if few people have given the method careful analysis. (i) Unconditional security The most stringent measure is an information-theoretic measure – whether or not a system has unconditional security. An adversary is assumed to have unlimited computational resources, and the question is whether or not there is enough information available to defeat the system. Unconditional security for encryption systems is called perfect secrecy For perfect secrecy, the uncertainty in the plaintext, after observing the ciphertext, must be equal to the a priori uncertainty about the plaintext – observation of the ciphertext provides no information whatsoever to an adversary. A necessary condition for a symmetric-key

encryption scheme to be unconditionally secure is that the key be at least as long as the message. The one-time pad (§154) is an example of an unconditionally secure encryption algorithm In general, encryption schemes c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.13 Classes of attacks and security models 43 do not offer perfect secrecy, and each ciphertext character observed decreases the theoretical uncertainty in the plaintext and the encryption key. Public-key encryption schemes cannot be unconditionally secure since, given a ciphertext c, the plaintext can in principle be recovered by encrypting all possible plaintexts until c is obtained. (ii) Complexity-theoretic security An appropriate model of computation is defined and adversaries are modeled as having polynomial computational power. (They mount attacks involving time and space polynomial in the size of appropriate security parameters) A proof of security relative to the model is then

constructed. An objective is to design a cryptographic method based on the weakest assumptions possible anticipating a powerful adversary. Asymptotic analysis and usually also worst-case analysis is used and so care must be exercised to determine when proofs have practical significance. In contrast, polynomial attacks which are feasible under the model might, in practice, still be computationally infeasible. Security analysis of this type, although not of practical value in all cases, may nonetheless pave the way to a better overall understanding of security. Complexity-theoretic analysis is invaluable for formulating fundamental principles and confirming intuition This is like many other sciences, whose practical techniques are discovered early in the development, well before a theoretical basis and understanding is attained. (iii) Provable security A cryptographic method is said to be provably secure if the difficulty of defeating it can be shown to be essentially as difficult as

solving a well-known and supposedly difficult (typically number-theoretic) problem, such as integer factorization or the computation of discrete logarithms. Thus, “provable” here means provable subject to assumptions This approach is considered by some to be as good a practical analysis technique as exists. Provable security may be considered part of a special sub-class of the larger class of computational security considered next. (iv) Computational security This measures the amount of computational effort required, by the best currently-known methods, to defeat a system; it must be assumed here that the system has been well-studied to determine which attacks are relevant. A proposed technique is said to be computationally secure if the perceived level of computation required to defeat it (using the best attack known) exceeds, by a comfortable margin, the computational resources of the hypothesized adversary. Often methods in this class are related to hard problems but, unlike for

provable security, no proof of equivalence is known. Most of the best known public-key and symmetrickey schemes in current use are in this class This class is sometimes also called practical security. (v) Ad hoc security This approach consists of any variety of convincing arguments that every successful attack requires a resource level (e.g, time and space) greater than the fixed resources of a perceived adversary. Cryptographic primitives and protocols which survive such analysis are said to have heuristic security, with security here typically in the computational sense. Primitives and protocols are usually designed to counter standard attacks such as those given in §1.13 While perhaps the most commonly used approach (especially for protocols), it is, in some ways, the least satisfying. Claims of security generally remain questionable and unforeseen attacks remain a threat. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 44 Ch. 1 Overview of

Cryptography 1.134 Perspective for computational security To evaluate the security of cryptographic schemes, certain quantities are often considered. 1.69 Definition The work factor Wd is the minimum amount of work (measured in appropriate units such as elementary operations or clock cycles) required to compute the private key d given the public key e, or, in the case of symmetric-key schemes, to determine the secret key k. More specifically, one may consider the work required under a ciphertext-only attack given n ciphertexts, denoted Wd (n). If Wd is t years, then for sufficiently large t the cryptographic scheme is, for all practical purposes, a secure system. To date no public-key system has been found where one can prove a sufficiently large lower bound on the work factor Wd . The best that is possible to date is to rely on the following as a basis for security. 1.70 Definition The historical work factor Wd is the minimum amount of work required to compute the private key d from

the public key e using the best known algorithms at a given point in time. The historical work factor Wd varies with time as algorithms and technology improve. It corresponds to computational security, whereas Wd corresponds to the true security level, although this typically cannot be determined. How large is large? §1.4 described how the designer of an encryption system tries to create a scheme for which the best approach to breaking it is through exhaustive search of the key space. The key space must then be large enough to make an exhaustive search completely infeasible. An important question then is “How large is large?”. In order to gain some perspective on the magnitude of numbers, Table 1.2 lists various items along with an associated magnitude Reference Seconds in a year Age of our solar system (years) Seconds since creation of solar system Clock cycles per year, 50 MHz computer Binary strings of length 64 Binary strings of length 128 Binary strings of length 256 Number

of 75-digit prime numbers Electrons in the universe Magnitude ≈ 3 × 107 ≈ 6 × 109 ≈ 2 × 1017 ≈ 1.6 × 1015 264 ≈ 1.8 × 1019 2128 ≈ 3.4 × 1038 2256 ≈ 1.2 × 1077 ≈ 5.2 × 1072 ≈ 8.37 × 1077 Table 1.2: Reference numbers comparing relative magnitudes Some powers of 10 are referred to by prefixes. For example, high-speed modern computers are now being rated in terms of teraflops where a teraflop is 1012 floating point operations per second Table 13 provides a list of commonly used prefixes c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.14 Notes and further references Prefix exa peta tera giga mega kilo hecto deca Symbol E P T G M k h da 45 Magnitude 18 10 1015 1012 109 106 103 102 10 Prefix Symbol Magnitude deci centi milli micro nano pico femto atto d c m µ n p f a 10−1 10−2 10−3 10−6 10−9 10−12 10−15 10−18 Table 1.3: Prefixes used for various powers of 10 1.14 Notes and further references §1.1 Kahn

[648] gives a thorough, comprehensive, and non-technical history of cryptography, published in 1967. Feistel [387] provides an early exposition of block cipher ideas The original specification of DES is the 1977 U.S Federal Information Processing Standards Publication 46 [396]. Public-key cryptography was introduced by Diffie and Hellman [345]. The first concrete realization of a public-key encryption scheme was the knapsack scheme by Merkle and Hellman [857]. The RSA public-key encryption and signature scheme is due to Rivest, Shamir, and Adleman [1060], while the ElGamal public-key encryption and signature schemes are due to ElGamal [368] The two digital signature standards, ISO/IEC 9796 [596] and the Digital Signature Standard [406], are discussed extensively in Chapter 11. Cryptography has used specialized areas of mathematics such as number theory to realize very practical mechanisms such as public-key encryption and digital signatures. Such usage was not conceived as possible a

mere twenty years ago. The famous mathematician, Hardy [539], went as far as to boast about its lack of utility: “ . both Gauss and lesser mathematicians may be justified in rejoicing that there is one science at any rate, and that their own, whose very remoteness from ordinary human activities should keep it gentle and clean.” §1.2 This section was inspired by the foreword to the book Contemporary Cryptology, The Science of Information Integrity, edited by Simmons [1143]. The handwritten signature came into the British legal system in the seventeenth century as a means to provide various functions associated with information security. See Chapter 9 of Meyer and Matyas [859] for details. This book only considers cryptography as it applies to information in digital form. Chapter 9 of Beker and Piper [84] provides an introduction to the encryption of analogue signals, in particular, speech. Although in many cases physical means are employed to facilitate privacy, cryptography

plays the major role. Physical means of providing privacy include fiber optic communication links, spread spectrum technology, TEMPEST techniques, and Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 46 Ch. 1 Overview of Cryptography tamper-resistant hardware. Steganography is that branch of information privacy which attempts to obscure the existence of data through such devices as invisible inks, secret compartments, the use of subliminal channels, and the like Kahn [648] provides an historical account of various steganographic techniques. Excellent introductions to cryptography can be found in the articles by Diffie and Hellman [347], Massey [786], and Rivest [1054]. A concise and elegant way to describe cryptography was given by Rivest [1054]: Cryptography is about communication in the presence of adversaries. The taxonomy of cryptographic primitives (Figure 11) was derived from the classification given by Bosselaers, Govaerts, and Vandewalle

[175]. §1.3 The theory of functions is fundamental in modern mathematics. The term range is often used in place of image of a function. The latter, being more descriptive, is preferred An alternate term for one-to-one is injective; an alternate term for onto is surjective. One-way functions were introduced by Diffie and Hellman [345]. A more extensive history is given on page 377. Trapdoor one-way functions were first postulated by Diffie and Hellman [345] and independently by Merkle [850] as a means to obtain public-key encryption schemes; several candidates are given in Chapter 8. §1.4 The basic concepts of cryptography are treated quite differently by various authors, some being more technical than others. Brassard [192] provides a concise, lucid, and technically accurate account. Schneier [1094] gives a less technical but very accessible introduction Salomaa [1089], Stinson [1178], and Rivest [1054] present more mathematical approaches. Davies and Price [308] provide a very

readable presentation suitable for the practitioner. The comparison of an encryption scheme to a resettable combination lock is from Diffie and Hellman [347]. Kerckhoffs’ desiderata [668] were originally stated in French The translation stated here is given in Kahn [648]. Shannon [1121] also gives desiderata for encryption schemes. §1.5 Symmetric-key encryption has a very long history, as recorded by Kahn [648]. Most systems invented prior to the 1970s are now of historical interest only Chapter 2 of Denning [326] is also a good source for many of the more well known schemes such as the Caesar cipher, Vigenère and Beaufort ciphers, rotor machines (Enigma and Hagelin), running key ciphers, and so on; see also Davies and Price [308] and Konheim [705]. Beker and Piper [84] give an indepth treatment, including cryptanalysis of several of the classical systems used in World War II. Shannon’s paper [1121] is considered the seminal work on secure communications. It is also an excellent

source for descriptions of various well-known historical symmetric-key ciphers Simple substitution and transposition ciphers are the focus of §1.5 Hill ciphers [557], a class of substitution ciphers which substitute blocks using matrix methods, are covered in Example 7.52 The idea of confusion and diffusion (Remark 136) was introduced by Shannon [1121] Kahn [648] gives 1917 as the date when Vernam discovered the cipher which bears Vernam’s name, however, Vernam did not publish the result until 1926 [1222]; see page 274 for further discussion. Massey [786] states that reliable sources have suggested that the Moscow-Washington hot-line (channel for very high level communications) is no longer secured with a one-time pad, which has been replaced by a symmetric-key cipher requiring a much shorter key. This change would indicate that confidence and understanding in the c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §1.14 Notes and further references 47

ability to construct very strong symmetric-key encryption schemes exists. The one-time pad seems to have been used extensively by Russian agents operating in foreign countries. The highest ranking Russian agent ever captured in the United States was Rudolph Abel. When apprehended in 1957 he had in his possession a booklet the size of a postage stamp (1 87 × 78 × 78 inches) containing a one-time key; see Kahn [648, p.664] §1.6 The concept of a digital signature was introduced by Diffie and Hellman [345] and independently by Merkle [850]. The first practical realization of a digital signature scheme appeared in the paper by Rivest, Shamir, and Adleman [1060]. Rabin [1022] (see also [1023]) also claims to have independently discovered RSA but did not publish the result. Most introductory sources for digital signatures stress digital signatures with message recovery coming from a public-key encryption system. Mitchell, Piper, and Wild [882] give a good general treatment of the subject.

Stinson [1178] provides a similar elementary but general introduction. Chapter 11 generalizes the definition of a digital signature by allowing randomization. The scheme described in §18 is referred to as deterministic Many other types of digital signatures with specific properties have been created, such as blind signatures, undeniable signatures, and failstop signatures (see Chapter 11). §1.7 Much effort has been devoted to developing a theory of authentication. At the forefront of this is Simmons [1144], whose contributions are nicely summarized by Massey [786]. For a more concrete example of the necessity for authentication without secrecy, see the article by Simmons [1146]. §1.8 1976 marked a major turning point in the history of cryptography. In several papers that year, Diffie and Hellman introduced the idea of public-key cryptography and gave concrete examples of how such a scheme might be realized. The first paper on public-key cryptography was “Multiuser cryptographic

techniques” by Diffie and Hellman [344], presented at the National Computer Conference in June of 1976. Although the authors were not satisfied with the examples they cited, the concept was made clear In their landmark paper, Diffie and Hellman [345] provided a more comprehensive account of public-key cryptography and described the first viable method to realize this elegant concept. Another good source for the early history and development of the subject is Diffie [343]. Nechvatal [922] also provides a broad survey of public-key cryptography. Merkle [849, 850] independently discovered public-key cryptography, illustrating how this concept could be realized by giving an elegant and ingenious example now commonly referred to as the Merkle puzzle scheme. Simmons [1144, p412] notes the first reported application of public-key cryptography was fielded by Sandia National Laboratories (US) in 1978. §1.9 Much of the early work on cryptographic hash functions was done by Merkle [850]. The

most comprehensive current treatment of the subject is by Preneel [1004]. §1.10 A large number of successful cryptanalytic attacks on systems claiming security are due to protocol failure. An overview of this area is given by Moore [899], including classifications of protocol failures and design principles. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 48 Ch. 1 Overview of Cryptography §1.11 One approach to distributing public-keys is the so-called Merkle channel (see Simmons [1144, p.387]) Merkle proposed that public keys be distributed over so many independent public channels (newspaper, radio, television, etc.) that it would be improbable for an adversary to compromise all of them In 1979 Kohnfelder [702] suggested the idea of using public-key certificates to facilitate the distribution of public keys over unsecured channels, such that their authenticity can be verified. Essentially the same idea, but by on-line requests, was proposed by

Needham and Schroeder (ses Wilkes [1244]). A provably secure key agreement protocol has been proposed whose security is based on the Heisenberg uncertainty principle of quantum physics. The security of so-called quantum cryptography does not rely upon any complexity-theoretic assumptions. For further details on quantum cryptography, consult Chapter 6 of Brassard [192], and Bennett, Brassard, and Ekert [115]. §1.12 For an introduction and detailed treatment of many pseudorandom sequence generators, see Knuth [692]. Knuth cites an example of a complex scheme to generate random numbers which on closer analysis is shown to produce numbers which are far from random, and concludes: .random numbers should not be generated with a method chosen at random §1.13 The seminal work of Shannon [1121] on secure communications, published in 1949, remains as one of the best introductions to both practice and theory, clearly presenting many of the fundamental ideas including redundancy, entropy, and

unicity distance. Various models under which security may be examined are considered by Rueppel [1081], Simmons [1144], and Preneel [1003], among others; see also Goldwasser [476]. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter Chapter 2 Mathematical Background Contents in Brief 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Probability theory . Information theory . Complexity theory . Number theory . Abstract algebra . Finite fields . Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 56 57 63 75 80 85 This chapter is a collection of basic material on probability theory, information theory, complexity theory, number theory, abstract algebra, and finite fields

that will be used throughout this book. Further background and proofs of the facts presented here can be found in the references given in §2.7 The following standard notation will be used throughout: 1. Z denotes the set of integers; that is, the set { , −2, −1, 0, 1, 2, } 2. Q denotes the set of rational numbers; that is, the set { ab | a, b ∈ Z, b 6= 0} 3. R denotes the set of real numbers 4. π is the mathematical constant; π ≈ 314159 5. e is the base of the natural logarithm; e ≈ 271828 6. [a, b] denotes the integers x satisfying a ≤ x ≤ b 7. bxc is the largest integer less than or equal to x For example, b52c = 5 and b−5.2c = −6 8. dxe is the smallest integer greater than or equal to x For example, d52e = 6 and d−5.2e = −5 9. If A is a finite set, then |A| denotes the number of elements in A, called the cardinality of A. 10. a ∈ A means that element a is a member of the set A 11. A ⊆ B means that A is a subset of B 12. A ⊂ B means that A is a

proper subset of B; that is A ⊆ B and A 6= B 13. The intersection of sets A and B is the set A ∩ B = {x | x ∈ A and x ∈ B} 14. The union of sets A and B is the set A ∪ B = {x | x ∈ A or x ∈ B} 15. The difference of sets A and B is the set A − B = {x | x ∈ A and x 6∈ B} 16. The Cartesian product of sets A and B is the set A × B = {(a, b) | a ∈ A and b ∈ B}. For example, {a1 , a2 } × {b1 , b2 , b3 } = {(a1 , b1 ), (a1 , b2 ), (a1 , b3 ), (a2 , b1 ), (a2 , b2 ), (a2 , b3 )}. 49 50 Ch. 2 Mathematical Background 17. A function or mapping f : A − B is a rule which assigns to each element a in A precisely one element b in B. If a ∈ A is mapped to b ∈ B then b is called the image of a, a is called a preimage of b, and this is written f (a) = b. The set A is called the domain of f , and the set B is called the codomain of f . 18. A function f : A − B is 1 − 1 (one-to-one) or injective if each element in B is the image of at most one element in A. Hence

f (a1 ) = f (a2 ) implies a1 = a2 19. A function f : A − B is onto or surjective if each b ∈ B is the image of at least one a ∈ A. 20. A function f : A − B is a bijection if it is both one-to-one and onto If f is a bijection between finite sets A and B, then |A| = |B|. If f is a bijection between a set A and itself, then f is called a permutation on A. 21. ln x is the natural logarithm of x; that is, the logarithm of x to the base e 22. lg x is the logarithm of x to the base 2 23. exp(x) is the exponential function ex Pn 24. a denotes the sum a1 + a2 + · · · + an . Qni=1 i 25. i=1 ai denotes the product a1 · a2 · · · · · an 26. For a positive integer n, the factorial function is n! = n(n − 1)(n − 2) · · · 1 By convention, 0! = 1. 2.1 Probability theory 2.11 Basic definitions 2.1 Definition An experiment is a procedure that yields one of a given set of outcomes The individual possible outcomes are called simple events. The set of all possible outcomes is

called the sample space. This chapter only considers discrete sample spaces; that is, sample spaces with only finitely many possible outcomes. Let the simple events of a sample space S be labeled s1 , s2 , . , sn 2.2 Definition A probability distribution P on S is a sequence of numbers p1 , p2 , , pn that are all non-negative and sum to 1. The number pi is interpreted as the probability of si being the outcome of the experiment. 2.3 Definition An event E is a subset of the sample space S The probability that event E occurs, denoted P (E), is the sum of the probabilities pi of all simple events si which belong to E. If si ∈ S, P ({si }) is simply denoted by P (si ) 2.4 Definition If E is an event, the complementary event is the set of simple events not belonging to E, denoted E 2.5 Fact Let E ⊆ S be an event (i) 0 ≤ P (E) ≤ 1. Furthermore, P (S) = 1 and P (∅) = 0 (∅ is the empty set) (ii) P (E) = 1 − P (E). c 1997 by CRC Press, Inc. See accompanying notice at front

of chapter §2.1 Probability theory 51 (iii) If the outcomes in S are equally likely, then P (E) = |E| |S| . 2.6 Definition Two events E1 and E2 are called mutually exclusive if P (E1 ∩ E2 ) = 0 That is, the occurrence of one of the two events excludes the possibility that the other occurs. 2.7 Fact Let E1 and E2 be two events (i) If E1 ⊆ E2 , then P (E1 ) ≤ P (E2 ). (ii) P (E1 ∪ E2 ) + P (E1 ∩ E2 ) = P (E1 ) + P (E2 ). Hence, if E1 and E2 are mutually exclusive, then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ). 2.12 Conditional probability 2.8 Definition Let E1 and E2 be two events with P (E2 ) > 0 The conditional probability of E1 given E2 , denoted P (E1 |E2 ), is P (E1 |E2 ) = P (E1 ∩ E2 ) . P (E2 ) P (E1 |E2 ) measures the probability of event E1 occurring, given that E2 has occurred. 2.9 Definition Events E1 and E2 are said to be independent if P (E1 ∩ E2 ) = P (E1 )P (E2 ) Observe that if E1 and E2 are independent, then P (E1 |E2 ) = P (E1 ) and P (E2 |E1 ) =

P (E2 ). That is, the occurrence of one event does not influence the likelihood of occurrence of the other. 2.10 Fact (Bayes’ theorem) If E1 and E2 are events with P (E2 ) > 0, then P (E1 |E2 ) = P (E1 )P (E2 |E1 ) . P (E2 ) 2.13 Random variables Let S be a sample space with probability distribution P . 2.11 Definition A random variable X is a function from the sample space S to the set of real numbers; to each simple event si ∈ S, X assigns a real number X(si ). Since S is assumed to be finite, X can only take on a finite number of values. 2.12 P Definition Let X be a random variable on S. The expected value or mean of X is E(X) = si ∈S X(si )P (si ). P 2.13 Fact Let X be a random variable on S Then E(X) = x∈R x · P (X = x) 2.14 Fact IfP X1 , X2 , . , XP m are random variables on S, and a1 , a2 , . , am are real numbers, m then E( m a X ) = i=1 i i i=1 ai E(Xi ). 2.15 Definition The variance of a random variable X of mean µ is a non-negative number defined by Var(X)

= E((X − µ)2 ). The standard deviation of X is the non-negative square root of Var(X). Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 52 Ch. 2 Mathematical Background If a random variable has small variance then large deviations from the mean are unlikely to be observed. This statement is made more precise below 2.16 Fact (Chebyshev’s inequality) Let X be a random variable with mean µ = E(X) and variance σ2 = Var(X). Then for any t > 0, P (|X − µ| ≥ t) ≤ σ2 . t2 2.14 Binomial distribution  2.17 Definition Let n and k be non-negative integers The binomial coefficient nk is the number of different ways of choosing k distinct objects from a set of n distinct objects, where the order of choice is not important. 2.18 Fact (properties of binomial coefficients) Let n and k be non-negative integers  n! (i) nk = k!(n−k)! .   n n (ii) k = n−k .    n n (iii) n+1 k+1 = k + k+1 . 2.19 Fact theorem) For any real numbers a, b, and

non-negative integer n, (a+b)n =  Pn (binomial n k n−k . k=0 k a b 2.20 Definition A Bernoulli trial is an experiment with exactly two possible outcomes, called success and failure. 2.21 Fact Suppose that the probability of success on a particular Bernoulli trial is p Then the probability of exactly k successes in a sequence of n such independent trials is   n k p (1 − p)n−k , for each 0 ≤ k ≤ n. (2.1) k 2.22 Definition The probability distribution (21) is called the binomial distribution 2.23 Fact The expected number of successes in a sequence of n independent Bernoulli trials, with probability p of success in each trial, is np. The variance of the number of successes is np(1 − p). 2.24 Fact (law of large numbers) Let X be the random variable denoting the fraction of successes in n independent Bernoulli trials, with probability p of success in each trial Then for any  > 0, P (|X − p| > ) − 0, as n − ∞. In other words, as n gets larger, the proportion of

successes should be close to p, the probability of success in each trial. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.1 Probability theory 53 2.15 Birthday problems 2.25 Definition (i) For positive integers m, n with m ≥ n, the number m(n) is defined as follows: m(n) = m(m − 1)(m − 2) · · · (m − n + 1). (ii) Let m, n be non-negative integers with m ≥ n. The Stirling number of the second  kind, denoted m n , is     n m 1 X n m = (−1)n−k k , n n! k k=0  with the exception that 00 = 1.  The symbol m n counts the number of ways of partitioning a set of m objects into n non-empty subsets. 2.26 Fact (classical occupancy problem) An urn has m balls numbered 1 to m Suppose that n balls are drawn from the urn one at a time, with replacement, and their numbers are listed. The probability that exactly t different balls have been drawn is   (t) n m P1 (m, n, t) = , 1 ≤ t ≤ n. t mn The birthday problem is a special case of the

classical occupancy problem. 2.27 Fact (birthday problem) An urn has m balls numbered 1 to m Suppose that n balls are drawn from the urn one at a time, with replacement, and their numbers are listed. (i) The probability of at least one coincidence (i.e, a ball drawn at least twice) is m(n) , 1 ≤ n ≤ m. (2.2) mn √ If n = O( m) (see Definition 2.55) and m − ∞, then      n(n − 1) n2 1 P2 (m, n) − 1 − exp − ≈ 1 − exp − +O √ . 2m 2m m p (ii) As m − ∞, the expected number of draws before a coincidence is πm 2 . P2 (m, n) = 1 − P1 (m, n, n) = 1 − The following explains why probability distribution (2.2) is referred to as the birthday surprise or birthday paradox. The probability that at least 2 people in a room of 23 people have the same birthday is P2 (365, 23) ≈ 0.507, which is surprisingly large The quantity P2 (365, n) also increases rapidly as n increases; for example, P2 (365, 30) ≈ 0.706 A different kind of problem is considered in Facts

2.28, 229, and 230 below Suppose that there are two urns, one containing m white balls numbered 1 to m, and the other containing m red balls numbered 1 to m. First, n1 balls are selected from the first urn and their numbers listed. Then n2 balls are selected from the second urn and their numbers listed Finally, the number of coincidences between the two lists is counted. 2.28 Fact (model A) If the balls from both urns are drawn one at a time, with replacement, then the probability of at least one coincidence is    X 1 n2 (t1 +t2 ) n1 P3 (m, n1 , n2 ) = 1 − n1 +n2 m , m t t2 1 t ,t 1 2 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 54 Ch. 2 Mathematical Background √ where the summation is over all 0 ≤ t1 ≤ n1 , 0 ≤ t2 ≤ n2 . If n = n1 = n2 , n = O( m) and m − ∞, then     2  n n2 1 1+O √ ≈ 1 − exp − . P3 (m, n1 , n2 ) − 1 − exp − m m m 2.29 Fact (model B) If the balls from both urns are drawn without

replacement, then the probability of at least one coincidence is m(n1 +n2 ) P4 (m, n1 , n2 ) = 1 − (n ) (n ) . m 1 m 2 √ √ If n1 = O( m), n2 = O( m), and m − ∞, then     n1 n2 n1 + n2 − 1 1 P4 (m, n1 , n2 ) − 1 − exp − 1+ +O . m 2m m 2.30 Fact (model C) If the n1 white balls are drawn one at a time, with replacement, and the n2 red balls are drawn without replacement, then the probability of at least one coincidence is  n2 n1 P5 (m, n1 , n2 ) = 1 − 1 − . m √ √ If n1 = O( m), n2 = O( m), and m − ∞, then      n n  n1 n2 1 1 2 P5 (m, n1 , n2 ) − 1 − exp − 1+O √ ≈ 1 − exp − . m m m 2.16 Random mappings 2.31 Definition Let Fn denote the collection of all functions (mappings) from a finite domain of size n to a finite codomain of size n. Models where random elements of Fn are considered are called random mappings models. In this section the only random mappings model considered is where every function from Fn is equally likely to

be chosen; such models arise frequently in cryptography and algorithmic number theory. Note that |Fn | = nn , whence the probability that a particular function from Fn is chosen is 1/nn . 2.32 Definition Let f be a function in Fn with domain and codomain equal to {1, 2, , n} The functional graph of f is a directed graph whose points (or vertices) are the elements {1, 2, . , n} and whose edges are the ordered pairs (x, f (x)) for all x ∈ {1, 2, , n} 2.33 Example (functional graph) Consider the function f : {1, 2, , 13} − {1, 2, , 13} defined by f (1) = 4, f (2) = 11, f (3) = 1, f (4) = 6, f (5) = 3, f (6) = 9, f (7) = 3, f (8) = 11, f (9) = 1, f (10) = 2, f (11) = 10, f (12) = 4, f (13) = 7. The functional graph of f is shown in Figure 2.1  As Figure 2.1 illustrates, a functional graph may have several components (maximal connected subgraphs), each component consisting of a directed cycle and some directed trees attached to the cycle. 2.34 Fact As n tends to infinity,

the following statements regarding the functional digraph of a random function f from Fn are true: (i) The expected number of components is 12 ln n. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.1 Probability theory 55 13 12 4 6 7 3 1 9 10 8 11 2 5 Figure 2.1: A functional graph (see Example 233) p (ii) The expected number of points which are on the cycles is πn/2. (iii) The expected number of terminal points (points which have no preimages) is n/e. (iv) The expected number of k-th iterate image points (x is a k-th iterate image point if x = f (f (· · · f (y) · · · )) for some y) is (1 − τk )n, where the τk satisfy the recurrence | {z } k times τ0 = 0, τk+1 = e−1+τk for k ≥ 0. 2.35 Definition Let f be a random function from {1, 2, , n} to {1, 2, , n} and let u ∈ {1, 2, . , n} Consider the sequence of points u0 , u1 , u2 , defined by u0 = u, ui = f (ui−1 ) for i ≥ 1. In terms of the functional graph of f ,

this sequence describes a path that connects to a cycle. (i) The number of edges in the path is called the tail length of u, denoted λ(u). (ii) The number of edges in the cycle is called the cycle length of u, denoted µ(u). (iii) The rho-length of u is the quantity ρ(u) = λ(u) + µ(u). (iv) The tree size of u is the number of edges in the maximal tree rooted on a cycle in the component that contains u. (v) The component size of u is the number of edges in the component that contains u. (vi) The predecessors size of u is the number of iterated preimages of u. 2.36 Example The functional graph in Figure 21 has 2 components and 4 terminal points The point u = 3 has parameters λ(u) = 1, µ(u) = 4, ρ(u) = 5. The tree, component, and predecessors sizes of u = 3 are 4, 9, and 3, respectively.  2.37 Fact As n tends to infinity, the following are the expectations of some parameters associated with a random point in p p{1, 2, . , n} and a random p function from Fn : (i) tail length:

πn/8 (ii) cycle length: πn/8 (iii) rho-length: πn/2 (iv) tree size: n/3 (v) compop nent size: 2n/3 (vi) predecessors size: πn/8. 2.38 Fact As n tends to infinity, the expectations tail, cycle, and rho lengths in √ √ of the maximum √ a random function from Fn are c1 n, c2 n, and c3 n, respectively, where c1 ≈ 0.78248, c2 ≈ 1.73746, and c3 ≈ 24149 Facts 2.37 and 238 indicate that in the functional graph of a random function, most points are grouped together in one giant component, and √ there is a small number of large trees. Also, almost unavoidably, a cycle of length about n arises after following a path of √ length n edges. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 56 Ch. 2 Mathematical Background 2.2 Information theory 2.21 Entropy Let X be a random variable which takes on a finite set of values x1 , x2 , . , xP n , with probn ability P (X = xi ) = pi , where 0 ≤ pi ≤ 1 for each i, 1 ≤ i ≤ n, and where i=1 pi =

1. Also, let Y and Z be random variables which take on finite sets of values. The entropy of X is a mathematical measure of the amount of information provided by an observation of X. Equivalently, it is the uncertainity about the outcome before an observation of X Entropy is also useful for approximating the average number of bits required to encode the elements of X. P 2.39 DefinitionThe entropy or uncertainty of X is defined to be H(X) = − ni=1 pi lg pi =   Pn 1 1 i=1 pi lg pi where, by convention, pi · lg pi = pi · lg pi = 0 if pi = 0. 2.40 Fact (properties of entropy) Let X be a random variable which takes on n values (i) 0 ≤ H(X) ≤ lg n. (ii) H(X) = 0 if and only if pi = 1 for some i, and pj = 0 for all j 6= i (that is, there is no uncertainty of the outcome). (iii) H(X) = lg n if and only if pi = 1/n for each i, 1 ≤ i ≤ n (that is, all outcomes are equally likely). 2.41 Definition The joint entropy of X and Y is defined to be X H(X, Y ) = − P (X = x, Y = y) lg(P

(X = x, Y = y)), x,y where the summation indices x and y range over all values of X and Y , respectively. The definition can be extended to any number of random variables. 2.42 Fact If X and Y are random variables, then H(X, Y ) ≤ H(X) + H(Y ), with equality if and only if X and Y are independent. 2.43 Definition If X, Y are random variables, the conditional entropy of X given Y = y is X H(X|Y = y) = − P (X = x|Y = y) lg(P (X = x|Y = y)), x where the summation index x ranges over all values of X. The conditional entropy of X given Y , also called the equivocation of Y about X, is X H(X|Y ) = P (Y = y)H(X|Y = y), y where the summation index y ranges over all values of Y . 2.44 Fact (properties of conditional entropy) Let X and Y be random variables (i) The quantity H(X|Y ) measures the amount of uncertainty remaining about X after Y has been observed. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.3 Complexity theory 57 (ii) H(X|Y ) ≥ 0 and

H(X|X) = 0. (iii) H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ). (iv) H(X|Y ) ≤ H(X), with equality if and only if X and Y are independent. 2.22 Mutual information 2.45 Definition The mutual information or transinformation of random variables X and Y is I(X; Y ) = H(X) − H(X|Y ). Similarly, the transinformation of X and the pair Y , Z is defined to be I(X; Y, Z) = H(X) − H(X|Y, Z). 2.46 Fact (properties of mutual transinformation) (i) The quantity I(X; Y ) can be thought of as the amount of information that Y reveals about X. Similarly, the quantity I(X; Y, Z) can be thought of as the amount of information that Y and Z together reveal about X (ii) I(X; Y ) ≥ 0. (iii) I(X; Y ) = 0 if and only if X and Y are independent (that is, Y contributes no information about X). (iv) I(X; Y ) = I(Y ; X). 2.47 Definition The conditional transinformation of the pair X, Y given Z is defined to be IZ (X; Y ) = H(X|Z) − H(X|Y, Z). 2.48 Fact (properties of conditional transinformation) (i) The

quantity IZ (X; Y ) can be interpreted as the amount of information that Y provides about X, given that Z has already been observed. (ii) I(X; Y, Z) = I(X; Y ) + IY (X; Z). (iii) IZ (X; Y ) = IZ (Y ; X). 2.3 Complexity theory 2.31 Basic definitions The main goal of complexity theory is to provide mechanisms for classifying computational problems according to the resources needed to solve them. The classification should not depend on a particular computational model, but rather should measure the intrinsic difficulty of the problem. The resources measured may include time, storage space, random bits, number of processors, etc., but typically the main focus is time, and sometimes space 2.49 Definition An algorithm is a well-defined computational procedure that takes a variable input and halts with an output. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 58 Ch. 2 Mathematical Background Of course, the term “well-defined computational procedure”

is not mathematically precise. It can be made so by using formal computational models such as Turing machines, random-access machines, or boolean circuits. Rather than get involved with the technical intricacies of these models, it is simpler to think of an algorithm as a computer program written in some specific programming language for a specific computer that takes a variable input and halts with an output. It is usually of interest to find the most efficient (i.e, fastest) algorithm for solving a given computational problem. The time that an algorithm takes to halt depends on the “size” of the problem instance. Also, the unit of time used should be made precise, especially when comparing the performance of two algorithms. 2.50 Definition The size of the input is the total number of bits needed to represent the input in ordinary binary notation using an appropriate encoding scheme. Occasionally, the size of the input will be the number of items in the input. 2.51 Example (sizes

of some objects) (i) The number of bits in the binary representation of a positive integer n is 1 + blg nc bits. For simplicity, the size of n will be approximated by lg n (ii) If f is a polynomial of degree at most k, each coefficient being a non-negative integer at most n, then the size of f is (k + 1) lg n bits. (iii) If A is a matrix with r rows, s columns, and with non-negative integer entries each at most n, then the size of A is rs lg n bits.  2.52 Definition The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed Often a step is taken to mean a bit operation. For some algorithms it will be more convenient to take step to mean something else such as a comparison, a machine instruction, a machine clock cycle, a modular multiplication, etc. 2.53 Definition The worst-case running time of an algorithm is an upper bound on the running time for any input, expressed as a function of the input size. 2.54 Definition The

average-case running time of an algorithm is the average running time over all inputs of a fixed size, expressed as a function of the input size. 2.32 Asymptotic notation It is often difficult to derive the exact running time of an algorithm. In such situations one is forced to settle for approximations of the running time, and usually may only derive the asymptotic running time. That is, one studies how the running time of the algorithm increases as the size of the input increases without bound In what follows, the only functions considered are those which are defined on the positive integers and take on real values that are always positive from some point onwards. Let f and g be two such functions. 2.55 Definition (order notation) (i) (asymptotic upper bound) f (n) = O(g(n)) if there exists a positive constant c and a positive integer n0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.3 Complexity

theory 59 (ii) (asymptotic lower bound) f (n) = Ω(g(n)) if there exists a positive constant c and a positive integer n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 . (iii) (asymptotic tight bound) f (n) = Θ(g(n)) if there exist positive constants c1 and c2 , and a positive integer n0 such that c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 . (iv) (o-notation) f (n) = o(g(n)) if for any positive constant c > 0 there exists a constant n0 > 0 such that 0 ≤ f (n) < cg(n) for all n ≥ n0 . Intuitively, f (n) = O(g(n)) means that f grows no faster asymptotically than g(n) to within a constant multiple, while f (n) = Ω(g(n)) means that f (n) grows at least as fast asymptotically as g(n) to within a constant multiple. f (n) = o(g(n)) means that g(n) is an upper bound for f (n) that is not asymptotically tight, or in other words, the function f (n) becomes insignificant relative to g(n) as n gets larger. The expression o(1) is often used to signify a function f (n) whose

limit as n approaches ∞ is 0. 2.56 Fact (properties of order notation) For any functions f (n), g(n), h(n), and l(n), the following are true (i) f (n) = O(g(n)) if and only if g(n) = Ω(f (n)). (ii) f (n) = Θ(g(n)) if and only if f (n) = O(g(n)) and f (n) = Ω(g(n)). (iii) If f (n) = O(h(n)) and g(n) = O(h(n)), then (f + g)(n) = O(h(n)). (iv) If f (n) = O(h(n)) and g(n) = O(l(n)), then (f · g)(n) = O(h(n)l(n)). (v) (reflexivity) f (n) = O(f (n)). (vi) (transitivity) If f (n) = O(g(n)) and g(n) = O(h(n)), then f (n) = O(h(n)). 2.57 Fact (approximations of some commonly occurring functions) (i) (polynomial function) If f (n) is a polynomial of degree k with positive leading term, then f (n) = Θ(nk ). (ii) For any constant c > 0, logc n = Θ(lg n). (iii) (Stirling’s formula) For all integers n ≥ 1,  n n  n n+(1/(12n)) √ √ 2πn ≤ n! ≤ 2πn . e e √ n  Thus n! = 2πn ne 1 + Θ( n1 ) . Also, n! = o(nn ) and n! = Ω(2n ) (iv) lg(n!) = Θ(n lg n). 2.58 Example

(comparative growth rates of some functions) Let  and c be arbitrary constants with 0 <  < 1 < c. The following functions are listed in increasing order of their asymptotic growth rates: √ n 1 < ln ln n < ln n < exp( ln n ln ln n) < n < nc < nln n < cn < nn < cc .  2.33 Complexity classes 2.59 Definition A polynomial-time algorithm is an algorithm whose worst-case running time function is of the form O(nk ), where n is the input size and k is a constant. Any algorithm whose running time cannot be so bounded is called an exponential-time algorithm. Roughly speaking, polynomial-time algorithms can be equated with good or efficient algorithms, while exponential-time algorithms are considered inefficient. There are, however, some practical situations when this distinction is not appropriate When considering polynomial-time complexity, the degree of the polynomial is significant. For example, even Handbook of Applied Cryptography by A. Menezes, P

van Oorschot and S Vanstone 60 Ch. 2 Mathematical Background though an algorithm with a running time of O(nln ln n ), n being the input size, is asymptotically slower that an algorithm with a running time of O(n100 ), the former algorithm may be faster in practice for smaller values of n, especially if the constants hidden by the big-O notation are smaller. Furthermore, in cryptography, average-case complexity is more important than worst-case complexity a necessary condition for an encryption scheme to be considered secure is that the corresponding cryptanalysis problem is difficult on average (or more precisely, almost always difficult), and not just for some isolated cases. 2.60 Definition A subexponential-time algorithm is an algorithm whose worst-case running time function is of the form eo(n) , where n is the input size. A subexponential-time algorithm is asymptotically faster than an algorithm whose running time is fully exponential in the input size, while it is

asymptotically slower than a polynomial-time algorithm. 2.61 Example (subexponential running time) Let A be an algorithm whose inputs are either elements of a finite field Fq (see §2.6), or an integer q If the expected running time of A is of the form  Lq [α, c] = O exp (c + o(1))(ln q)α (ln ln q)1−α , (2.3) where c is a positive constant, and α is a constant satisfying 0 < α < 1, then A is a subexponential-time algorithm. Observe that for α = 0, Lq [0, c] is a polynomial in ln q, while for α = 1, Lq [1, c] is a polynomial in q, and thus fully exponential in ln q.  For simplicity, the theory of computational complexity restricts its attention to decision problems, i.e, problems which have either YES or NO as an answer This is not too restrictive in practice, as all the computational problems that will be encountered here can be phrased as decision problems in such a way that an efficient algorithm for the decision problem yields an efficient algorithm for the

computational problem, and vice versa. 2.62 Definition The complexity class P is the set of all decision problems that are solvable in polynomial time. 2.63 Definition The complexity class NP is the set of all decision problems for which a YES answer can be verified in polynomial time given some extra information, called a certificate. 2.64 Definition The complexity class co-NP is the set of all decision problems for which a NO answer can be verified in polynomial time using an appropriate certificate. It must be emphasized that if a decision problem is in NP, it may not be the case that the certificate of a YES answer can be easily obtained; what is asserted is that such a certificate does exist, and, if known, can be used to efficiently verify the YES answer. The same is true of the NO answers for problems in co-NP. 2.65 Example (problem in NP) Consider the following decision problem: COMPOSITES INSTANCE: A positive integer n. QUESTION: Is n composite? That is, are there integers a,

b > 1 such that n = ab? COMPOSITES belongs to NP because if an integer n is composite, then this fact can be verified in polynomial time if one is given a divisor a of n, where 1 < a < n (the certificate in this case consists of the divisor a). It is in fact also the case that COMPOSITES belongs to co-NP. It is still unknown whether or not COMPOSITES belongs to P  c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.3 Complexity theory 61 2.66 Fact P ⊆ NP and P ⊆ co-NP The following are among the outstanding unresolved questions in the subject of complexity theory: 1. Is P = NP? 2. Is NP = co-NP? 3. Is P = NP ∩ co-NP? Most experts are of the opinion that the answer to each of the three questions is NO, although nothing along these lines has been proven. The notion of reducibility is useful when comparing the relative difficulties of problems. 2.67 Definition Let L1 and L2 be two decision problems L1 is said to polytime reduce to L2 , written

L1 ≤P L2 , if there is an algorithm that solves L1 which uses, as a subroutine, an algorithm for solving L2 , and which runs in polynomial time if the algorithm for L2 does. Informally, if L1 ≤P L2 , then L2 is at least as difficult as L1 , or, equivalently, L1 is no harder than L2 . 2.68 Definition Let L1 and L2 be two decision problems If L1 ≤P L2 and L2 ≤P L1 , then L1 and L2 are said to be computationally equivalent. 2.69 Fact Let L1 , L2 , and L3 be three decision problems (i) (transitivity) If L1 ≤P L2 and L2 ≤P L3 , then L1 ≤P L3 . (ii) If L1 ≤P L2 and L2 ∈ P, then L1 ∈ P. 2.70 Definition A decision problem L is said to be NP-complete if (i) L ∈ NP, and (ii) L1 ≤P L for every L1 ∈ NP. The class of all NP-complete problems is denoted by NPC. NP-complete problems are the hardest problems in NP in the sense that they are at least as difficult as every other problem in NP. There are thousands of problems drawn from diverse fields such as combinatorics,

number theory, and logic, that are known to be NPcomplete. 2.71 Example (subset sum problem) The subset sum problem is the following: given a set of positive integers {a1 , a2 , . , an } and a positive integer s, determine whether or not there is a subset of the ai that sum to s. The subset sum problem is NP-complete  2.72 Fact (i) (ii) (iii) Let L1 and L2 be two decision problems. If L1 is NP-complete and L1 ∈ P, then P = NP. If L1 ∈ NP, L2 is NP-complete, and L2 ≤P L1 , then L1 is also NP-complete. If L1 is NP-complete and L1 ∈ co-NP, then NP = co-NP. By Fact 2.72(i), if a polynomial-time algorithm is found for any single NP-complete problem, then it is the case that P = NP, a result that would be extremely surprising. Hence, a proof that a problem is NP-complete provides strong evidence for its intractability. Figure 22 illustrates what is widely believed to be the relationship between the complexity classes P, NP, co-NP, and NPC. Fact 2.72(ii) suggests the following

procedure for proving that a decision problem L1 is NP-complete: Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 62 Ch. 2 Mathematical Background NPC co-NP NP ∩ co-NP NP P Figure 2.2: Conjectured relationship between the complexity classes P, NP, co-NP, and NPC 1. Prove that L1 ∈ NP 2. Select a problem L2 that is known to be NP-complete 3. Prove that L2 ≤P L1 2.73 Definition A problem is NP-hard if there exists some NP-complete problem that polytime reduces to it. Note that the NP-hard classification is not restricted to only decision problems. Observe also that an NP-complete problem is also NP-hard 2.74 Example (NP-hard problem) Given positive integers a1 , a2 , , an and a positive integer s, the computational version of the subset sum problem would ask to actually find a subset of the ai which sums to s, provided that such a subset exists. This problem is NPhard  2.34 Randomized algorithms The algorithms studied so far in this

section have been deterministic; such algorithms follow the same execution path (sequence of operations) each time they execute with the same input. By contrast, a randomized algorithm makes random decisions at certain points in the execution; hence their execution paths may differ each time they are invoked with the same input. The random decisions are based upon the outcome of a random number generator Remarkably, there are many problems for which randomized algorithms are known that are more efficient, both in terms of time and space, than the best known deterministic algorithms. Randomized algorithms for decision problems can be classified according to the probability that they return the correct answer. 2.75 Definition Let A be a randomized algorithm for a decision problem L, and let I denote an arbitrary instance of L. (i) A has 0-sided error if P (A outputs YES | I’s answer is YES ) = 1, and P (A outputs YES | I’s answer is NO ) = 0. (ii) A has 1-sided error if P (A outputs

YES | I’s answer is YES ) ≥ 12 , and P (A outputs YES | I’s answer is NO ) = 0. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.4 Number theory 63 (iii) A has 2-sided error if P (A outputs YES | I’s answer is YES ) ≥ 23 , and P (A outputs YES | I’s answer is NO ) ≤ 13 . The number 12 in the definition of 1-sided error is somewhat arbitrary and can be replaced by any positive constant. Similarly, the numbers 23 and 13 in the definition of 2-sided error, can be replaced by 12 +  and 12 − , respectively, for any constant , 0 <  < 12 . 2.76 Definition The expected running time of a randomized algorithm is an upper bound on the expected running time for each input (the expectation being over all outputs of the random number generator used by the algorithm), expressed as a function of the input size. The important randomized complexity classes are defined next. 2.77 Definition (randomized complexity classes) (i) The complexity class

ZPP (“zero-sided probabilistic polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 0-sided error which runs in expected polynomial time. (ii) The complexity class RP (“randomized polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 1-sided error which runs in (worst-case) polynomial time. (iii) The complexity class BPP (“bounded error probabilistic polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 2-sided error which runs in (worst-case) polynomial time. 2.78 Fact P ⊆ ZPP ⊆ RP ⊆ BPP and RP ⊆ NP 2.4 Number theory 2.41 The integers The set of integers {. , −3, −2, −1, 0, 1, 2, 3, } is denoted by the symbol Z 2.79 Definition Let a, b be integers Then a divides b (equivalently: a is a divisor of b, or a is a factor of b) if there exists an integer c such that b = ac. If a divides b, then this is denoted by a|b.

2.80 Example (i) −3|18, since 18 = (−3)(−6) (ii) 173|0, since 0 = (173)(0) The following are some elementary properties of divisibility. 2.81 Fact (i) (ii) (iii) (iv) (properties of divisibility) For all a, b, c ∈ Z, the following are true: a|a. If a|b and b|c, then a|c. If a|b and a|c, then a|(bx + cy) for all x, y ∈ Z. If a|b and b|a, then a = ±b. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone  64 Ch. 2 Mathematical Background 2.82 Definition (division algorithm for integers) If a and b are integers with b ≥ 1, then ordinary long division of a by b yields integers q (the quotient) and r (the remainder) such that a = qb + r, where 0 ≤ r < b. Moreover, q and r are unique. The remainder of the division is denoted a mod b, and the quotient is denoted a div b. 2.83 Fact Let a, b ∈ Z with b 6= 0 Then a div b = ba/bc and a mod b = a − bba/bc 2.84 Example If a = 73, b = 17, then q = 4 and r = 5 Hence 73 mod 17 = 5 and 73 div 17 =

4.  2.85 Definition An integer c is a common divisor of a and b if c|a and c|b 2.86 Definition A non-negative integer d is the greatest common divisor of integers a and b, denoted d = gcd(a, b), if (i) d is a common divisor of a and b; and (ii) whenever c|a and c|b, then c|d. Equivalently, gcd(a, b) is the largest positive integer that divides both a and b, with the exception that gcd(0, 0) = 0. 2.87 Example The common divisors of 12 and 18 are {±1, ±2, ±3, ±6}, and gcd(12, 18) = 6  2.88 Definition A non-negative integer d is the least common multiple of integers a and b, denoted d = lcm(a, b), if (i) a|d and b|d; and (ii) whenever a|c and b|c, then d|c. Equivalently, lcm(a, b) is the smallest non-negative integer divisible by both a and b. 2.89 Fact If a and b are positive integers, then lcm(a, b) = a · b/ gcd(a, b) 2.90 Example Since gcd(12, 18) = 6, it follows that lcm(12, 18) = 12 · 18/6 = 36  2.91 Definition Two integers a and b are said to be relatively prime or

coprime if gcd(a, b) = 1 2.92 Definition An integer p ≥ 2 is said to be prime if its only positive divisors are 1 and p Otherwise, p is called composite. The following are some well known facts about prime numbers. 2.93 Fact If p is prime and p|ab, then either p|a or p|b (or both) 2.94 Fact There are an infinite number of prime numbers 2.95 Fact (prime number theorem) Let π(x) denote the number of prime numbers ≤ x Then lim x∞ π(x) = 1. x/ ln x c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.4 Number theory 65 This means that for large values of x, π(x) is closely approximated by the expression x/ ln x. For instance, when x = 1010 , π(x) = 455, 052, 511, whereas bx/ ln xc = 434, 294, 481. A more explicit estimate for π(x) is given below 2.96 Fact Let π(x) denote the number of primes ≤ x Then for x ≥ 17 x π(x) > ln x and for x > 1 x π(x) < 1.25506 . ln x 2.97 Fact (fundamental theorem of arithmetic) Every integer n ≥ 2 has

a factorization as a product of prime powers: n = pe11 pe22 · · · pekk , where the pi are distinct primes, and the ei are positive integers. Furthermore, the factorization is unique up to rearrangement of factors 2.98 Fact If a = pe11 pe22 · · · pekk , b = pf11 pf22 · · · pfkk , where each ei ≥ 0 and fi ≥ 0, then min(e1 ,f1 ) min(e2 ,f2 ) p2 · · · pk max(e1 ,f1 ) max(e2 ,f2 ) p2 · · · pk gcd(a, b) = p1 min(ek ,fk ) and lcm(a, b) = p1 max(ek ,fk ) . 2.99 Example Let a = 4864 = 28 · 19, b = 3458 = 2 · 7 · 13 · 19 Then gcd(4864, 3458) = 2 · 19 = 38 and lcm(4864, 3458) = 28 · 7 · 13 · 19 = 442624.  2.100 Definition For n ≥ 1, let φ(n) denote the number of integers in the interval [1, n] which are relatively prime to n. The function φ is called the Euler phi function (or the Euler totient function). 2.101 Fact (properties of Euler phi function) (i) If p is a prime, then φ(p) = p − 1. (ii) The Euler phi function is multiplicative. That is, if

gcd(m, n) = 1, then φ(mn) = φ(m) · φ(n). (iii) If n = pe11 pe22 · · · pekk is the prime factorization of n, then      1 1 1 φ(n) = n 1 − 1− ··· 1 − . p1 p2 pk Fact 2.102 gives an explicit lower bound for φ(n) 2.102 Fact For all integers n ≥ 5, φ(n) > n . 6 ln ln n Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 66 Ch. 2 Mathematical Background 2.42 Algorithms in Z Let a and b be non-negative integers, each less than or equal to n. Recall (Example 251) that the number of bits in the binary representation of n is blg nc + 1, and this number is approximated by lg n. The number of bit operations for the four basic integer operations of addition, subtraction, multiplication, and division using the classical algorithms is summarized in Table 2.1 These algorithms are studied in more detail in §142 More sophisticated techniques for multiplication and division have smaller complexities. Operation Addition Subtraction

Multiplication Division Bit complexity a+b a−b a·b a = qb + r O(lg a + lg b) = O(lg n) O(lg a + lg b) = O(lg n) O((lg a)(lg b)) = O((lg n)2 ) O((lg q)(lg b)) = O((lg n)2 ) Table 2.1: Bit complexity of basic operations in Z The greatest common divisor of two integers a and b can be computed via Fact 2.98 However, computing a gcd by first obtaining prime-power factorizations does not result in an efficient algorithm, as the problem of factoring integers appears to be relatively difficult. The Euclidean algorithm (Algorithm 2104) is an efficient algorithm for computing the greatest common divisor of two integers that does not require the factorization of the integers. It is based on the following simple fact 2.103 Fact If a and b are positive integers with a > b, then gcd(a, b) = gcd(b, a mod b) 2.104 Algorithm Euclidean algorithm for computing the greatest common divisor of two integers INPUT: two non-negative integers a and b with a ≥ b. OUTPUT: the greatest common divisor

of a and b. 1. While b 6= 0 do the following: 1.1 Set r←a mod b, a←b, b←r 2. Return(a) 2.105 Fact Algorithm 2104 has a running time of O((lg n)2 ) bit operations 2.106 Example (Euclidean algorithm) The following are the division steps of Algorithm 2104 for computing gcd(4864, 3458) = 38: 4864 3458 1406 646 114 76 = = = = = = 1 · 3458 + 1406 2 · 1406 + 646 2 · 646 + 114 5 · 114 + 76 1 · 76 + 38 2 · 38 + 0. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter  §2.4 Number theory 67 The Euclidean algorithm can be extended so that it not only yields the greatest common divisor d of two integers a and b, but also integers x and y satisfying ax + by = d. 2.107 Algorithm Extended Euclidean algorithm INPUT: two non-negative integers a and b with a ≥ b. OUTPUT: d = gcd(a, b) and integers x, y satisfying ax + by = d. 1. If b = 0 then set d←a, x←1, y←0, and return(d,x,y) 2. Set x2 ←1, x1 ←0, y2 ←0, y1 ←1 3. While b > 0 do the following:

3.1 q←ba/bc, r←a − qb, x←x2 − qx1 , y←y2 − qy1 3.2 a←b, b←r, x2 ←x1 , x1 ←x, y2 ←y1 , and y1 ←y 4. Set d←a, x←x2 , y←y2 , and return(d,x,y) 2.108 Fact Algorithm 2107 has a running time of O((lg n)2 ) bit operations 2.109 Example (extended Euclidean algorithm) Table 22 shows the steps of Algorithm 2107 with inputs a = 4864 and b = 3458. Hence gcd(4864, 3458) = 38 and (4864)(32) + (3458)(−45) = 38.  q − 1 2 2 5 1 2 r − 1406 646 114 76 38 0 x − 1 −2 5 −27 32 −91 y − −1 3 −7 38 −45 128 a 4864 3458 1406 646 114 76 38 b 3458 1406 646 114 76 38 0 x2 1 0 1 −2 5 −27 32 x1 0 1 −2 5 −27 32 −91 y2 0 1 −1 3 −7 38 −45 y1 1 −1 3 −7 38 −45 128 Table 2.2: Extended Euclidean algorithm (Algorithm 2107) with inputs a = 4864, b = 3458 Efficient algorithms for gcd and extended gcd computations are further studied in §14.4 2.43 The integers modulo n Let n be a positive integer. 2.110 Definition If a and b are integers,

then a is said to be congruent to b modulo n, written a ≡ b (mod n), if n divides (a−b). The integer n is called the modulus of the congruence 2.111 Example (i) 24 ≡ 9 (mod 5) since 24 − 9 = 3 · 5 (ii) −11 ≡ 17 (mod 7) since −11 − 17 = −4 · 7. 2.112 Fact (i) (ii) (iii)  (properties of congruences) For all a, a1 , b, b1 , c ∈ Z, the following are true. a ≡ b (mod n) if and only if a and b leave the same remainder when divided by n. (reflexivity) a ≡ a (mod n). (symmetry) If a ≡ b (mod n) then b ≡ a (mod n). Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 68 Ch. 2 Mathematical Background (iv) (transitivity) If a ≡ b (mod n) and b ≡ c (mod n), then a ≡ c (mod n). (v) If a ≡ a1 (mod n) and b ≡ b1 (mod n), then a + b ≡ a1 + b1 (mod n) and ab ≡ a1 b1 (mod n). The equivalence class of an integer a is the set of all integers congruent to a modulo n. From properties (ii), (iii), and (iv) above, it can be seen

that for a fixed n the relation of congruence modulo n partitions Z into equivalence classes. Now, if a = qn + r, where 0 ≤ r < n, then a ≡ r (mod n). Hence each integer a is congruent modulo n to a unique integer between 0 and n − 1, called the least residue of a modulo n. Thus a and r are in the same equivalence class, and so r may simply be used to represent this equivalence class. 2.113 Definition The integers modulo n, denoted Zn , is the set of (equivalence classes of) integers {0, 1, 2, , n − 1} Addition, subtraction, and multiplication in Zn are performed modulo n. 2.114 Example Z25 = {0, 1, 2, , 24} In Z25 , 13 + 16 = 4, since 13 + 16 = 29 ≡ 4 (mod 25). Similarly, 13 · 16 = 8 in Z25  2.115 Definition Let a ∈ Zn The multiplicative inverse of a modulo n is an integer x ∈ Zn such that ax ≡ 1 (mod n). If such an x exists, then it is unique, and a is said to be invertible, or a unit; the inverse of a is denoted by a−1 2.116 Definition Let a, b ∈ Zn

Division of a by b modulo n is the product of a and b−1 modulo n, and is only defined if b is invertible modulo n. 2.117 Fact Let a ∈ Zn Then a is invertible if and only if gcd(a, n) = 1 2.118 Example The invertible elements in Z9 are 1, 2, 4, 5, 7, and 8 For example, 4−1 = 7 because 4 · 7 ≡ 1 (mod 9).  The following is a generalization of Fact 2.117 2.119 Fact Let d = gcd(a, n) The congruence equation ax ≡ b (mod n) has a solution x if and only if d divides b, in which case there are exactly d solutions between 0 and n − 1; these solutions are all congruent modulo n/d. 2.120 Fact (Chinese remainder theorem, CRT) If the integers n1 , n2 , , nk are pairwise relatively prime, then the system of simultaneous congruences x ≡ a1 x ≡ a2 . . x ≡ ak (mod n1 ) (mod n2 ) (mod nk ) has a unique solution modulo n = n1 n2 · · · nk . 2.121 Algorithm (Gauss’s algorithm) The solution x to the simultaneous congruences in the P Chinese remainder theorem (Fact 2.120) may

be computed as x = ki=1 ai Ni Mi mod n, −1 where Ni = n/ni and Mi = Ni mod ni . These computations can be performed in O((lg n)2 ) bit operations. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.4 Number theory 69 Another efficient practical algorithm for solving simultaneous congruences in the Chinese remainder theorem is presented in §14.5 2.122 Example The pair of congruences x ≡ 3 (mod 7), x ≡ 7 (mod 13) has a unique solution x ≡ 59 (mod 91)  2.123 Fact If gcd(n1 , n2 ) = 1, then the pair of congruences x ≡ a (mod n1 ), x ≡ a (mod n2 ) has a unique solution x ≡ a (mod n1 n2 ). 2.124 Definition The multiplicative group of Zn is Z∗n = {a ∈ Zn | gcd(a, n) = 1} In particular, if n is a prime, then Z∗n = {a | 1 ≤ a ≤ n − 1}. 2.125 Definition The order of Z∗n is defined to be the number of elements in Z∗n , namely |Z∗n | It follows from the definition of the Euler phi function (Definition 2.100) that |Z∗n | = φ(n).

Note also that if a ∈ Z∗n and b ∈ Z∗n , then a · b ∈ Z∗n , and so Z∗n is closed under multiplication. 2.126 Fact Let n ≥ 2 be an integer (i) (Euler’s theorem) If a ∈ Z∗n , then aφ(n) ≡ 1 (mod n). (ii) If n is a product of distinct primes, and if r ≡ s (mod φ(n)), then ar ≡ as (mod n) for all integers a. In other words, when working modulo such an n, exponents can be reduced modulo φ(n). A special case of Euler’s theorem is Fermat’s (little) theorem. 2.127 Fact Let p be a prime (i) (Fermat’s theorem) If gcd(a, p) = 1, then ap−1 ≡ 1 (mod p). (ii) If r ≡ s (mod p − 1), then ar ≡ as (mod p) for all integers a. In other words, when working modulo a prime p, exponents can be reduced modulo p − 1. (iii) In particular, ap ≡ a (mod p) for all integers a. 2.128 Definition Let a ∈ Z∗n The order of a, denoted ord(a), is the least positive integer t such that at ≡ 1 (mod n). 2.129 Fact If the order of a ∈ Z∗n is t, and as ≡ 1 (mod n),

then t divides s In particular, t|φ(n). 2.130 Example Let n = 21 Then Z∗21 = {1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 19, 20} Note that φ(21) = φ(7)φ(3) = 12 = |Z∗21 |. The orders of elements in Z∗21 are listed in Table 23  a ∈ Z∗21 order of a 1 1 2 6 4 3 5 6 8 2 10 6 11 6 13 2 16 3 17 6 19 6 20 2 Table 2.3: Orders of elements in Z∗21 2.131 Definition Let α ∈ Z∗n If the order of α is φ(n), then α is said to be a generator or a primitive element of Z∗n . If Z∗n has a generator, then Z∗n is said to be cyclic Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 70 Ch. 2 Mathematical Background 2.132 Fact (properties of generators of Z∗n ) (i) Z∗n has a generator if and only if n = 2, 4, pk or 2pk , where p is an odd prime and k ≥ 1. In particular, if p is a prime, then Z∗p has a generator (ii) If α is a generator of Z∗n , then Z∗n = {αi mod n | 0 ≤ i ≤ φ(n) − 1}. (iii) Suppose that α is a

generator of Z∗n . Then b = αi mod n is also a generator of Z∗n if and only if gcd(i, φ(n)) = 1. It follows that if Z∗n is cyclic, then the number of generators is φ(φ(n)). (iv) α ∈ Z∗n is a generator of Z∗n if and only if αφ(n)/p 6≡ 1 (mod n) for each prime divisor p of φ(n). 2.133 Example Z∗21 is not cyclic since it does not contain an element of order φ(21) = 12 (see Table 2.3); note that 21 does not satisfy the condition of Fact 2132(i) On the other hand, Z∗25 is cyclic, and has a generator α = 2.  2.134 Definition Let a ∈ Z∗n a is said to be a quadratic residue modulo n, or a square modulo n, if there exists an x ∈ Z∗n such that x2 ≡ a (mod n). If no such x exists, then a is called a quadratic non-residue modulo n. The set of all quadratic residues modulo n is denoted by Qn and the set of all quadratic non-residues is denoted by Qn . Note that by definition 0 6∈ Z∗n , whence 0 6∈ Qn and 0 6∈ Qn . 2.135 Fact Let p be an odd prime and

let α be a generator of Z∗p Then a ∈ Z∗p is a quadratic residue modulo p if and only if a = αi mod p, where i is an even integer. It follows that |Qp | = (p − 1)/2 and |Qp | = (p − 1)/2; that is, half of the elements in Z∗p are quadratic residues and the other half are quadratic non-residues. 2.136 Example α = 6 is a generator of Z∗13 The powers of α are listed in the following table i α mod 13 i 0 1 1 6 2 10 3 8 4 9 5 2 6 12 7 7 8 3 9 5 10 4 Hence Q13 = {1, 3, 4, 9, 10, 12} and Q13 = {2, 5, 6, 7, 8, 11}. 11 11  2.137 Fact Let n be a product of two distinct odd primes p and q, n = pq Then a ∈ Z∗n is a quadratic residue modulo n if and only if a ∈ Qp and a ∈ Qq . It follows that |Qn | = |Qp | · |Qq | = (p − 1)(q − 1)/4 and |Qn | = 3(p − 1)(q − 1)/4. 2.138 Example Let n = 21 Then Q21 = {1, 4, 16} and Q21 = {2, 5, 8, 10, 11, 13, 17, 19, 20}  2.139 Definition Let a ∈ Qn If x ∈ Z∗n satisfies x2 ≡ a (mod n), then x is called a

square root of a modulo n. 2.140 Fact (number of square roots) (i) If p is an odd prime and a ∈ Qp , then a has exactly two square roots modulo p. (ii) More generally, let n = pe11 pe22 · · · pekk where the pi are distinct odd primes and ei ≥ 1. If a ∈ Qn , then a has precisely 2k distinct square roots modulo n 2.141 Example The square roots of 12 modulo 37 are 7 and 30 The square roots of 121 modulo 315 are 11, 74, 101, 151, 164, 214, 241, and 304.  c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.4 Number theory 71 2.44 Algorithms in Zn Let n be a positive integer. As before, the elements of Zn will be represented by the integers {0, 1, 2, . , n − 1} Observe that if a, b ∈ Zn , then  a + b, if a + b < n, (a + b) mod n = a + b − n, if a + b ≥ n. Hence modular addition (and subtraction) can be performed without the need of a long division. Modular multiplication of a and b may be accomplished by simply multiplying a and b as

integers, and then taking the remainder of the result after division by n. Inverses in Zn can be computed using the extended Euclidean algorithm as next described. 2.142 Algorithm Computing multiplicative inverses in Zn INPUT: a ∈ Zn . OUTPUT: a−1 mod n, provided that it exists. 1. Use the extended Euclidean algorithm (Algorithm 2107) to find integers x and y such that ax + ny = d, where d = gcd(a, n). 2. If d > 1, then a−1 mod n does not exist Otherwise, return(x) Modular exponentiation can be performed efficiently with the repeated square-andmultiply algorithm (Algorithm 2.143), which is crucial for many cryptographic protocols One version of this is based on the following observation. Let the binary reprePalgorithm t sentation of k be i=0 ki 2i , where each ki ∈ {0, 1}. Then ak = t Y 0 1 aki 2 = (a2 )k0 (a2 )k1 · · · (a2 )kt . i t i=0 2.143 Algorithm Repeated square-and-multiply algorithm for exponentiation in Zn Pt INPUT: a ∈ Zn , and integer 0 ≤ k < n

whose binary representation is k = i=0 ki 2i . OUTPUT: ak mod n. 1. Set b←1 If k = 0 then return(b) 2. Set A←a 3. If k0 = 1 then set b←a 4. For i from 1 to t do the following: 4.1 Set A←A2 mod n 4.2 If ki = 1 then set b←A · b mod n 5. Return(b) 2.144 Example (modular exponentiation) Table 24 shows the steps involved in the computation of 5596 mod 1234 = 1013.  The number of bit operations for the basic operations in Zn is summarized in Table 2.5 Efficient algorithms for performing modular multiplication and exponentiation are further examined in §14.3 and §146 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 72 Ch. 2 Mathematical Background i ki A b 0 0 5 1 1 0 25 1 2 1 625 625 3 0 681 625 4 1 1011 67 5 0 369 67 6 1 421 1059 7 0 779 1059 8 0 947 1059 9 1 925 1013 Table 2.4: Computation of 5596 mod 1234 Operation Modular addition Modular subtraction Modular multiplication Modular inversion Modular exponentiation Bit

complexity (a + b) mod n (a − b) mod n (a · b) mod n a−1 mod n k a mod n, k < n O(lg n) O(lg n) O((lg n)2 ) O((lg n)2 ) O((lg n)3 ) Table 2.5: Bit complexity of basic operations in Zn 2.45 The Legendre and Jacobi symbols The Legendre symbol is a useful tool for keeping track of whether or not an integer a is a quadratic residue modulo a prime p. 2.145 Definition Let p be an odd prime and a an integer The Legendre symbol to be     0, if p|a, a 1, if a ∈ Qp , =  p −1, if a ∈ Qp .  a p is defined 2.146 Fact (properties of Legendre symbol) Let p be an odd prime and a, b ∈ Z Then the Legendre symbol has the following properties:    (i) ap ≡ a(p−1)/2 (mod p). In particular, p1 = 1 and −1 = (−1)(p−1)/2 . Hence p −1 ∈ Qp if p ≡ 1 (mod 4), and −1 ∈ Qp if p ≡ 3 (mod 4).     ∗ a b a2 (ii) ab p = p p . Hence if a ∈ Zp , then p = 1   (iii) If a ≡ b (mod p), then ap = pb .    2 (iv) p2 = (−1)(p −1)/8 . Hence 2p = 1 if p

≡ 1 or 7 (mod 8), and 2p = −1 if p ≡ 3 or 5 (mod 8). (v) (law of quadratic reciprocity) If q is an odd prime distinct from p, then     q p = (−1)(p−1)(q−1)/4 . q p   In other words, pq = qp unless both p and q are congruent to 3 modulo 4, in which   case pq = − qp . The Jacobi symbol is a generalization of the Legendre symbol to integers n which are odd but not necessarily prime. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.4 Number theory 73 2.147 Definition Let n ≥ 3 be odd with prime factorization n = pe11 pe22 · · · pekk . Then the Jacobi  symbol na is defined to be    e1  e2  ek a a a a = ··· . n p1 p2 pk Observe that if n is prime, then the Jacobi symbol is just the Legendre symbol. 2.148 Fact (properties of Jacobi symbol) Let m ≥ 3, n ≥ 3 be odd integers, and a, b ∈ Z Then the Jacobi symbol has the following properties:   (i) na = 0, 1, or − 1. Moreover, na = 0 if and only if gcd(a, n) 6= 1  

  ∗ a b a2 (ii) ab n = n n .Hence if a ∈ Zn , then n = 1 a a a = m (iii) mn n .   (iv) If a ≡ b (mod n), then na = nb .  (v) n1 = 1.    −1 (n−1)/2 (vi) −1 . Hence −1 n = (−1) n = 1 if n ≡ 1 (mod 4), and n = −1 if n ≡ 3 (mod 4).    2 (vii) n2 = (−1)(n −1)/8 . Hence n2 = 1 if n ≡ 1 or 7 (mod 8), and n2 = −1 if n ≡ 3 or 5 (mod 8).   n m n (m−1)(n−1)/4 (viii) m . In other words, n = m (−1) n = m unless both m and n are   n congruent to 3 modulo 4, in which case m n =− m . By properties of the Jacobi symbol it follows that if n is odd and a = 2e a1 where a1 is odd, then    e    e   a 2 a1 2 n mod a1 = = (−1)(a1 −1)(n−1)/4 . n n n n a1  This observation yields the following recursive algorithm for computing na , which does not require the prime factorization of n. 2.149 Algorithm Jacobi symbol (and Legendre symbol) computation JACOBI(a,n) INPUT: an odd integer n ≥ 3, and  an integer a, 0 ≤ a < n. OUTPUT: the

Jacobi symbol na (and hence the Legendre symbol when n is prime). 1. If a = 0 then return(0) 2. If a = 1 then return(1) 3. Write a = 2e a1 , where a1 is odd 4. If e is even then set s←1 Otherwise set s←1 if n ≡ 1 or 7 (mod 8), or set s← − 1 if n ≡ 3 or 5 (mod 8). 5. If n ≡ 3 (mod 4) and a1 ≡ 3 (mod 4) then set s← − s 6. Set n1 ←n mod a1 7. If a1 = 1 then return(s); otherwise return(s · JACOBI(n1 ,a1 )) 2.150 Fact Algorithm 2149 has a running time of O((lg n)2 ) bit operations Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 74 Ch. 2 Mathematical Background 2.151 Remark (finding quadratic non-residues modulo a prime p) Let p denote an odd prime Even though it is known that half of the elements in Z∗p are quadratic non-residues modulo p (see Fact 2.135), there is no deterministic polynomial-time algorithm known for finding one. A randomized algorithm for finding a quadratic  non-residue is to simply select random integers a

∈ Z∗p until one is found satisfying ap = −1. The expected number iterations before a non-residue is found is 2, and hence the procedure takes expected polynomial-time. 2.152 Example (Jacobi symbol computation) For a = 158 and n = 235, Algorithm 2.149 com as follows: putes the Jacobi symbol 158 235          158 77 2 79 235 78·234/4 = = (−1) (−1) = 235 235 235 79 79     79 2 (−1)76·78/4 = = −1.  = 77 77  Unlike the Legendre symbol, the Jacobi symbol na does not reveal whether or not a is a quadratic residue modulo n. It is indeed true that if a ∈ Qn , then na = 1 However, a n = 1 does not imply that a ∈ Qn . 2.153 Example (quadratic residues and non-residues) Table 26 lists the elements in Z∗21 and their  Jacobi symbols. Recall from Example 2138 that Q21 = {1, 4, 16} Observe that 5  21 = 1 but 5 6∈ Q21 . a ∈ Z∗21 1 2 a mod n  a 3  a 7  a 21 2 4 5 8 10 1 4 16 1 −1 1 1 1 1 −1 1 1 −1 1 11 13 16 4 1 −1 −1

1 −1 17 19 16 16 1 1 −1 1 −1 −1 1 −1 −1 −1 20 4 16 4 1 1 −1 1 −1 1 1 −1 1 −1 −1 −1 1 Table 2.6: Jacobi symbols of elements in Z∗21  2.154 Definition Let n ≥ 3 be an odd integer, and let Jn = {a ∈ Z∗n | na = 1} The set of e n , is defined to be the set Jn − Qn . pseudosquares modulo n, denoted Q e n | = (p − 2.155 Fact Let n = pq be a product of two distinct odd primes Then |Qn | = |Q 1)(q − 1)/4; that is, half of the elements in Jn are quadratic residues and the other half are pseudosquares. 2.46 Blum integers 2.156 Definition A Blum integer is a composite integer of the form n = pq, where p and q are distinct primes each congruent to 3 modulo 4. 2.157 Fact Let n = pq be a Blum integer, and let a ∈ Qn Then a has precisely four square roots modulo n, exactly one of which is also in Qn . 2.158 Definition Let n be a Blum integer and let a ∈ Qn The unique square root of a in Qn is called the principal square root of

a modulo n. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.5 Abstract algebra 75 2.159 Example (Blum integer) For the Blum integer n = 21, Jn = {1, 4, 5, 16, 17, 20} and e n = {5, 17, 20}. The four square roots of a = 4 are 2, 5, 16, and 19, of which only 16 is Q also in Q21 . Thus 16 is the principal square root of 4 modulo 21  2.160 Fact If n = pq is a Blum integer, then the function f : Qn − Qn defined by f (x) = x2 mod n is a permutation. The inverse function of f is: f −1 (x) = x((p−1)(q−1)+4)/8 mod n. 2.5 Abstract algebra This section provides an overview of basic algebraic objects and their properties, for reference in the remainder of this handbook. Several of the definitions in §251 and §252 were presented earlier in §2.43 in the more concrete setting of the algebraic structure Z∗n 2.161 Definition A binary operation ∗ on a set S is a mapping from S × S to S That is, ∗ is a rule which assigns to each ordered pair of

elements from S an element of S. 2.51 Groups 2.162 Definition A group (G, ∗) consists of a set G with a binary operation ∗ on G satisfying the following three axioms. (i) The group operation is associative. That is, a ∗ (b ∗ c) = (a ∗ b) ∗ c for all a, b, c ∈ G (ii) There is an element 1 ∈ G, called the identity element, such that a ∗ 1 = 1 ∗ a = a for all a ∈ G. (iii) For each a ∈ G there exists an element a−1 ∈ G, called the inverse of a, such that a ∗ a−1 = a−1 ∗ a = 1. A group G is abelian (or commutative) if, furthermore, (iv) a ∗ b = b ∗ a for all a, b ∈ G. Note that multiplicative group notation has been used for the group operation. If the group operation is addition, then the group is said to be an additive group, the identity element is denoted by 0, and the inverse of a is denoted −a. Henceforth, unless otherwise stated, the symbol ∗ will be omitted and the group operation will simply be denoted by juxtaposition. 2.163

Definition A group G is finite if |G| is finite The number of elements in a finite group is called its order. 2.164 Example The set of integers Z with the operation of addition forms a group The identity element is 0 and the inverse of an integer a is the integer −a.  2.165 Example The set Zn , with the operation of addition modulo n, forms a group of order n. The set Zn with the operation of multiplication modulo n is not a group, since not all elements have multiplicative inverses. However, the set Z∗n (see Definition 2124) is a group of order φ(n) under the operation of multiplication modulo n, with identity element 1.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 76 Ch. 2 Mathematical Background 2.166 Definition A non-empty subset H of a group G is a subgroup of G if H is itself a group with respect to the operation of G. If H is a subgroup of G and H 6= G, then H is called a proper subgroup of G. 2.167 Definition A group G is cyclic if

there is an element α ∈ G such that for each b ∈ G there is an integer i with b = αi . Such an element α is called a generator of G 2.168 Fact If G is a group and a ∈ G, then the set of all powers of a forms a cyclic subgroup of G, called the subgroup generated by a, and denoted by hai. 2.169 Definition Let G be a group and a ∈ G The order of a is defined to be the least positive integer t such that at = 1, provided that such an integer exists. If such a t does not exist, then the order of a is defined to be ∞. 2.170 Fact Let G be a group, and let a ∈ G be an element of finite order t Then |hai|, the size of the subgroup generated by a, is equal to t. 2.171 Fact (Lagrange’s theorem) If G is a finite group and H is a subgroup of G, then |H| divides |G|. Hence, if a ∈ G, the order of a divides |G| 2.172 Fact Every subgroup of a cyclic group G is also cyclic In fact, if G is a cyclic group of order n, then for each positive divisor d of n, G contains exactly one

subgroup of order d. 2.173 Fact Let G be a group (i) If the order of a ∈ G is t, then the order of ak is t/ gcd(t, k). (ii) If G is a cyclic group of order n and d|n, then G has exactly φ(d) elements of order d. In particular, G has φ(n) generators 2.174 Example Consider the multiplicative group Z∗19 = {1, 2, , 18} of order 18 The group is cyclic (Fact 2.132(i)), and a generator is α = 2 The subgroups of Z∗19 , and their generators, are listed in Table 27  Subgroup {1} {1, 18} {1, 7, 11} {1, 7, 8, 11, 12, 18} {1, 4, 5, 6, 7, 9, 11, 16, 17} {1, 2, 3, . , 18} Generators 1 18 7, 11 8, 12 4, 5, 6, 9, 16, 17 2, 3, 10, 13, 14, 15 Order 1 2 3 6 9 18 Table 2.7: The subgroups of Z∗19 2.52 Rings 2.175 Definition A ring (R, +, ×) consists of a set R with two binary operations arbitrarily denoted + (addition) and × (multiplication) on R, satisfying the following axioms (i) (R, +) is an abelian group with identity denoted 0. c 1997 by CRC Press, Inc. See accompanying notice

at front of chapter §2.5 Abstract algebra 77 (ii) The operation × is associative. That is, a × (b × c) = (a × b) × c for all a, b, c ∈ R (iii) There is a multiplicative identity denoted 1, with 1 6= 0, such that 1 × a = a × 1 = a for all a ∈ R. (iv) The operation × is distributive over +. That is, a × (b + c) = (a × b) + (a × c) and (b + c) × a = (b × a) + (c × a) for all a, b, c ∈ R. The ring is a commutative ring if a × b = b × a for all a, b ∈ R. 2.176 Example The set of integers Z with the usual operations of addition and multiplication is a commutative ring.  2.177 Example The set Zn with addition and multiplication performed modulo n is a commutative ring  2.178 Definition An element a of a ring R is called a unit or an invertible element if there is an element b ∈ R such that a × b = 1. 2.179 Fact The set of units in a ring R forms a group under multiplication, called the group of units of R. 2.180 Example The group of units of the ring Zn is

Z∗n (see Definition 2124)  2.53 Fields 2.181 Definition A field is a commutative ring in which all non-zero elements have multiplicative inverses m times z }| { 2.182 Definition The characteristic of a field is 0 if 1 + 1 + · · · + 1 is never equal to 0 for any m Pm≥ 1. Otherwise, the characteristic of the field is the least positive integer m such that i=1 1 equals 0. 2.183 Example The set of integers under the usual operations of addition and multiplication is not a field, since the only non-zero integers with multiplicative inverses are 1 and −1. However, the rational numbers Q, the real numbers R, and the complex numbers C form fields of characteristic 0 under the usual operations.  2.184 Fact Zn is a field (under the usual operations of addition and multiplication modulo n) if and only if n is a prime number. If n is prime, then Zn has characteristic n 2.185 Fact If the characteristic m of a field is not 0, then m is a prime number 2.186 Definition A subset F of a

field E is a subfield of E if F is itself a field with respect to the operations of E. If this is the case, E is said to be an extension field of F Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 78 Ch. 2 Mathematical Background 2.54 Polynomial rings 2.187 Definition If R is a commutative ring, then a polynomial in the indeterminate x over the ring R is an expression of the form f (x) = an xn + · · · + a2 x2 + a1 x + a0 where each ai ∈ R and n ≥ 0. The element ai is called the coefficient of xi in f (x) The largest integer m for which am 6= 0 is called the degree of f (x), denoted deg f (x); am is called the leading coefficient of f (x). If f (x) = a0 (a constant polynomial) and a0 6= 0, then f (x) has degree 0. If all the coefficients of f (x) are 0, then f (x) is called the zero polynomial and its degree, for mathematical convenience, is defined to be −∞. The polynomial f (x) is said to be monic if its leading coefficient is equal to

1. 2.188 Definition If R is a commutative ring, the polynomial ring R[x] is the ring formed by the set of all polynomials in the indeterminate x having coefficients from R. The two operations are the standard polynomial addition and multiplication, with coefficient arithmetic performed in the ring R. 2.189 Example (polynomial ring) Let f (x) = x3 + x + 1 and g(x) = x2 + x be elements of the polynomial ring Z2 [x]. Working in Z2 [x], f (x) + g(x) = x3 + x2 + 1 and f (x) · g(x) = x5 + x4 + x3 + x.  For the remainder of this section, F will denote an arbitrary field. The polynomial ring F [x] has many properties in common with the integers (more precisely, F [x] and Z are both Euclidean domains, however, this generalization will not be pursued here). These similarities are investigated further 2.190 Definition Let f (x) ∈ F [x] be a polynomial of degree at least 1 Then f (x) is said to be irreducible over F if it cannot be written as the product of two polynomials in F [x], each of

positive degree. 2.191 Definition (division algorithm for polynomials) If g(x), h(x) ∈ F [x], with h(x) 6= 0, then ordinary polynomial long division of g(x) by h(x) yields polynomials q(x) and r(x) ∈ F [x] such that g(x) = q(x)h(x) + r(x), where deg r(x) < deg h(x). Moreover, q(x) and r(x) are unique. The polynomial q(x) is called the quotient, while r(x) is called the remainder. The remainder of the division is sometimes denoted g(x) mod h(x), and the quotient is sometimes denoted g(x) div h(x) (cf. Definition 282) 2.192 Example (polynomial division) Consider the polynomials g(x) = x6 +x5 +x3 +x2 +x+1 and h(x) = x4 + x3 + 1 in Z2 [x]. Polynomial long division of g(x) by h(x) yields g(x) = x2 h(x) + (x3 + x + 1). Hence g(x) mod h(x) = x3 + x + 1 and g(x) div h(x) = x2 . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter  §2.5 Abstract algebra 79 2.193 Definition If g(x), h(x) ∈ F [x] then h(x) divides g(x), written h(x)|g(x), if g(x) mod h(x) = 0.

Let f (x) be a fixed polynomial in F [x]. As with the integers (Definition 2110), one can define congruences of polynomials in F [x] based on division by f (x). 2.194 Definition If g(x), h(x) ∈ F [x], then g(x) is said to be congruent to h(x) modulo f (x) if f (x) divides g(x) − h(x). This is denoted by g(x) ≡ h(x) (mod f (x)) 2.195 Fact (properties of congruences) For all g(x), h(x), g1 (x), h1 (x), s(x) ∈ F [x], the following are true (i) g(x) ≡ h(x) (mod f (x)) if and only if g(x) and h(x) leave the same remainder upon division by f (x). (ii) (reflexivity) g(x) ≡ g(x) (mod f (x)). (iii) (symmetry) If g(x) ≡ h(x) (mod f (x)), then h(x) ≡ g(x) (mod f (x)). (iv) (transitivity) If g(x) ≡ h(x) (mod f (x)) and h(x) ≡ s(x) (mod f (x)), then g(x) ≡ s(x) (mod f (x)). (v) If g(x) ≡ g1 (x) (mod f (x)) and h(x) ≡ h1 (x) (mod f (x)), then g(x) + h(x) ≡ g1 (x) + h1 (x) (mod f (x)) and g(x)h(x) ≡ g1 (x)h1 (x) (mod f (x)). Let f (x) be a fixed polynomial in F [x]. The

equivalence class of a polynomial g(x) ∈ F [x] is the set of all polynomials in F [x] congruent to g(x) modulo f (x). From properties (ii), (iii), and (iv) above, it can be seen that the relation of congruence modulo f (x) partitions F [x] into equivalence classes. If g(x) ∈ F [x], then long division by f (x) yields unique polynomials q(x), r(x) ∈ F [x] such that g(x) = q(x)f (x) + r(x), where deg r(x) < deg f (x). Hence every polynomial g(x) is congruent modulo f (x) to a unique polynomial of degree less than deg f (x) The polynomial r(x) will be used as representative of the equivalence class of polynomials containing g(x). 2.196 Definition F [x]/(f (x)) denotes the set of (equivalence classes of) polynomials in F [x] of degree less than n = deg f (x). Addition and multiplication are performed modulo f (x) 2.197 Fact F [x]/(f (x)) is a commutative ring 2.198 Fact If f (x) is irreducible over F , then F [x]/(f (x)) is a field 2.55 Vector spaces 2.199 Definition A vector

space V over a field F is an abelian group (V, +), together with a multiplication operation • : F × V − V (usually denoted by juxtaposition) such that for all a, b ∈ F and v, w ∈ V , the following axioms are satisfied. (i) a(v + w) = av + aw. (ii) (a + b)v = av + bv. (iii) (ab)v = a(bv). (iv) 1v = v. The elements of V are called vectors, while the elements of F are called scalars. The group operation + is called vector addition, while the multiplication operation is called scalar multiplication. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 80 Ch. 2 Mathematical Background 2.200 Definition Let V be a vector space over a field F A subspace of V is an additive subgroup U of V which is closed under scalar multiplication, i.e, av ∈ U for all a ∈ F and v ∈ U 2.201 Fact A subspace of a vector space is also a vector space 2.202 Definition Let S = {v1 , v2 , , vn } be a finite subset of a vector space V over a field F (i) A linear

combination of S is an expression of the form a1 v1 + a2 v2 + · · · + an vn , where each ai ∈ F . (ii) The span of S, denoted hSi, is the set of all linear combinations of S. The span of S is a subspace of V . (iii) If U is a subspace of V , then S is said to span U if hSi = U . (iv) The set S is linearly dependent over F if there exist scalars a1 , a2 , . , an , not all zero, such that a1 v1 + a2 v2 + · · · + an vn = 0. If no such scalars exist, then S is linearly independent over F . (v) A linearly independent set of vectors that spans V is called a basis for V . 2.203 Fact Let V be a vector space (i) If V has a finite spanning set, then it has a basis. (ii) If V has a basis, then in fact all bases have the same number of elements. 2.204 Definition If a vector space V has a basis, then the number of elements in a basis is called the dimension of V , denoted dim V . 2.205 Example If F is any field, then the n-fold Cartesian product V = F × F × · · · × F is a vector

space over F of dimension n. The standard basis for V is {e1 , e2 , , en }, where ei is a vector with a 1 in the ith coordinate and 0’s elsewhere.  2.206 Definition Let E be an extension field of F Then E can be viewed as a vector space over the subfield F , where vector addition and scalar multiplication are simply the field operations of addition and multiplication in E. The dimension of this vector space is called the degree of E over F , and denoted by [E : F ]. If this degree is finite, then E is called a finite extension of F . 2.207 Fact Let F , E, and L be fields If L is a finite extension of E and E is a finite extension of F , then L is also a finite extension of F and [L : F ] = [L : E][E : F ]. 2.6 Finite fields 2.61 Basic properties 2.208 Definition A finite field is a field F which contains a finite number of elements The order of F is the number of elements in F . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.6 Finite fields 81

2.209 Fact (existence and uniqueness of finite fields) (i) If F is a finite field, then F contains pm elements for some prime p and integer m ≥ 1. (ii) For every prime power order pm , there is a unique (up to isomorphism) finite field of order pm . This field is denoted by Fpm , or sometimes by GF (pm ) Informally speaking, two fields are isomorphic if they are structurally the same, although the representation of their field elements may be different. Note that if p is a prime then Zp is a field, and hence every field of order p is isomorphic to Zp . Unless otherwise stated, the finite field Fp will henceforth be identified with Zp . 2.210 Fact If Fq is a finite field of order q = pm , p a prime, then the characteristic of Fq is p Moreover, Fq contains a copy of Zp as a subfield. Hence Fq can be viewed as an extension field of Zp of degree m. 2.211 Fact (subfields of a finite field) Let Fq be a finite field of order q = pm Then every subfield of Fq has order pn , for some n that

is a positive divisor of m. Conversely, if n is a positive divisor of m, then there is exactly one subfield of Fq of order pn ; an element a ∈ Fq is in n the subfield Fpn if and only if ap = a. 2.212 Definition The non-zero elements of Fq form a group under multiplication called the multiplicative group of Fq , denoted by F∗q 2.213 Fact F∗q is a cyclic group of order q − 1 Hence aq = a for all a ∈ Fq 2.214 Definition A generator of the cyclic group F∗q is called a primitive element or generator of Fq . 2.215 Fact If a, b ∈ Fq , a finite field of characteristic p, then t t t (a + b)p = ap + bp for all t ≥ 0. 2.62 The Euclidean algorithm for polynomials Let Zp be the finite field of order p. The theory of greatest common divisors and the Euclidean algorithm for integers carries over in a straightforward manner to the polynomial ring Zp [x] (and more generally to the polynomial ring F [x], where F is any field). 2.216 Definition Let g(x), h(x) ∈ Zp [x], where not

both are 0 Then the greatest common divisor of g(x) and h(x), denoted gcd(g(x), h(x)), is the monic polynomial of greatest degree in Zp [x] which divides both g(x) and h(x). By definition, gcd(0, 0) = 0 2.217 Fact Zp [x] is a unique factorization domain That is, every non-zero polynomial f (x) ∈ Zp [x] has a factorization f (x) = af1 (x)e1 f2 (x)e2 · · · fk (x)ek , where the fi (x) are distinct monic irreducible polynomials in Zp [x], the ei are positive integers, and a ∈ Zp . Furthermore, the factorization is unique up to rearrangement of factors The following is the polynomial version of the Euclidean algorithm (cf. Algorithm 2104) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 82 Ch. 2 Mathematical Background 2.218 Algorithm Euclidean algorithm for Zp [x] INPUT: two polynomials g(x), h(x) ∈ Zp [x]. OUTPUT: the greatest common divisor of g(x) and h(x). 1. While h(x) 6= 0 do the following: 1.1 Set r(x)←g(x) mod h(x), g(x)←h(x),

h(x)←r(x) 2. Return(g(x)) 2.219 Definition A Zp -operation means either an addition, subtraction, multiplication, inversion, or division in Zp 2.220 Fact Suppose that deg g(x) ≤ m and deg h(x) ≤ m Then Algorithm 2218 has a running time of O(m2 ) Zp -operations, or equivalently, O(m2 (lg p)2 ) bit operations. As with the case of the integers (cf. Algorithm 2107), the Euclidean algorithm can be extended so that it also yields two polynomials s(x) and t(x) satisfying s(x)g(x) + t(x)h(x) = gcd(g(x), h(x)). 2.221 Algorithm Extended Euclidean algorithm for Zp [x] INPUT: two polynomials g(x), h(x) ∈ Zp [x]. OUTPUT: d(x) = gcd(g(x), h(x)) and polynomials s(x), t(x) ∈ Zp [x] which satisfy s(x)g(x) + t(x)h(x) = d(x). 1. If h(x) = 0 then set d(x)←g(x), s(x)←1, t(x)←0, and return(d(x),s(x),t(x)) 2. Set s2 (x)←1, s1 (x)←0, t2 (x)←0, t1 (x)←1 3. While h(x) 6= 0 do the following: 3.1 q(x)←g(x) div h(x), r(x)←g(x) − h(x)q(x) 3.2 s(x)←s2 (x) − q(x)s1 (x), t(x)←t2

(x) − q(x)t1 (x) 3.3 g(x)←h(x), h(x)←r(x) 3.4 s2 (x)←s1 (x), s1 (x)←s(x), t2 (x)←t1 (x), and t1 (x)←t(x) 4. Set d(x)←g(x), s(x)←s2 (x), t(x)←t2 (x) 5. Return(d(x),s(x),t(x)) 2.222 Fact (running time of Algorithm 2221) (i) The polynomials s(x) and t(x) given by Algorithm 2.221 have small degree; that is, they satisfy deg s(x) < deg h(x) and deg t(x) < deg g(x). (ii) Suppose that deg g(x) ≤ m and deg h(x) ≤ m. Then Algorithm 2221 has a running time of O(m2 ) Zp -operations, or equivalently, O(m2 (lg p)2 ) bit operations. 2.223 Example (extended Euclidean algorithm for polynomials) The following are the steps of Algorithm 2.221 with inputs g(x) = x10 + x9 + x8 + x6 + x5 + x4 + 1 and h(x) = x9 + x6 + x5 + x3 + x2 + 1 in Z2 [x]. Initialization s2 (x)←1, s1 (x)←0, t2 (x)←0, t1 (x)←1. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.6 Finite fields 83 Iteration 1 q(x)←x + 1, r(x)←x8 + x7 + x6 + x2 + x, s(x)←1,

t(x)←x + 1, g(x)←x9 + x6 + x5 + x3 + x2 + 1, h(x)←x8 + x7 + x6 + x2 + 1, s2 (x)←0, s1 (x)←1, t2 (x)←1, t1 (x)←x + 1. Iteration 2 q(x)←x + 1, r(x)←x5 + x2 + x + 1, s(x)←x + 1, t(x)←x2 , g(x)←x8 + x7 + x6 + x2 + 1, h(x)←x5 + x2 + x + 1, s2 (x)←1, s1 (x)←x + 1, t2 (x)←x + 1, t1 (x)←x2 . Iteration 3 q(x)←x3 + x2 + x + 1, r(x)←x3 + x + 1, s(x)←x4 , t(x)←x5 + x4 + x3 + x2 + x + 1, g(x)←x5 + x2 + x + 1, h(x)←x3 + x + 1, s2 (x)←x + 1, s1 (x)←x4 , t2 (x)←x2 , t1 (x)←x5 + x4 + x3 + x2 + x + 1. Iteration 4 q(x)←x2 + 1, r(x)←0, s(x)←x6 + x4 + x + 1, t(x)←x7 + x6 + x2 + x + 1, g(x)←x3 + x + 1, h(x)←0, s2 (x)←x4 , s1 (x)←x6 + x4 + x + 1, t2 (x)←x5 + x4 + x3 + x2 + x + 1, t1 (x)←x7 + x6 + x2 + x + 1. Hence gcd(g(x), h(x)) = x3 + x + 1 and (x4 )g(x) + (x5 + x4 + x3 + x2 + x + 1)h(x) = x3 + x + 1.  2.63 Arithmetic of polynomials A commonly used representation for the elements of a finite field Fq , where q = pm and p is a

prime, is a polynomial basis representation. If m = 1, then Fq is just Zp and arithmetic is performed modulo p. Since these operations have already been studied in Section 242, it is henceforth assumed that m ≥ 2. The representation is based on Fact 2198 2.224 Fact Let f (x) ∈ Zp [x] be an irreducible polynomial of degree m Then Zp [x]/(f (x)) is a finite field of order pm . Addition and multiplication of polynomials is performed modulo f (x). The following fact assures that all finite fields can be represented in this manner. 2.225 Fact For each m ≥ 1, there exists a monic irreducible polynomial of degree m over Zp Hence, every finite field has a polynomial basis representation. An efficient algorithm for finding irreducible polynomials over finite fields is presented in §4.51 Tables 46 and 47 list some irreducible polynomials over the finite field Z2 Henceforth, the elements of the finite field Fpm will be represented by polynomials in Zp [x] of degree < m. If g(x), h(x)

∈ Fpm , then addition is the usual addition of polynomials in Zp [x] The product g(x)h(x) can be formed by first multiplying g(x) and h(x) as polynomials by the ordinary method, and then taking the remainder after polynomial division by f (x). Multiplicative inverses in Fpm can be computed by using the extended Euclidean algorithm for the polynomial ring Zp [x] Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 84 Ch. 2 Mathematical Background 2.226 Algorithm Computing multiplicative inverses in Fpm INPUT: a non-zero polynomial g(x) ∈ Fpm . (The elements of the field Fpm are represented as Zp [x]/(f (x)), where f (x) ∈ Zp [x] is an irreducible polynomial of degree m over Zp .) OUTPUT: g(x)−1 ∈ Fpm . 1. Use the extended Euclidean algorithm for polynomials (Algorithm 2221) to find two polynomials s(x) and t(x) ∈ Zp [x] such that s(x)g(x) + t(x)f (x) = 1. 2. Return(s(x)) Exponentiation in Fpm can be done efficiently by the repeated

square-and-multiply algorithm (cf. Algorithm 2143) 2.227 Algorithm Repeated square-and-multiply algorithm for exponentiation in Fpm INPUT: g(x) ∈ Fpm and an integer 0 ≤ k < pm − 1 whose binary representation is Pt k = i=0 ki 2i . (The field Fpm is represented as Zp [x]/(f (x)), where f (x) ∈ Zp [x] is an irreducible polynomial of degree m over Zp .) OUTPUT: g(x)k mod f (x). 1. Set s(x)←1 If k = 0 then return(s(x)) 2. Set G(x)←g(x) 3. If k0 = 1 then set s(x)←g(x) 4. For i from 1 to t do the following: 4.1 Set G(x)←G(x)2 mod f (x) 4.2 If ki = 1 then set s(x)←G(x) · s(x) mod f (x) 5. Return(s(x)) The number of Zp -operations for the basic operations in Fpm is summarized in Table 2.8 Number of Zp -operations Operation Addition Subtraction Multiplication Inversion Exponentiation g(x) + h(x) g(x) − h(x) g(x) · h(x) g(x)−1 g(x)k , k < pm O(m) O(m) O(m2 ) O(m2 ) O((lg p)m3 ) Table 2.8: Complexity of basic operations in Fpm In some applications (cf. §453),

it may be preferable to use a primitive polynomial to define a finite field. 2.228 Definition An irreducible polynomial f (x) ∈ Zp [x] of degree m is called a primitive polynomial if x is a generator of F∗pm , the multiplicative group of all the non-zero elements in Fpm = Zp [x]/(f (x)). 2.229 Fact The irreducible polynomial f (x) ∈ Zp [x] of degree m is a primitive polynomial if and only if f (x) divides xk − 1 for k = pm − 1 and for no smaller positive integer k. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §2.7 Notes and further references 85 2.230 Fact For each m ≥ 1, there exists a monic primitive polynomial of degree m over Zp In fact, there are precisely φ(pm − 1)/m such polynomials. 2.231 Example (the finite field F24 of order 16) It can be verified (Algorithm 469) that the polynomial f (x) = x4 + x + 1 is irreducible over Z2 Hence the finite field F24 can be represented as the set of all polynomials over F2 of degree less than

4 That is, F24 = {a3 x3 + a2 x2 + a1 x + a0 | ai ∈ {0, 1}}. For convenience, the polynomial a3 x3 + a2 x2 + a1 x + a0 is represented by the vector (a3 a2 a1 a0 ) of length 4, and F24 = {(a3 a2 a1 a0 ) | ai ∈ {0, 1}}. The following are some examples of field arithmetic. (i) Field elements are simply added componentwise: for example, (1011) + (1001) = (0010). (ii) To multiply the field elements (1101) and (1001), multiply them as polynomials and then take the remainder when this product is divided by f (x): (x3 + x2 + 1) · (x3 + 1) = x6 + x5 + x2 + 1 ≡ x3 + x2 + x + 1 (mod f (x)). Hence (1101) · (1001) = (1111). (iii) The multiplicative identity of F24 is (0001). (iv) The inverse of (1011) is (0101). To verify this, observe that (x3 + x + 1) · (x2 + 1) = x5 + x2 + x + 1 ≡ 1 (mod f (x)), whence (1011) · (0101) = (0001). f (x) is a primitive polynomial, or, equivalently, the field element x = (0010) is a generator of F∗24 . This may be checked by verifying that all the

non-zero elements in F24 can be obtained as a powers of x. The computations are summarized in Table 29  A list of some primitive polynomials over finite fields of characteristic two is given in Table 4.8 2.7 Notes and further references §2.1 A classic introduction to probability theory is the first volume of the book by Feller [392]. The material on the birthday problem (§2.15) is summarized from Nishimura and Sibuya [931]. See also Girault, Cohen, and Campana [460] The material on random mappings (§2.16) is summarized from the excellent article by Flajolet and Odlyzko [413] §2.2 The concept of entropy was introduced in the seminal paper of Shannon [1120]. These ideas were then applied to develop a mathematical theory of secrecy systems by Shannon [1121]. Hellman [548] extended the Shannon theory approach to cryptography, and this work was further generalized by Beauchemin and Brassard [80]. For an introduction to information theory see the books by Welsh [1235] and Goldie and

Pinch [464]. For more complete treatments, consult Blahut [144] and McEliece [829] Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 86 Ch. 2 Mathematical Background i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 xi mod x4 + x + 1 vector notation 1 x x2 x3 x+1 x2 + x x3 + x2 3 x +x+1 x2 + 1 x3 + x 2 x +x+1 x3 + x2 + x x3 + x2 + x + 1 x3 + x2 + 1 x3 + 1 (0001) (0010) (0100) (1000) (0011) (0110) (1100) (1011) (0101) (1010) (0111) (1110) (1111) (1101) (1001) Table 2.9: The powers of x modulo f (x) = x4 + x + 1 §2.3 Among the many introductory-level books on algorithms are those of Cormen, Leiserson, and Rivest [282], Rawlins [1030], and Sedgewick [1105]. A recent book on complexity theory is Papadimitriou [963]. Example 258 is from Graham, Knuth, and Patashnik [520, p.441] For an extensive list of NP-complete problems, see Garey and Johnson [441] §2.4 Two introductory-level books in number theory are Giblin [449] and Rosen [1069]. Good number theory books

at a more advanced level include Koblitz [697], Hardy and Wright [540], Ireland and Rosen [572], and Niven and Zuckerman [932]. The most comprehensive works on the design and analysis of algorithms, including number theoretic algorithms, are the first two volumes of Knuth [691, 692]. Two more recent books exclusively devoted to this subject are Bach and Shallit [70] and Cohen [263]. Facts 296 and 2102 are due to Rosser and Schoenfeld [1070]. Shallit [1108] describes and analyzes three algorithms for computing the Jacobi symbol. §2.5 Among standard references in abstract algebra are the books by Herstein [556] and Hungerford [565]. §2.6 An excellent introduction to finite fields is provided in McEliece [830]. An encyclopedic treatment of the theory and applications of finite fields is given by Lidl and Niederreitter [764]. Two books which discuss various methods of representing the elements of a finite field are those of Jungnickel [646] and Menezes et al. [841] c 1997 by CRC Press,

Inc. See accompanying notice at front of chapter Chapter 3 Number-Theoretic Reference Problems Contents in Brief 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 Introduction and overview . The integer factorization problem . The RSA problem . The quadratic residuosity problem . Computing square roots in Zn . The discrete logarithm problem . The Diffie-Hellman problem . Composite moduli . Computing individual bits . The subset sum problem . Factoring polynomials over finite fields Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 89 98 99 99 103 113 114 114 117 122 125

3.1 Introduction and overview The security of many public-key cryptosystems relies on the apparent intractability of the computational problems studied in this chapter. In a cryptographic setting, it is prudent to make the assumption that the adversary is very powerful. Thus, informally speaking, a computational problem is said to be easy or tractable if it can be solved in (expected)1 polynomial time, at least for a non-negligible fraction of all possible inputs In other words, if there is an algorithm which can solve a non-negligible fraction of all instances of a problem in polynomial time, then any cryptosystem whose security is based on that problem must be considered insecure. The computational problems studied in this chapter are summarized in Table 3.1 The true computational complexities of these problems are not known. That is to say, they are widely believed to be intractable,2 although no proof of this is known. Generally, the only lower bounds known on the resources

required to solve these problems are the trivial linear bounds, which do not provide any evidence of their intractability. It is, therefore, of interest to study their relative difficulties For this reason, various techniques of reducing one 1 For simplicity, the remainder of the chapter shall generally not distinguish between deterministic polynomialtime algorithms and randomized algorithms (see §2.34) whose expected running time is polynomial 2 More precisely, these problems are intractable if the problem parameters are carefully chosen. 87 88 Ch. 3 Number-Theoretic Reference Problems Problem FACTORING RSAP QRP SQROOT DLP GDLP DHP GDHP SUBSET-SUM Description Integer factorization problem: given a positive integer n, find its prime factorization; that is, write n = pe11 pe22 . pekk where the pi are pairwise distinct primes and each ei ≥ 1. RSA problem (also known as RSA inversion): given a positive integer n that is a product of two distinct odd primes p and q, a

positive integer e such that gcd(e, (p − 1)(q − 1)) = 1, and an integer c, find an integer m such that me ≡ c (mod n). Quadratic residuosity problem: given an odd composite inte ger n and an integer a having Jacobi symbol na = 1, decide whether or not a is a quadratic residue modulo n. Square roots modulo n: given a composite integer n and a ∈ Qn (the set of quadratic residues modulo n), find a square root of a modulo n; that is, an integer x such that x2 ≡ a (mod n). Discrete logarithm problem: given a prime p, a generator α of Z∗p , and an element β ∈ Z∗p , find the integer x, 0 ≤ x ≤ p − 2, such that αx ≡ β (mod p). Generalized discrete logarithm problem: given a finite cyclic group G of order n, a generator α of G, and an element β ∈ G, find the integer x, 0 ≤ x ≤ n − 1, such that αx = β. Diffie-Hellman problem: given a prime p, a generator α of Z∗p , and elements αa mod p and αb mod p, find αab mod p. Generalized Diffie-Hellman

problem: given a finite cyclic group G, a generator α of G, and group elements αa and αb , find αab . Subset sum problem: given a set of positive integers {a1 , a2 , . , an } and a positive integer s, determine whether or not there is a subset of the aj that sums to s. Table 3.1: Some computational problems of cryptographic relevance computational problem to another have been devised and studied in the literature. These reductions provide a means for converting any algorithm that solves the second problem into an algorithm for solving the first problem. The following intuitive notion of reducibility (cf. §233) is used in this chapter 3.1 Definition Let A and B be two computational problems A is said to polytime reduce to B, written A ≤P B, if there is an algorithm that solves A which uses, as a subroutine, a hypothetical algorithm for solving B, and which runs in polynomial time if the algorithm for B does.3 Informally speaking, if A polytime reduces to B, then B is at least

as difficult as A; equivalently, A is no harder than B. Consequently, if A is a well-studied computational problem that is widely believed to be intractable, then proving that A ≤P B provides strong evidence of the intractability of problem B. 3.2 Definition Let A and B be two computational problems If A ≤P B and B ≤P A, then A and B are said to be computationally equivalent, written A ≡P B. 3 In the literature, the hypothetical polynomial-time subroutine for B is sometimes called an oracle for B. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.2 The integer factorization problem 89 Informally speaking, if A ≡P B then A and B are either both tractable or both intractable, as the case may be. Chapter outline The remainder of the chapter is organized as follows. Algorithms for the integer factorization problem are studied in §32 Two problems related to factoring, the RSA problem and the quadratic residuosity problem, are briefly considered

in §3.3 and §34 Efficient algorithms for computing square roots in Zp , p a prime, are presented in §35, and the equivalence of the problems of finding square roots modulo a composite integer n and factoring n is established. Algorithms for the discrete logarithm problem are studied in §36, and the related Diffie-Hellman problem is briefly considered in §3.7 The relation between the problems of factoring a composite integer n and computing discrete logarithms in (cyclic subgroups of) the group Z∗n is investigated in §3.8 The tasks of finding partial solutions to the discrete logarithm problem, the RSA problem, and the problem of computing square roots modulo a composite integer n are the topics of §3.9 The L3 -lattice basis reduction algorithm is presented in §3.10, along with algorithms for the subset sum problem and for simultaneous diophantine approximation. Berlekamp’s Q-matrix algorithm for factoring polynomials is presented in §3.11 Finally, §312 provides references

and further chapter notes. 3.2 The integer factorization problem The security of many cryptographic techniques depends upon the intractability of the integer factorization problem. A partial list of such protocols includes the RSA public-key encryption scheme (§8.2), the RSA signature scheme (§1131), and the Rabin public-key encryption scheme (§8.3) This section summarizes the current knowledge on algorithms for the integer factorization problem. 3.3 Definition The integer factorization problem (FACTORING) is the following: given a positive integer n, find its prime factorization; that is, write n = pe11 pe22 · · · pekk where the pi are pairwise distinct primes and each ei ≥ 1. 3.4 Remark (primality testing vs factoring) The problem of deciding whether an integer is composite or prime seems to be, in general, much easier than the factoring problem. Hence, before attempting to factor an integer, the integer should be tested to make sure that it is indeed composite. Primality

tests are a main topic of Chapter 4 3.5 Remark (splitting vs factoring) A non-trivial factorization of n is a factorization of the form n = ab where 1 < a < n and 1 < b < n; a and b are said to be non-trivial factors of n. Here a and b are not necessarily prime To solve the integer factorization problem, it suffices to study algorithms that split n, that is, find a non-trivial factorization n = ab. Once found, the factors a and b can be tested for primality. The algorithm for splitting integers can then be recursively applied to a and/or b, if either is found to be composite. In this manner, the prime factorization of n can be obtained. 3.6 Note (testing for perfect powers) If n ≥ 2, it can be efficiently checked as follows whether or not n is a perfect power, i.e, n = xk for some integers x ≥ 2, k ≥ 2 For each prime Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 90 Ch. 3 Number-Theoretic Reference Problems p ≤ lg n, an integer

approximation x of n1/p is computed. This can be done by performing a binary search for x satisfying n = xp in the interval [2, 2blg n/pc+1 ]. The entire procedure takes O((lg3 n) lg lg lg n) bit operations. For the remainder of this section, it will always be assumed that n is not a perfect power. It follows that if n is composite, then n has at least two distinct prime factors. Some factoring algorithms are tailored to perform better when the integer n being factored is of a special form; these are called special-purpose factoring algorithms. The running times of such algorithms typically depend on certain properties of the factors of n Examples of special-purpose factoring algorithms include trial division (§321), Pollard’s rho algorithm (§3.22), Pollard’s p − 1 algorithm (§323), the elliptic curve algorithm (§324), and the special number field sieve (§3.27) In contrast, the running times of the so-called general-purpose factoring algorithms depend solely on the size of

n. Examples of generalpurpose factoring algorithms include the quadratic sieve (§326) and the general number field sieve (§3.27) Whenever applicable, special-purpose algorithms should be employed as they will generally be more efficient. A reasonable overall strategy is to attempt to find small factors first, capitalize on any particular special forms an integer may have, and then, if all else fails, bring out the general-purpose algorithms. As an example of a general strategy, one might consider the following. 1. Apply trial division by small primes less than some bound b1 2. Next, apply Pollard’s rho algorithm, hoping to find any small prime factors smaller than some bound b2 , where b2 > b1 . 3. Apply the elliptic curve factoring algorithm, hoping to find any small factors smaller than some bound b3 , where b3 > b2 . 4. Finally, apply one of the more powerful general-purpose algorithms (quadratic sieve or general number field sieve). 3.21 Trial division Once it is

established that an integer n is composite, before expending vast amounts of time with more powerful techniques, the first thing that should be attempted is trial division by all “small” primes. Here, “small” is determined as a function √ of the size of n. As an extreme case, trial division can be attempted by all primes up to n.√If this is done, trial division will completely factor n but the procedure will take roughly n divisions in the worst case when n is a product of two primes of the same size. In general, if the factors found at each stage are tested for primality, then trial division to factor n completely takes O(p + lg n) divisions, where p is the second-largest prime factor of n. Fact 3.7 indicates that if trial division is used to factor a randomly chosen large integer n, then the algorithm can be expected to find some small factors of n relatively quickly, and expend a large amount of time to find the second largest prime factor of n. 3.7 Fact Let n be chosen

uniformly at random from the interval [1, x] (i) If 12 ≤ α ≤ 1, then the probability that the largest prime factor of n is ≤ xα is approximately 1 + ln α. Thus, for example, the probability that n has a prime factor √ > x is ln 2 ≈ 0.69 (ii) The probability that the second-largest prime factor of n is ≤ x0.2117 is about 12 Q ei (iii) The expected total number of prime factors P of n is ln ln x + O(1). (If n = pi , the total number of prime factors of n is ei .) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.2 The integer factorization problem 91 3.22 Pollard’s rho factoring algorithm Pollard’s rho algorithm is a special-purpose factoring algorithm for finding small factors of a composite integer. Let f : S − S be a random function, where S is a finite set of cardinality n. Let x0 be a random element of S, and consider the sequence x0 , x1 , x2 , . defined by xi+1 = f (xi ) for i ≥ 0. SincepS is finite, the sequence must

eventually cycle, and consists of a tail p of expected length πn/8 followed by an endlessly repeating cycle of expected length πn/8 (see Fact 2.37) A problem that arises in some cryptanalytic tasks, including integer factorization (Algorithm 3.9) and the discrete logarithm problem (Algorithm 360), is of finding distinct indices i and j such that xi = xj (a collision is then said to have occurred). An obvious method for finding a collision is to compute and store xi for i = 0, 1, 2, . and look for duplicates. The expected number of inputs that√ must be tried before a√duplicate p is detected is πn/2 (Fact 2.27) This method requires O( n) memory and O( n) time, assuming the xi are stored in a hash table so that new entries can be added in constant time. 3.8 Note (Floyd’s cycle-finding algorithm) The large storage requirements in the above technique for finding a collision can be eliminated by using Floyd’s cycle-finding algorithm In this method, one starts with the pair (x1 ,

x2 ), and iteratively computes (xi , x2i ) from the previous pair (xi−1 , x2i−2 ), until xm = x2m for some m. If the tail of the sequence has length λ and the cycle has length µ, then the first time that xm = x2m is when m = µ(1 + bλ/µc). Note √ that λ < m ≤ λ + µ, and consequently the expected running time of this method is O( n). Now, let p be a prime factor of a composite integer n. Pollard’s rho algorithm for factoring n attempts to find duplicates in the sequence of integers x0 , x1 , x2 , defined by x0 = 2, xi+1 = f (xi ) = x2i + 1 mod p for i ≥ 0. Floyd’s cycle-finding algorithm is utilized to find xm and x2m such that xm ≡ x2m (mod p) Since p divides n but is unknown, this is done by computing the terms xi modulo n and testing if gcd(xm − x2m , n) > 1. If also gcd(xm − x2m , n) < n, then a non-trivial factor of n is obtained. (The situation gcd(xm − x2m , n) = n occurs with negligible probability.) 3.9 Algorithm Pollard’s rho

algorithm for factoring integers INPUT: a composite integer n that is not a prime power. OUTPUT: a non-trivial factor d of n. 1. Set a←2, b←2 2. For i = 1, 2, do the following: 2.1 Compute a←a2 + 1 mod n, b←b2 + 1 mod n, b←b2 + 1 mod n 2.2 Compute d = gcd(a − b, n) 2.3 If 1 < d < n then return(d) and terminate with success 2.4 If d = n then terminate the algorithm with failure (see Note 312) 3.10 Example (Pollard’s rho algorithm for finding a non-trivial factor of n = 455459) The following table lists the values of variables a, b, and d at the end of each iteration of step 2 of Algorithm 3.9 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 92 Ch. 3 Number-Theoretic Reference Problems a 5 26 677 2871 44380 179685 121634 155260 44567 b 26 2871 179685 155260 416250 43670 164403 247944 68343 d 1 1 1 1 1 1 1 1 743 Hence two non-trivial factors of 455459 are 743 and 455459/743 = 613.  3.11 Fact Assuming that the function f (x)

= x2 + 1 mod p behaves like a random function, √ the expected time for Pollard’s rho algorithm to find a factor p of n is O( p) modular multiplications. This implies that the expected time to find a non-trivial factor of n is O(n1/4 ) modular multiplications. 3.12 Note (options upon termination with failure) If Pollard’s rho algorithm terminates with failure, one option is to try again with a different polynomial f having integer coefficients instead of f (x) = x2 + 1. For example, the polynomial f (x) = x2 + c may be used as long as c 6= 0, −2. 3.23 Pollard’s p − 1 factoring algorithm Pollard’s p−1 factoring algorithm is a special-purpose factoring algorithm that can be used to efficiently find any prime factors p of a composite integer n for which p − 1 is smooth (see Definition 3.13) with respect to some relatively small bound B 3.13 Definition Let B be a positive integer An integer n is said to be B-smooth, or smooth with respect to a bound B, if all its prime

factors are ≤ B. The idea behind Pollard’s p − 1 algorithm is the following. Let B be a smoothness bound. Let Q be the least common multiple of all powers of primes ≤ B that are ≤ n If n q l ≤ n, then l ln q ≤ ln n, and so l ≤ b ln ln q c. Thus Y Q = q bln n/ ln qc , q≤B where the product is over all distinct primes q ≤ B. If p is a prime factor of n such that p−1 is B-smooth, then p − 1|Q, and consequently for any a satisfying gcd(a, p) = 1, Fermat’s theorem (Fact 2.127) implies that aQ ≡ 1 (mod p) Hence if d = gcd(aQ − 1, n), then p|d. It is possible that d = n, in which case the algorithm fails; however, this is unlikely to occur if n has at least two large distinct prime factors. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.2 The integer factorization problem 93 3.14 Algorithm Pollard’s p − 1 algorithm for factoring integers INPUT: a composite integer n that is not a prime power. OUTPUT: a non-trivial factor d

of n. 1. Select a smoothness bound B 2. Select a random integer a, 2 ≤ a ≤ n − 1, and compute d = gcd(a, n) If d ≥ 2 then return(d). 3. For each prime q ≤ B do the following: n 3.1 Compute l = b ln ln q c. l 3.2 Compute a←aq mod n (using Algorithm 2143) 4. Compute d = gcd(a − 1, n) 5. If d = 1 or d = n, then terminate the algorithm with failure Otherwise, return(d) 3.15 Example (Pollard’s p − 1 algorithm for finding a non-trivial factor of n = 19048567) 1. Select the smoothness bound B = 19 2. Select the integer a = 3 and compute gcd(3, n) = 1 3. The following table lists the intermediate values of the variables q, l, and a after each iteration of step 3 in Algorithm 3.14: q 2 3 5 7 11 13 17 19 l 24 15 10 8 6 6 5 5 a 2293244 13555889 16937223 15214586 9685355 13271154 11406961 554506 4. Compute d = gcd(554506 − 1, n) = 5281 5. Two non-trivial factors of n are p = 5281 and q = n/p = 3607 (these factors are in fact prime). Notice that p − 1 = 5280 = 25 × 3 × 5

× 11, and q − 1 = 3606 = 2 × 3 × 601. That is, p − 1 is 19-smooth, while q − 1 is not 19-smooth.  3.16 Fact Let n be an integer having a prime factor p such that p − 1 is B-smooth The running time of Pollard’s p − 1 algorithm for finding the factor p is O(B ln n/ ln B) modular multiplications. 3.17 Note (improvements) The smoothness bound B in Algorithm 314 is selected based on the amount of time one is willing to spend on Pollard’s p − 1 algorithm before moving on to more general techniques. In practice, B may be between 105 and 106 If the algorithm terminates with d = 1, then one might try searching over prime numbers q1 , q2 , . , ql larger than B by first computing a←aqi mod n for 1 ≤ i ≤ l, and then computing d = gcd(a − 1, n). Another variant is to start with a large bound B, and repeatedly execute step 3 for a few primes q followed by the gcd computation in step 4. There are numerous other practical improvements of the algorithm (see page 125).

Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 94 Ch. 3 Number-Theoretic Reference Problems 3.24 Elliptic curve factoring The details of the elliptic curve factoring algorithm are beyond the scope of this book; nevertheless, a rough outline follows. The success of Pollard’s p − 1 algorithm hinges on p − 1 being smooth for some prime divisor p of n; if no such p exists, then the algorithm fails. Observe that p − 1 is the order of the group Z∗p . The elliptic curve factoring algorithm is a generalization of Pollard’s p − 1 algorithm in the sense that the group Z∗p is replaced by a random elliptic curve group over Zp . The order of such a group is roughly uniformly dis√ √ tributed in the interval [p+1−2 p, p+1+2 p]. If the order of the group chosen is smooth with respect to some pre-selected bound, the elliptic curve algorithm will, with high probability, find a non-trivial factor of n. If the group order is not smooth, then the

algorithm will likely fail, but can be repeated with a different choice of elliptic curve√group. The elliptic curve algorithm has an expected running time of Lp [ 12 , 2] (see Example 2.61 for definition of Lp ) to find a factor p of n Since this running time depends on the size of the prime factors of n, the algorithm tends to find small such factors first. The elliptic curve algorithm is, therefore, classified as a special-purpose factoring algorithm. It is currently the algorithm of choice for finding t-decimal digit prime factors, for t ≤ 40, of very large composite integers. In the hardest case, when n is a product of two primes of roughly the same size, the expected running time of the elliptic curve algorithm is Ln [ 12 , 1], which is the same as that of the quadratic sieve (§3.26) However, the elliptic curve algorithm is not as efficient as the quadratic sieve in practice for such integers. 3.25 Random square factoring methods The basic idea behind the random square

family of methods is the following. Suppose x and y are integers such that x2 ≡ y 2 (mod n) but x 6≡ ±y (mod n). Then n divides x2 −y 2 = (x−y)(x+y) but n does not divide either (x−y) or (x+y). Hence, gcd(x−y, n) must be a non-trivial factor of n. This result is summarized next 3.18 Fact Let x, y, and n be integers If x2 ≡ y 2 (mod n) but x 6≡ ±y (mod n), then gcd(x− y, n) is a non-trivial factor of n. The random square methods attempt to find integers x and y at random so that x2 ≡ y 2 (mod n). Then, as shown in Fact 319, with probability at least 12 it is the case that x 6≡ ±y (mod n), whence gcd(x − y, n) will yield a non-trivial factor of n. 3.19 Fact Let n be an odd composite integer that is divisible by k distinct odd primes If a ∈ Z∗n , then the congruence x2 ≡ a2 (mod n) has exactly 2k solutions modulo n, two of which are x = a and x = −a. 3.20 Example Let n = 35 Then there are four solutions to the congruence x2 ≡ 4 (mod 35), namely x = 2,

12, 23, and 33.  A common strategy employed by the random square algorithms for finding x and y at random satisfying x2 ≡ y 2 (mod n) is the following. A set consisting of the first t primes S = {p1 , p2 , . , pt } is chosen; S is called the factor base Proceed to find pairs of integers (ai , bi ) satisfying (i) a2i ≡ bi (mod n); and c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.2 The integer factorization problem 95 Qt e (ii) bi = j=1 pj ij , eij ≥ 0; that is, bi is pt -smooth. Next find a subset of the bi ’s whose product is a perfect square. Knowing the factorizations of the bi ’s, this is possible by selecting a subset of the bi ’s such that the power of each prime pj appearing in their product is even. For this purpose, only the parity of the non-negative integer exponents eij needs to be considered. Thus, to simplify matters, for each i, associate the binary vector vi = (vi1 , vi2 , . , vit ) with the integer exponent vector

(ei1 , ei2 , . , eit ) such that vij = eij mod 2 If t + 1 pairs (ai , bi ) are obtained, then the t-dimensional vectors v1 , v2 , . , vt+1 must be linearly dependent P over Z2 . That is, there must exist a non-empty subset T ⊆ {1, 2, . . . , t + 1} such that i∈T vi = 0 over Z2 , and Q hence i∈T bQ is a perfect square. The set T can be found using ordinary i Q linear algebra over Z2 . Clearly, i∈T a2i Q is also a perfect square. Thus setting x = i∈T ai and y to be the integer square root of i∈T bi yields a pair of integers (x, y) satisfying x2 ≡ y 2 (mod n). If this pair also satisfies x 6≡ ±y (mod n), then gcd(x − y, n) yields a non-trivial factor of n. Otherwise, some of the (ai , bi ) pairs may be replaced by some new such pairs, and the process is repeated. In practice, there will be several dependencies among the vectors v1 , v2 , . , vt+1 , and with high probability at least one will yield an (x, y) pair satisfying x 6≡ ±y (mod n); hence, this last step

of generating new (ai , bi ) pairs does not usually occur. This description of the random square methods is incomplete for two reasons. Firstly, the optimal choice of t, the size of the factor base, is not specified; this is addressed in Note 3.24 Secondly, a method for efficiently generating the pairs (ai , bi ) is not specified Several techniques have been proposed. In the simplest of these, called Dixon’s algorithm, ai is chosen at random, and bi = a2i mod n is computed. Next, trial division by elements in the factor base is used to test whether bi is pt -smooth. If not, then another integer ai is chosen at random, and the procedure is repeated. The more efficient techniques strategically select an ai such that bi is relatively small. Since the proportion of pt -smooth integers in the interval [2, x] becomes larger as x decreases, the probability of such bi being pt -smooth is higher. The most efficient of such techniques is the quadratic sieve algorithm, which is described next.

3.26 Quadratic sieve factoring √ Suppose an integer n is to be factored. Let m = b nc, and consider the polynomial q(x) = (x + m)2 − n. Note that q(x) = x2 + 2mx + m2 − n ≈ x2 + 2mx, (3.1) which is small (relative to n) if x is small in absolute value. The quadratic sieve algorithm selects ai = (x + m) and tests whether bi = (x + m)2 − n is pt -smooth. Note that a2i = (x + m)2 ≡ bi (mod n). Note also that if a prime p divides bi then (x + m)2 ≡ n (mod p), and hence n is a quadratic residue modulo p. Thus the factor base need only contain those primes p for which the Legendre symbol np is 1 (Definition 2.145) Furthermore, since bi may be negative, −1 is included in the factor base The steps of the quadratic sieve algorithm are summarized in Algorithm 3.21 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 96 Ch. 3 Number-Theoretic Reference Problems 3.21 Algorithm Quadratic sieve algorithm for factoring integers INPUT: a composite

integer n that is not a prime power. OUTPUT: a non-trivial factor d of n. 1. Select the factor base S = {p1 , p2 , , pt }, where p1 = −1 and pj (j ≥ 2) is the (j − 1)th prime p for which n is a quadratic residue modulo p. √ 2. Compute m = b nc 3. (Collect t + 1 pairs (ai , bi ) The x values are chosen in the order 0, ±1, ±2, ) Set i←1. While i ≤ t + 1 do the following: 3.1 Compute b = q(x) = (x + m)2 − n, and test using trial division (cf Note 323) by elements in S whether b is pt -smooth. If not, pick a new x and repeat step 31 Q e 3.2 If b is pt -smooth, say b = tj=1 pj ij , then set ai ←(x + m), bi ←b, and vi = (vi1 , vi2 , . , vit ), where vij = eij mod 2 for 1 ≤ j ≤ t 3.3 i←i + 1 4. Use P linear algebra over Z2 to find a non-empty subset T ⊆ {1, 2, . , t + 1} such that i∈T vi = 0. Q 5. Compute x = i∈T ai mod n P 6. For each j, 1 ≤ j ≤ t, compute lj = ( i∈T eij )/2 Qt l 7. Compute y = j=1 pjj mod n 8. If x ≡ P±y (mod n), then find

another non-empty subset T ⊆ {1, 2, . , t+1} such that i∈T vi = 0, and go to step 5. (In the unlikely case such a subset T does not exist, replace a few of the (ai , bi ) pairs with new pairs (step 3), and go to step 4.) 9. Compute d = gcd(x − y, n) and return(d) 3.22 Example (quadratic sieve algorithm for finding a non-trivial factor of n = 24961) 1. Select the factor base S = {−1, 2, 3, 5, 13, 23} of size t = 6 (7, 11, 17 and 19 are omitted from S since np = −1 for these primes.) √ 2. Compute m = b 24961c = 157 3. Following is the data collected for the first t + 1 values of x for which q(x) is 23smooth i 1 2 3 4 5 6 7 4. 5. 6. 7. 8. 9. 10. 11. x 0 1 −1 2 −2 4 −6 q(x) −312 3 −625 320 −936 960 −2160 factorization of q(x) 3 −2 · 3 · 13 3 −54 26 · 5 −23 · 32 · 13 26 · 3 · 5 −24 · 33 · 5 ai vi 157 158 156 159 155 161 151 (1, 1, 1, 0, 1, 0) (0, 0, 1, 0, 0, 0) (1, 0, 0, 0, 0, 0) (0, 0, 0, 1, 0, 0) (1, 1, 0, 0, 1, 0) (0, 0, 1, 1, 0, 0)

(1, 0, 1, 1, 0, 0) By inspection, v1 + v2 + v5 = 0. (In the notation of Algorithm 321, T = {1, 2, 5}) Compute x = (a1 a2 a5 mod n) = 936. Compute l1 = 1, l2 = 3, l3 = 2, l4 = 0, l5 = 1, l6 = 0. Compute y = −23 · 32 · 13 mod n = 24025. Since 936 ≡ −24025 (mod n), another linear dependency must be found. By inspection, v3 + v6 + v7 = 0; thus T = {3, 6, 7}. Compute x = (a3 a6 a7 mod n) = 23405. Compute l1 = 1, l2 = 5, l3 = 2, l4 = 3, l5 = 0, l6 = 0. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.2 The integer factorization problem 97 12. Compute y = (−25 · 32 · 53 mod n) = 13922 13. Now, 23405 6≡ ±13922 (mod n), so compute gcd(x−y, n) = gcd(9483, 24961) = 109. Hence, two non-trivial factors of 24961 are 109 and 229  3.23 Note (sieving) Instead of testing smoothness by trial division in step 31 of Algorithm 321, a more efficient technique known as sieving is employed in practice. Observe first that if p is an odd prime in the factor

base and p divides q(x), then p also divides q(x+ lp) for every integer l. Thus by solving the equation q(x) ≡ 0 (mod p) for x (for example, using the algorithms in §3.51), one knows either one or two (depending on the number of solutions to the quadratic equation) entire sequences of other values y for which p divides q(y). The sieving process is the following. An array Q[ ] indexed by x, −M ≤ x ≤ M , is created and the xth entry is initialized to blg |q(x)|c. Let x1 , x2 be the solutions to q(x) ≡ 0 (mod p), where p is an odd prime in the factor base. Then the value blg pc is subtracted from those entries Q[x] in the array for which x ≡ x1 or x2 (mod p) and −M ≤ x ≤ M . This is repeated for each odd prime p in the factor base. (The case of p = 2 and prime powers can be handled in a similar manner.) After the sieving, the array entries Q[x] with values near 0 are most likely to be pt -smooth (roundoff errors must be taken into account), and this can be verified by

factoring q(x) by trial division. 3.24 Note (running time of the quadratic sieve) To optimize the running time of the quadratic sieve, the size of the factor base should be judiciously chosen. The optimal selection of t ≈ Ln [ 12 , 12 ] (see Example√2.61) is derived from knowledge concerning the distribution of smooth integers close to n. With this choice, Algorithm 321 with sieving (Note 323) has an expected running time of Ln [ 12 , 1], independent of the size of the factors of n. 3.25 Note (multiple polynomial variant) In order to collect a sufficient number of (ai , bi ) pairs, the sieving interval must be quite large. From equation (31) it can be seen that |q(x)| increases linearly with |x|, and consequently the probability of smoothness decreases To overcome this problem, a variant (the multiple polynomial quadratic sieve) was proposed whereby many appropriately-chosen quadratic polynomials can be used instead of just q(x), each polynomial being sieved over an interval of

much smaller length. This variant also has an expected running time of Ln [ 12 , 1], and is the method of choice in practice. 3.26 Note (parallelizing the quadratic sieve) The multiple polynomial variant of the quadratic sieve is well suited for parallelization. Each node of a parallel computer, or each computer in a network of computers, simply sieves through different collections of polynomials. Any (ai , bi ) pair found is reported to a central processor. Once sufficient pairs have been collected, the corresponding system of linear equations is solved on a single (possibly parallel) computer. 3.27 Note (quadratic sieve vs elliptic curve factoring) The elliptic curve factoring algorithm (§3.24) has the same4 expected (asymptotic) running time as the quadratic sieve factoring algorithm in the special case when n is the product of two primes of equal size. However, for such numbers, the quadratic sieve is superior in practice because the main steps in the algorithm are single

precision operations, compared to the much more computationally intensive multi-precision elliptic curve operations required in the elliptic curve algorithm. 4 This does not take into account the different o(1) terms in the two expressions Ln [ 12 , 1]. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 98 Ch. 3 Number-Theoretic Reference Problems 3.27 Number field sieve factoring For several years it was believed by some people that a running time of Ln [ 12 , 1] was, in fact, the best achievable by any integer factorization algorithm. This barrier was broken in 1990 with the discovery of the number field sieve. Like the quadratic sieve, the number field sieve is an algorithm in the random square family of methods (§3.25) That is, it attempts to find integers x and y such that x2 ≡ y 2 (mod n) and x 6≡ ±y (mod n). To achieve this goal, two factor bases are used, one consisting of all prime numbers less than some bound, and the other consisting

of all prime ideals of norm less than some bound in the ring of integers of a suitably-chosen algebraic number field. The details of the algorithm are quite complicated, and are beyond the scope of this book. A special version of the algorithm (the special number field sieve) applies to integers of the form n = re − s for small r and |s|, and has an expected running time of Ln [ 13 , c], where c = (32/9)1/3 ≈ 1.526 The general version of the algorithm, sometimes called the general number field sieve, applies to all integers and has an expected running time of Ln [ 13 , c], where c = (64/9)1/3 ≈ 1.923 This is, asymptotically, the fastest algorithm known for integer factorization The primary reason why the running time of the number field sieve is smaller than that of the quadratic sieve is that the candidate smooth numbers in the former are much smaller than those in the latter. The general number field sieve was at first believed to be slower than the quadratic sieve for

factoring integers having fewer than 150 decimal digits. However, experiments in 1994–1996 have indicated that the general number field sieve is substantially faster than the quadratic sieve even for numbers in the 115 digit range. This implies that the crossover point between the effectiveness of the quadratic sieve vs. the general number field sieve may be 110–120 digits. For this reason, the general number field sieve is considered the current champion of all general-purpose factoring algorithms. 3.3 The RSA problem The intractability of the RSA problem forms the basis for the security of the RSA public-key encryption scheme (§8.2) and the RSA signature scheme (§1131) 3.28 Definition The RSA problem (RSAP) is the following: given a positive integer n that is a product of two distinct odd primes p and q, a positive integer e such that gcd(e, (p − 1)(q − 1)) = 1, and an integer c, find an integer m such that me ≡ c (mod n). In other words, the RSA problem is that of

finding eth roots modulo a composite integer n. The conditions imposed on the problem parameters n and e ensure that for each integer c ∈ {0, 1, . , n − 1} there is exactly one m ∈ {0, 1, , n − 1} such that me ≡ c (mod n). Equivalently, the function f : Zn − Zn defined as f (m) = me mod n is a permutation. 3.29 Remark (SQROOT vs RSA problems) Since p − 1 is even, it follows that e is odd In particular, e 6= 2, and hence the SQROOT problem (Definition 3.43) is not a special case of the RSA problem. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.4 The quadratic residuosity problem 99 As is shown in §8.22(i), if the factors of n are known then the RSA problem can be easily solved. This fact is stated next 3.30 Fact RSAP ≤P FACTORING That is, the RSA problem polytime reduces to the integer factorization problem. It is widely believed that the RSA and the integer factorization problems are computationally equivalent, although no proof

of this is known. 3.4 The quadratic residuosity problem The security of the Goldwasser-Micali probabilistic public-key encryption scheme (§8.7) and the Blum-Blum-Shub pseudorandom bit generator (§5.52) are both based on the apparent intractability of the quadratic residuosity problem Recall from §2.45 that if n ≥ 3 is an odd integer, then Jn is the set of all a ∈ Z∗n having Jacobi symbol 1. Recall also that Qn is the set of quadratic residues modulo n and e n = Jn − Qn . that the set of pseudosquares modulo n is defined by Q 3.31 Definition The quadratic residuosity problem (QRP) is the following: given an odd composite integer n and a ∈ Jn , decide whether or not a is a quadratic residue modulo n 3.32 Remark (QRP with a prime modulus) If n is a prime, then it is easy to decide whether  a ∈ Z∗n is a quadratic residuemodulo n since, by definition, a ∈ Qn if and only if na = 1, and the Legendre symbol na can be efficiently calculated by Algorithm 2.149 Assume now

that n is a product of two distinct odd  primes p and q. It follows from Fact 2.137 that if a ∈ Jn , then a ∈ Qn if and only if ap = 1 Thus, if the factorization of  n is known, then QRP can be solved simply by computing the Legendre symbol ap . This observation can be generalized to all integers n and leads to the following fact. 3.33 Fact QRP ≤P FACTORING That is, the QRP polytime reduces to the FACTORING problem. On the other hand, if the factorization of n is unknown, then there is no efficient procedure known for solving QRP, other than by guessing the answer. If n = pq, then the e n | (Fact 2.155) It is believed that the probability of a correct guess is 12 since |Qn | = |Q QRP is as difficult as the problem of factoring integers, although no proof of this is known. 3.5 Computing square roots in Zn The operations of squaring modulo an integer n and extracting square roots modulo an integer n are frequently used in cryptographic functions. The operation of computing

square roots modulo n can be performed efficiently when n is a prime, but is difficult when n is a composite integer whose prime factors are unknown. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 100 Ch. 3 Number-Theoretic Reference Problems 3.51 Case (i): n prime Recall from Remark 3.32 that if p is a prime, then it is easy to decide if a ∈ Z∗p is a quadratic residue modulo p. If a is, in fact, a quadratic residue modulo p, then the two square roots of a can be efficiently computed, as demonstrated by Algorithm 3.34 3.34 Algorithm Finding square roots modulo a prime p INPUT: an odd prime p and an integer a, 1 ≤ a ≤ p − 1. OUTPUT: the two square roots of a modulo p, provided a is a quadratic residue modulo p.   1. Compute the Legendre symbol ap using Algorithm 2149 If ap = −1 then return(a does not have a square root modulo p) and terminate.  2. Select integers b, 1 ≤ b ≤ p − 1, at random until one is found with pb = −1 (b

is a quadratic non-residue modulo p.) 3. By repeated division by 2, write p − 1 = 2s t, where t is odd 4. Compute a−1 mod p by the extended Euclidean algorithm (Algorithm 2142) 5. Set c←bt mod p and r←a(t+1)/2 mod p (Algorithm 2143) 6. For i from 1 to s − 1 do the following: s−i−1 6.1 Compute d = (r2 · a−1 )2 mod p. 6.2 If d ≡ −1 (mod p) then set r←r · c mod p 6.3 Set c←c2 mod p 7. Return(r, −r) Algorithm 3.34 is a randomized algorithm because of the manner in which the quadratic non-residue b is selected in step 2. No deterministic polynomial-time algorithm for finding a quadratic non-residue modulo a prime p is known (see Remark 2.151) 3.35 Fact Algorithm 334 has an expected running time of O((lg p)4 ) bit operations This running time is obtained by observing that the dominant step (step 6) is executed s−1 times, each iteration involving a modular exponentiation and thus taking O((lg p)3 ) bit operations (Table 2.5) Since in the worst case s = O(lg p),

the running time of O((lg p)4 ) follows. When s is small, the loop in step 6 is executed only a small number of times, and the running time of Algorithm 3.34 is O((lg p)3 ) bit operations This point is demonstrated next for the special cases s = 1 and s = 2. Specializing Algorithm 3.34 to the case s = 1 yields the following simple deterministic algorithm for finding square roots when p ≡ 3 (mod 4). 3.36 Algorithm Finding square roots modulo a prime p where p ≡ 3 (mod 4) INPUT: an odd prime p where p ≡ 3 (mod 4), and a square a ∈ Qp . OUTPUT: the two square roots of a modulo p. 1. Compute r = a(p+1)/4 mod p (Algorithm 2143) 2. Return(r, −r) Specializing Algorithm 3.34 to the case s = 2, and using the fact that 2 is a quadratic non-residue modulo p when p ≡ 5 (mod 8), yields the following simple deterministic algorithm for finding square roots when p ≡ 5 (mod 8). c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.5 Computing square roots in Zn

101 3.37 Algorithm Finding square roots modulo a prime p where p ≡ 5 (mod 8) INPUT: an odd prime p where p ≡ 5 (mod 8), and a square a ∈ Qp . OUTPUT: the two square roots of a modulo p. 1. Compute d = a(p−1)/4 mod p (Algorithm 2143) 2. If d = 1 then compute r = a(p+3)/8 mod p 3. If d = p − 1 then compute r = 2a(4a)(p−5)/8 mod p 4. Return(r, −r) 3.38 Fact Algorithms 336 and 337 have running times of O((lg p)3 ) bit operations Algorithm 3.39 for finding square roots modulo p is preferable to Algorithm 334 when p − 1 = 2s t with s large. 3.39 Algorithm Finding square roots modulo a prime p INPUT: an odd prime p and a square a ∈ Qp . OUTPUT: the two square roots of a modulo p. 1. Choose random b ∈ Zp until b2 − 4a is a quadratic non-residue modulo p, ie,  b2 −4a = −1. p 2. Let f be the polynomial x2 − bx + a in Zp [x] 3. Compute r = x(p+1)/2 mod f using Algorithm 2227 (Note: r will be an integer) 4. Return(r, −r) 3.40 Fact Algorithm 339 has an expected

running time of O((lg p)3 ) bit operations 3.41 Note (computing square roots in a finite field) Algorithms 334, 336, 337, and 339 can be extended in a straightforward manner to find square roots in any finite field Fq of odd order q = pm , p prime, m ≥ 1. Square roots in finite fields of even order can also be computed efficiently via Fact 3.42 3.42 Fact Each element a ∈ F2m has exactly one square root, namely a2 m−1 . 3.52 Case (ii): n composite The discussion in this subsection is restricted to the case of computing square roots modulo n, where n is a product of two distinct odd primes p and q. However, all facts presented here generalize to the case where n is an arbitrary composite integer. Unlike the case where n is a prime, the problem of deciding whether a given a ∈ Z∗n is a quadratic residue modulo a composite integer n, is believed to be a difficult problem.  Certainly, if the Jacobi symbol na = −1, then a is a quadratic non-residue. On the other hand, if na =

1, then deciding whether or not a is a quadratic residue is precisely the quadratic residuosity problem, considered in §3.4 3.43 Definition The square root modulo n problem (SQROOT) is the following: given a composite integer n and a quadratic residue a modulo n (ie a ∈ Qn ), find a square root of a modulo n. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 102 Ch. 3 Number-Theoretic Reference Problems If the factors p and q of n are known, then the SQROOT problem can be solved efficiently by first finding square roots of a modulo p and modulo q, and then combining them using the Chinese remainder theorem (Fact 2.120) to obtain the square roots of a modulo n. The steps are summarized in Algorithm 344, which, in fact, finds all of the four square roots of a modulo n. 3.44 Algorithm Finding square roots modulo n given its prime factors p and q INPUT: an integer n, its prime factors p and q, and a ∈ Qn . OUTPUT: the four square roots of a modulo n.

1. Use Algorithm 339 (or Algorithm 336 or 337, if applicable) to find the two square roots r and −r of a modulo p. 2. Use Algorithm 339 (or Algorithm 336 or 337, if applicable) to find the two square roots s and −s of a modulo q. 3. Use the extended Euclidean algorithm (Algorithm 2107) to find integers c and d such that cp + dq = 1. 4. Set x←(rdq + scp) mod n and y←(rdq − scp) mod n 5. Return(±x mod n, ±y mod n) 3.45 Fact Algorithm 344 has an expected running time of O((lg p)3 ) bit operations Algorithm 3.44 shows that if one can factor n, then the SQROOT problem is easy More precisely, SQROOT ≤P FACTORING. The converse of this statement is also true, as stated in Fact 3.46 3.46 Fact FACTORING ≤P SQROOT That is, the FACTORING problem polytime reduces to the SQROOT problem. Hence, since SQROOT ≤P FACTORING, the FACTORING and SQROOT problems are computationally equivalent. Justification. Suppose that one has a polynomial-time algorithm A for solving the SQROOT problem

This algorithm can then be used to factor a given composite integer n as follows. Select an integer x at random with gcd(x, n) = 1, and compute a = x2 mod n Next, algorithm A is run with inputs a and n, and a square root y of a modulo n is returned. If y ≡ ±x (mod n), then the trial fails, and the above procedure is repeated with a new x chosen at random. Otherwise, if y 6≡ ±x (mod n), then gcd(x − y, n) is guaranteed to be a non-trivial factor of n (Fact 3.18), namely, p or q Since a has four square roots modulo n (±x and ±z with ±z 6≡ ±x (mod n)), the probability of success for each attempt is 12 . Hence, the expected number of attempts before a factor of n is obtained is two, and consequently the procedure runs in expected polynomial time.  3.47 Note (strengthening of Fact 346) The proof of Fact 346 can be easily modified to establish the following stronger result Let c ≥ 1 be any constant If there is an algorithm A which, given n, can find a square root modulo n in

polynomial time for a (lg1n)c fraction of all quadratic residues a ∈ Qn , then the algorithm A can be used to factor n in expected polynomial time. The implication of this statement is that if the problem of factoring n is difficult, then for almost all a ∈ Qn it is difficult to find square roots modulo n. The computational equivalence of the SQROOT and FACTORING problems was the basis of the first “provably secure” public-key encryption and signature schemes, presented in §8.3 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.6 The discrete logarithm problem 103 3.6 The discrete logarithm problem The security of many cryptographic techniques depends on the intractability of the discrete logarithm problem. A partial list of these includes Diffie-Hellman key agreement and its derivatives (§12.6), ElGamal encryption (§84), and the ElGamal signature scheme and its variants (§11.5) This section summarizes the current knowledge regarding

algorithms for solving the discrete logarithm problem. Unless otherwise specified, algorithms in this section are described in the general setting of a (multiplicatively written) finite cyclic group G of order n with generator α (see Definition 2.167) For a more concrete approach, the reader may find it convenient to think of G as the multiplicative group Z∗p of order p − 1, where the group operation is simply multiplication modulo p. 3.48 Definition Let G be a finite cyclic group of order n Let α be a generator of G, and let β ∈ G. The discrete logarithm of β to the base α, denoted logα β, is the unique integer x, 0 ≤ x ≤ n − 1, such that β = αx . 3.49 Example Let p = 97 Then Z∗97 is a cyclic group of order n = 96 A generator of Z∗97 is  α = 5. Since 532 ≡ 35 (mod 97), log5 35 = 32 in Z∗97 The following are some elementary facts about logarithms. 3.50 Fact Let α be a generator of a cyclic group G of order n, and let β, γ ∈ G Let s be an integer.

Then logα (βγ) = (logα β + logα γ) mod n and logα (β s ) = s logα β mod n The groups of most interest in cryptography are the multiplicative group F∗q of the finite field Fq (§2.6), including the particular cases of the multiplicative group Z∗p of the integers modulo a prime p, and the multiplicative group F∗2m of the finite field F2m of characteristic two. Also of interest are the group of units Z∗n where n is a composite integer, the group of points on an elliptic curve defined over a finite field, and the jacobian of a hyperelliptic curve defined over a finite field. 3.51 Definition The discrete logarithm problem (DLP) is the following: given a prime p, a generator α of Z∗p , and an element β ∈ Z∗p , find the integer x, 0 ≤ x ≤ p − 2, such that αx ≡ β (mod p). 3.52 Definition The generalized discrete logarithm problem (GDLP) is the following: given a finite cyclic group G of order n, a generator α of G, and an element β ∈ G, find the integer

x, 0 ≤ x ≤ n − 1, such that αx = β. The discrete logarithm problem in elliptic curve groups and in the jacobians of hyperelliptic curves are not explicitly considered in this section. The discrete logarithm problem in Z∗n is discussed further in §3.8 3.53 Note (difficulty of the GDLP is independent of generator) Let α and γ be two generators of a cyclic group G of order n, and let β ∈ G. Let x = logα β, y = logγ β, and z = logα γ Then αx = β = γ y = (αz )y . Consequently x = zy mod n, and logγ β = (logα β) (logα γ)−1 mod n. This means that any algorithm which computes logarithms to the base α can be used to compute logarithms to any other base γ that is also a generator of G. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 104 Ch. 3 Number-Theoretic Reference Problems 3.54 Note (generalization of GDLP) A more general formulation of the GDLP is the following: given a finite group G and elements α, β ∈ G, find an

integer x such that αx = β, provided that such an integer exists. In this formulation, it is not required that G be a cyclic group, and, even if it is, it is not required that α be a generator of G. This problem may be harder to solve, in general, than GDLP. However, in the case where G is a cyclic group (for example if G is the multiplicative group of a finite field) and the order of α is known, it can be easily recognized whether an integer x satisfying αx = β exists. This is because of the following fact: if G is a cyclic group, α is an element of order n in G, and β ∈ G, then there exists an integer x such that αx = β if and only if β n = 1. 3.55 Note (solving the DLP in a cyclic group G of order n is in essence computing an isomorphism between G and Zn ) Even though any two cyclic groups of the same order are isomorphic (that is, they have the same structure although the elements may be written in different representations), an efficient algorithm for computing

logarithms in one group does not necessarily imply an efficient algorithm for the other group. To see this, consider that every cyclic group of order n is isomorphic to the additive cyclic group Zn , i.e, the set of integers {0, 1, 2, . , n − 1} where the group operation is addition modulo n Moreover, the discrete logarithm problem in the latter group, namely, the problem of finding an integer x such that ax ≡ b (mod n) given a, b ∈ Zn , is easy as shown in the following. First note that there does not exist a solution x if d = gcd(a, n) does not divide b (Fact 2.119) Otherwise, if d divides b, the extended Euclidean algorithm (Algorithm 2.107) can be used to find integers s and t such that as + nt = d. Multiplying both sides of this equation by the integer b/d gives a(sb/d) + n(tb/d) = b. Reducing this equation modulo n yields a(sb/d) ≡ b (mod n) and hence x = (sb/d) mod n is the desired (and easily obtainable) solution. The known algorithms for the DLP can be categorized as

follows: 1. algorithms which work in arbitrary groups, eg, exhaustive search (§361), the babystep giant-step algorithm (§362), Pollard’s rho algorithm (§363); 2. algorithms which work in arbitrary groups but are especially efficient if the order of the group has only small prime factors, e.g, Pohlig-Hellman algorithm (§364); and 3. the index-calculus algorithms (§365) which are efficient only in certain groups 3.61 Exhaustive search The most obvious algorithm for GDLP (Definition 3.52) is to successively compute α0 , α1 , α2 , . until β is obtained This method takes O(n) multiplications, where n is the order of α, and is therefore inefficient if n is large (i.e in cases of cryptographic interest) 3.62 Baby-step giant-step algorithm √ Let m = d ne, where n is the order of α. The baby-step giant-step algorithm is a timememory trade-off of the method of exhaustive search and is based on the following observation If β = αx , then one can write x = im+j, where 0 ≤ i,

j < m Hence, αx = αim αj , which implies β(α−m )i = αj . This suggests the following algorithm for computing x c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.6 The discrete logarithm problem 105 3.56 Algorithm Baby-step giant-step algorithm for computing discrete logarithms INPUT: a generator α of a cyclic group G of order n, and an element β ∈ G. OUTPUT: the discrete logarithm x = logα β. √ 1. Set m←d ne 2. Construct a table with entries (j, αj ) for 0 ≤ j < m Sort this table by second component. (Alternatively, use conventional hashing on the second component to store the entries in a hash table; placing an entry, and searching for an entry in the table takes constant time.) 3. Compute α−m and set γ←β 4. For i from 0 to m − 1 do the following: 4.1 Check if γ is the second component of some entry in the table 4.2 If γ = αj then return(x = im + j) 4.3 Set γ←γ · α−m √ √ Algorithm 3.56 requires

storage√for O( n) group elements The table takes O( n) multiplications to construct, and O( n lg n) comparisons to sort. Having constructed this √ √ table, step 4 takes O( n) multiplications and O( n) table look-ups. Under the assumption that a group multiplication takes more time than lg n comparisons, the running time of Algorithm 3.56 can be stated more concisely as follows √ 3.57 Fact The running time of the baby-step giant-step algorithm (Algorithm 356) is O( n) group multiplications. 3.58 Example (baby-step giant-step algorithm for logarithms in Z∗113 ) Let p = 113 The element α = 3 is a generator of Z∗113 of order n = 112 Consider β = 57 Then log3 57 is computed as follows. √ 1. Set m←d 112e = 11 2. Construct a table whose entries are (j, αj mod p) for 0 ≤ j < 11: j 3j mod 113 0 1 1 3 2 9 3 27 4 81 5 17 6 51 7 40 8 7 9 21 10 63 and sort the table by second component: j 0 1 8 2 5 9 3 7 6 10 4 3j mod 113 1 3 7 9 17 21 27 40

51 63 81 3. Using Algorithm 2142, compute α−1 = 3−1 mod 113 = 38 and then compute α−m = 3811 mod 113 = 58. 4. Next, γ = βα−mi mod 113 for i = 0, 1, 2, is computed until a value in the second row of the table is obtained. This yields: i 0 1 2 3 4 5 6 7 8 9 γ = 57 · 58i mod 113 57 29 100 37 112 55 26 39 2 3 Finally, since βα−9m = 3 = α1 , β = α100 and, therefore, log3 57 = 100.  3.59 Note (restricted exponents) In order to improve performance, some cryptographic protocols which use exponentiation in Z∗p select exponents of a special form, eg having small Hamming weight. (The Hamming weight of an integer is the number of ones in its binary representation.) Suppose that p is a k-bit prime,  and only exponents of Hamming weight t are used. The number of such exponents is kt Algorithm 356 can be modified to search  k the exponent space in roughly t/2 steps. The algorithm also applies to exponents that are restricted in certain other

ways, and extends to all finite groups. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 106 Ch. 3 Number-Theoretic Reference Problems 3.63 Pollard’s rho algorithm for logarithms Pollard’s rho algorithm (Algorithm 3.60) for computing discrete logarithms is a randomized algorithm with the same expected running time as the baby-step giant-step algorithm (Algorithm 3.56), but which requires a negligible amount of storage For this reason, it is far preferable to Algorithm 3.56 for problems of practical interest For simplicity, it is assumed in this subsection that G is a cyclic group whose order n is prime. The group G is partitioned into three sets S1 , S2 , and S3 of roughly equal size based on some easily testable property. Some care must be exercised in selecting the partition; for example, 1 6∈ S2 . Define a sequence of group elements x0 , x1 , x2 , by x0 = 1 and   β · xi , if xi ∈ S1 , def xi+1 = f (xi ) = (3.2) x2 , if xi ∈ S2

,  i α · xi , if xi ∈ S3 , for i ≥ 0. This sequence of group elements in turn defines two sequences of integers a0 , a1 , a2 , . and b0 , b1 , b2 , satisfying xi = αai β bi for i ≥ 0: a0 = 0, b0 = 0, and for i ≥ 0,  if xi ∈ S1 ,  ai , ai+1 = 2ai mod n, if xi ∈ S2 , (3.3)  ai + 1 mod n, if xi ∈ S3 , and bi+1   bi + 1 mod n, if xi ∈ S1 , 2bi mod n, if xi ∈ S2 , =  bi , if xi ∈ S3 . (3.4) Floyd’s cycle-finding algorithm (Note 3.8) can then be utilized to find two group elements xi and x2i such that xi = x2i . Hence αai β bi = αa2i β b2i , and so β bi −b2i = αa2i −ai Taking logarithms to the base α of both sides of this last equation yields (bi − b2i ) · logα β ≡ (a2i − ai ) (mod n). Provided bi 6≡ b2i (mod n) (note: bi ≡ b2i occurs with negligible probability), this equation can then be efficiently solved to determine logα β. 3.60 Algorithm Pollard’s rho algorithm for computing discrete logarithms INPUT:

a generator α of a cyclic group G of prime order n, and an element β ∈ G. OUTPUT: the discrete logarithm x = logα β. 1. Set x0 ←1, a0 ←0, b0 ←0 2. For i = 1, 2, do the following: 2.1 Using the quantities xi−1 , ai−1 , bi−1 , and x2i−2 , a2i−2 , b2i−2 computed previously, compute xi , ai , bi and x2i , a2i , b2i using equations (32), (33), and (34) 2.2 If xi = x2i , then do the following: Set r←bi − b2i mod n. If r = 0 then terminate the algorithm with failure; otherwise, compute x = r−1 (a2i − ai ) mod n and return(x). In the rare case that Algorithm 3.60 terminates with failure, the procedure can be repeated by selecting random integers a0 , b0 in the interval [1, n − 1], and starting with x0 = αa0 β b0 . Example 361 with artificially small parameters illustrates Pollard’s rho algorithm c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.6 The discrete logarithm problem 107 3.61 Example (Pollard’s rho algorithm

for logarithms in a subgroup of Z∗383 ) The element α = 2 is a generator of the subgroup G of Z∗383 of order n = 191. Suppose β = 228 Partition the elements of G into three subsets according to the rule x ∈ S1 if x ≡ 1 (mod 3), x ∈ S2 if x ≡ 0 (mod 3), and x ∈ S3 if x ≡ 2 (mod 3). Table 32 shows the values of xi , ai , bi , x2i , a2i , and b2i at the end of each iteration of step 2 of Algorithm 3.60 Note that x14 = x28 = 144. Finally, compute r = b14 − b28 mod 191 = 125, r−1 = 125−1 mod 191 = 136, and r−1 (a28 − a14 ) mod 191 = 110. Hence, log2 228 = 110  i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 xi 228 279 92 184 205 14 28 256 152 304 372 121 12 144 ai 0 0 0 1 1 1 2 2 2 3 3 6 6 12 bi 1 2 4 4 5 6 6 7 8 8 9 18 19 38 x2i 279 184 14 256 304 121 144 235 72 14 256 304 121 144 a2i 0 1 1 2 3 6 12 48 48 96 97 98 5 10 b2i 2 4 6 7 8 18 38 152 154 118 119 120 51 104 Table 3.2: Intermediate steps of Pollard’s rho algorithm in Example 361 3.62 Fact Let G be a group

of order n, a prime Assume that the function f : G − G defined by equation (32) behaves like a random function Then √ the expected running time of Pollard’s rho algorithm for discrete logarithms in G is O( n) group operations. Moreover, the algorithm requires negligible storage. 3.64 Pohlig-Hellman algorithm Algorithm 3.63 for computing logarithms takes advantage of the factorization of the order n of the group G. Let n = pe11 pe22 · · · perr be the prime factorization of n If x = logα β, then the approach is to determine xi = x mod pei i for 1 ≤ i ≤ r, and then use Gauss’s algorithm (Algorithm 2.121) to recover x mod n Each integer xi is determined by computing the digits l0 , l1 , . , lei −1 in turn of its pi -ary representation: xi = l0 +l1 pi +· · ·+lei −1 piei −1 , where 0 ≤ lj ≤ pi − 1. To see that the output of Algorithm 3.63 is correct, observe first that in step 23 the j−1 order of α is q. Next, at iteration j of step 24, γ = αl0 +l1

q+···+lj−1 q Hence, β = (β/γ)n/q = (αx−l0 −l1 q−···−lj−1 q j+1 = (αn/q j+1 )xi −l0 −l1 q−···−lj−1 q = (αn/q j+1 )lj q j j−1 )n/q j+1 j−1 +···+le−1 qe−1 = (αn/q )lj +···+le−1 q e−1−j = (α)lj , the last equality being true because α has order q. Hence, logα β is indeed equal to lj Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 108 Ch. 3 Number-Theoretic Reference Problems 3.63 Algorithm Pohlig-Hellman algorithm for computing discrete logarithms INPUT: a generator α of a cyclic group G of order n, and an element β ∈ G. OUTPUT: the discrete logarithm x = logα β. 1. Find the prime factorization of n: n = pe11 pe22 · · · perr , where ei ≥ 1 2. For i from 1 to r do the following: (Compute xi = l0 + l1 pi + · · · + lei −1 piei −1 , where xi = x mod pei i ) 2.1 (Simplify the notation) Set q←pi and e←ei 2.2 Set γ←1 and l−1 ←0 2.3 Compute α←αn/q

2.4 (Compute the lj ) For j from 0 to e − 1 do the following: Compute γ←γαlj−1 q and β←(βγ −1 )n/q . Compute lj ← logα β (e.g, using Algorithm 356; see Note 367(iii)) 2.5 Set xi ←l0 + l1 q + · · · + le−1 q e−1 3. Use Gauss’s algorithm (Algorithm 2121) to compute the integer x, 0 ≤ x ≤ n − 1, such that x ≡ xi (mod pei i ) for 1 ≤ i ≤ r. 4. Return(x) j−1 j+1 Example 3.64 illustrates Algorithm 363 with artificially small parameters 3.64 Example (Pohlig-Hellman algorithm for logarithms in Z∗251 ) Let p = 251 The element α = 71 is a generator of Z∗251 of order n = 250. Consider β = 210 Then x = log71 210 is computed as follows. 1. The prime factorization of n is 250 = 2 · 53 2. (a) (Compute x1 = x mod 2) Compute α = αn/2 mod p = 250 and β = β n/2 mod p = 250. Then x1 = log250 250 = 1. (b) (Compute x2 = x mod 53 = l0 + l1 5 + l2 52 ) i. Compute α = αn/5 mod p = 20 ii. Compute γ = 1 and β = (βγ −1 )n/5 mod p = 149 Using

exhaustive search,5 compute l0 = log20 149 = 2. iii. Compute γ = γα2 mod p = 21 and β = (βγ −1 )n/25 mod p = 113 Using exhaustive search, compute l1 = log20 113 = 4. iv. Compute γ = γα4·5 mod p = 115 and β = (βγ −1 )(p−1)/125 mod p = 149. Using exhaustive search, compute l2 = log20 149 = 2 Hence, x2 = 2 + 4 · 5 + 2 · 52 = 72. 3. Finally, solve the pair of congruences x ≡ 1 (mod 2), x ≡ 72 (mod 125) to get x = log71 210 = 197.  3.65 Fact Given the factorization of n, the running time of the Pohlig-Hellman algorithm (AlPr √ gorithm 3.63) is O( i=1 ei (lg n + pi )) group multiplications 3.66 Note (effectiveness of Pohlig-Hellman) Fact 365 implies that the Pohlig-Hellman algorithm is efficient only if each prime divisor pi of n is relatively small; that is, if n is a smooth 5 Exhaustive search is preferable to Algorithm 3.56 when the group is very small (here the order of α is 5) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.6

The discrete logarithm problem 109 integer (Definition 3.13) An example of a group in which the Pohlig-Hellman algorithm is effective follows. Consider the multiplicative group Z∗p where p is the 107-digit prime: p = 227088231986781039743145181950291021585250524967592855 96453269189798311427475159776411276642277139650833937. The order of Z∗p is n = p − 1 = 24 · 1047298 · 2247378 · 3503774. Since the largest prime divisor of p − 1 is only 350377, it is relatively easy to compute logarithms in this group using the Pohlig-Hellman algorithm. 3.67 Note (miscellaneous) (i) If n is a prime, then Algorithm 3.63 (Pohlig-Hellman) is the same as baby-step giantstep (Algorithm 356) (ii) In step 1 of Algorithm 3.63, a factoring algorithm which finds small factors first (eg, Algorithm 3.9) should be employed; if the order n is not a smooth integer, then Algorithm 363 is inefficient anyway (iii) The storage required for Algorithm 3.56 in step 24 can be eliminated by using instead

Pollard’s rho algorithm (Algorithm 3.60) 3.65 Index-calculus algorithm The index-calculus algorithm is the most powerful method known for computing discrete logarithms. The technique employed does not apply to all groups, but when it does, it often gives a subexponential-time algorithm The algorithm is first described in the general setting of a cyclic group G (Algorithm 3.68) Two examples are then presented to illustrate how the index-calculus algorithm works in two kinds of groups that are used in practical applications, namely Z∗p (Example 3.69) and F∗2m (Example 370) The index-calculus algorithm requires the selection of a relatively small subset S of elements of G, called the factor base, in such a way that a significant fraction of elements of G can be efficiently expressed as products of elements from S. Algorithm 368 proceeds to precompute a database containing the logarithms of all the elements in S, and then reuses this database each time the logarithm of a particular

group element is required. The description of Algorithm 3.68 is incomplete for two reasons Firstly, a technique for selecting the factor base S is not specified. Secondly, a method for efficiently generating relations of the form (3.5) and (37) is not specified The factor base S must be a subset of G that is small (so that the system of equations to be solved in step 3 is not too large), but not too small (so that the expected number of trials to generate a relation (3.5) or (37) is not too large). Suitable factor bases and techniques for generating relations are known for some cyclic groups including Z∗p (see §3.65(i)) and F∗2m (see §365(ii)), and, moreover, the multiplicative group F∗q of a general finite field Fq . 3.68 Algorithm Index-calculus algorithm for discrete logarithms in cyclic groups INPUT: a generator α of a cyclic group G of order n, and an element β ∈ G. OUTPUT: the discrete logarithm y = logα β. 1. (Select a factor base S) Choose a subset S = {p1 , p2 ,

, pt } of G such that a “significant proportion” of all elements in G can be efficiently expressed as a product of elements from S. 2. (Collect linear relations involving logarithms of elements in S) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 110 Ch. 3 Number-Theoretic Reference Problems 2.1 Select a random integer k, 0 ≤ k ≤ n − 1, and compute αk 2.2 Try to write αk as a product of elements in S: α = k t Y pci i , ci ≥ 0. (3.5) i=1 If successful, take logarithms of both sides of equation (3.5) to obtain a linear relation k≡ t X ci logα pi (mod n). (3.6) i=1 2.3 Repeat steps 21 and 22 until t + c relations of the form (36) are obtained (c is a small positive integer, e.g c = 10, such that the system of equations given by the t + c relations has a unique solution with high probability). 3. (Find the logarithms of elements in S) Working modulo n, solve the linear system of t + c equations (in t unknowns) of the form

(3.6) collected in step 2 to obtain the values of logα pi , 1 ≤ i ≤ t. 4. (Compute y) 4.1 Select a random integer k, 0 ≤ k ≤ n − 1, and compute β · αk 4.2 Try to write β · αk as a product of elements in S: β · αk = t Y pdi i , di ≥ 0. (3.7) i=1 If the attempt is unsuccessful then repeat step 4.1P Otherwise, taking logarithms of both sides of equation (3.7) yields logα β = ( ti=1 di logα pi − k) mod n; Pt thus, compute y = ( i=1 di logα pi − k) mod n and return(y). (i) Index-calculus algorithm in Z∗p For the field Zp , p a prime, the factor base S can be chosen as the first t prime numbers. A relation (3.5) is generated by computing αk mod p and then using trial division to check whether this integer is a product of primes in S. Example 369 illustrates Algorithm 368 in Z∗p on a problem with artificially small parameters. 3.69 Example (Algorithm 368 for logarithms in Z∗229 ) Let p = 229 The element α = 6 is a generator of Z∗229 of order n =

228. Consider β = 13 Then log6 13 is computed as follows, using the index-calculus technique. 1. The factor base is chosen to be the first 5 primes: S = {2, 3, 5, 7, 11} 2. The following six relations involving elements of the factor base are obtained (unsuccessful attempts are not shown): 6100 mod 229 = 180 = 22 · 32 · 5 618 mod 229 = 176 = 24 · 11 612 mod 229 = 165 = 3 · 5 · 11 662 mod 229 = 154 = 2 · 7 · 11 6143 mod 229 = 198 = 2 · 32 · 11 6206 mod 229 = 210 = 2 · 3 · 5 · 7. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.6 The discrete logarithm problem 111 These relations yield the following six equations involving the logarithms of elements in the factor base: 100 ≡ 2 log6 2 + 2 log6 3 + log6 5 (mod 228) 18 ≡ 4 log6 2 + log6 11 (mod 228) 12 ≡ log6 3 + log6 5 + log6 11 (mod 228) 62 ≡ log6 2 + log6 7 + log6 11 (mod 228) 143 ≡ log6 2 + 2 log6 3 + log6 11 (mod 228) 206 ≡ log6 2 + log6 3 + log6 5 + log6 7 (mod 228). 3. Solving

the linear system of six equations in five unknowns (the logarithms xi = log6 pi ) yields the solutions log6 2 = 21, log6 3 = 208, log6 5 = 98, log6 7 = 107, and log6 11 = 162. 4. Suppose that the integer k = 77 is selected Since β · αk = 13 · 677 mod 229 = 147 = 3 · 72 , it follows that log6 13 = (log6 3 + 2 log6 7 − 77) mod 228 = 117.  (ii) Index-calculus algorithm in F∗2m The elements of the finite field F2m are represented as polynomials in Z2 [x] of degree at most m−1, where multiplication is performed modulo a fixed irreducible polynomial f (x) of degree m in Z2 [x] (see §2.6) The factor base S can be chosen as the set of all irreducible polynomials in Z2 [x] of degree at most some prescribed bound b. A relation (35) is generated by computing αk mod f (x) and then using trial division to check whether this polynomial is a product of polynomials in S Example 370 illustrates Algorithm 368 in F∗2m on a problem with artificially small parameters. 3.70 Example

(Algorithm 368 for logarithms in F∗27 ) The polynomial f (x) = x7 + x + 1 is irreducible over Z2 . Hence, the elements of the finite field F27 of order 128 can be represented as the set of all polynomials in Z2 [x] of degree at most 6, where multiplication is performed modulo f (x). The order of F∗27 is n = 27 − 1 = 127, and α = x is a generator of F∗27 . Suppose β = x4 + x3 + x2 + x + 1 Then y = logx β can be computed as follows, using the index-calculus technique. 1. The factor base is chosen to be the set of all irreducible polynomials in Z2 [x] of degree at most 3: S = {x, x + 1, x2 + x + 1, x3 + x + 1, x3 + x2 + 1}. 2. The following five relations involving elements of the factor base are obtained (unsuccessful attempts are not shown): x18 mod f (x) = x6 + x4 x105 mod f (x) = x6 + x5 + x4 + x = x4 (x + 1)2 = x(x + 1)2 (x3 + x2 + 1) x72 mod f (x) = x6 + x5 + x3 + x2 = x2 (x + 1)2 (x2 + x + 1) x45 mod f (x) = x5 + x2 + x + 1 = (x + 1)2 (x3 + x + 1) x121 mod f (x) =

x6 + x5 + x4 + x3 + x2 + x + 1 = (x3 + x + 1)(x3 + x2 +1). These relations yield the following five equations involving the logarithms of elements in the factor base (for convenience of notation, let p1 = logx x, p2 = logx (x+ Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 112 Ch. 3 Number-Theoretic Reference Problems 1), p3 = logx (x2 + x + 1), p4 = logx (x3 + x + 1), and p5 = logx (x3 + x2 + 1)): 18 ≡ 4p1 + 2p2 (mod 127) 105 ≡ p1 + 2p2 + p5 (mod 127) 72 ≡ 2p1 + 2p2 + p3 (mod 127) 45 ≡ 2p2 + p4 (mod 127) 121 ≡ p4 + p5 (mod 127). 3. Solving the linear system of five equations in five unknowns yields the values p1 = 1, p2 = 7, p3 = 56, p4 = 31, and p5 = 90. 4. Suppose k = 66 is selected Since βαk = (x4 + x3 + x2 + x + 1)x66 mod f (x) = x5 + x3 + x = x(x2 + x + 1)2 , it follows that logx (x4 + x3 + x2 + x + 1) = (p1 + 2p3 − 66) mod 127 = 47.  3.71 Note (running time of Algorithm 368) To optimize the running time of the index-calculus

algorithm, the size t of the factor base should be judiciously chosen. The optimal selection relies on knowledge concerning the distribution of smooth integers in the interval [1, p − 1] for the case of Z∗p , and for the case of F∗2m on the distribution of smooth polynomials (that is, polynomials all of whose irreducible factors have relatively small degrees) among polynomials in F2 [x] of degree less than m. With an optimal choice of t, the index-calculus algorithm as described above for Z∗p and F∗2m has an expected running time of Lq [ 12 , c] where q = p or q = 2m , and c > 0 is a constant. 3.72 Note (fastest algorithms known for discrete logarithms in Z∗p and F∗2m ) Currently, the best algorithm known for computing logarithms in F∗2m is a variation of the index-calculus algorithm called Coppersmith’s algorithm, with an expected running time of L2m [ 13 , c] for some constant c < 1.587 The best algorithm known for computing logarithms in Z∗p is a variation

of the index-calculus algorithm called the number field sieve, with an expected running time of Lp [ 13 , 1.923] The latest efforts in these directions are surveyed in the Notes section (§3.12) 3.73 Note (parallelization of the index-calculus algorithm) (i) For the optimal choice of parameters, the most time-consuming phase of the indexcalculus algorithm is usually the generation of relations involving factor base logarithms (step 2 of Algorithm 3.68) The work for this stage can be easily distributed among a network of processors by simply having the processors search for relations independently of each other. The relations generated are collected by a central processor When enough relations have been generated, the corresponding system of linear equations can be solved (step 3 of Algorithm 368) on a single (possibly parallel) computer. (ii) The database of factor base logarithms need only be computed once for a given finite field. Relative to this, the computation of individual

logarithms (step 4 of Algorithm 368) is considerably faster c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.7 The Diffie-Hellman problem 113 3.66 Discrete logarithm problem in subgroups of Z∗p The discrete logarithm problem in subgroups of Z∗p has special interest because its presumed intractability is the basis for the security of the U.S Government NIST Digital Signature Algorithm (§11.51), among other cryptographic techniques Let p be a prime and q a prime divisor of p − 1. Let G be the unique cyclic subgroup of Z∗p of order q, and let α be a generator of G. Then the discrete logarithm problem in G is the following: given p, q, α, and β ∈ G, find the unique integer x, 0 ≤ x ≤ q − 1, such that αx ≡ β (mod p). The powerful index-calculus algorithms do not appear to apply directly in G. That is, one needs to apply the index-calculus algorithm in the group Z∗p itself in order to compute logarithms in the smaller group G.

Consequently, there are two approaches one could take to computing logarithms in G: 1. Use a “square-root” algorithm directly in G, such as Pollard’s rho algorithm (Algo√ rithm 3.60) The running time of this approach is O( q) ∗ 2. Let γ be a generator of Zp , and let l = (p − 1)/q Use an index-calculus algorithm in Z∗p to find integers y and z such that α = γ y and β = γ z . Then x = logα β = (z/l)(y/l)−1 mod q. (Since y and z are both divisible by l, y/l and z/l are indeed integers.) The running time of this approach is Lp [ 13 , c] if the number field sieve is used. √ Which of the two approaches is faster depends on the relative size of q and Lp [ 13 , c]. 3.7 The Diffie-Hellman problem The Diffie-Hellman problem is closely related to the well-studied discrete logarithm problem (DLP) of §3.6 It is of significance to public-key cryptography because its apparent intractability forms the basis for the security of many cryptographic schemes including

DiffieHellman key agreement and its derivatives (§126), and ElGamal public-key encryption (§8.4) 3.74 Definition The Diffie-Hellman problem (DHP) is the following: given a prime p, a generator α of Z∗p , and elements αa mod p and αb mod p, find αab mod p 3.75 Definition The generalized Diffie-Hellman problem (GDHP) is the following: given a finite cyclic group G, a generator α of G, and group elements αa and αb , find αab Suppose that the discrete logarithm problem in Z∗p could be efficiently solved. Then given α, p, αa mod p and αb mod p, one could first find a from α, p, and αa mod p by solving a discrete logarithm problem, and then compute (αb )a = αab mod p. This establishes the following relation between the Diffie-Hellman problem and the discrete logarithm problem. 3.76 Fact DHP ≤P DLP That is, DHP polytime reduces to the DLP More generally, GDHP ≤P GDLP. The question then remains whether the GDLP and GDHP are computationally equivalent. This remains

unknown; however, some recent progress in this regard is summarized in Fact 3.77 Recall that φ is the Euler phi function (Definition 2100), and an integer is B-smooth if all its prime factors are ≤ B (Definition 3.13) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 114 Ch. 3 Number-Theoretic Reference Problems 3.77 Fact (known equivalences between GDHP and GDLP) (i) Let p be a prime where the factorization of p−1 is known. Suppose also that φ(p−1) is B-smooth, where B = O((ln p)c ) for some constant c. Then the DHP and DLP in Z∗p are computationally equivalent. (ii) More generally, let G be a finite cyclic group of order n where the factorization of n is known. Suppose also that φ(n) is B-smooth, where B = O((ln n)c ) for some constant c. Then the GDHP and GDLP in G are computationally equivalent (iii) Let G be a finite cyclic group of order n where the factorization of n is known. If for each prime divisor p of n either p − 1 or p + 1

is B-smooth, where B = O((ln n)c ) for some constant c, then the GDHP and GDLP in G are computationally equivalent. 3.8 Composite moduli The group of units of Zn , namely Z∗n , has been proposed for use in several cryptographic mechanisms, including the key agreement protocols of Yacobi and McCurley (see §12.6 notes on page 538) and the identification scheme of Girault (see §10.4 notes on page 423) There are connections of cryptographic interest between the discrete logarithm and DiffieHellman problems in (cyclic subgroups of) Z∗n , and the problem of factoring n. This section summarizes the results known along these lines. 3.78 Fact Let n be a composite integer If the discrete logarithm problem in Z∗n can be solved in polynomial time, then n can be factored in expected polynomial time. In other words, the discrete logarithm problem in Z∗n is at least as difficult as the problem of factoring n. Fact 379 is a partial converse to Fact 378 and states that the discrete logarithm

in Z∗n is no harder than the combination of the problems of factoring n and computing discrete logarithms in Z∗p for each prime factor p of n. 3.79 Fact Let n be a composite integer The discrete logarithm problem in Z∗n polytime reduces to the combination of the integer factorization problem and the discrete logarithm problem in Z∗p for each prime factor p of n. Fact 3.80 states that the Diffie-Hellman problem in Z∗n is at least as difficult as the problem of factoring n 3.80 Fact Let n = pq where p and q are odd primes If the Diffie-Hellman problem in Z∗n can be solved in polynomial time for a non-negligible proportion of all bases α ∈ Z∗n , then n can be factored in expected polynomial time. 3.9 Computing individual bits While the discrete logarithm problem in Z∗p (§3.6), the RSA problem (§33), and the problem of computing square roots modulo a composite integer n (§3.52) appear to be intractable, when the problem parameters are carefully selected, it remains

possible that it is much easier to compute some partial information about the solution, for example, its least significant bit. It turns out that while some bits of the solution to these problems are indeed easy c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.9 Computing individual bits 115 to compute, other bits are equally difficult to compute as the entire solution. This section summarizes the results known along these lines. The results have applications to the construction of probabilistic public-key encryption schemes (§87) and pseudorandom bit generation (§55) Recall (Definition 1.12) that a function f is called a one-way function if f (x) is easy to compute for all x in its domain, but for essentially all y in the range of f , it is computationally infeasible to find any x such that f (x) = y. Three (candidate) one-way functions Although no proof is known for the existence of a one-way function, it is widely believed that one-way functions do

exist (cf. Remark 912) The following are candidate one-way functions (in fact, one-way permutations) since they are easy to compute, but their inversion requires the solution of the discrete logarithm problem in Z∗p , the RSA problem, or the problem of computing square roots modulo n, respectively: 1. exponentiation modulo p Let p be a prime and let α be a generator of Z∗p The function is f : Z∗p − Z∗p defined as f (x) = αx mod p 2. RSA function Let p and q be distinct odd primes, n = pq, and let e be an integer such that gcd(e, (p − 1)(q − 1)) = 1. The function is f : Zn − Zn defined as f (x) = xe mod n. 3. Rabin function Let n = pq, where p and q are distinct primes each congruent to 3 modulo 4. The function is f : Qn − Qn defined as f (x) = x2 mod n (Recall from Fact 2160 that f is a permutation, and from Fact 346 that inverting f , i.e, computing principal square roots, is difficult assuming integer factorization is intractable.) The following definitions are

used in §3.91, 392, and 393 3.81 Definition Let f : S − S be a one-way function, where S is a finite set A Boolean predicate B : S − {0, 1} is said to be a hard predicate for f if: (i) B(x) is easy to compute given x ∈ S; and (ii) an oracle which computes B(x) correctly with non-negligible advantage6 given only f (x) (where x ∈ S) can be used to invert f easily. Informally, B is a hard predicate for the one-way function f if determining the single bit B(x) of information about x, given only f (x), is as difficult as inverting f itself. 3.82 Definition Let f : S − S be a one-way function, where S is a finite set A k-bit predicate B (k) : S − {0, 1}k is said to be a hard k-bit predicate for f if: (i) B (k) (x) is easy to compute given x ∈ S; and (ii) for every Boolean predicate B : {0, 1}k − {0, 1}, an oracle which computes B(B (k) (x)) correctly with non-negligible advantage given only f (x) (where x ∈ S) can be used to invert f easily. If such a B (k) exists, then f

is said to hide k bits, or the k bits are said to be simultaneously secure. Informally, B (k) is a hard k-bit predicate for the one-way function f if determining any partial information whatsoever about B (k) (x), given only f (x), is as difficult as inverting f itself. 6 In Definitions 3.81 and 382, the probability is taken over all choices of x ∈ S and random coin tosses of the oracle. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 116 Ch. 3 Number-Theoretic Reference Problems 3.91 The discrete logarithm problem in Z∗p individual bits Let p be an odd prime and α a generator of Z∗p . Assume that the discrete logarithm problem in Z∗p is intractable. Let β ∈ Z∗p , and let x = logα β Recall from Fact 2135 that β is a quadratic residue modulo p if and only if x is even. Hence,   the least significant bit of x is equal to (1 − βp )/2, where the Legendre symbol βp can be efficiently computed (Algorithm 2.149) More generally, the

following is true 3.83 Fact Let p be an odd prime, and let α be a generator of Z∗p Suppose that p − 1 = 2s t, where t is odd. Then there is an efficient algorithm which, given β ∈ Z∗p , computes the s least significant bits of x = logα β. 3.84 Fact Let p be a prime and α a generator of Z∗p Define the predicate B : Z∗p − {0, 1} by  0, if 1 ≤ x ≤ (p − 1)/2, B(x) = 1, if (p − 1)/2 < x ≤ p − 1. Then B is a hard predicate for the function of exponentiation modulo p. In other words, given p, α, and β, computing the single bit B(x) of the discrete logarithm x = logα β is as difficult as computing the entire discrete logarithm. 3.85 Fact Let p be a prime and α a generator of Z∗p Let k = O(lg lg p) be an integer Let the interval [1, p−1] be partitioned into 2k intervals I0 , I1 , . , I2k −1 of roughly equal lengths Define the k-bit predicate B (k) : Z∗p − {0, 1}k by B (k) (x) = j if x ∈ Ij . Then B (k) is a hard k-bit predicate for the

function of exponentiation modulo p. 3.92 The RSA problem individual bits Let n be a product of two distinct odd primes p and q, and let e be an integer such that gcd(e, (p − 1)(q − 1)) = 1. Given n, e, and c = xe mod n (for some x ∈ Zn ), some information about x is easily obtainable. For example, since e is an odd integer,    e   e   x x c x = = = , n n n n  and hence the single bit of information nx can be obtained simply by computing the Jacobi symbol nc (Algorithm 2.149) There are, however, other bits of information about x that are difficult to compute, as the next two results show. 3.86 Fact Define the predicate B : Zn − {0, 1} by B(x) = x mod 2; that is, B(x) is the least significant bit of x. Then B is a hard predicate for the RSA function (see page 115) 3.87 Fact Let k = O(lg lg n) be an integer Define the k-bit predicate B (k) : Zn − {0, 1}k by B (k) (x) = x mod 2k . That is, B (k) (x) consists of the k least significant bits of x Then B (k) is a hard

k-bit predicate for the RSA function. Thus the RSA function has lg lg n simultaneously secure bits. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.10 The subset sum problem 117 3.93 The Rabin problem individual bits Let n = pq, where p and q are distinct primes each congruent to 3 modulo 4. 3.88 Fact Define the predicate B : Qn − {0, 1} by B(x) = x mod 2; that is, B(x) is the least significant bit of the quadratic residue x. Then B is a hard predicate for the Rabin function (see page 115). 3.89 Fact Let k = O(lg lg n) be an integer Define the k-bit predicate B (k) : Qn − {0, 1}k by B (k) (x) = x mod 2k . That is, B (k) (x) consists of the k least significant bits of the quadratic residue x. Then B (k) is a hard k-bit predicate for the Rabin function Thus the Rabin function has lg lg n simultaneously secure bits. 3.10 The subset sum problem The difficulty of the subset sum problem was the basis for the (presumed) security of the first public-key

encryption scheme, called the Merkle-Hellman knapsack scheme (§8.61) 3.90 Definition The subset sum problem (SUBSET-SUM) is the following: given a set {a1 , a2 , . , an } of positive integers, called a knapsack set, and a positive integer s, determine whether or not there is a subset of the aj that sum P to s. Equivalently, determine whether or not there exist xi ∈ {0, 1}, 1 ≤ i ≤ n, such that ni=1 ai xi = s. The subset sum problem above is stated as a decision problem. It can be shown that the problem is computationally Pn equivalent to its computational version which is to actually determine the xi such that i=1 ai xi = s, provided that such xi exist. Fact 391 provides evidence of the intractability of the subset sum problem. 3.91 Fact The subset sum problem is NP-complete The computational version of the subset sum problem is NP-hard (see Example 2.74) Algorithms 3.92 and 394 give two methods for solving the computational version of the subset sum problem; both are

exponential-time algorithms. Algorithm 394 is the fastest method known for the general subset sum problem. 3.92 Algorithm Naive algorithm for subset sum problem INPUT: a set of positive integers {a1 , a2 , . P , an } and a positive integer s. n OUTPUT: xi ∈ {0, 1}, 1 ≤ i ≤ n, such that i=1 ai xi = s, provided such xi exist. 1. For each possible vector (x1 , x2 , , xn ) ∈ (Z2 )n do the following: Pn 1.1 Compute l = i=1 ai xi 1.2 If l = s then return(a solution is (x1 , x2 , , xn )) 2. Return(no solution exists) 3.93 Fact Algorithm 392 takes O(2n ) steps and, hence, is inefficient Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 118 Ch. 3 Number-Theoretic Reference Problems 3.94 Algorithm Meet-in-the-middle algorithm for subset sum problem INPUT: a set of positive integers {a1 , a2 , . P , an } and a positive integer s. n OUTPUT: xi ∈ {0, 1}, 1 ≤ i ≤ n, such that i=1 ai xi = s, provided such xi exist. 1. Set t←bn/2c Pt 2.

Construct a table with entries ( i=1 ai xi , (x1 , x2 , , xt )) for (x1 , x2 , , xt ) ∈ (Z2 )t . Sort this table by first component 3. For each (xt+1 , xt+2 , , xn ) ∈ (Z2 )n−t , do the following: P 3.1 Compute l = s − ni=t+1 ai xi and check, using a binary search, whether l is the first component of some entry in the table. P 3.2 If l = ti=1 ai xi then return(a solution is (x1 , x2 , , xn )) 4. Return(no solution exists) 3.95 Fact Algorithm 394 takes O(n2n/2 ) steps and, hence, is inefficient 3.101 The L3 -lattice basis reduction algorithm The L3 -lattice basis reduction algorithm is a crucial component in many number-theoretic algorithms. It is useful for solving certain subset sum problems, and has been used for cryptanalyzing public-key encryption schemes which are based on the subset sum problem 3.96 Definition Let x = (x1 , x2 , , xn ) and y = (y1 , y2 , , yn ) be two vectors in Rn The inner product of x and y is the real number < x, y > = x1 y1

+ x2 y2 + · · · + xn yn . 3.97 Definition Let y = (y1 , y2 , , yn ) be a vector in Rn The length of y is the real number q √ kyk = < y, y > = y12 + y22 + · · · + yn2 . 3.98 Definition Let B = {b1 , b2 , , bm } be a set of linearly independent vectors in Rn (so that m ≤ n). The set L of all integer linear combinations of b1 , b2 , , bm is called a lattice of dimension m; that is, L = Zb1 + Zb2 + · · · + Zbm . The set B is called a basis for the lattice L. A lattice can have many different bases. A basis consisting of vectors of relatively small lengths is called reduced. The following definition provides a useful notion of a reduced basis, and is based on the Gram-Schmidt orthogonalization process 3.99 Definition Let B = {b1 , b2 , , bn } be a basis for a lattice L ⊂ Rn Define the vectors b∗i (1 ≤ i ≤ n) and the real numbers µi,j (1 ≤ j < i ≤ n) inductively by µi,j b∗i = < bi , b∗j > , < b∗j , b∗j > = bi − i−1

X 1 ≤ j < i ≤ n, µi,j b∗j , 1 ≤ i ≤ n. j=1 The basis B is said to be reduced (more precisely, Lovász-reduced) if |µi,j | ≤ 1 , for 1 ≤ j < i ≤ n 2 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter (3.8) (3.9) §3.10 The subset sum problem 119 (where |µi,j | denotes the absolute value of µi,j ), and   3 kb∗i k2 ≥ − µ2i,i−1 kb∗i−1 k2 , for 1 < i ≤ n. (3.10) 4 Fact 3.100 explains the sense in which the vectors in a reduced basis are relatively short 3.100 Fact Let L ⊂ Rn be a lattice with a reduced basis {b1 , b2 , , bn } (i) For every non-zero x ∈ L, kb1 k ≤ 2(n−1)/2 kxk. (ii) More generally, for any set {a1 , a2 , . , at } of linearly independent vectors in L, kbj k ≤ 2(n−1)/2 max(ka1 k, ka2 k, . , kat k), for 1 ≤ j ≤ t The L3 -lattice basis reduction algorithm (Algorithm 3.101) is a polynomial-time algorithm (Fact 3103) for finding a reduced basis, given a basis for a lattice 3.101

Algorithm L3 -lattice basis reduction algorithm INPUT: a basis (b1 , b2 , . , bn ) for a lattice L in Rm , m ≥ n OUTPUT: a reduced basis for L. 1. b∗1 ←b1 , B1 ← < b∗1 , b∗1 > 2. For i from 2 to n do the following: 2.1 b∗i ←bi 2.2 For j from 1 to i − 1, set µi,j ← < bi , b∗j > /Bj and b∗i ←b∗i − µi,j b∗j 2.3 Bi ← < b∗i , b∗i > 3. k←2 4. Execute subroutine RED(k,k − 1) to possibly update some µi,j 5. If Bk < ( 34 − µ2k,k−1 )Bk−1 then do the following: 5.1 Set µ←µk,k−1 , B←Bk + µ2 Bk−1 , µk,k−1 ←µBk−1 /B, Bk ←Bk−1 Bk /B, and Bk−1 ←B. 5.2 Exchange bk and bk−1 5.3 If k > 2 then exchange µk,j and µk−1,j for j = 1, 2, , k − 2 5.4 For i = k + 1, k + 2, , n: Set t←µi,k , µi,k ←µi,k−1 − µt, and µi,k−1 ←t + µk,k−1 µi,k . 5.5 k← max(2, k − 1) 5.6 Go to step 4 Otherwise, for l = k − 2, k − 3, . , 1, execute RED(k,l), and finally set k←k + 1 6.

If k ≤ n then go to step 4 Otherwise, return(b1, b2 , , bn ) RED(k,l) If |µk,l | > 12 then do the following: 1. r←b05 + µk,l c, bk ←bk − rbl 2. For j from 1 to l − 1, set µk,j ←µk,j − rµl,j 3. µk,l ←µk,l − r 3.102 Note (explanation of selected steps of Algorithm 3101) (i) Steps 1 and 2 initialize the algorithm by computing b∗i (1 ≤ i ≤ n) and µi,j (1 ≤ j < i ≤ n) as defined in equations (3.9) and (38), and also Bi =< b∗i , b∗i > (1 ≤ i ≤ n) (ii) k is a variable such that the vectors b1 , b2 , . , bk−1 are reduced (initially k = 2 in step 3). The algorithm then attempts to modify bk , so that b1 , b2 , , bk are reduced Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 120 Ch. 3 Number-Theoretic Reference Problems (iii) In step 4, the vector bk is modified appropriately so that |µk,k−1 | ≤ 12 , and the µk,j are updated for 1 ≤ j < k − 1. (iv) In step 5, if the condition of

equation (3.10) is violated for i = k, then vectors bk and bk−1 are exchanged and their corresponding parameters are updated. Also, k is decremented by 1 since then it is only guaranteed that b1 , b2 , . , bk−2 are reduced Otherwise, bk is modified appropriately so that |µk,j | ≤ 12 for j = 1, 2, . , k − 2, while keeping (3.10) satisfied k is then incremented because now b1 , b2 , , bk are reduced. It can be proven that the L3 -algorithm terminates after a finite number of iterations. Note that if L is an integer lattice, i.e L ⊂ Zn , then the L3 -algorithm only operates on rational numbers. The precise running time is given next 3.103 Fact Let L ⊂ Zn be a lattice with basis {b1 , b2 , , bn }, and let C ∈ R, C ≥ 2, be such that kbi k2 ≤ C for i = 1, 2, . , n Then the number of arithmetic operations needed by Algorithm 3.101 is O(n4 log C), on integers of size O(n log C) bits 3.102 Solving subset sum problems of low density The density of a knapsack set,

as defined below, provides a measure of the size of the knapsack elements. 3.104 Definition Let S = {a1 , a2 , , an } be a knapsack set The density of S is defined to be n d= . max{lg ai | 1 ≤ i ≤ n} Algorithm 3.105 reduces the subset sum problem to one of finding a particular short vector in a lattice. By Fact 3100, the reduced basis produced by the L3 -algorithm includes a vector of length which is guaranteed to be within a factor of 2(n−1)/2 of the shortest nonzero vector of the lattice. In practice, however, the L3 -algorithm usually finds a vector which is much shorter than what is guaranteed by Fact 3.100 Hence, the L3 -algorithm can be expected to find the short vector which yields a solution to the subset sum problem, provided that this vector is shorter than most of the non-zero vectors in the lattice. 3.105 Algorithm Solving subset sum problems using L3 -algorithm INPUT: a set of positive integers {a1 , a2 , . P , an } and an integer s. n OUTPUT: xi ∈ {0, 1}, 1

≤ i ≤ n, such that i=1 ai xi = s, provided such xi exist. √ 1. Let m = d 12 ne 2. Form an (n + 1)-dimensional lattice L with basis consisting of the rows of the matrix   1 0 0 · · · 0 ma1  0 1 0 · · · 0 ma2     0 0 1 · · · 0 ma3    A= . .  .  .  .    0 0 0 · · · 1 man  1 1 1 · · · 12 ms 2 2 2 3. Find a reduced basis B of L (use Algorithm 3101) 4. For each vector y = (y1 , y2 , , yn+1 ) in B, do the following: c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.10 The subset sum problem 121 4.1 If yn+1 = 0 and yi ∈ {− 21 , 12 } for all i = 1, 2, , n, then do the following: For i = 1, 2, . , n, set xi ←yi + 12 Pn If i=1 ai xi = s, then return(a solution is (x1 , x2 , . , xn )) For i = 1, 2, . , n, set xi ← − yi + 12 Pn If i=1 ai xi = s, then return(a solution is (x1 , x2 , . , xn )) 5. Return(FAILURE) (Either no solution exists, or the algorithm has

failed to find one) Justification. Let the rows of the matrix A be b1 , b2 , , bn+1 , and let L be the (n + 1)dimensional lattice generated byP these vectors. If (x1 , x2 , , xn ) is a solution to the subset n sum problem, the vector y = is in L. Note that yi ∈ {− 21 , 12 } for i=1 xi bi − bn+1 q 2 the vector y is a i = 1, 2, . , n and yn+1 = 0 Since kyk = y12 + y22 + · · · + yn+1 vector of short length in L. If the density of the knapsack set is small, ie the ai are large, then most vectors in L will have relatively large lengths, and hence y may be the unique shortest non-zero vector in L. If this is indeed the case, then there is good possibility of the L3 -algorithm finding a basis which includes this vector. Algorithm 3.105 is not guaranteed to succeed Assuming that the L3 -algorithm always produces a basis which includes the shortest non-zero lattice vector, Algorithm 3.105 succeeds with high probability if the density of the knapsack set is less than 09408

3.103 Simultaneous diophantine approximation Simultaneous diophantine approximation is concerned with approximating a vector ( qq1 , qq2 , . , qqn ) of rational numbers (more generally, a vector (α1 , α2 , , αn ) of real numbers) by a vector ( pp1 , pp2 , . , ppn ) of rational numbers with a smaller denominator p Algorithms for finding simultaneous diophantine approximation have been used to break some knapsack public-key encryption schemes (§8.6) 3.106 Definition Let δ be a real number The vector ( pp1 , pp2 , , ppn ) of rational numbers is said to be a simultaneous diophantine approximation of δ-quality to the vector ( qq1 , qq2 , . , qqn ) of rational numbers if p < q and p qi − pi ≤ q −δ for i = 1, 2, . , n q (The larger δ is, the better is the approximation.) Furthermore, it is an unusually good simultaneous diophantine approximation (UGSDA) if δ > n1 Fact 3.107 shows that an UGSDA is indeed unusual 3.107 Fact For n ≥ 2, the set    q1 q2

qn Sn (q) = , ,. , | 0 ≤ qi < q, gcd(q1 , q2 , . , qn , q) = 1 q q q has at least 12 q n members. Of these, at most O(q n(1−δ)+1 ) members have at least one δquality simultaneous diophantine approximation Hence, for any fixed δ > n1 , the fraction of members of Sn (q) having at least one UGSDA approaches 0 as q ∞. Algorithm 3.108 reduces the problem of finding a δ-quality simultaneous diophantine approximation, and hence also a UGSDA, to the problem of finding a short vector in a lattice. The latter problem can (usually) be solved using the L3 -lattice basis reduction Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 122 Ch. 3 Number-Theoretic Reference Problems 3.108 Algorithm Finding a δ-quality simultaneous diophantine approximation INPUT: a vector w = ( qq1 , qq2 , . , qqn ) of rational numbers, and a rational number δ > 0 OUTPUT: a δ-quality simultaneous diophantine approximation ( pp1 , pp2 , . , ppn ) of w 1. Choose

an integer λ ≈ q δ 2. Use Algorithm 3101 to find a reduced basis B for the (n + 1)-dimensional lattice L which is generated by the rows of the matrix   λq 0 0 ··· 0 0  0 λq 0 ··· 0 0     0 0 λq · · · 0 0    A= . . . . .  .  . . . . . .     0 0 0 ··· λq 0  −λq1 −λq2 −λq3 · · · −λqn 1 3. For each v = (v1 , v2 , , vn , vn+1 ) in B such that vn+1 6= q, do the following: 3.1 p←vn+1  3.2 For i from 1 to n, set pi ← 1q vλi + pqi 3.3 If |p qqi − pi | ≤ q −δ for each i, 1 ≤ i ≤ n, then return( pp1 , pp2 , , ppn ) 4. Return(FAILURE) (Either no δ-quality simultaneous diophantine approximation exists, or the algorithm has failed to find one) Justification. Let the rows of the matrix A be denoted by b1 , b2 , , bn+1 Suppose that ( qq1 , qq2 , . , qqn ) has a δ-quality approximation ( pp1 , pp2 , , ppn ) Then the vector x = p1 b1 + p2 b2 + · · · + pn bn + pbn+1 = (λ(p1

q − pq1 ), λ(p2 q − pq2 ), . , λ(pn q − pqn ), p) √ is in L and has length less than approximately ( n + 1)q. Thus x is short compared to the original basis vectors, which are of length roughly q 1+δ . Also, if v = (v1 , v2 , , vn+1 ) is a vector in L of length less than q, then the vector ( pp1 , pp2 , . , ppn ) defined in step 3 is a δquality approximation Hence there is a good possibility that the L3 -algorithm will produce a reduced basis which includes a vector v that corresponds to a δ-quality approximation. 3.11 Factoring polynomials over finite fields The problem considered in this section is the following: given a polynomial f (x) ∈ Fq [x], with q = pm , find its factorization f (x) = f1 (x)e1 f2 (x)e2 · · · ft (x)et , where each fi (x) is an irreducible polynomial in Fq [x] and each ei ≥ 1. (ei is called the multiplicity of the factor fi (x)) Several situations call for the factoring of polynomials over finite fields, such as index-calculus

algorithms in F∗2m (Example 3.70) and Chor-Rivest public-key encryption (§8.62) This section presents an algorithm for square-free factorization, and Berlekamp’s classical deterministic algorithm for factoring polynomials which is efficient if the underlying field is small. Efficient randomized algorithms are known for the case of large q; references are provided on page 132 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.11 Factoring polynomials over finite fields 123 3.111 Square-free factorization Observe first that f (x) may be divided by its leading coefficient. Thus, it may be assumed that f (x) is monic (see Definition 2.187) This section shows how the problem of factoring a monic polynomial f (x) may then be reduced to the problem of factoring one or more monic square-free polynomials. 3.109 Definition Let f (x) ∈ Fq [x] Then f (x) is square-free if it has no repeated factors, ie, there is no polynomial g(x) with deg g(x) ≥ 1 such that

g(x)2 divides f (x). The squareQk free factorization of f (x) is f (x) = i=1 fi (x)i , where each fi (x) is a square-free polynomial and gcd(fi (x), fj (x)) = 1 for i 6= j. (Some of the fi (x) in the square-free factorization of f (x) may be 1) Pn Let f (x) = i=0 ai xi be a polynomial of degree n ≥ 1. The (formal) derivative of Pn−1 f (x) is the polynomial f 0 (x) = i=0 ai+1 (i + 1)xi . If f 0 (x) = 0, then, because p is the characteristic of Fq , in each term ai xi of f (x) for which ai 6= 0, the exponent of x must Pn/p q/p be a multiple of p. Hence, f (x) has the form f (x) = a(x)p , where a(x) = i=0 aip xi , and the problem of finding the square-free factorization of f (x) is reduced to finding that of a(x). Now, it is possible that a0 (x) = 0, but repeating this process as necessary, it may be assumed that f 0 (x) 6= 0. Next, let g(x) = gcd(f (x), f 0 (x)). Noting that an irreducible factor of multiplicity k in f (x) will have multiplicity k − 1 in f 0 (x) if gcd(k, p) = 1,

and will retain multiplicity k in f 0 (x) otherwise, the following conclusions may be drawn. If g(x) = 1, then f (x) has no repeated factors; and if g(x) has positive degree, then g(x) is a non-trivial factor of f (x), and f (x)/g(x) has no repeated factors. Note, however, the possibility of g(x) having repeated factors, and, indeed, the possibility that g 0 (x) = 0. Nonetheless, g(x) can be refined further as above. The steps are summarized in Algorithm 3110 In the algorithm, F denotes the square-free factorization of a factor of f (x) in factored form. 3.110 Algorithm Square-free factorization SQUARE-FREE(f (x)) INPUT: a monic polynomial f (x) ∈ Fq [x] of degree ≥ 1, where Fq has characteristic p. OUTPUT: the square-free factorization of f (x). 1. Set i←1, F ←1, and compute f 0 (x) 2. If f 0 (x) = 0 then set f (x)←f (x)1/p and F ←(SQUARE-FREE(f (x)))p Otherwise (i.e f 0 (x) 6= 0) do the following: 2.1 Compute g(x)← gcd(f (x), f 0 (x)) and h(x)←f (x)/g(x) 2.2 While

h(x) 6= 1 do the following: Compute h(x)← gcd(h(x), g(x)) and l(x)←h(x)/h(x). Set F ←F · l(x)i , i←i + 1, h(x)←h(x), and g(x)←g(x)/h(x). 2.3 If g(x) 6= 1 then set g(x)←g(x)1/p and F ←F · (SQUARE-FREE(g(x)))p 3. Return(F ) Qk Once the square-free factorization f (x) = i=1 fi (x)i is found, the square-free polynomials f1 (x), f2 (x), . , fk (x) need to be factored in order to obtain the complete factorization of f (x) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 124 Ch. 3 Number-Theoretic Reference Problems 3.112 Berlekamp’s Q-matrix algorithm Qt Let f (x) = i=1 fi (x) be a monic polynomial in Fq [x] of degree n having distinct irreducible factors fi (x), 1 ≤ i ≤ t. Berlekamp’s Q-matrix algorithm (Algorithm 3111) for factoring f (x) is based on the following facts. The set of polynomials B = {b(x) ∈ Fq [x]/(f (x)) | b(x)q ≡ b(x) (mod f (x))} is a vector space of dimension t over Fq . B consists of precisely those

vectors in the null space of the matrix Q − In , where Q is the n × n matrix with (i, j)-entry qij specified by xiq mod f (x) = n−1 X qij xj , 0 ≤ i ≤ n − 1, j=0 and where In is the n × n identity matrix. A basis B = {v1 (x), v2 (x), , vt (x)} for B can thus be found by standard techniques from linear algebra. Finally, for each pair of distinct factors fi (x) and fj (x) of f (x) there exists some vk (x) ∈ B and some α ∈ Fq such that fi (x) divides vk (x) − α but fj (x) does not divide vk (x) − α; these two factors can thus be split by computing gcd(f (x), vk (x) − α). In Algorithm 3111, a vector w = Pn−1 (w0 , w1 , . , wn−1 ) is identified with the polynomial w(x) = i=0 wi xi 3.111 Algorithm Berlekamp’s Q-matrix algorithm for factoring polynomials over finite fields INPUT: a square-free monic polynomial f (x) of degree n in Fq [x]. OUTPUT: the factorization of f (x) into monic irreducible polynomials. 1. For each i, 0 ≤ i ≤ n − 1, compute

the polynomial xiq mod f (x) = n−1 X qij xj . j=0 Note that each qij is an element of Fq . 2. Form the n × n matrix Q whose (i, j)-entry is qij 3. Determine a basis v1 , v2 , , vt for the null space of the matrix (Q − In ), where In is the n × n identity matrix. The number of irreducible factors of f (x) is precisely t 4. Set F ←{f (x)} (F is the set of factors of f (x) found so far; their product is equal to f (x).) 5. For i from 1 to t do the following: 5.1 For each polynomial h(x) ∈ F such that deg h(x) > 1 do the following: compute gcd(h(x), vi (x) − α) for each α ∈ Fq , and replace h(x) in F by all those polynomials in the gcd computations whose degrees are ≥ 1. 6. Return(the polynomials in F are the irreducible factors of f (x)) 3.112 Fact The running time of Algorithm 3111 for factoring a square-free polynomial of degree n over Fq is O(n3 + tqn2 ) Fq -operations, where t is the number of irreducible factors of f (x). The method is efficient only when

q is small c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.12 Notes and further references 125 3.12 Notes and further references §3.1 Many of the topics discussed in this chapter lie in the realm of algorithmic number theory. Excellent references on this subject include the books by Bach and Shallit [70], Cohen [263], and Pomerance [993]. Adleman and McCurley [15] give an extensive survey of the important open problems in algorithmic number theory. Two other recommended surveys are by Bach [65] and Lenstra and Lenstra [748]. Woll [1253] gives an overview of the reductions among thirteen of these problems §3.2 A survey of the integer factorization problem is given by Pomerance [994]. See also Chapters 8 and 10 of Cohen [263], and the books by Bressoud [198] and Koblitz [697] Brillhart et al. [211] provide extensive listings of factorizations of integers of the form bn ± 1 for “small” n and b = 2, 3, 5, 6, 7, 10, 11, 12. Bach and Sorenson [71]

presented some algorithms for recognizing perfect powers (cf. Note 36), one having a worst-case running time of O(lg3 n) bit operations, and a second having an average-case running time of O(lg2 n) bit operations A more recent algorithm of Bernstein [121] runs in essentially linear time O((lg n)1+o(1) ) Fact 37 is from Knuth [692]. Pages 367–369 of this reference contain explicit formulas regarding the expected sizes of the largest and second largest prime factors, and the expected total number of prime factors, of a randomly chosen positive integer. For further results, see Knuth and Trabb Pardo [694], who prove that the average number of bits in the k th largest prime factor of a random m-bit number is asymptotically equivalent to the average length of the k th longest cycle in a permutation on m objects. Floyd’s cycle-finding algorithm (Note 3.8) is described by Knuth [692, p7] Sedgewick, Szymanski, and Yao [1106] showed that by saving a small number of values from the xi

sequence, a collision can be found by doing roughly one-third the work as in Floyd’s cyclefinding algorithm. Pollard’s rho algorithm for factoring (Algorithm 39) is due to Pollard [985]. Regarding Note 312, Cohen [263, p422] provides an explanation for the restriction c 6= 0, −2. Brent [196] presented a cycle-finding algorithm which is better on average than Floyd’s cycle-finding algorithm, and applied it to yield a factorization algorithm which is similar to Pollard’s but about 24 percent faster. Brent and Pollard [197] later modified 8 this algorithm to factor the eighth Fermat number F8 = 22 + 1. Using techniques from algebraic geometry, Bach [67] obtained the first rigorously proven result concerning the expected running time of Pollard’s rho algorithm:  for fixed k, the probability that a prime factor p is discovered before step k is at least k2 /p + O(p−3/2 ) as p ∞. The p − 1 algorithm (Algorithm 3.14) is due to Pollard [984] Several practical improvements

have been proposed for the p − 1 algorithm, including those by Montgomery [894] and Montgomery and Silverman [895], the latter using fast Fourier transform techniques. Williams [1247] presented an algorithm for factoring n which is efficient if n has a prime factor p such that p+1 is smooth. These methods were generalized by Bach and Shallit [69] to techniques that factor n efficiently provided n has a prime factor p such that the k th cyclotomic polynomial Φk (p) is smooth. The first few cyclotomic polynomials are Φ1 (p) = p − 1, Φ2 (p) = p + 1, Φ3 (p) = p2 + p + 1, Φ4 (p) = p2 + 1, Φ5 (p) = p4 + p3 + p2 + p + 1, and Φ6 (p) = p2 − p + 1. The elliptic curve factoring algorithm (ECA) of §3.24 was invented by Lenstra [756] Montgomery [894] gave several practical improvements to the ECA. Silverman and Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 126 Ch. 3 Number-Theoretic Reference Problems Wagstaff [1136] gave a practical analysis of

the complexity of the ECA, and suggested optimal parameter selection and running-time guidelines. Lenstra and Manasse [753] implemented the ECA on a network of MicroVAX computers, and were successful in finding 35decimal digit prime factors of large (at least 85 digit) composite integers Later, Dixon and Lenstra [350] implemented the ECA on a 16K MasPar (massively parallel) SIMD (single instruction, multiple data) machine. The largest factor they found was a 40-decimal digit prime factor of an 89-digit composite integer. On November 26 1995, Peter Montgomery reported finding a 47-decimal digit prime factor of the 99-digit composite integer 5256 + 1 with the ECA. Hafner and McCurley [536] estimated the number of integers n ≤ x that can be factored with probability at least 12 using at most t arithmetic operations, by trial division and the elliptic curve algorithm. Pomerance and Sorenson [997] provided the analogous estimates for Pollard’s p − 1 algorithm and Williams’ p + 1

algorithm. They conclude that for a given running time bound, both Pollard’s p−1 and Williams’ p+1 algorithms factor more integers than trial division, but fewer than the elliptic curve algorithm. Pomerance [994] credits the idea of multiplying congruences to produce a solution to x2 ≡ y 2 (mod n) for the purpose of factoring n (§3.25) to some old work of Kraitchik circa 1926-1929. The continued fraction factoring algorithm, first introduced by Lehmer and Powers [744] in 1931, and refined more than 40 years later by Morrison and Brillhart [908], was the first realization of a random square method to result in a subexponential-time algorithm. The algorithm was later analyzed by Pomerance [989] and conjectured to have √ an expected running time of Ln [ 12 , 2]. If the smoothness testing in the algorithm is done with the elliptic curve method, then the expected running time drops to Ln [ 12 , 1]. Morrison and Brillhart were also the first to use the idea of a factor base to

test for good (ai , bi ) pairs. The continued fraction algorithm was the champion of factoring algorithms from the mid 1970s until the early 1980s, when it was surpassed by the quadratic sieve algorithm. The quadratic sieve (QS) (§3.26) was discovered by Pomerance [989, 990] The multiple polynomial variant of the quadratic sieve (Note 3.25) is due to P Montgomery, and is described by Pomerance [990]; see also Silverman [1135] A detailed practical analysis of the QS is given by van Oorschot [1203]. Several practical improvements to the original algorithms have subsequently been proposed and successfully implemented. The first serious implementation of the QS was by Gerver [448] who factored a 47-decimal digit number In 1984, Davis, Holdridge, and Simmons [311] factored a 71-decimal digit number with the QS. In 1988, Lenstra and Manasse [753] used the QS to factor a 106-decimal digit number by distributing the computations to hundreds of computers by electronic mail; see also Lenstra

and Manasse [754]. In 1993, the QS was used by Denny et al [333] to factor a 120-decimal digit number. In 1994, the 129-decimal digit (425 bit) RSA-129 challenge number (see Gardner [440]), was factored by Atkins et al. [59] by enlisting the help of about 1600 computers around the world. The factorization was carried out in 8 months Table 33 shows the estimated time taken, in mips years, for the above factorizations. A mips year is equivalent to the computational power of a computer that is rated at 1 mips (million instructions per second) and utilized for one year, or, equivalently, about 3 · 1013 instructions. The number field sieve was first proposed by Pollard [987] and refined by others. Lenstra et al. [752] described the special number field sieve (SNFS) for factoring integers of the form re − s for small positive r and |s|. A readable introduction to the algorithm is provided by Pomerance [995]. A detailed report of an SNFS implementation is given by Lenstra et al [751]. This

implementation was used to factor the ninth Fermat number F9 = 2512 + 1, which is the product of three prime factors having 7, 49, and 99 decimal digits. The general number field sieve (GNFS) was introduced by Buhler, Lenstra, and Pomerance [219] c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.12 Notes and further references 127 Year 1984 1988 1993 1994 Number of digits 71 106 120 129 mips years 0.1 140 825 5000 Table 3.3: Running time estimates for numbers factored with QS Coppersmith [269] proposed modifications to the GNFS which improve its running time to Ln [ 13 , 1.902], however, the method is not practical; another modification (also impractical) allows a precomputation taking Ln [ 13 , 2007] time and Ln [ 13 , 1639] storage, following which all integers in a large range of values can be factored in Ln [ 13 , 1639] time A detailed report of a GNFS implementation on a massively parallel computer with 16384 processors is given by Bernstein and

Lenstra [122]. See also Buchmann, Loho, and Zayer [217], and Golliver, Lenstra, and McCurley [493] More recently, Dodson and Lenstra [356] reported on their GNFS implementation which was successful in factoring a 119decimal digit number using about 250 mips years of computing power. They estimated that this factorization completed about 2.5 times faster than it would with the quadratic sieve Most recently, Lenstra [746] announced the factorization of the 130-decimal digit RSA130 challenge number using the GNFS. This number is the product of two 65-decimal digit primes. The factorization was estimated to have taken about 500 mips years of computing power (compare with Table 3.3) The book edited by Lenstra and Lenstra [749] contains several other articles related to the number field sieve. The ECA, continued fraction algorithm, quadratic sieve, special number field sieve, and general number field sieve have heuristic (or conjectured) rather than proven running times because the analyses

make (reasonable) assumptions about the proportion of integers generated that are smooth. See Canfield, Erdös, and Pomerance [231] for bounds on the proportion of y-smooth integers in the interval [2, x] Dixon’s algorithm [351] was the first rigorously analyzed subexponential-time algorithm for factoring integers. The fastest rigorously analyzed algorithm currently known is due to Lenstra and Pomerance [759] with an expected running time of Ln [ 12 , 1]. These algorithms are of theoretical interest only, as they do not appear to be practical. §3.3 The RSA problem was introduced in the landmark 1977 paper by Rivest, Shamir, and Adleman [1060]. §3.4 The quadratic residuosity problem is of much historical interest, and was one of the main algorithmic problems discussed by Gauss [444]. §3.5 An extensive treatment of the problem of finding square roots modulo a prime p, or more generally, the problem of finding dth roots in a finite field, can be found in Bach and Shallit [70]. The

presentation of Algorithm 334 for finding square roots modulo a prime is derived from Koblitz [697, pp48-49]; a proof of correctness can be found there Bach and Shallit attribute the essential ideas of Algorithm 3.34 to an 1891 paper by A Tonelli Algorithm 339 is from Bach and Shallit [70], who attribute it to a 1903 paper of M Cipolla The computational equivalence of computing square roots modulo a composite n and factoring n (Fact 3.46 and Note 347) was first discovered by Rabin [1023] Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 128 Ch. 3 Number-Theoretic Reference Problems §3.6 A survey of the discrete logarithm problem is given by McCurley [827]. See also Odlyzko [942] for a survey of recent advances. Knuth [693] attributes the baby-step giant-step algorithm (Algorithm 3.56) to D Shanks The baby-step giant-step algorithms for searching restricted exponent spaces (cf. Note 359) are described by Heiman [546]. Suppose that p is a k-bit prime,

and that only exponents of Hamming weight t are used. Coppersmith (personal  communication, July 1995) observed that this exponent space can be searched in k · k/2 steps by dividing the exponent into two t/2 equal pieces so that the Hamming weight of each piece is t/2; if k is much smaller than 2t/2 , this is an improvement over Note 3.59 Pollard’s rho algorithm for logarithms (Algorithm 3.60) is due to Pollard [986] Pollard also presented a lambda method for computing discrete logarithms which is applicable when x, the logarithm sought, is known to lie in a certain √ interval. More specifically, if the interval is of width w, the method is expected to take O( w) group operations and requires storage for only O(lg w) group elements. Van Oorschot and Wiener [1207] showed how Pollard’s rho algorithm can be parallelized so that using m processors results in a speedup by a factor of m. This has particular significance to cyclic groups such as elliptic curve groups, for which no

subexponential-time discrete logarithm algorithm is known. The Pohlig-Hellman algorithm (Algorithm 3.63) was discovered by Pohlig and Hellman [982]. A variation which represents the logarithm in a mixed-radix notation and does not use the Chinese remainder theorem was given by Thiong Ly [1190]. According to McCurley [827], the basic ideas behind the index-calculus algorithm (Algorithm 3.68) first appeared in the work of Kraitchik (circa 1922-1924) and of Cunningham (see Western and Miller [1236]), and was rediscovered by several authors. Adleman [8] described the method for the group Z∗p and analyzed the complexity of the algorithm Hellman and Reyneri [555] gave the first description of an index-calculus algorithm for extension fields Fpm with p fixed. Coppersmith, Odlyzko, and Schroeppel [280] presented three variants of the index-calculus method for computing logarithms in Z∗p : the linear sieve, the residue list sieve, and the Gaussian integer method. Each has a heuristic

expected running time of Lp [ 12 , 1] (cf Note 3.71) The Gaussian integer method, which is related to the method of ElGamal [369], was implemented in 1990 by LaMacchia and Odlyzko [736] and was successful in computing logarithms in Z∗p with p a 192-bit prime. The paper concludes that it should be feasible to compute discrete logarithms modulo primes of about 332 bits (100 decimal digits) using the Gaussian integer method. Gordon [510] adapted the number field sieve for factoring integers to the problem of computing logarithms in Z∗p ; his algorithm has a heuristic expected running time of Lp [ 13 , c], where c = 32/3 ≈ 2.080 Schirokauer [1092] subsequently presented a modification of Gordon’s algorithm that has a heuristic expected running time of Lp [ 13 , c], where c = (64/9)1/3 ≈ 1.923 (Note 372) This is the same running time as conjectured for the number field sieve for factoring integers (see §3.27) Recently, Weber [1232] implemented the algorithms of Gordon and

Schirokauer and was successful in computing logarithms in Z∗p , where p is a 40-decimal digit prime such that p−1 is divisible by a 38-decimal digit (127-bit) prime. More recently, Weber, Denny, and Zayer (personal communication, April 1996) announced the solution of a discrete logarithm problem modulo a 75-decimal digit (248-bit) prime p with (p − 1)/2 prime. Blake et al. [145] made improvements to the index-calculus technique for F∗2m and computed logarithms in F∗2127 Coppersmith [266] dramatically improved the algorithm and showed that under reasonable assumptions the expected running time of his improved alc 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.12 Notes and further references 129 gorithm is L2m [ 13 , c] for some constant c < 1.587 (Note 372) Later, Odlyzko [940] gave several refinements to Coppersmith’s algorithm, and a detailed practical analysis; this paper provides the most extensive account to date of the discrete

logarithm problem in F∗2m . A similar practical analysis was also given by van Oorschot [1203]. Most recently in 1992, Gordon and McCurley [511] reported on their massively parallel implementation of Coppersmith’s algorithm, combined with their own improvements. Using primarily a 1024 processor nCUBE-2 machine with 4 megabytes of memory per processor, they completed the precomputation of logarithms of factor base elements (which is the dominant step of the algorithm) required to compute logarithms in F∗2227 , F∗2313 , and F∗2401 . The calculations for F∗2401 were estimated to take 5 days. Gordon and McCurley also completed most of the precomputations required for computing logarithms in F∗2503 ; the amount of time to complete this task on the 1024 processor nCUBE-2 was estimated to be 44 days. They concluded that computing logarithms in the multiplicative groups of fields as large as F2593 still seems to be out of their reach, but might be possible in the near future with

a concerted effort. It was not until 1992 that a subexponential-time algorithm for computing discrete logarithms over all finite fields Fq was discovered by Adleman and DeMarrais [11]. The expected running time of the algorithm is conjectured to be Lq [ 12 , c] for some constant c Adleman [9] generalized the number field sieve from algebraic number fields to algebraic function fields which resulted in an algorithm, called the function field sieve, for computing discrete logarithms in F∗pm ; the algorithm has a heuristic expected running time of Lpm [ 13 , c] for some constant c > 0 when log p ≤ mg(m) , and where g is any function such that 0 < g(m) < 0.98 and limm∞ g(m) = 0 The practicality of the function field sieve has not yet been determined. It remains an open problem to find an algorithm with a heuristic expected running time of Lq [ 13 , c] for all finite fields Fq . The algorithms mentioned in the previous three paragraphs have heuristic (or conjectured) rather

than proven running times because the analyses make some (reasonable) assumptions about the proportion of integers or polynomials generated that are smooth, and also because it is not clear when the system of linear equations generated has full rank, i.e, yields a unique solution. The best rigorously analyzed algorithms known for the discrete logarithm problem in Z∗p and F∗2m are due to Pomerance [991] with expected running times of √ √ Lp [ 12 , 2] and L2m [ 12 , 2], respectively. Lovorn [773] obtained rigorously analyzed algorithms for the fields Fp√2 and Fpm with log p < m098 , having expected running times of Lp2 [ 12 , 32 ] and Lpm [ 12 , 2], respectively. The linear system of equations collected in the quadratic sieve and number field sieve factoring algorithms, and the index-calculus algorithms for computing discrete logarithms in Z∗p and F∗2m , are very large. For the problem sizes currently under consideration, these systems cannot be solved using ordinary

linear algebra techniques, due to both time and space constraints. However, the equations generated are extremely sparse, typically with at most 50 non-zero coefficients per equation. The technique of structured or so-called intelligent Gaussian elimination (see Odlyzko [940]) can be used to reduce the original sparse system to a much smaller system that is still fairly sparse. The resulting system can be solved using either ordinary Gaussian elimination, or one of the conjugate gradient, Lanczos (Coppersmith, Odlyzko, and Schroeppel [280]), or Wiedemann algorithms [1239] which were also designed to handle sparse systems. LaMacchia and Odlyzko [737] have implemented some of these algorithms and concluded that the linear algebra stages arising in both integer factorization and the discrete logarithm problem are not running-time bottlenecks in practice. Recently, Coppersmith [272] proposed a modification of the Wiedemann algorithm which allows parallelization of the algorithm; for an

analysis of Coppersmith’s algorithm, see Kaltofen [657]. Coppersmith [270] (see also Montgomery [896]) presented a modifiHandbook of Applied Cryptography by A Menezes, P van Oorschot and S Vanstone 130 Ch. 3 Number-Theoretic Reference Problems cation of the Lanczos algorithm for solving sparse linear equations over F2 ; this variant appears to be the most efficient in practice. As an example of the numbers involved, Gordon and McCurley’s [511] implementation for computing logarithms in F∗2401 produced a total of 117164 equations from a factor base consisting of the 58636 irreducible polynomials in F2 [x] of degree at most 19. The system of equations had 2068707 non-zero entries. Structured Gaussian elimination was then applied to this system, the result being a 16139 × 16139 system of equations having 1203414 nonzero entries, which was then solved using the conjugate gradient method. Another example is from the recent factorization of the RSA-129 number (see Atkins et al.

[59]) The sieving step produced a sparse matrix of 569466 rows and 524339 columns. Structured Gaussian elimination was used to reduce this to a dense 188614 × 188160 system, which was then solved using ordinary Gaussian elimination. There are many ways of representing a finite field, although any two finite fields of the same order are isomorphic (see also Note 3.55) Lenstra [757] showed how to compute an isomorphism between any two explicitly given representations of a finite field in deterministic polynomial time. Thus, it is sufficient to find an algorithm for computing discrete logarithms in one representation of a given field; this algorithm can then be used, together with the isomorphism obtained by Lenstra’s algorithm, to compute logarithms in any other representation of the same field. Menezes, Okamoto, and Vanstone [843] showed how the discrete logarithm problem for an elliptic curve over a finite field Fq can be reduced to the discrete logarithm problem in some extension

field Fqk . For the special class of supersingular curves, k is at most 6, thus providing a subexponential-time algorithm for the former problem This work was extended by Frey and Rück [422]. No subexponential-time algorithm is known for the discrete logarithm problem in the more general class of non-supersingular elliptic curves Adleman, DeMarrais, and Huang [12] presented a subexponential-time algorithm for finding logarithms in the jacobian of large genus hyperelliptic curves over finite fields. More precisely, there exists a number c, 0 < c ≤ 2.181, such that for all sufficiently large g ≥ 1 and all odd primes p with log p ≤ (2g + 1)0.98 , the expected running time of the algorithm for computing logarithms in the jacobian of a genus g hyperelliptic curve over Zp is conjectured to be Lp2g+1 [ 12 , c]. McCurley [826] invented a subexponential-time algorithm for the discrete logarithm problem in the class group of an imaginary quadratic number field. See also Hafner and

McCurley [537] for further details, and Buchmann and Düllmann [216] for an implementation report. In 1994, Shor [1128] conceived randomized polynomial-time algorithms for computing discrete logarithms and factoring integers on a quantum computer, a computational device based on quantum mechanical principles; presently it is not known how to build a quantum computer, nor if this is even possible. Also recently, Adleman [10] demonstrated the feasibility of using tools from molecular biology to solve an instance of the directed Hamiltonian path problem, which is NP-complete. The problem instance was encoded in molecules of DNA, and the steps of the computation were performed with standard protocols and enzymes. Adleman notes that while the currently available fastest supercomputers can execute approximately 1012 operations per second, it is plausible for a DNA computer to execute 1020 or more operations per second Moreover such a DNA computer would be far more energy-efficient than

existing supercomputers. It is not clear at present whether it is feasible to build a DNA computer with such performance. However, should either quantum computers or DNA computers ever become practical, they would have a very significant c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §3.12 Notes and further references 131 impact on public-key cryptography. §3.7 Fact 3.77(i) is due to den Boer [323] Fact 377(iii) was proven by Maurer [817], who also proved more generally that the GDHP and GDLP in a group G of order n are computationally equivalent when certain extra information of length O(lg n) bits is given. The extra information depends only on n and not on the definition of G, and consists of parameters that define cyclic elliptic curves of smooth order over the fields Zpi where the pi are the prime divisors of n. Waldvogel and Massey [1228] proved that if a and b are chosen uniformly and randomly from the interval {0, 1, . , p−1}, the values αab

mod p are roughly uniformly distributed (see page 537). §3.8 Facts 3.78 and 379 are due to Bach [62] Fact 380 is due to Shmuely [1127] McCurley [825] refined this result to prove that for specially chosen composite n, the ability to solve the Diffie-Hellman problem in Z∗n for the fixed base α = 16 implies the ability to factor n. §3.9 The notion of a hard Boolean predicate (Definition 3.81) was introduced by Blum and Micali [166], who also proved Fact 384 The notion of a hard k-bit predicate (Definition 382) was introduced by Long and Wigderson [772], who also proved Fact 3.85; see also Peralta [968]. Fact 383 is due to Peralta [968] The results on hard predicates and k-bit predicates for the RSA functions (Facts 3.86 and 387) are due to Alexi et al [23] Facts 388 and 389 are due to Vazirani and Vazirani [1218]. Yao [1258] showed how any one-way length-preserving permutation can be transformed into a more complicated one-way length-preserving permutation which has a hard

predicate. Subsequently, Goldreich and Levin [471] showed how any one-way function f can be transformed into a one-way function g which has a hard predicate. Their construction is as follows. Define the function g by g(p, x) = (p, f (x)), where p is a binary P string of the same length as x, say n. Then g is also a one-way function and B(p, x) = ni=1 pi xi mod 2 is a hard predicate for g. Håstad, Schrift, and Shamir [543] considered the one-way function f (x) = αx mod n, where n is a Blum integer and α ∈ Z∗n . Under the assumption that factoring Blum integers is intractable, they proved that all the bits of this function are individually hard. Moreover, the lower half as well as the upper half of the bits are simultaneously secure. §3.10 The subset sum problem (Definition 3.90) is sometimes confused with the knapsack problem which is the following: given two sets {a1 , a2 , , an } and {b1 , b2 , , bn } of positive integers, and given two positive determine whether or not

there is a Pintegers s and t,P subset S of {1, 2, . , n} such that i∈S ai ≤ s and i∈S bi ≥ t The subset sum problem is actually a special case of the knapsack problem when ai = bi for i = 1, 2, , n and s = t. Algorithm 394 is described by Odlyzko [941] The L3 -lattice basis reduction algorithm (Algorithm 3.101) and Fact 3103 are both due to Lenstra, Lenstra, and Lovász [750]. Improved algorithms have been given for lattice basis reduction, for example, by Schnorr and Euchner [1099]; consult also Section 2.6 of Cohen [263]. Algorithm 3105 for solving the subset sum problem involving knapsacks sets of low density is from Coster et al. [283] Unusually good simultaneous diophantine approximations were first introduced and studied by Lagarias [723]; Fact 3107 and Algorithm 3108 are from this paper. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 132 Ch. 3 Number-Theoretic Reference Problems §3.11 A readable introduction to polynomial

factorization algorithms is given by Lidl and Niederreiter [764, Chapter 4]. Algorithm 3110 for square-free factorization is from Geddes, Czapor, and Labahn [445] Yun [1261] presented an algorithm that is more efficient than Algorithm 3110 for finding the square-free factorization of a polynomial The running time of the algorithm is only O(n2 ) Zp -operations when f (x) is a polynomial of degree n in Zp [x]. A lucid presentation of Yun’s algorithm is provided by Bach and Shallit [70]. Berlekamp’s Q-matrix algorithm (Algorithm 3.111) was first discovered by Prange [999] for the purpose of factoring polynomials of the form xn − 1 over finite fields. The algorithm was later and independently discovered by Berlekamp [117] who improved it for factoring general polynomials over finite fields. There is no deterministic polynomial-time algorithm known for the problem of factoring polynomials over finite fields. There are, however, many efficient randomized algorithms that work well even

when the underlying field is very large, such as the algorithms given by Ben-Or [109], Berlekamp [119], Cantor and Zassenhaus [232], and Rabin [1025]. For recent work along these lines, see von zur Gathen and Shoup [1224], as well as Kaltofen and Shoup [658]. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter Chapter 4 Public-Key Parameters Contents in Brief 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Introduction . Probabilistic primality tests . (True) Primality tests . Prime number generation . Irreducible polynomials over Zp . Generators and elements of high order Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 135 142 145 154 160 165 4.1 Introduction The efficient generation of public-key parameters is a prerequisite in

public-key systems. A specific example is the requirement of a prime number p to define a finite field Zp for use in the Diffie-Hellman key agreement protocol and its derivatives (§12.6) In this case, an element of high order in Z∗p is also required. Another example is the requirement of primes p and q for an RSA modulus n = pq (§8.2) In this case, the prime must be of sufficient size, and be “random” in the sense that the probability of any particular prime being selected must be sufficiently small to preclude an adversary from gaining advantage through optimizing a search strategy based on such probability. Prime numbers may be required to have certain additional properties, in order that they do not make the associated cryptosystems susceptible to specialized attacks. A third example is the requirement of an irreducible polynomial f (x) of degree m over the finite field Zp for constructing the finite field Fpm . In this case, an element of high order in F∗pm is also

required Chapter outline The remainder of §4.1 introduces basic concepts relevant to prime number generation and summarizes some results on the distribution of prime numbers. Probabilistic primality tests, the most important of which is the Miller-Rabin test, are presented in §4.2 True primality tests by which arbitrary integers can be proven to be prime are the topic of §4.3; since these tests are generally more computationally intensive than probabilistic primality tests, they are not described in detail. §44 presents four algorithms for generating prime numbers, strong primes, and provable primes. §45 describes techniques for constructing irreducible and primitive polynomials, while §4.6 considers the production of generators and elements of high orders in groups. §47 concludes with chapter notes and references 133 134 Ch. 4 Public-Key Parameters 4.11 Approaches to generating large prime numbers To motivate the organization of this chapter and introduce many of the

relevant concepts, the problem of generating large prime numbers is first considered. The most natural method is to generate a random number n of appropriate size, and check if it is prime. √ This can be done by checking whether n is divisible by any of the prime numbers ≤ n. While more efficient methods are required in practice, to motivate further discussion consider the following approach: 1. Generate as candidate a random odd number n of appropriate size 2. Test n for primality 3. If n is composite, return to the first step A slight modification is to consider candidates restricted to some search sequence starting from n; a trivial search sequence which may be used is n, n + 2, n + 4, n + 6, . Using specific search sequences may allow one to increase the expectation that a candidate is prime, and to find primes possessing certain additional desirable properties a priori. In step 2, the test for primality might be either a test which proves that the candidate is prime (in

which case the outcome of the generator is called a provable prime), or a test which establishes a weaker result, such as that n is “probably prime” (in which case the outcome of the generator is called a probable prime). In the latter case, careful consideration must be given to the exact meaning of this expression. Most so-called probabilistic primality tests are absolutely correct when they declare candidates n to be composite, but do not provide a mathematical proof that n is prime in the case when such a number is declared to be “probably” so. In the latter case, however, when used properly one may often be able to draw conclusions more than adequate for the purpose at hand. For this reason, such tests are more properly called compositeness tests than probabilistic primality tests. True primality tests, which allow one to conclude with mathematical certainty that a number is prime, also exist, but generally require considerably greater computational resources. While (true)

primality tests can determine (with mathematical certainty) whether a typically random candidate number is prime, other techniques exist whereby candidates n are specially constructed such that it can be established by mathematical reasoning whether a candidate actually is prime. These are called constructive prime generation techniques A final distinction between different techniques for prime number generation is the use of randomness. Candidates are typically generated as a function of a random input The technique used to judge the primality of the candidate, however, may or may not itself use random numbers. If it does not, the technique is deterministic, and the result is reproducible; if it does, the technique is said to be randomized. Both deterministic and randomized probabilistic primality tests exist In some cases, prime numbers are required which have additional properties. For example, to make the extraction of discrete logarithms in Z∗p resistant to an algorithm due to

Pohlig and Hellman (§3.64), it is a requirement that p − 1 have a large prime divisor Thus techniques for generating public-key parameters, such as prime numbers, of special form need to be considered. 4.12 Distribution of prime numbers Let π(x) denote the number of primes in the interval [2, x]. The prime number theorem (Fact 2.95) states that π(x) ∼ lnxx 1 In other words, the number of primes in the interval 1 If f (x) and g(x) are two functions, then f (x) ∼ g(x) means that limx∞ f (x) g(x) = 1. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.2 Probabilistic primality tests 135 [2, x] is approximately equal to lnxx . The prime numbers are quite uniformly distributed, as the following three results illustrate. 4.1 Fact (Dirichlet theorem) If gcd(a, n) = 1, then there are infinitely many primes congruent to a modulo n. A more explicit version of Dirichlet’s theorem is the following. 4.2 Fact Let π(x, n, a) denote the number of primes

in the interval [2, x] which are congruent to a modulo n, where gcd(a, n) = 1. Then x . π(x, n, a) ∼ φ(n) ln x In other words, the prime numbers are roughly uniformly distributed among the φ(n) congruence classes in Z∗n , for any value of n. 4.3 Fact (approximation for the nth prime number) Let pn denote the nth prime number Then pn ∼ n ln n. More explicitly, n ln n < pn < n(ln n + ln ln n) for n ≥ 6. 4.2 Probabilistic primality tests The algorithms in this section are methods by which arbitrary positive integers are tested to provide partial information regarding their primality. More specifically, probabilistic primality tests have the following framework For each odd positive integer n, a set W (n) ⊂ Zn is defined such that the following properties hold: (i) given a ∈ Zn , it can be checked in deterministic polynomial time whether a ∈ W (n); (ii) if n is prime, then W (n) = ∅ (the empty set); and (iii) if n is composite, then #W (n) ≥ n2 . 4.4 Definition

If n is composite, the elements of W (n) are called witnesses to the compositeness of n, and the elements of the complementary set L(n) = Zn − W (n) are called liars. A probabilistic primality test utilizes these properties of the sets W (n) in the following manner. Suppose that n is an integer whose primality is to be determined An integer a ∈ Zn is chosen at random, and it is checked if a ∈ W (n). The test outputs “composite” if a ∈ W (n), and outputs “prime” if a 6∈ W (n). If indeed a ∈ W (n), then n is said to fail the primality test for the base a; in this case, n is surely composite. If a 6∈ W (n), then n is said to pass the primality test for the base a; in this case, no conclusion with absolute certainty can be drawn about the primality of n, and the declaration “prime” may be incorrect.2 Any single execution of this test which declares “composite” establishes this with certainty. On the other hand, successive independent runs of the test all of

which return the answer “prime” allow the confidence that the input is indeed prime to be increased to whatever level is desired the cumulative probability of error is multiplicative over independent trials. If the test is run t times independently on the composite number n, the probability that n is declared “prime” all t times (i.e, the probability of error) is at most ( 12 )t 2 This discussion illustrates why a probabilistic primality test is more properly called a compositeness test. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 136 Ch. 4 Public-Key Parameters 4.5 Definition An integer n which is believed to be prime on the basis of a probabilistic primality test is called a probable prime Two probabilistic primality tests are covered in this section: the Solovay-Strassen test (§4.22) and the Miller-Rabin test (§423) For historical reasons, the Fermat test is first discussed in §4.21; this test is not truly a probabilistic

primality test since it usually fails to distinguish between prime numbers and special composite integers called Carmichael numbers. 4.21 Fermat’s test Fermat’s theorem (Fact 2.127) asserts that if n is a prime and a is any integer, 1 ≤ a ≤ n−1, then an−1 ≡ 1 (mod n). Therefore, given an integer n whose primality is under question, finding any integer a in this interval such that this equivalence is not true suffices to prove that n is composite. 4.6 Definition Let n be an odd composite integer An integer a, 1 ≤ a ≤ n − 1, such that an−1 6≡ 1 (mod n) is called a Fermat witness (to compositeness) for n. Conversely, finding an integer a between 1 and n − 1 such that an−1 ≡ 1 (mod n) makes n appear to be a prime in the sense that it satisfies Fermat’s theorem for the base a. This motivates the following definition and Algorithm 4.9 4.7 Definition Let n be an odd composite integer and let a be an integer, 1 ≤ a ≤ n − 1 Then n is said to be a

pseudoprime to the base a if an−1 ≡ 1 (mod n). The integer a is called a Fermat liar (to primality) for n. 4.8 Example (pseudoprime) The composite integer n = 341 (= 11 × 31) is a pseudoprime to the base 2 since 2340 ≡ 1 (mod 341).  4.9 Algorithm Fermat primality test FERMAT(n,t) INPUT: an odd integer n ≥ 3 and security parameter t ≥ 1. OUTPUT: an answer “prime” or “composite” to the question: “Is n prime?” 1. For i from 1 to t do the following: 1.1 Choose a random integer a, 2 ≤ a ≤ n − 2 1.2 Compute r = an−1 mod n using Algorithm 2143 1.3 If r 6= 1 then return(“composite”) 2. Return(“prime”) If Algorithm 4.9 declares “composite”, then n is certainly composite On the other hand, if the algorithm declares “prime” then no proof is provided that n is indeed prime. Nonetheless, since pseudoprimes for a given base a are known to be rare, Fermat’s test provides a correct answer on most inputs; this, however, is quite distinct from providing

a correct answer most of the time (e.g, if run with different bases) on every input In fact, it does not do the latter because there are (even rarer) composite numbers which are pseudoprimes to every base a for which gcd(a, n) = 1. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.2 Probabilistic primality tests 137 4.10 Definition A Carmichael number n is a composite integer such that an−1 ≡ 1 (mod n) for all integers a which satisfy gcd(a, n) = 1. If n is a Carmichael number, then the only Fermat witnesses for n are those integers a, 1 ≤ a ≤ n − 1, for which gcd(a, n) > 1. Thus, if the prime factors of n are all large, then with high probability the Fermat test declares that n is “prime”, even if the number of iterations t is large. This deficiency in the Fermat test is removed in the Solovay-Strassen and Miller-Rabin probabilistic primality tests by relying on criteria which are stronger than Fermat’s theorem. This subsection is

concluded with some facts about Carmichael numbers. If the prime factorization of n is known, then Fact 4.11 can be used to easily determine whether n is a Carmichael number. 4.11 Fact (necessary and sufficient conditions for Carmichael numbers) A composite integer n is a Carmichael number if and only if the following two conditions are satisfied: (i) n is square-free, i.e, n is not divisible by the square of any prime; and (ii) p − 1 divides n − 1 for every prime divisor p of n. A consequence of Fact 4.11 is the following 4.12 Fact Every Carmichael number is the product of at least three distinct primes 4.13 Fact (bounds for the number of Carmichael numbers) (i) There are an infinite number of Carmichael numbers. In fact, there are more than n2/7 Carmichael numbers in the interval [2, n], once n is sufficiently large. (ii) The best upper bound known for C(n), the number of Carmichael numbers ≤ n, is: C(n) ≤ n1−{1+o(1)} ln ln ln n/ ln ln n for n ∞. The smallest Carmichael

number is n = 561 = 3 × 11 × 17. Carmichael numbers are relatively scarce; there are only 105212 Carmichael numbers ≤ 1015 . 4.22 Solovay-Strassen test The Solovay-Strassen probabilistic primality test was the first such test popularized by the advent of public-key cryptography, in particular the RSA cryptosystem. There is no longer any reason to use this test, because an alternative is available (the Miller-Rabin test) which is both more efficient and always at least as correct (see Note 4.33) Discussion is nonetheless included for historical completeness and to clarify this exact point, since many people continue to reference this test.  Recall (§2.45) that na denotes the Jacobi symbol, and is equivalent to the Legendre symbol if n is prime. The Solovay-Strassen test is based on the following fact 4.14 Fact (Euler’s criterion) Let n be an odd prime Then a(n−1)/2 ≡ integers a which satisfy gcd(a, n) = 1.  a n (mod n) for all Fact 4.14 motivates the following

definitions 4.15 Definition Let n be an odd composite integer and let a be an integer, 1 ≤ a ≤ n − 1  (i) If either gcd(a, n) > 1 or a(n−1)/2 6≡ na (mod n), then a is called an Euler witness (to compositeness) for n. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 138 Ch. 4 Public-Key Parameters  (ii) Otherwise, i.e, if gcd(a, n) = 1 and a(n−1)/2 ≡ na (mod n), then n is said to be an Euler pseudoprime to the base a. (That is, n acts like a prime in that it satisfies Euler’s criterion for the particular base a.) The integer a is called an Euler liar (to primality) for n. 4.16 Example (Euler pseudoprime) The composite integer  91 (= 7 × 13) is an Euler pseudo9 prime to the base 9 since 945 ≡ 1 (mod 91) and 91 = 1.  Euler’s criterion (Fact 4.14) can be used as a basis for a probabilistic primality test because of the following result 4.17 Fact Let n be an odd composite integer Then at most φ(n)/2 of all the numbers a, 1 ≤

a ≤ n − 1, are Euler liars for n (Definition 4.15) Here, φ is the Euler phi function (Definition 2100) 4.18 Algorithm Solovay-Strassen probabilistic primality test SOLOVAY-STRASSEN(n,t) INPUT: an odd integer n ≥ 3 and security parameter t ≥ 1. OUTPUT: an answer “prime” or “composite” to the question: “Is n prime?” 1. For i from 1 to t do the following: 1.1 Choose a random integer a, 2 ≤ a ≤ n − 2 1.2 Compute r = a(n−1)/2 mod n using Algorithm 2143 1.3 If r 6= 1 and r 6= n − 1 then return(“composite”)  1.4 Compute the Jacobi symbol s = na using Algorithm 2149 1.5 If r 6≡ s (mod n) then return (“composite”) 2. Return(“prime”) If gcd(a, n) = d, then d is a divisor of r = a(n−1)/2 mod n. Hence, testing whether r 6= 1 is step 1.3, eliminates the necessity of testing whether gcd(a, n) 6= 1 If Algorithm 418 declares “composite”, then n is certainly composite because prime numbers do not violate Euler’s criterion (Fact 4.14) Equivalently,

if n is actually prime, then the algorithm always declares “prime” On the other hand, if n is actually composite, then since the bases a in step 1.1 are chosen independently during each iteration of step 1, Fact 417 can be used to deduce the following probability of the algorithm erroneously declaring “prime”. 4.19 Fact (Solovay-Strassen error-probability bound) Let n be an odd composite integer The probability that SOLOVAY-STRASSEN(n,t) declares n to be “prime” is less than ( 12 )t . 4.23 Miller-Rabin test The probabilistic primality test used most in practice is the Miller-Rabin test, also known as the strong pseudoprime test. The test is based on the following fact 4.20 Fact Let n be an odd prime, and let n − 1 = 2s r where r is odd Let a be any integer j such that gcd(a, n) = 1. Then either ar ≡ 1 (mod n) or a2 r ≡ −1 (mod n) for some j, 0 ≤ j ≤ s − 1. Fact 4.20 motivates the following definitions c 1997 by CRC Press, Inc. See accompanying notice at

front of chapter §4.2 Probabilistic primality tests 139 4.21 Definition Let n be an odd composite integer and let n − 1 = 2s r where r is odd Let a be an integer in the interval [1, n − 1]. (i) If ar 6≡ 1 (mod n) and if a2 r 6≡ −1 (mod n) for all j, 0 ≤ j ≤ s − 1, then a is called a strong witness (to compositeness) for n. j (ii) Otherwise, i.e, if either ar ≡ 1 (mod n) or a2 r ≡ −1 (mod n) for some j, 0 ≤ j ≤ s − 1, then n is said to be a strong pseudoprime to the base a. (That is, n acts like a prime in that it satisfies Fact 4.20 for the particular base a) The integer a is called a strong liar (to primality) for n. j 4.22 Example (strong pseudoprime) Consider the composite integer n = 91 (= 7×13) Since 91 − 1 = 90 = 2 × 45, s = 1 and r = 45. Since 9r = 945 ≡ 1 (mod 91), 91 is a strong pseudoprime to the base 9. The set of all strong liars for 91 is: {1, 9, 10, 12, 16, 17, 22, 29, 38, 53, 62, 69, 74, 75, 79, 81, 82, 90}. Notice that the

number of strong liars for 91 is 18 = φ(91)/4, where φ is the Euler phi function (cf. Fact 423)  Fact 4.20 can be used as a basis for a probabilistic primality test due to the following result 4.23 Fact If n is an odd composite integer, then at most 14 of all the numbers a, 1 ≤ a ≤ n − 1, are strong liars for n. In fact, if n 6= 9, the number of strong liars for n is at most φ(n)/4, where φ is the Euler phi function (Definition 2.100) 4.24 Algorithm Miller-Rabin probabilistic primality test MILLER-RABIN(n,t) INPUT: an odd integer n ≥ 3 and security parameter t ≥ 1. OUTPUT: an answer “prime” or “composite” to the question: “Is n prime?” 1. Write n − 1 = 2s r such that r is odd 2. For i from 1 to t do the following: 2.1 Choose a random integer a, 2 ≤ a ≤ n − 2 2.2 Compute y = ar mod n using Algorithm 2143 2.3 If y 6= 1 and y 6= n − 1 then do the following: j←1. While j ≤ s − 1 and y 6= n − 1 do the following: Compute y←y 2 mod n. If y = 1

then return(“composite”). j←j + 1. If y 6= n − 1 then return (“composite”). 3. Return(“prime”) Algorithm 4.24 tests whether each base a satisfies the conditions of Definition 421(i) j In the fifth line of step 2.3, if y = 1, then a2 r ≡ 1 (mod n) Since it is also the case that j−1 j−1 a2 r 6≡ ±1 (mod n), it follows from Fact 3.18 that n is composite (in fact gcd(a2 r − 1, n) is a non-trivial factor of n). In the seventh line of step 23, if y 6= n − 1, then a is a strong witness for n. If Algorithm 424 declares “composite”, then n is certainly composite because prime numbers do not violate Fact 420 Equivalently, if n is actually prime, then the algorithm always declares “prime”. On the other hand, if n is actually composite, then Fact 4.23 can be used to deduce the following probability of the algorithm erroneously declaring “prime”. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 140 Ch. 4 Public-Key Parameters

4.25 Fact (Miller-Rabin error-probability bound) For any odd composite integer n, the probability that MILLER-RABIN(n,t) declares n to be “prime” is less than ( 14 )t 4.26 Remark (number of strong liars) For most composite integers n, the number of strong liars for n is actually much smaller than the upper bound of φ(n)/4 given in Fact 4.23 Consequently, the Miller-Rabin error-probability bound is much smaller than ( 14 )t for most positive integers n. 4.27 Example (some composite integers have very few strong liars) The only strong liars for the composite integer n = 105 (= 3 × 5 × 7) are 1 and 104. More generally, if k ≥ 2 and n is the product of the first k odd primes, there are only 2 strong liars for n, namely 1 and n − 1.  4.28 Remark (fixed bases in Miller-Rabin) If a1 and a2 are strong liars for n, their product a1 a2 is very likely, but not certain, to also be a strong liar for n. A strategy that is sometimes employed is to fix the bases a in the Miller-Rabin

algorithm to be the first few primes (composite bases are ignored because of the preceding statement), instead of choosing them at random. 4.29 Definition Let p1 , p2 , , pt denote the first t primes Then ψt is defined to be the smallest positive composite integer which is a strong pseudoprime to all the bases p1 , p2 , , pt The numbers ψt can be interpreted as follows: to determine the primality of any integer n < ψt , it is sufficient to apply the Miller-Rabin algorithm to n with the bases a being the first t prime numbers. With this choice of bases, the answer returned by Miller-Rabin is always correct. Table 41 gives the value of ψt for 1 ≤ t ≤ 8 t 1 2 3 4 5 6 7 8 ψt 2047 1373653 25326001 3215031751 2152302898747 3474749660383 341550071728321 341550071728321 Table 4.1: Smallest strong pseudoprimes The table lists values of ψt , the smallest positive composite integer that is a strong pseudoprime to each of the first t prime bases, for 1 ≤ t ≤ 8. 4.24

Comparison: Fermat, Solovay-Strassen, and Miller-Rabin Fact 4.30 describes the relationships between Fermat liars, Euler liars, and strong liars (see Definitions 4.7, 415, and 421) 4.30 Fact Let n be an odd composite integer (i) If a is an Euler liar for n, then it is also a Fermat liar for n. (ii) If a is a strong liar for n, then it is also an Euler liar for n. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.2 Probabilistic primality tests 141 4.31 Example (Fermat, Euler, strong liars) Consider the composite integer n = 65 (= 5 × 13). The Fermat liars for 65 are {1, 8, 12, 14, 18, 21, 27, 31, 34, 38, 44, 47, 51, 53, 57, 64} The Euler liars for 65 are {1, 8, 14, 18, 47, 51, 57, 64}, while the strong liars for 65 are {1, 8, 18, 47, 57, 64}.  For a fixed composite candidate n, the situation is depicted in Figure 4.1 This setFermat liars for n Euler liars for n strong liars for n Figure 4.1: Relationships between Fermat, Euler, and strong liars for a

composite integer n tles the question of the relative accuracy of the Fermat, Solovay-Strassen, and Miller-Rabin tests, not only in the sense of the relative correctness of each test on a fixed candidate n, but also in the sense that given n, the specified containments hold for each randomly chosen base a. Thus, from a correctness point of view, the Miller-Rabin test is never worse than the Solovay-Strassen test, which in turn is never worse than the Fermat test. As the following result shows, there are, however, some composite integers n for which the Solovay-Strassen and Miller-Rabin tests are equally good. 4.32 Fact If n ≡ 3 (mod 4), then a is an Euler liar for n if and only if it is a strong liar for n What remains is a comparison of the computational costs. While the Miller-Rabin test may appear more complex, it actually requires, at worst, the same amount of computation as Fermat’s test in terms of modular multiplications; thus the Miller-Rabin test is better than Fermat’s

test in all regards. At worst, the sequence of computations defined in MILLERRABIN(n,1) requires the equivalent of computing a(n−1)/2 mod n It is also the case that MILLER-RABIN(n,1) requires less computation than SOLOVAY-STRASSEN(n,1), the latter requiring the computation of a(n−1)/2 mod n and possibly a further Jacobi symbol computation. For this reason, the Solovay-Strassen test is both computationally and conceptually more complex 4.33 Note (Miller-Rabin is better than Solovay-Strassen) In summary, both the Miller-Rabin and Solovay-Strassen tests are correct in the event that either their input is actually prime, or that they declare their input composite. There is, however, no reason to use the SolovayStrassen test (nor the Fermat test) over the Miller-Rabin test The reasons for this are summarized below (i) The Solovay-Strassen test is computationally more expensive. (ii) The Solovay-Strassen test is harder to implement since it also involves Jacobi symbol computations. (iii)

The error probability for Solovay-Strassen is bounded above by ( 12 )t , while the error probability for Miller-Rabin is bounded above by ( 14 )t . Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 142 Ch. 4 Public-Key Parameters (iv) Any strong liar for n is also an Euler liar for n. Hence, from a correctness point of view, the Miller-Rabin test is never worse than the Solovay-Strassen test. 4.3 (True) Primality tests The primality tests in this section are methods by which positive integers can be proven to be prime, and are often referred to as primality proving algorithms. These primality tests are generally more computationally intensive than the probabilistic primality tests of §4.2 Consequently, before applying one of these tests to a candidate prime n, the candidate should be subjected to a probabilistic primality test such as Miller-Rabin (Algorithm 4.24) 4.34 Definition An integer n which is determined to be prime on the basis of a

primality proving algorithm is called a provable prime 4.31 Testing Mersenne numbers Efficient algorithms are known for testing primality of some special classes of numbers, such as Mersenne numbers and Fermat numbers. Mersenne primes n are useful because the arithmetic in the field Zn for such n can be implemented very efficiently (see §14.34) The Lucas-Lehmer test for Mersenne numbers (Algorithm 4.37) is such an algorithm 4.35 Definition Let s ≥ 2 be an integer A Mersenne number is an integer of the form 2s − 1 If 2s − 1 is prime, then it is called a Mersenne prime. The following are necessary and sufficient conditions for a Mersenne number to be prime. 4.36 Fact Let s ≥ 3 The Mersenne number n = 2s − 1 is prime if and only if the following two conditions are satisfied: (i) s is prime; and (ii) the sequence of integers defined by u0 = 4 and uk+1 = (u2k − 2) mod n for k ≥ 0 satisfies us−2 = 0. Fact 4.36 leads to the following deterministic polynomial-time algorithm

for determining (with certainty) whether a Mersenne number is prime 4.37 Algorithm Lucas-Lehmer primality test for Mersenne numbers INPUT: a Mersenne number n = 2s − 1 with s ≥ 3. OUTPUT: an answer “prime” or “composite” to the question: “Is n prime?” √ 1. Use trial division to check if s has any factors between 2 and b sc If it does, then return(“composite”). 2. Set u←4 3. For k from 1 to s − 2 do the following: compute u←(u2 − 2) mod n 4. If u = 0 then return(“prime”) Otherwise, return(“composite”) It is unknown whether there are infinitely many Mersenne primes. Table 42 lists the 33 known Mersenne primes. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.3 (True) Primality tests Index j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 143 Mj 2 3 5 7 13 17 19 31 61 89 107 127 521 607 1279 2203 2281 decimal digits 1 1 2 3 4 6 6 10 19 27 33 39 157 183 386 664 687 Index j 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32? 33?

Mj 3217 4253 4423 9689 9941 11213 19937 21701 23209 44497 86243 110503 132049 216091 756839 859433 decimal digits 969 1281 1332 2917 2993 3376 6002 6533 6987 13395 25962 33265 39751 65050 227832 258716 Table 4.2: Known Mersenne primes The table shows the 33 known exponents Mj , 1 ≤ j ≤ 33, for which 2Mj − 1 is a Mersenne prime, and also the number of decimal digits in 2Mj − 1. The question marks after j = 32 and j = 33 indicate that it is not known whether there are any other exponents s between M31 and these numbers for which 2s − 1 is prime. 4.32 Primality testing using the factorization of n − 1 This section presents results which can be used to prove that an integer n is prime, provided that the factorization or a partial factorization of n−1 is known. It may seem odd to consider a technique which requires the factorization of n − 1 as a subproblem if integers of this size can be factored, the primality of n itself could be determined by factoring n. However,

the factorization of n−1 may be easier to compute if n has a special form, such as a Fermat k number n = 22 + 1. Another situation where the factorization of n − 1 may be easy to compute is when the candidate n is “constructed” by specific methods (see §4.44) 4.38 Fact Let n ≥ 3 be an integer Then n is prime if and only if there exists an integer a satisfying: (i) an−1 ≡ 1 (mod n); and (ii) a(n−1)/q 6≡ 1 (mod n) for each prime divisor q of n − 1. This result follows from the fact that Z∗n has an element of order n − 1 (Definition 2.128) if and only if n is prime; an element a satisfying conditions (i) and (ii) has order n − 1. 4.39 Note (primality test based on Fact 438) If n is a prime, the number of elements of order n − 1 is precisely φ(n − 1). Hence, to prove a candidate n prime, one may simply choose an integer a ∈ Zn at random and uses Fact 4.38 to check if a has order n − 1 If this is the case, then n is certainly prime. Otherwise, another a

∈ Zn is selected and the test is repeated. If n is indeed prime, the expected number of iterations before an element a of order n − 1 is selected is O(ln ln n); this follows since (n − 1)/φ(n − 1) < 6 ln ln n for Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 144 Ch. 4 Public-Key Parameters n ≥ 5 (Fact 2.102) Thus, if such an a is not found after a “reasonable” number (for example, 12 ln ln n) of iterations, then n is probably composite and should again be subjected to a probabilistic primality test such as Miller-Rabin (Algorithm 4.24)3 This method is, in effect, a probabilistic compositeness test. The next result gives a method for proving primality which requires knowledge of only a partial factorization of n − 1. 4.40 Fact (Pocklington’s theorem) Let n ≥ 3 be an integer, n = RF + 1 (i.e F divides Qt and let e n − 1) where the prime factorization of F is F = j=1 qj j . If there exists an integer a satisfying: (i) an−1

≡ 1 (mod n); and (ii) gcd(a(n−1)/qj − 1, n) = 1 for each j, 1 ≤ j ≤ t, √ then every prime divisor p of n is congruent to 1 modulo F . It follows that if F > n − 1, then n is prime. If n is indeed prime, then the following result establishes that most integers √ a satisfy conditions (i) and (ii) of Fact 4.40, provided that the prime divisors of F > n − 1 are sufficiently large. √ 4.41 Fact Let n = RF + 1 be an odd prime with F > n − 1 and gcd(R, F ) = 1 Let the distinct prime factors of F be q1 , q2 , . , qt Then the probability that a randomly selected n−1 base a, 1 ≤ a ≤ n − 1, satisfies both: ≡ 1 (modPn); and (ii) gcd(a(n−1)/qj − Qt (i) a t 1, n) = 1 for each j, 1 ≤ j ≤ t, is j=1 (1 − 1/qj ) ≥ 1 − j=1 1/qj . √ Thus, if the factorization of a divisor F > n − 1 of n − 1 is known then to test n for primality, one may simply choose random integers a in the interval [2, n − 2] until one is found satisfying conditions (i)

and (ii) of Fact 4.40, implying that n is prime If such an a is not found after a “reasonable” number of iterations,4 then n is probably composite and this could be established by subjecting it to a probabilistic primality test (footnote 3 also applies here). This method is, in effect, a probabilistic compositeness test The next result gives a method for proving√ primality which only requires the factorization of a divisor F of n − 1 that is greater than 3 n. For an example of the use of Fact 442, see Note 4.63 4.42 Fact Let n ≥ 3 be an odd integer Let n = 2RF + 1, and suppose that there exists an integer a satisfying both: (i) an−1 ≡ 1 (mod n); and (ii) gcd(a(n−1)/q − 1, n) = 1 for each prime √ divisor q of F . Let x ≥ 0 and y be defined by 2R = xF + y and 0 ≤ y < F If F ≥ 3 n and if y 2 − 4x is neither 0 nor a perfect square, then n is prime. 4.33 Jacobi sum test The Jacobi sum test is another true primality test. The basic idea is to test a set of

congruences which are analogues of Fermat’s theorem (Fact 2127(i)) in certain cyclotomic rings. The running time of the Jacobi sum test for determining the primality of an integer n is O((ln n)c ln ln ln n ) bit operations for some constant c. This is “almost” a polynomialtime algorithm since the exponent ln ln ln n acts like a constant for the range of values for 3 Another approach is to run both algorithms in parallel (with an unlimited number of iterations), until one of them stops with a definite conclusion “prime” or “composite”. Q 4 The number of iterations may be taken to be T where P T ≤ ( 1 )100 , and where P = 1 − t j=1 (1 − 1/qj ). 2 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.4 Prime number generation 145 n of interest. For example, if n ≤ 2512 , then ln ln ln n < 178 The version of the Jacobi sum primality test used in practice is a randomized algorithm which terminates within O(k(ln n)c ln ln ln n ) steps with

probability at least 1 − ( 12 )k for every k ≥ 1, and always gives a correct answer. One drawback of the algorithm is that it does not produce a “certificate” which would enable the answer to be verified in much shorter time than running the algorithm itself. The Jacobi sum test is, indeed, practical in the sense that the primality of numbers that are several hundred decimal digits long can be handled in just a few minutes on a computer. However, the test is not as easy to program as the probabilistic Miller-Rabin test (Algorithm 4.24), and the resulting code is not as compact The details of the algorithm are complicated and are not given here; pointers to the literature are given in the chapter notes on page 166. 4.34 Tests using elliptic curves Elliptic curve primality proving algorithms are based on an elliptic curve analogue of Pocklington’s theorem (Fact 4.40) The version of the algorithm used in practice is usually referred to as Atkin’s test or the Elliptic Curve

Primality Proving algorithm (ECPP) Under heuristic arguments, the expected running time of this algorithm for proving the primality of an integer n has been shown to be O((ln n)6+ ) bit operations for any  > 0. Atkin’s test has the advantage over the Jacobi sum test (§4.33) that it produces a short certificate of primality which can be used to efficiently verify the primality of the number. Atkin’s test has been used to prove the primality of numbers more than 1000 decimal digits long. The details of the algorithm are complicated and are not presented here; pointers to the literature are given in the chapter notes on page 166. 4.4 Prime number generation This section considers algorithms for the generation of prime numbers for cryptographic purposes. Four algorithms are presented: Algorithm 444 for generating probable primes (see Definition 4.5), Algorithm 453 for generating strong primes (see Definition 452), Algorithm 456 for generating probable primes p and q suitable for

use in the Digital Signature Algorithm (DSA), and Algorithm 4.62 for generating provable primes (see Definition 434) 4.43 Note (prime generation vs primality testing) Prime number generation differs from primality testing as described in §42 and §43, but may and typically does involve the latter The former allows the construction of candidates of a fixed form which may lead to more efficient testing than possible for random candidates. 4.41 Random search for probable primes By the prime number theorem (Fact 2.95), the proportion of (positive) integers ≤ x that are prime is approximately 1/ ln x. Since half of all integers ≤ x are even, the proportion of odd integers ≤ x that are prime is approximately 2/ ln x. For instance, the proportion of all odd integers ≤ 2512 that are prime is approximately 2/(512 · ln(2)) ≈ 1/177. This suggests that a reasonable strategy for selecting a random k-bit (probable) prime is to repeatedly pick random k-bit odd integers n until one is

found that is declared to be “prime” Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 146 Ch. 4 Public-Key Parameters by MILLER-RABIN(n,t) (Algorithm 4.24) for an appropriate value of the security parameter t (discussed below) If a random k-bit odd integer n is divisible by a small prime, it is less computationally expensive to rule out the candidate n by trial division than by using the Miller-Rabin test. Since the probability that a random integer n has a small prime divisor is relatively large, before applying the Miller-Rabin test, the candidate n should be tested for small divisors below a pre-determined bound B. This can be done by dividing n by all the primes below B, or by computing greatest common divisors of n and (pre-computed) products of several of the primes Q ≤ B. The proportion of candidate odd integers n not ruled out by this trial division is 3≤p≤B (1− p1 ) which, by Mertens’s theorem, is approximately 1.12/ ln B (here

p ranges over prime values). For example, if B = 256, then only 20% of candidate odd integers n pass the trial division stage, i.e, 80% are discarded before the more costly MillerRabin test is performed 4.44 Algorithm Random search for a prime using the Miller-Rabin test RANDOM-SEARCH(k,t) INPUT: an integer k, and a security parameter t (cf. Note 449) OUTPUT: a random k-bit probable prime. 1. Generate an odd k-bit integer n at random 2. Use trial division to determine whether n is divisible by any odd prime ≤ B (see Note 4.45 for guidance on selecting B) If it is then go to step 1 3. If MILLER-RABIN(n,t) (Algorithm 424) outputs “prime” then return(n) Otherwise, go to step 1. 4.45 Note (optimal trial division bound B) Let E denote the time for a full k-bit modular exponentiation, and let D denote the time required for ruling out one small prime as divisor of a k-bit integer. (The values E and D depend on the particular implementation of longinteger arithmetic) Then the trial

division bound B that minimizes the expected running time of Algorithm 4.44 for generating a k-bit prime is roughly B = E/D A more accurate estimate of the optimum choice for B can be obtained experimentally. The odd primes up to B can be precomputed and stored in a table. If memory is scarce, a value of B that is smaller than the optimum value may be used. Since the Miller-Rabin test does not provide a mathematical proof that a number is indeed prime, the number n returned by Algorithm 4.44 is a probable prime (Definition 45) It is important, therefore, to have an estimate of the probability that n is in fact composite. 4.46 Definition The probability that RANDOM-SEARCH(k,t) (Algorithm 444) returns a composite number is denoted by pk,t . 4.47 Note (remarks on estimating pk,t ) It is tempting to conclude directly from Fact 425 that pk,t ≤ ( 14 )t . This reasoning is flawed (although typically the conclusion will be correct in practice) since it does not take into account the

distribution of the primes. (For example, if all candidates n were chosen from a set S of composite numbers, the probability of error is 1.) The following discussion elaborates on this point Let X represent the event that n is composite, and let Yt denote the event than MILLER-RABIN(n,t) declares n to be prime. Then Fact 4.25 states that P (Yt |X) ≤ ( 14 )t What is relevant, however, to the estimation of pk,t is the quantity P (X|Yt ). Suppose that candidates n are drawn uniformly and randomly c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.4 Prime number generation 147 from a set S of odd numbers, and suppose p is the probability that n is prime (this depends on the candidate set S). Assume also that 0 < p < 1 Then by Bayes’ theorem (Fact 210):  t P (X)P (Yt |X) P (Yt |X) 1 1 , P (X|Yt ) = ≤ ≤ P (Yt ) P (Yt ) p 4 since P (Yt ) ≥ p. Thus the probability P (X|Yt ) may be considerably larger than ( 14 )t if p is small. However, the

error-probability of Miller-Rabin is usually far smaller than ( 14 )t (see Remark 4.26) Using better estimates for P (Yt |X) and estimates on the number of k-bit prime numbers, it has been shown that pk,t is, in fact, smaller than ( 14 )t for all sufficiently large k. A more concrete result is the following: if candidates n are chosen at random from the set of odd numbers in the interval [3, x], then P (X|Yt ) ≤ ( 14 )t for all x ≥ 1060 . Further refinements for P (Yt |X) allow the following explicit upper bounds on pk,t for various values of k and t. 5 4.48 Fact (some upper bounds on pk,t in Algorithm 444) √ (i) (ii) (iii) (iv) pk,1 < k 2 42− k for k ≥ 2. √ pk,t < k 3/2 2t t−1/2 42− tk for (t = 2, k ≥ 88) or (3 ≤ t ≤ k/9, k ≥ 21). 7 pk,t < 20 k2−5t + 17 k 15/4 2−k/2−2t + 12k2−k/4−3t for k/9 ≤ t ≤ k/4, k ≥ 21. pk,t < 17 k 15/4 2−k/2−2t for t ≥ k/4, k ≥ 21. For example, if k = 512 and t = 6, then Fact 4.48(ii) gives p512,6

≤ ( 12 )88 In other words, the probability that RANDOM-SEARCH(512,6) returns a 512-bit composite integer is less than ( 12 )88 . Using more advanced techniques, the upper bounds on pk,t given by Fact 4.48 have been improved These upper bounds arise from complicated formulae which are not given here. Table 43 lists some improved upper bounds on pk,t for some sample values of k and t. As an example, the probability that RANDOM-SEARCH(500,6) returns a composite number is ≤ ( 12 )92 . Notice that the values of pk,t implied by the table are considerably smaller than ( 14 )t = ( 12 )2t . t k 100 150 200 250 300 350 400 450 500 550 600 1 5 8 11 14 19 28 37 46 56 65 75 2 14 20 25 29 33 38 46 54 63 72 82 3 20 28 34 39 44 48 55 62 70 79 88 4 25 34 41 47 53 58 63 70 78 86 95 5 29 39 47 54 60 66 72 78 85 93 102 6 33 43 52 60 67 73 80 85 92 100 108 7 36 47 57 65 73 80 87 93 99 107 115 8 39 51 61 70 78 86 93 100 106 113 121 9 41 54 65 75 83 91 99 106 113 119 127 10 44 57 69 79 88 97

105 112 119 126 133 Table 4.3: Upper bounds on pk,t for sample values of k and t An entry j corresponding to k and t implies pk,t ≤ ( 12 )j . 5 The estimates of p k,t presented in the remainder of this subsection were derived for the situation where Algorithm 4.44 does not use trial division by small primes to rule out some candidates n Since trial division never rules out a prime, it can only give a better chance of rejecting composites. Thus the error probability pk,t might actually be even smaller than the estimates given here. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 148 Ch. 4 Public-Key Parameters 4.49 Note (controlling the error probability) In practice, one is usually willing to tolerate an error probability of ( 12 )80 when using Algorithm 444 to generate probable primes For sample values of k, Table 44 lists the smallest value of t that can be derived from Fact 448 for which pk,t ≤ ( 12 )80 . For example, when generating

1000-bit probable primes, MillerRabin with t = 3 repetitions suffices Algorithm 444 rules out most candidates n either by trial division (in step 2) or by performing just one iteration of the Miller-Rabin test (in step 3). For this reason, the only effect of selecting a larger security parameter t on the running time of the algorithm will likely be to increase the time required in the final stage when the (probable) prime is chosen. k 100 150 200 250 300 350 400 450 t 27 18 15 12 9 8 7 6 k 500 550 600 650 700 750 800 850 t 6 5 5 4 4 4 4 3 k 900 950 1000 1050 1100 1150 1200 1250 t 3 3 3 3 3 3 3 3 k 1300 1350 1400 1450 1500 1550 1600 1650 t 2 2 2 2 2 2 2 2 k 1700 1750 1800 1850 1900 1950 2000 2050 t 2 2 2 2 2 2 2 2 Table 4.4: For sample k, the smallest t from Fact 448 is given for which pk,t ≤ ( 12 )80 4.50 Remark (Miller-Rabin test with base a = 2) The Miller-Rabin test involves exponentiating the base a; this may be performed using the repeated square-and-multiply

algorithm (Algorithm 2.143) If a = 2, then multiplication by a is a simple procedure relative to multiplying by a in general One optimization of Algorithm 444 is, therefore, to fix the base a = 2 when first performing the Miller-Rabin test in step 3. Since most composite numbers will fail the Miller-Rabin test with base a = 2, this modification will lower the expected running time of Algorithm 4.44 4.51 Note (incremental search) (i) An alternative technique to generating candidates n at random in step 1 of Algorithm 4.44 is to first select a random k-bit odd number n0 , and then test the s numbers n = n0 , n0 + 2, n0 + 4, . , n0 + 2(s − 1) for primality If all these s candidates are found to be composite, the algorithm is said to have failed. If s = c·ln 2k where c is a constant, the probability qk,t,s that this incremental search variant of Algorithm 4.44 √ 3 − k returns a composite number has been shown to be less than δk 2 for some constant δ. Table 45 gives some explicit

bounds on this error probability for k = 500 and t ≤ 10. Under reasonable number-theoretic assumptions, the probability of the algorithm failing has been shown to be less than 2e−2c for large k (here, e ≈ 271828) (ii) Incremental search has the advantage that fewer random bits are required. Furthermore, the trial division by small primes in step 2 of Algorithm 444 can be accomplished very efficiently as follows First the values R[p] = n0 mod p are computed for each odd prime p ≤ B. Each time 2 is added to the current candidate, the values in the table R are updated as R[p]←(R[p] + 2) mod p. The candidate passes the trial division stage if and only if none of the R[p] values equal 0. (iii) If B is large, an alternative method for doing the trial division is to initialize a table S[i]←0 for 0 ≤ i ≤ (s − 1); the entry S[i] corresponds to the candidate n0 + 2i. For each odd prime p ≤ B, n0 mod p is computed. Let j be the smallest index for c 1997 by CRC Press, Inc. See

accompanying notice at front of chapter §4.4 Prime number generation 149 t c 1 5 10 1 17 13 11 2 37 32 30 3 51 46 44 4 63 58 56 5 72 68 66 6 81 77 75 7 89 85 83 8 96 92 90 9 103 99 97 10 110 105 103 Table 4.5: Upper bounds on the error probability of incremental search (Note 451) for k = 500 and sample values of c and t. An entry j corresponding to c and t implies q500,t,s ≤ ( 12 )j , where s = c · ln 2500 . which (n0 + 2j) ≡ 0 (mod p). Then S[j] and each pth entry after it are set to 1 A candidate n0 + 2i then passes the trial division stage if and only if S[i] = 0. Note that the estimate for the optimal trial division bound B given in Note 4.45 does not apply here (nor in (ii)) since the cost of division is amortized over all candidates. 4.42 Strong primes The RSA cryptosystem (§8.2) uses a modulus of the form n = pq, where p and q are distinct odd primes The primes p and q must be of sufficient size that factorization of their product is beyond

computational reach. Moreover, they should be random primes in the sense that they be chosen as a function of a random input through a process defining a pool of candidates of sufficient cardinality that an exhaustive attack is infeasible. In practice, the resulting primes must also be of a pre-determined bitlength, to meet system specifications. The discovery of the RSA cryptosystem led to the consideration of several additional constraints on the choice of p and q which are necessary to ensure the resulting RSA system safe from cryptanalytic attack, and the notion of a strong prime (Definition 4.52) was defined These attacks are described at length in Note 8.8(iii); as noted there, it is now believed that strong primes offer little protection beyond that offered by random primes, since randomly selected primes of the sizes typically used in RSA moduli today will satisfy the constraints with high probability. On the other hand, they are no less secure, and require only minimal

additional running time to compute; thus, there is little real additional cost in using them. 4.52 Definition A prime number p is said to be a strong prime if integers r, s, and t exist such that the following three conditions are satisfied: (i) p − 1 has a large prime factor, denoted r; (ii) p + 1 has a large prime factor, denoted s; and (iii) r − 1 has a large prime factor, denoted t. In Definition 4.52, a precise qualification of “large” depends on specific attacks that should be guarded against; for further details, see Note 8.8(iii) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 150 Ch. 4 Public-Key Parameters 4.53 Algorithm Gordon’s algorithm for generating a strong prime SUMMARY: a strong prime p is generated. 1. Generate two large random primes s and t of roughly equal bitlength (see Note 454) 2. Select an integer i0 Find the first prime in the sequence 2it + 1, for i = i0 , i0 + 1, i0 + 2, . (see Note 454) Denote this prime by

r = 2it + 1 3. Compute p0 = 2(sr−2 mod r)s − 1 4. Select an integer j0 Find the first prime in the sequence p0 + 2jrs, for j = j0 , j0 + 1, j0 + 2, . (see Note 454) Denote this prime by p = p0 + 2jrs 5. Return(p) Justification. To see that the prime p returned by Gordon’s algorithm is indeed a strong prime, observe first (assuming r 6= s) that sr−1 ≡ 1 (mod r); this follows from Fermat’s theorem (Fact 2.127) Hence, p0 ≡ 1 (mod r) and p0 ≡ −1 (mod s) Finally (cf Definition 452), (i) p − 1 = p0 + 2jrs − 1 ≡ 0 (mod r), and hence p − 1 has the prime factor r; (ii) p + 1 = p0 + 2jrs + 1 ≡ 0 (mod s), and hence p + 1 has the prime factor s; and (iii) r − 1 = 2it ≡ 0 (mod t), and hence r − 1 has the prime factor t. 4.54 Note (implementing Gordon’s algorithm) (i) The primes s and t required in step 1 can be probable primes generated by Algorithm 4.44 The Miller-Rabin test (Algorithm 424) can be used to test each candidate for primality in steps 2 and 4,

after ruling out candidates that are divisible by a small prime less than some bound B. See Note 445 for guidance on selecting B Since the Miller-Rabin test is a probabilistic primality test, the output of this implementation of Gordon’s algorithm is a probable prime. (ii) By carefully choosing the sizes of primes s, t and parameters i0 , j0 , one can control the exact bitlength of the resulting prime p. Note that the bitlengths of r and s will be about half that of p, while the bitlength of t will be slightly less than that of r. 4.55 Fact (running time of Gordon’s algorithm) If the Miller-Rabin test is the primality test used in steps 1, 2, and 4, the expected time Gordon’s algorithm takes to find a strong prime is only about 19% more than the expected time Algorithm 4.44 takes to find a random prime 4.43 NIST method for generating DSA primes Some public-key schemes require primes satisfying various specific conditions. For example, the NIST Digital Signature Algorithm (DSA of

§1151) requires two primes p and q satisfying the following three conditions: (i) 2159 < q < 2160 ; that is, q is a 160-bit prime; (ii) 2L−1 < p < 2L for a specified L, where L = 512 + 64l for some 0 ≤ l ≤ 8; and (iii) q divides p − 1. This section presents an algorithm for generating such primes p and q. In the following, H denotes the SHA-1 hash function (Algorithm 9.53) which maps bitstrings of bitlength < 264 to 160-bit hash-codes. Where required, an integer x in the range 0 ≤ x < 2g whose binary representation is x = xg−1 2g−1 + xg−2 2g−2 + · · · + x2 22 + x1 2 + x0 should be converted to the g-bit sequence (xg−1 xg−2 · · · x2 x1 x0 ), and vice versa. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.4 Prime number generation 151 4.56 Algorithm NIST method for generating DSA primes INPUT: an integer l, 0 ≤ l ≤ 8. OUTPUT: a 160-bit prime q and an L-bit prime p, where L = 512 + 64l and q|(p − 1). 1.

Compute L = 512 + 64l Using long division of (L − 1) by 160, find n, b such that L − 1 = 160n + b, where 0 ≤ b < 160. 2. Repeat the following: 2.1 Choose a random seed s (not necessarily secret) of bitlength g ≥ 160 2.2 Compute U = H(s)⊕H((s + 1) mod 2g ) 2.3 Form q from U by setting to 1 the most significant and least significant bits of U . (Note that q is a 160-bit odd integer) 2.4 Test q for primality using MILLER-RABIN(q,t) for t ≥ 18 (see Note 457) Until q is found to be a (probable) prime. 3. Set i←0, j←2 4. While i < 4096 do the following: 4.1 For k from 0 to n do the following: set Vk ←H((s + j + k) mod 2g ) 4.2 For the integer W defined below, let X = W + 2L−1 (X is an L-bit integer) W = V0 + V1 2160 + V2 2320 + · · · + Vn−1 2160(n−1) + (Vn mod 2b )2160n . 4.3 Compute c = X mod 2q and set p = X −(c−1) (Note that p ≡ 1 (mod 2q)) 4.4 If p ≥ 2L−1 then do the following: Test p for primality using MILLER-RABIN(p,t) for t ≥ 5 (see Note

4.57) If p is a (probable) prime then return(q,p). 4.5 Set i←i + 1, j←j + n + 1 5. Go to step 2 4.57 Note (choice of primality test in Algorithm 456) (i) The FIPS 186 document where Algorithm 4.56 was originally described only specifies that a robust primality test be used in steps 24 and 44, ie, a primality test where the probability of a composite integer being declared prime is at most ( 12 )80 . If the heuristic assumption is made that q is a randomly chosen 160-bit integer then, by Table 4.4, MILLER-RABIN(q,18) is a robust test for the primality of q If p is assumed to be a randomly chosen L-bit integer, then by Table 4.4, MILLER-RABIN(p,5) is a robust test for the primality of p. Since the Miller-Rabin test is a probabilistic primality test, the output of Algorithm 456 is a probable prime (ii) To improve performance, candidate primes q and p should be subjected to trial division by all odd primes less than some bound B before invoking the Miller-Rabin test. See Note 4.45 for

guidance on selecting B 4.58 Note (“weak” primes cannot be intentionally constructed) Algorithm 456 has the feature that the random seed s is not input to the prime number generation portion of the algorithm itself, but rather to an unpredictable and uncontrollable randomization process (steps 2.2 and 4.1), the output of which is used as the actual random seed This precludes manipulation of the input seed to the prime number generation. If the seed s and counter i are made public, then anyone can verify that q and p were generated using the approved method. This feature prevents a central authority who generates p and q as system-wide parameters for use in the DSA from intentionally constructing “weak” primes q and p which it could subsequently exploit to recover other entities’ private keys. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 152 Ch. 4 Public-Key Parameters 4.44 Constructive techniques for provable primes Maurer’s algorithm

(Algorithm 4.62) generates random provable primes that are almost uniformly distributed over the set of all primes of a specified size. The expected time for generating a prime is only slightly greater than that for generating a probable prime of equal size using Algorithm 4.44 with security parameter t = 1 (In practice, one may wish to choose t > 1 in Algorithm 4.44; cf Note 449) The main idea behind Algorithm 4.62 is Fact 459, which is a slight modification of Pocklington’s theorem (Fact 4.40) and Fact 441 4.59 Fact Let n ≥ 3 be an odd integer, and suppose that n = 1 + 2Rq where q is an odd prime Suppose further that q > R. (i) If there exists an integer a satisfying an−1 ≡ 1 (mod n) and gcd(a2R − 1, n) = 1, then n is prime. (ii) If n is prime, the probability that a randomly selected base a, 1 ≤ a ≤ n−1, satisfies an−1 ≡ 1 (mod n) and gcd(a2R − 1, n) = 1 is (1 − 1/q). Algorithm 4.62 recursively generates an odd prime q, and then chooses random integers

R, R < q, until n = 2Rq + 1 can be proven prime using Fact 4.59(i) for some base a By Fact 4.59(ii) the proportion of such bases is 1 − 1/q for prime n On the other hand, if n is composite, then most bases a will fail to satisfy the condition an−1 ≡ 1 (mod n). 4.60 Note (description of constants c and m in Algorithm 462) (i) The optimal value of the constant c defining the trial division bound B = ck 2 in step 2 depends on the implementation of long-integer arithmetic, and is best determined experimentally (cf. Note 445) (ii) The constant m = 20 ensures that I is at least 20 bits long and hence the interval from which R is selected, namely [I + 1, 2I], is sufficiently large (for the values of k of practical interest) that it most likely contains at least one value R for which n = 2Rq + 1 is prime. 4.61 Note (relative size r of q with respect to n in Algorithm 462) The relative size r of q with respect to n is defined to be r = lg q/ lg n. In order to assure that the generated

prime n is chosen randomly with essentially uniform distribution from the set of all k-bit primes, the size of the prime factor q of n − 1 must be chosen according to the probability distribution of the largest prime factor of a randomly selected k-bit integer. Since q must be greater than R in order for Fact 4.59 to apply, the relative size r of q is restricted to being in the interval [ 12 , 1]. It can be deduced from Fact 37(i) that the cumulative probability distribution of the relative size r of the largest prime factor of a large random integer, given that r is at least 1 1 2 , is (1 + lg r) for 2 ≤ r ≤ 1. In step 4 of Algorithm 462, the relative size r is generated according to this distribution by selecting a random number s ∈ [0, 1] and then setting r = 2s−1 . If k ≤ 2m then r is chosen to be the smallest permissible value, namely 12 , in order to ensure that the interval from which R is selected is sufficiently large (cf. Note 460(ii)) c 1997 by CRC Press, Inc.

See accompanying notice at front of chapter §4.4 Prime number generation 153 4.62 Algorithm Maurer’s algorithm for generating provable primes PROVABLE PRIME(k) INPUT: a positive integer k. OUTPUT: a k-bit prime number n. 1. (If k is small, then test random integers by trial division A table of small primes may be precomputed for this purpose.) If k ≤ 20 then repeatedly do the following: 1.1 Select a random k-bit odd integer n √ 1.2 Use trial division by all primes less than n to determine whether n is prime 1.3 If n is prime then return(n) 2. Set c←01 and m←20 (see Note 460) 3. (Trial division bound) Set B←c · k2 (see Note 460) 4. (Generate r, the size of q relative to n see Note 461) If k > 2m then repeatedly do the following: select a random number s in the interval [0, 1], set r←2s−1 , until (k − rk) > m. Otherwise (ie k ≤ 2m), set r←05 5. Compute q←PROVABLE PRIME(br · kc + 1) 6. Set I←b2k−1 /(2q)c 7. success←0 8. While (success = 0) do

the following: 8.1 (select a candidate integer n) Select a random integer R in the interval [I + 1, 2I] and set n←2Rq + 1. 8.2 Use trial division to determine whether n is divisible by any prime number < B If it is not then do the following: Select a random integer a in the interval [2, n − 2]. Compute b←an−1 mod n. If b = 1 then do the following: Compute b←a2R mod n and d← gcd(b − 1, n). If d = 1 then success←1. 9. Return(n) 4.63 Note (improvements to Algorithm 462) (i) A speedup can be achieved by using Fact 4.42 instead of Fact 459(i) for proving n = 2Rq + 1 prime √ in step 8.2 of Maurer’s algorithm Fact 442 only requires that q be greater than 3 n. (ii) If a candidate n passes the trial division (in step 8.2), then a Miller-Rabin test (Algorithm 424) with the single base a = 2 should be performed on n; only if n passes this test should the attempt to prove its primality (the remainder of step 8.2) be undertaken This leads to a faster implementation due to

the efficiency of the Miller-Rabin test with a single base a = 2 (cf. Remark 450) (iii) Step 4 requires the use of real number arithmetic when computing 2s−1 . To avoid these computations, one can precompute and store a list of such values for a selection of random numbers s ∈ [0, 1]. 4.64 Note (provable primes vs probable primes) Probable primes are advantageous over provable primes in that Algorithm 444 for generating probable primes with t = 1 is slightly faster than Maurer’s algorithm. Moreover, the latter requires more run-time memory due Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 154 Ch. 4 Public-Key Parameters to its recursive nature. Provable primes are preferable to probable primes in the sense that the former have zero error probability. In any cryptographic application, however, there is always a non-zero error probability of some catastrophic failure, such as the adversary guessing a secret key or hardware failure. Since the

error probability of probable primes can be efficiently brought down to acceptably low levels (see Note 4.49 but note the dependence on t), there appears to be no reason for mandating the use of provable primes over probable primes. 4.5 Irreducible polynomials over Zp Recall (Definition 2.190) that a polynomial f (x) ∈ Zp [x] of degree m ≥ 1 is said to be irreducible over Zp if it cannot be written as a product of two polynomials in Zp [x] each having degree less than m. Such a polynomial f (x) can be used to represent the elements of the finite field Fpm as Fpm = Zp [x]/(f (x)), the set of all polynomials in Zp [x] of degree less than m where the addition and multiplication of polynomials is performed modulo f (x) (see §2.63) This section presents techniques for constructing irreducible polynomials over Zp , where p is a prime. The characteristic two finite fields F2m are of particular interest for cryptographic applications because the arithmetic in these fields can be

efficiently performed both in software and in hardware. For this reason, additional attention is given to the special case of irreducible polynomials over Z2 . The arithmetic in finite fields can usually be implemented more efficiently if the irreducible polynomial chosen has few non-zero terms. Irreducible trinomials, ie, irreducible polynomials having exactly three non-zero terms, are considered in §4.52 Primitive polynomials, ie, irreducible polynomials f (x) of degree m in Zp [x] for which x is a generator of F∗pm , the multiplicative group of the finite field Fpm = Zp [x]/(f (x)) (Definition 2.228), are the topic of §4.53 Primitive polynomials are also used in the generation of linear feedback shift register sequences having the maximum possible period (Fact 612) 4.51 Irreducible polynomials If f (x) ∈ Zp [x] is irreducible over Zp and a is a non-zero element in Zp , then a·f (x) is also irreducible over Zp . Hence it suffices to restrict attention to monic polynomials in

Zp [x], i.e, polynomials whose leading coefficient is 1 Observe also that if f (x) is an irreducible polynomial, then its constant term must be non-zero. In particular, if f (x) ∈ Z2 [x], then its constant term must be 1. There is a formula for computing exactly the number of monic irreducible polynomials in Zp [x] of a fixed degree. The Möbius function, which is defined next, is used in this formula. 4.65 Definition Let m be a positive integer The Möbius function µ is defined by  if m = 1,  1, 0, if m is divisible by the square of a prime, µ(m) =  (−1)k , if m is the product of k distinct primes. 4.66 Example (Möbius function) The following table gives the values of the Möbius function µ(m) for the first 10 values of m: c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.5 Irreducible polynomials over Zp m µ(m) 1 1 155 2 −1 3 −1 4 0 5 −1 6 1 7 −1 8 0 9 0 10 1  4.67 Fact (number of monic irreducible polynomials)

Let p be a prime and m a positive integer (i) The number Np (m) of monic irreducible polynomials of degree m in Zp [x] is given by the following formula: 1 X Np (m) = µ(d)pm/d , m d|m where the summation ranges over all positive divisors d of m. (ii) The probability of a random monic polynomial of degree m in Zp [x] being irreducible 1 over Zp is roughly m . More specifically, the number Np (m) satisfies 1 1 Np (m) ≈ ≤ . 2m pm m Testing irreducibility of polynomials in Zp [x] is significantly simpler than testing primality of integers. A polynomial can be tested for irreducibility by verifying that it has no irreducible factors of degree ≤ b m 2 c. The following result leads to an efficient method (Algorithm 469) for accomplishing this 4.68 Fact Let p be a prime and let k be a positive integer (i) The product of all monic irreducible polynomials in Zp [x] of degree dividing k is k equal to xp − x. (ii) Let f (x) be a polynomial of degree m in Zp [x]. Then f (x) is irreducible

over Zp if i and only if gcd(f (x), xp − x) = 1 for each i, 1 ≤ i ≤ b m 2 c. 4.69 Algorithm Testing a polynomial for irreducibility INPUT: a prime p and a monic polynomial f (x) of degree m in Zp [x]. OUTPUT: an answer to the question: “Is f (x) irreducible over Zp ?” 1. Set u(x)←x 2. For i from 1 to b m 2 c do the following: 2.1 Compute u(x)←u(x)p mod f (x) using Algorithm 2227 (Note that u(x) is a polynomial in Zp [x] of degree less than m.) 2.2 Compute d(x) = gcd(f (x), u(x) − x) (using Algorithm 2218) 2.3 If d(x) 6= 1 then return(“reducible”) 3. Return(“irreducible”) Fact 4.67 suggests that one method for finding an irreducible polynomial of degree m in Zp [x] is to generate a random monic polynomial of degree m in Zp [x], test it for irreducibility, and continue until an irreducible one is found (Algorithm 4.70) The expected number of polynomials to be tried before an irreducible one is found is approximately m. Handbook of Applied Cryptography by A.

Menezes, P van Oorschot and S Vanstone 156 Ch. 4 Public-Key Parameters 4.70 Algorithm Generating a random monic irreducible polynomial over Zp INPUT: a prime p and a positive integer m. OUTPUT: a monic irreducible polynomial f (x) of degree m in Zp [x]. 1. Repeat the following: 1.1 (Generate a random monic polynomial of degree m in Zp [x]) Randomly select integers a0 , a1 , a2 , . , am−1 between 0 and p − 1 with a0 6= 0. Let f (x) be the polynomial f (x) = xm +am−1 xm−1 +· · ·+a2 x2 +a1 x+a0 1.2 Use Algorithm 469 to test whether f (x) is irreducible over Zp Until f (x) is irreducible. 2. Return(f (x)) It is known that the expected degree of the irreducible factor of least degree of a random polynomial of degree m in Zp [x] is O(lg m). Hence for each choice of f (x), the expected number of times steps 2.1 – 23 of Algorithm 469 are iterated is O(lg m) Each iteration takes O((lg p)m2 ) Zp -operations. These observations, together with Fact 467(ii), determine the

running time for Algorithm 470 4.71 Fact Algorithm 470 has an expected running time of O(m3 (lg m)(lg p)) Zp -operations Given one irreducible polynomial of degree m over Zp , Note 4.74 describes a method, which is more efficient than Algorithm 4.70, for randomly generating additional such polynomials 4.72 Definition Let Fq be a finite field of characteristic p, and let α ∈ Fq A minimum polynomial of α over Zp is a monic polynomial of least degree in Zp [x] having α as a root 4.73 Fact (i) (ii) (iii) (iv) Let Fq be a finite field of order q = pm , and let α ∈ Fq . The minimum polynomial of α over Zp , denoted mα (x), is unique. mα (x) is irreducible over Zp . The degree of mα (x) is a divisor of m. t Let t be the smallest positive integer such that αp = α. (Note that such a t exists pm since, by Fact 2.213, α = α) Then mα (x) = t−1 Y i (x − αp ). (4.1) i=0 4.74 Note (generating new irreducible polynomials from a given one) Suppose that f (y) is a given

irreducible polynomial of degree m over Zp . The finite field Fpm can then be represented as Fpm = Zp [y]/(f (y)) A random monic irreducible polynomial of degree m over Zp can be efficiently generated as follows. First generate a random element α ∈ Fpm and then, by repeated exponentiation by p, determine the smallest positive integer t for which t αp = α. If t < m, then generate a new random element α ∈ Fpm and repeat; the probability that t < m is known to be at most (lg m)/q m/2 If indeed t = m, then compute mα (x) using the formula (4.1) Then mα (x) is a random monic irreducible polynomial of degree m in Zp [x]. This method has an expected running time of O(m3 (lg p)) Zp -operations (compare with Fact 471) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.5 Irreducible polynomials over Zp 157 4.52 Irreducible trinomials If a polynomial f (x) in Z2 [x] has an even number of non-zero terms, then f (1) = 0, whence (x + 1) is a factor of

f (x). Hence, the smallest number of non-zero terms an irreducible polynomial of degree ≥ 2 in Z2 [x] can have is three. An irreducible trinomial of degree m in Z2 [x] must be of the form xm + xk + 1, where 1 ≤ k ≤ m − 1. Choosing an irreducible trinomial f (x) ∈ Z2 [x] of degree m to represent the elements of the finite field F2m = Z2 [x]/(f (x)) can lead to a faster implementation of the field arithmetic. The following facts are sometimes of use when searching for irreducible trinomials. 4.75 Fact (i) (ii) (iii) Let m be a positive integer, and let k denote an integer in the interval [1, m − 1]. If the trinomial xm + xk + 1 is irreducible over Z2 then so is xm + xm−k + 1. If m ≡ 0 (mod 8), there is no irreducible trinomial of degree m in Z2 [x]. Suppose that either m ≡ 3 (mod 8) or m ≡ 5 (mod 8). Then a necessary condition for xm + xk + 1 to be irreducible over Z2 is that either k or m − k must be of the form 2d for some positive divisor d of m. Tables 4.6 and

47 list an irreducible trinomial of degree m over Z2 for each m ≤ 1478 for which such a trinomial exists. 4.53 Primitive polynomials Primitive polynomials were introduced at the beginning of §4.5 Let f (x) ∈ Zp [x] be an irreducible polynomial of degree m. If the factorization of the integer pm −1 is known, then Fact 4.76 yields an efficient algorithm (Algorithm 477) for testing whether or not f (x) is a primitive polynomial. If the factorization of pm − 1 is unknown, there is no efficient algorithm known for performing this test. 4.76 Fact Let p be a prime and let the distinct prime factors of pm − 1 be r1 , r2 , , rt Then an irreducible polynomial f (x) ∈ Zp [x] is primitive if and only if for each i, 1 ≤ i ≤ t: x(p m −1)/ri 6≡ 1 (mod f (x)). (That is, x is an element of order p − 1 in the field Zp [x]/(f (x)).) m 4.77 Algorithm Testing whether an irreducible polynomial is primitive INPUT: a prime p, a positive integer m, the distinct prime factors r1

, r2 , . , rt of pm − 1, and a monic irreducible polynomial f (x) of degree m in Zp [x]. OUTPUT: an answer to the question: “Is f (x) a primitive polynomial?” 1. For i from 1 to t do the following: m 1.1 Compute l(x) = x(p −1)/ri mod f (x) (using Algorithm 2227) 1.2 If l(x) = 1 then return(“not primitive”) 2. Return(“primitive”) There are precisely φ(pm − 1)/m monic primitive polynomials of degree m in Zp [x] (Fact 2.230), where φ is the Euler phi function (Definition 2100) Since the number of monic irreducible polynomials of degree m in Zp [x] is roughly pm /m (Fact 4.67(ii)), it follows that the probability of a random monic irreducible polynomial of degree m in Zp [x] Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 158 Ch. 4 Public-Key Parameters m 2 3 4 5 6 7 9 10 11 12 14 15 17 18 20 21 22 23 25 28 29 30 31 33 34 35 36 39 41 42 44 46 47 49 52 54 55 57 58 60 62 63 65 66 68 71 73 74 76 79 81 84 86 87 89 90 92 k 1 1 1 2 1 1 1

3 2 3 5 1 3 3 3 2 1 5 3 1 2 1 3 10 7 2 9 4 3 7 5 1 5 9 3 9 7 4 19 1 29 1 18 3 9 6 25 35 21 9 4 5 21 13 38 27 21 m 93 94 95 97 98 100 102 103 105 106 108 110 111 113 118 119 121 123 124 126 127 129 130 132 134 135 137 140 142 145 146 147 148 150 151 153 154 155 156 159 161 162 166 167 169 170 172 174 175 177 178 180 182 183 185 186 191 k 2 21 11 6 11 15 29 9 4 15 17 33 10 9 33 8 18 2 19 21 1 5 3 17 57 11 21 15 21 52 71 14 27 53 3 1 15 62 9 31 18 27 37 6 34 11 1 13 6 8 31 3 81 56 24 11 9 m 193 194 196 198 199 201 202 204 207 209 210 212 214 215 217 218 220 223 225 228 231 233 234 236 238 239 241 242 244 247 249 250 252 253 255 257 258 260 263 265 266 268 270 271 273 274 276 278 279 281 282 284 286 287 289 292 294 k 15 87 3 9 34 14 55 27 43 6 7 105 73 23 45 11 7 33 32 113 26 74 31 5 73 36 70 95 111 82 35 103 15 46 52 12 71 15 93 42 47 25 53 58 23 67 63 5 5 93 35 53 69 71 21 37 33 m 295 297 300 302 303 305 308 310 313 314 316 318 319 321 322 324 327 329 330 332 333 337 340 342 343 345

346 348 350 351 353 354 358 359 362 364 366 367 369 370 372 375 377 378 380 382 383 385 386 388 390 391 393 394 396 399 401 k 48 5 5 41 1 102 15 93 79 15 63 45 36 31 67 51 34 50 99 89 2 55 45 125 75 22 63 103 53 34 69 99 57 68 63 9 29 21 91 139 111 16 41 43 47 81 90 6 83 159 9 28 7 135 25 26 152 m 402 404 406 407 409 412 414 415 417 418 420 422 423 425 426 428 431 433 436 438 439 441 444 446 447 449 450 455 457 458 460 462 463 465 468 470 471 473 474 476 478 479 481 484 486 487 489 490 492 494 495 497 498 500 503 505 506 k 171 65 141 71 87 147 13 102 107 199 7 149 25 12 63 105 120 33 165 65 49 7 81 105 73 134 47 38 16 203 19 73 93 31 27 9 1 200 191 9 121 104 138 105 81 94 83 219 7 17 76 78 155 27 3 156 23 m 508 510 511 513 514 516 518 519 521 522 524 526 527 529 532 534 537 538 540 543 545 550 551 553 556 558 559 561 564 566 567 569 570 574 575 577 580 582 583 585 588 590 593 594 596 599 601 602 604 606 607 609 610 612 614 615 617 k 9 69 10 26 67 21 33 79 32 39 167 97 47 42 1 161

94 195 9 16 122 193 135 39 153 73 34 71 163 153 28 77 67 13 146 25 237 85 130 88 35 93 86 19 273 30 201 215 105 165 105 31 127 81 45 211 200 m 618 620 622 623 625 626 628 631 633 634 636 639 641 642 646 647 649 650 651 652 654 655 657 658 660 662 663 665 668 670 671 673 676 679 682 684 686 687 689 690 692 694 695 697 698 700 702 705 708 711 713 714 716 718 719 721 722 k 295 9 297 68 133 251 223 307 101 39 217 16 11 119 249 5 37 3 14 93 33 88 38 55 11 21 107 33 147 153 15 28 31 66 171 209 197 13 14 79 299 169 177 267 215 75 37 17 15 92 41 23 183 165 150 9 231 Table 4.6: Irreducible trinomials xm + xk + 1 over Z2 For each m, 1 ≤ m ≤ 722, for which an irreducible trinomial of degree m in Z2 [x] exists, the table lists the smallest k for which xm + xk + 1 is irreducible over Z2 . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.5 Irreducible polynomials over Zp m 724 726 727 729 730 732 735 737 738 740 742 743 745 746 748 750 751 753 754 756 758 759

761 762 767 769 772 774 775 777 778 780 782 783 785 791 793 794 798 799 801 804 806 807 809 810 812 814 815 817 818 820 822 823 825 826 828 k 207 5 180 58 147 343 44 5 347 135 85 90 258 351 19 309 18 158 19 45 233 98 3 83 168 120 7 185 93 29 375 13 329 68 92 30 253 143 53 25 217 75 21 7 15 159 29 21 333 52 119 123 17 9 38 255 189 m 831 833 834 838 839 841 842 844 845 846 847 849 850 852 855 857 858 860 861 862 865 866 868 870 871 873 876 879 881 882 884 887 889 890 892 894 895 897 898 900 902 903 905 906 908 911 913 916 918 919 921 924 926 927 930 932 935 k 49 149 15 61 54 144 47 105 2 105 136 253 111 159 29 119 207 35 14 349 1 75 145 301 378 352 149 11 78 99 173 147 127 183 31 173 12 113 207 1 21 35 117 123 143 204 91 183 77 36 221 31 365 403 31 177 417 m 937 938 942 943 945 948 951 953 954 956 959 961 964 966 967 969 972 975 977 979 982 983 985 986 988 990 991 993 994 996 998 999 1001 1007 1009 1010 1012 1014 1015 1020 1022 1023 1025 1026 1028 1029 1030 1031 1033 1034 1036 1039

1041 1042 1044 1047 1049 159 k 217 207 45 24 77 189 260 168 131 305 143 18 103 201 36 31 7 19 15 178 177 230 222 3 121 161 39 62 223 65 101 59 17 75 55 99 115 385 186 135 317 7 294 35 119 98 93 68 108 75 411 21 412 439 41 10 141 m 1050 1052 1054 1055 1057 1058 1060 1062 1063 1065 1071 1078 1079 1081 1082 1084 1085 1086 1087 1089 1090 1092 1094 1095 1097 1098 1100 1102 1103 1105 1106 1108 1110 1111 1113 1116 1119 1121 1122 1126 1127 1129 1130 1134 1135 1137 1138 1140 1142 1145 1146 1148 1151 1153 1154 1156 1158 k 159 291 105 24 198 27 439 49 168 463 7 361 230 24 407 189 62 189 112 91 79 23 57 139 14 83 35 117 65 21 195 327 417 13 107 59 283 62 427 105 27 103 551 129 9 277 31 141 357 227 131 23 90 241 75 307 245 m 1159 1161 1164 1166 1167 1169 1170 1174 1175 1177 1178 1180 1182 1183 1185 1186 1188 1190 1191 1193 1196 1198 1199 1201 1202 1204 1206 1207 1209 1210 1212 1214 1215 1217 1218 1220 1223 1225 1226 1228 1230 1231 1233 1234 1236 1238 1239 1241 1242 1246 1247 1249 1252 1255

1257 1260 1263 k 66 365 19 189 133 114 27 133 476 16 375 25 77 87 134 171 75 233 196 173 281 405 114 171 287 43 513 273 118 243 203 257 302 393 91 413 255 234 167 27 433 105 151 427 49 153 4 54 203 25 14 187 97 589 289 21 77 m 1265 1266 1268 1270 1271 1273 1276 1278 1279 1281 1282 1284 1286 1287 1289 1294 1295 1297 1298 1300 1302 1305 1306 1308 1310 1311 1313 1314 1319 1321 1324 1326 1327 1329 1332 1334 1335 1337 1338 1340 1343 1345 1348 1350 1351 1353 1354 1356 1358 1359 1361 1362 1364 1366 1367 1369 1372 k 119 7 345 333 17 168 217 189 216 229 231 223 153 470 99 201 38 198 399 75 77 326 39 495 333 476 164 19 129 52 337 397 277 73 95 617 392 75 315 125 348 553 553 237 39 371 255 131 117 98 56 655 239 1 134 88 181 m 1374 1375 1377 1380 1383 1385 1386 1388 1390 1391 1393 1396 1398 1399 1401 1402 1404 1407 1409 1410 1412 1414 1415 1417 1420 1422 1423 1425 1426 1428 1430 1431 1433 1434 1436 1438 1441 1442 1444 1446 1447 1449 1452 1454 1455 1457 1458 1460 1463 1465 1466 1468 1470 1471

1473 1476 1478 k 609 52 100 183 130 12 219 11 129 3 300 97 601 55 92 127 81 47 194 383 125 429 282 342 33 49 15 28 103 27 33 17 387 363 83 357 322 395 595 421 195 13 315 297 52 314 243 185 575 39 311 181 49 25 77 21 69 Table 4.7: Irreducible trinomials xm +xk +1 over Z2 For each m, 723 ≤ m ≤ 1478, for which an irreducible trinomial of degree m in Z2 [x] exists, the table gives the smallest k for which xm +xk +1 is irreducible over Z2 . Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 160 Ch. 4 Public-Key Parameters being primitive is approximately φ(pm − 1)/pm . Using the lower bound for the Euler phi function (Fact 2.102), this probability can be seen to be at least 1/(6 ln ln pm ) This suggests the following algorithm for generating primitive polynomials 4.78 Algorithm Generating a random monic primitive polynomial over Zp INPUT: a prime p, integer m ≥ 1, and the distinct prime factors r1 , r2 , . , rt of pm − 1 OUTPUT: a monic

primitive polynomial f (x) of degree m in Zp [x]. 1. Repeat the following: 1.1 Use Algorithm 470 to generate a random monic irreducible polynomial f (x) of degree m in Zp [x]. 1.2 Use Algorithm 477 to test whether f (x) is primitive Until f (x) is primitive. 2. Return(f (x)) For each m, 1 ≤ m ≤ 229, Table 4.8 lists a polynomial of degree m that is primitive over Z2 . If there exists a primitive trinomial f (x) = xm + xk + 1, then the trinomial with the smallest k is listed. If no primitive trinomial exists, then a primitive pentanomial of the form f (x) = xm + xk1 + xk2 + xk3 + 1 is listed. If pm − 1 is prime, then Fact 4.76 implies that every irreducible polynomial of degree m in Zp [x] is also primitive Table 49 gives either a primitive trinomial or a primitive pentanomial of degree m over Z2 where m is an exponent of one of the first 27 Mersenne primes (Definition 4.35) 4.6 Generators and elements of high order Recall (Definition 2.169) that if G is a (multiplicative) finite

group, the order of an element a ∈ G is the least positive integer t such that at = 1. If there are n elements in G, and if a ∈ G is an element of order n, then G is said to be cyclic and a is called a generator or a primitive element of G (Definition 2.167) Of special interest for cryptographic applications are the multiplicative group Z∗p of the integers modulo a prime p, and the multiplicative group F∗2m of the finite field F2m of characteristic two; these groups are cyclic (Fact 2.213) Also of interest is the group Z∗n (Definition 2.124), where n is the product of two distinct odd primes. This section deals with the problem of finding generators and other elements of high order in Z∗p , F∗2m , and Z∗n . See §251 for background in group theory and §26 for background in finite fields. Algorithm 4.79 is an efficient method for determining the order of a group element, given the prime factorization of the group order n. The correctness of the algorithm follows from

the fact that the order of an element must divide n (Fact 2.171) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.6 Generators and elements of high order m 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 k or (k1 , k2 , k3 ) 1 1 1 2 1 1 6, 5, 1 4 3 2 7, 4, 3 4, 3, 1 12, 11, 1 1 5, 3, 2 3 7 6, 5, 1 3 2 1 5 4, 3, 1 3 8, 7, 1 8, 7, 1 3 2 16, 15, 1 3 28, 27, 1 13 15, 14, 1 2 11 12, 10, 2 6, 5, 1 4 21, 19, 2 3 23, 22, 1 6, 5, 1 27, 26, 1 4, 3, 1 21, 20, 1 5 28, 27, 1 9 27, 26, 1 16, 15, 1 3 16, 15, 1 37, 36, 1 24 22, 21, 1 7 19 m 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 k or (k1 , k2 , k3 ) 22, 21, 1 1 16, 15, 1 57, 56, 1 1 4, 3, 1 18 10, 9, 1 10, 9, 1 9 29, 27, 2 16, 15, 1 6 53, 47, 6 25 16, 15, 1

11, 10, 1 36, 35, 1 31, 30, 1 20, 19, 1 9 38, 37, 1 4 38, 35, 3 46, 45, 1 13 28, 27, 1 13, 12, 1 13 72, 71, 1 38 19, 18, 1 84, 83, 1 13, 12, 1 2 21 11 49, 47, 2 6 11 47, 45, 2 37 7, 6, 1 77, 76, 1 9 11, 10, 1 16 15 65, 63, 2 31 7, 6, 1 13, 12, 1 10 45, 43, 2 9 82, 81, 1 15, 14, 1 161 m 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 k or (k1 , k2 , k3 ) 71, 70, 1 20, 18, 2 33 8 118, 111, 7 18 60, 59, 1 2 37 108, 107, 1 37, 36, 1 1 29, 27, 2 5 3 48, 47, 1 29 52, 51, 1 57 11 126, 125, 1 21 8, 7, 1 8, 5, 3 29 32, 31, 1 21 21, 20, 1 70, 69, 1 52 60, 59, 1 38, 37, 1 27 110, 109, 1 53 3 66, 65, 1 1 129, 127, 2 32, 31, 1 116, 115, 1 27, 26, 1 27, 26, 1 31 19, 18, 1 18 88, 87, 1 60, 59, 1 14, 13, 1 31, 30, 1 39, 38, 1 6 17, 15, 2 34 23 19, 18, 1 7 m 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188

189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 k or (k1 , k2 , k3 ) 100, 99, 1 13 6 119, 118, 1 8 87 34, 33, 1 37, 36, 1 7, 6, 1 128, 127, 1 56 102, 101, 1 24 23, 22, 1 58, 57, 1 74, 73, 1 127, 126, 1 18, 17, 1 9 28, 27, 1 15 87 10, 9, 1 66, 65, 1 62, 61, 1 65 34 42, 41, 1 14 55 8, 7, 1 74, 73, 1 30, 29, 1 29, 28, 1 43 62, 59, 3 6 35, 32, 3 46, 45, 1 105 8, 7, 1 49, 48, 1 23 196, 195, 1 45 11 19, 18, 1 15, 14, 1 35, 34, 1 92, 91, 1 33 31, 30, 1 32 58, 57, 1 46, 45, 1 148, 147, 1 64, 63, 1 Table 4.8: Primitive polynomials over Z2 For each m, 1 ≤ m ≤ 229, an exponent k is given for which the trinomial xm +xk +1 is primitive over Z2 . If no such trinomial exists, a triple of exponents (k1 , k2 , k3 ) is given for which the pentanomial xm + xk1 + xk2 + xk3 + 1 is primitive over Z2 . Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 162

Ch. 4 Public-Key Parameters j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 m 2 3 5 7 13 17 19 31 61 89 107 127 521 607 1279 2203 2281 3217 4253 4423 9689 9941 11213 19937 21701 23209 44497 k (k1 , k2 , k3 ) 1 1 2 1, 3 none (4,3,1) 3, 5, 6 none (5,2,1) 3, 6, 7, 13 none (43,26,14) 38 none (82,57,31) 1, 7, 15, 30, 63 32, 48, 158, 168 105, 147, 273 216, 418 none (1656,1197,585) 715, 915, 1029 67, 576 none (3297,2254,1093) 271, 369, 370, 649, 1393, 1419, 2098 84, 471, 1836, 2444, 4187 none (7449,4964,2475) none (8218,6181,2304) 881, 7083, 9842 none (15986,11393,5073) 1530, 6619, 9739 8575, 21034 Table 4.9: Primitive polynomials of degree m over Z2 , 2m −1 a Mersenne prime For each exponent m = Mj of the first 27 Mersenne primes, the table lists all values of k, 1 ≤ k ≤ m/2, for which the trinomial xm + xk + 1 is irreducible over Z2 . If no such trinomial exists, a triple of exponents (k1 , k2 , k3 ) is listed such that the pentanomial xm + xk1 + xk2 +

xk3 + 1 is irreducible over Z2 . 4.79 Algorithm Determining the order of a group element INPUT: a (multiplicative) finite group G of order n, an element a ∈ G, and the prime factorization n = pe11 pe22 · · · pekk . OUTPUT: the order t of a. 1. Set t←n 2. For i from 1 to k do the following: 2.1 Set t←t/pei i 2.2 Compute a1 ←at 2.3 While a1 6= 1 do the following: compute a1 ←ap1i and set t←t · pi 3. Return(t) Suppose now that G is a cyclic group of order n. Then for any divisor d of n the number of elements of order d in G is exactly φ(d) (Fact 2.173(ii)), where φ is the Euler phi function (Definition 2.100) In particular, G has exactly φ(n) generators, and hence the probability of a random element in G being a generator is φ(n)/n. Using the lower bound for the Euler phi function (Fact 2102), this probability can be seen to be at least 1/(6 ln ln n) This c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.6 Generators and elements of

high order 163 suggests the following efficient randomized algorithm for finding a generator of a cyclic group. 4.80 Algorithm Finding a generator of a cyclic group INPUT: a cyclic group G of order n, and the prime factorization n = pe11 pe22 · · · pekk . OUTPUT: a generator α of G. 1. Choose a random element α in G 2. For i from 1 to k do the following: 2.1 Compute b←αn/pi 2.2 If b = 1 then go to step 1 3. Return(α) 4.81 Note (group elements of high order) In some situations it may be desirable to have an element of high order, and not a generator Given a generator α in a cyclic group G of order n, and given a divisor d of n, an element β of order d in G can be efficiently obtained as follows: β = αn/d . If q is a prime divisor of the order n of a cyclic group G, then the following method finds an element β ∈ G of order q without first having to find a generator of G: select a random element g ∈ G and compute β = g n/q ; repeat until β 6= 1. 4.82 Note

(generators of F∗2m ) There are two basic approaches to finding a generator of F∗2m Both techniques require the factorization of the order of F∗2m , namely 2m − 1. (i) Generate a monic primitive polynomial f (x) of degree m over Z2 (Algorithm 4.78) The finite field F2m can then be represented as Z2 [x]/(f (x)), the set of all polynomials over Z2 modulo f (x), and the element α = x is a generator. (ii) Select the method for representing elements of F2m first. Then use Algorithm 480 with G = F∗2m and n = 2m − 1 to find a generator α of F∗2m . If n = pq, where p and q are distinct odd primes, then Z∗n is a non-cyclic group of order φ(n) = (p − 1)(q − 1). The maximum order of an element in Z∗n is lcm(p − 1, q − 1) Algorithm 4.83 is a method for generating such an element which requires the factorizations of p − 1 and q − 1. 4.83 Algorithm Selecting an element of maximum order in Z∗n , where n = pq INPUT: two distinct odd primes, p, q, and the

factorizations of p − 1 and q − 1. OUTPUT: an element α of maximum order lcm(p − 1, q − 1) in Z∗n , where n = pq. 1. Use Algorithm 480 with G = Z∗p and n = p − 1 to find a generator a of Z∗p 2. Use Algorithm 480 with G = Z∗q and n = q − 1 to find a generator b of Z∗q 3. Use Gauss’s algorithm (Algorithm 2121) to find an integer α, 1 ≤ α ≤ n − 1, satisfying α ≡ a (mod p) and α ≡ b (mod q). 4. Return(α) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 164 Ch. 4 Public-Key Parameters 4.61 Selecting a prime p and generator of Z∗p In cryptographic applications for which a generator of Z∗p is required, one usually has the flexibility of selecting the prime p. To guard against the Pohlig-Hellman algorithm for computing discrete logarithms (Algorithm 363), a security requirement is that p−1 should con√ tain a “large” prime factor q. In this context, “large” means that the quantity q represents an

infeasible amount of computation; for example, q ≥ 2160 . This suggests the following algorithm for selecting appropriate parameters (p, α). 4.84 Algorithm Selecting a k-bit prime p and a generator α of Z∗p INPUT: the required bitlength k of the prime and a security parameter t. OUTPUT: a k-bit prime p such that p − 1 has a prime factor ≥ t, and a generator α of Z∗p . 1. Repeat the following: 1.1 Select a random k-bit prime p (for example, using Algorithm 444) 1.2 Factor p − 1 Until p − 1 has a prime factor ≥ t. 2. Use Algorithm 480 with G = Z∗p and n = p − 1 to find a generator α of Z∗p 3. Return(p,α) Algorithm 4.84 is relatively inefficient as it requires the use of an integer factorization algorithm in step 1.2 An alternative approach is to generate the prime p by first choosing a large prime q and then selecting relatively small integers R at random until p = 2Rq + 1 is prime. Since p − 1 = 2Rq, the factorization of p − 1 can be obtained by factoring

R A particularly convenient situation occurs by imposing the condition R = 1. In this case the factorization of p − 1 is simply 2q. Furthermore, since φ(p − 1) = φ(2q) = φ(2)φ(q) = 1 q − 1, the probability that a randomly selected element α ∈ Z∗p is a generator is q−1 2q ≈ 2 . 4.85 Definition A safe prime p is a prime of the form p = 2q + 1 where q is prime Algorithm 4.86 generates a safe (probable) prime p and a generator of Z∗p 4.86 Algorithm Selecting a k-bit safe prime p and a generator α of Z∗p INPUT: the required bitlength k of the prime. OUTPUT: a k-bit safe prime p and a generator α of Z∗p . 1. Do the following: 1.1 Select a random (k − 1)-bit prime q (for example, using Algorithm 444) 1.2 Compute p←2q + 1, and test whether p is prime (for example, using trial division by small primes and Algorithm 424) Until p is prime. 2. Use Algorithm 480 to find a generator α of Z∗p 3. Return(p,α) c 1997 by CRC Press, Inc. See accompanying notice at

front of chapter §4.7 Notes and further references 165 4.7 Notes and further references §4.1 Several books provide extensive treatments of primality testing including those by Bressoud [198], Bach and Shallit [70], and Koblitz [697]. The book by Kranakis [710] offers a more theoretical approach. Cohen [263] gives a comprehensive treatment of modern primality tests See also the survey articles by A Lenstra [747] and A Lenstra and H Lenstra [748]. Facts 41 and 42 were proven in 1837 by Dirichlet For proofs of these results, see Chapter 16 of Ireland and Rosen [572]. Fact 43 is due to Rosser and Schoenfeld [1070] Bach and Shallit [70] have further results on the distribution of prime numbers. §4.2 Fact 4.13(i) was proven by Alford, Granville, and Pomerance [24]; see also Granville [521] Fact 4.13(ii) is due to Pomerance, Selfridge, and Wagstaff [996] Pinch [974] showed that there are 105212 Carmichael numbers up to 1015 . The Solovay-Strassen probabilistic primality test

(Algorithm 4.18) is due to Solovay and Strassen [1163], as modified by Atkin and Larson [57]. Fact 4.23 was proven independently by Monier [892] and Rabin [1024] The Miller-Rabin test (Algorithm 4.24) originated in the work of Miller [876] who presented it as a nonprobabilistic polynomial-time algorithm assuming the correctness of the Extended Riemann Hypothesis (ERH). Rabin [1021, 1024] rephrased Miller’s algorithm as a probabilistic primality test Rabin’s algorithm required a small number of gcd computations The MillerRabin test (Algorithm 424) is a simplification of Rabin’s algorithm which does not require any gcd computations, and is due to Knuth [692, p.379] Arazi [55], making use of Montgomery modular multiplication (§1432), showed how the Miller-Rabin test can be implemented by “divisionless modular exponentiations” only, yielding a probabilistic primality test which does not use any division operations. Miller [876], appealing to the work of Ankeny [32], proved under

assumption of the Extended Riemann Hypothesis that, if n is an odd composite integer, then its least strong witness is less than c(ln n)2 , where c is some constant. Bach [63] proved that this constant may be taken to be c = 2; see also Bach [64]. As a consequence, one can test n for primality in O((lg n)5 ) bit operations by executing the Miller-Rabin algorithm for all bases a ≤ 2(ln n)2 . This gives a deterministic polynomial-time algorithm for primality testing, under the assumption that the ERH is true. Table 4.1 is from Jaeschke [630], building on earlier work of Pomerance, Selfridge, and Wagstaff [996]. Arnault [56] found the following 46-digit composite integer n = 1195068768795265792518361315725116351898245581 that is a strong pseudoprime to all the 11 prime bases up to 31. Arnault also found a 337digit composite integer which is a strong pseudoprime to all 46 prime bases up to 199 The Miller-Rabin test (Algorithm 4.24) randomly generates t independent bases a and tests to

see if each is a strong witness for n. Let n be an odd composite integer and let t = d 12 lg ne. In situations where random bits are scarce, one may choose instead to generate a single random base a and use the bases a, a + 1, . , a + t − 1 Bach [66] proved that for a randomly chosen integer a, the probability that a, a + 1, . , a + t − 1 are all strong liars for n is bounded above by n−1/4+o(1) ; in other words, the probability that the MillerRabin algorithm using these bases mistakenly declares an odd composite integer “prime” is at most n−1/4+o(1) . Peralta and Shoup [969] later improved this bound to n−1/2+o(1) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 166 Ch. 4 Public-Key Parameters Monier [892] gave exact formulas for the number of Fermat liars, Euler liars, and strong liars for composite integers. One consequence of Monier’s formulas is the following improvement (in the case where n is not a prime power) of Fact 417

(see Kranakis [710, p.68]) If n ≥ 3 is an odd composite integer having r distinct prime factors, and if n ≡ 3 (mod 4), then there are at most φ(n)/2r−1 Euler liars for n. Another consequence is the following improvement (in the case where n has at least three distinct prime factors) of Fact 4.23 If n ≥ 3 is an odd composite integer having r distinct prime factors, then there are at most φ(n)/2r−1 strong liars for n. Erdös and Pomerance [373] estimated the average number of Fermat liars, Euler liars, and strong liars for composite integers. Fact 430(ii) was proven independently by Atkin and Larson [57], Monier [892], and Pomerance, Selfridge, and Wagstaff [996]. Pinch [975] reviewed the probabilistic primality tests used in the Mathematica, Maple V, Axiom, and Pari/GP computer algebra systems. Some of these systems use a probabilistic primality test known as the Lucas test; a description of this test is provided by Pomerance, Selfridge, and Wagstaff [996]. §4.3 If a

number n is composite, providing a non-trivial divisor of n is evidence of its compositeness that can be verified in polynomial time (by long division). In other words, the decision problem “is n composite?” belongs to the complexity class NP (cf. Example 265) Pratt [1000] used Fact 4.38 to show that this decision problem is also in co-NP That is, if n is prime there exists some evidence of this (called a certificate of primality) that can be verified in polynomial time. Note that the issue here is not in finding such evidence, but rather in determining whether such evidence exists which, if found, allows efficient verification. Pomerance [992] improved Pratt’s results and showed that every prime n has a certificate of primality which requires O(ln n) multiplications modulo n for its verification. Primality of the Fermat number Fk = 22 + 1 can be determined in deterministic polynomial time by Pepin’s test: for k ≥ 2, Fk is prime if and only if 5(Fk −1)/2 ≡ −1 (mod Fk ).

For the history behind Pepin’s test and the Lucas-Lehmer test (Algorithm 4.37), see Bach and Shallit [70]. k In Fact 4.38, the integer a does not have to be the same for all q More precisely, Brillhart and Selfridge [212] showed that Fact 4.38 can be refined as follows: an integer n ≥ 3 is prime if and only if for each prime divisor q of n − 1, there exists an integer aq such that (n−1)/q an−1 ≡ 1 (mod n) and aq 6≡ 1 (mod n). The same is true of Fact 440, which is q due to Pocklington [981]. For a proof of Fact 441, see Maurer [818] Fact 442 is due to Brillhart, Lehmer, and Selfridge [210]; a simplified proof is given by Maurer [818]. The original Jacobi sum test was discovered by Adleman, Pomerance, and Rumely [16]. The algorithm was simplified, both theoretically and algorithmically, by Cohen and H. Lenstra [265]. Cohen and A Lenstra [264] give an implementation report of the CohenLenstra Jacobi sum test; see also Chapter 9 of Cohen [263] Further improvements of the

Jacobi sum test are reported by Bosma and van der Hulst [174]. Elliptic curves were first used for primality proving by Goldwasser and Kilian [477], who presented a randomized algorithm which has an expected running time of O((ln n)11 ) bit operations for most inputs n. Subsequently, Adleman and Huang [13] designed a primality proving algorithm using hyperelliptic curves of genus two whose expected running time is polynomial for all inputs n. This established that the decision problem “is n prime?” is in the complexity class RP (Definition 2.77(ii)) The Goldwasser-Kilian and AdlemanHuang algorithms are inefficient in practice Atkin’s test, and an implementation of it, is extensively described by Atkin and Morain [58]; see also Chapter 9 of Cohen [263]. The c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §4.7 Notes and further references 167 largest number proven prime as of 1996 by a general purpose primality proving algorithm is a 1505-decimal digit

number, accomplished by Morain [903] using Atkin’s test. The total time for the computation was estimated to be 4 years of CPU time distributed among 21 SUN 3/60 workstations. See also Morain [902] for an implementation report on Atkin’s test which was used to prove the primality of the 1065-decimal digit number (23539 + 1)/3. §4.4 A proof of Mertens’s theorem can be found in Hardy and Wright [540]. The optimal trial division bound (Note 4.45) was derived by Maurer [818] The discussion (Note 447) on the probability P (X|Yt ) is from Beauchemin et al. [81]; the result mentioned in the last sentence of this note is due to Kim and Pomerance [673] Fact 448 was derived by Damgård, Landrock, and Pomerance [300], building on earlier work of Erdös and Pomerance [373], Kim and Pomerance [673], and Damgård and Landrock [299]. Table 43 is Table 2 of Damgård, Landrock, and Pomerance [300] The suggestions to first do a Miller-Rabin test with base a = 2 (Remark 4.50) and to do an

incremental search (Note 451) in Algorithm 444 were made by Brandt, Damgård, and Landrock [187]. The error and failure probabilities for incremental search (Note 4.51(i)) were obtained by Brandt and Damgård [186]; consult this paper for more concrete estimates of these probabilities. Algorithm 4.53 for generating strong primes is due to Gordon [514, 513] Gordon originally proposed computing p0 = (sr−1 − rs−1 ) mod rs in step 3. Kaliski (personal communication, April 1996) proposed the modified formula p0 = (2sr−2 mod r)s − 1 which can be computed more efficiently. Williams and Schmid [1249] proposed an algorithm for generating strong primes p with the additional constraint that p − 1 = 2q where q is prime; this algorithm is not as efficient as Gordon’s algorithm. Hellman and Bach [550] recommended an additional constraint on strong primes, specifying that s − 1 (where s is a large prime factor of p + 1) must have a large prime factor (see §15.23(v)); this thwarts

cycling attacks based on Lucas sequences. The NIST method for prime generation (Algorithm 4.56) is that recommended by the NIST Federal Information Processing Standards Publication (FIPS) 186 [406]. Fact 4.59 and Algorithm 462 for provable prime generation are derived from Maurer [818] Algorithm 4.62 is based on that of Shawe-Taylor [1123] Maurer notes that the total diversity of reachable primes using the original version of his algorithm is roughly 10% of all primes. Maurer also presents a more complicated algorithm for generating provable primes with a better diversity than Algorithm 4.62, and provides extensive implementation details and analysis of the expected running time. Maurer [812] provides heuristic justification that Algorithm 4.62 generates primes with virtually uniform distribution Mihailescu [870] observed that Maurer’s algorithm can be improved by using the Eratosthenes sieve method for trial division (in step 8.2 of Algorithm 462) and by searching for a prime n in

an appropriate interval of the arithmetic progression 2q + 1, 4q + 1, 6q + 1, instead of generating R’s at random until n = 2Rq + 1 is prime. The second improvement comes at the expense of a reduction of the set of primes which may be produced by the algorithm. Mihailescu’s paper includes extensive analysis and an implementation report. §4.5 Lidl and Niederreiter [764] provide a comprehensive treatment of irreducible polynomials; proofs of Facts 4.67 and 468 can be found there Algorithm 4.69 for testing a polynomial for irreducibility is due to Ben-Or [109] The fastest algorithm known for generating irreducible polynomials is due to Shoup [1131] and has an expected running time of O(m3 lg m + m2 lg p) Zp -operations. There is no deterministic polynomial-time algorithm known for finding an irreducible polynomial of a specified Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 168 Ch. 4 Public-Key Parameters degree m in Zp [x]. Adleman and

Lenstra [14] give a deterministic algorithm that runs in polynomial time under the assumption that the ERH is true. The best deterministic algo√ rithm known is due to Shoup [1129] and takes O(m4 p) Zp -operations, ignoring powers of log m and log p. Gordon [512] presents an improved method for computing minimum polynomials of elements in F2m . Zierler and Brillhart [1271] provide a table of all irreducible trinomials of degree ≤ 1000 in Z2 [x]. Blake, Gao, and Lambert [146] extended this list to all irreducible trinomials of degree ≤ 2000 in Z2 [x]. Fact 475 is from their paper Table 4.8 extends a similar table by Stahnke [1168] The primitive pentanomials xm + xk1 + xk2 + xk3 + 1 listed in Table 4.8 have the following properties: (i) k1 = k2 + k3 ; (ii) k2 > k3 ; and (iii) k3 is as small as possible, and for this particular value of k3 , k2 is as small as possible. The rational behind this form is explained in Stahnke’s paper For each m < 5000 for which the factorization

of 2m − 1 is known, Živković [1275, 1276] gives a primitive trinomial in Z2 [x], one primitive polynomial in Z2 [x] having five nonzero terms, and one primitive polynomial in Z2 [x] having seven non-zero terms, provided that such polynomials exist. The factorizations of 2m − 1 are known for all m ≤ 510 and for some additional m ≤ 5000. A list of such factorizations can be found in Brillhart et al. [211] and updates of the list are available by anonymous ftp from sableoxacuk in the /pub/math/cunningham/ directory. Hansen and Mullen [538] describe some improvements to Algorithm 4.78 for generating primitive polynomials They also give tables of primitive polynomials of degree m in Zp [x] for each prime power pm ≤ 1050 with p ≤ 97. Moreover, for each such p and m, the primitive polynomial of degree m over Zp listed has the smallest number of non-zero coefficients among all such polynomials. The entries of Table 4.9 were obtained from Zierler [1270] for Mersenne exponents Mj

, 1 ≤ j ≤ 23, and from Kurita and Matsumoto [719] for Mersenne exponents Mj , 24 ≤ j ≤ 27. Let f (x) ∈ Zp [x] be an irreducible polynomial of degree m, and consider the finite field 2 Fpm = Zp [x]/(f (x)). Then f (x) is called a normal polynomial if the set {x, xp , xp , , m−1 xp } forms a basis for Fpm over Zp ; such a basis is called a normal basis. Mullin et al. [911] introduced the concept of an optimal normal basis in order to reduce the hardware complexity of multiplying field elements in the finite field F2m . A VLSI implementation of the arithmetic in F2m which uses optimal normal bases is described by Agnew et al. [18] A normal polynomial which is also primitive is called a primitive normal polynomial. Davenport [301] proved that for any prime p and positive integer m there exists a primitive normal polynomial of degree m in Zp [x]. See also Lenstra and Schoof [760] who generalized this result from prime fields Zp to prime power fields Fq Morgan and Mullen

[905] give a primitive normal polynomial of degree m over Zp for each prime power pm ≤ 1050 with p ≤ 97. Moreover, each polynomial has the smallest number of non-zero coefficients among all primitive normal polynomials of degree m over Zp ; in fact, each polynomial has at most five non-zero terms. §4.6 No polynomial-time algorithm is known for finding generators, or even for testing whether an element is a generator, of a finite field Fq if the factorization of q − 1 is unknown. Shoup [1130] considered the problem of deterministically generating in polynomial time a subset of Fq that contains a generator, and presented a solution to the problem for the case where the characteristic p of Fq is small (e.g p = 2) Maurer [818] discusses how his algorithm (Algorithm 4.62) can be used to generate the parameters (p, α), where p is a provable prime and α is a generator of Z∗p . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter Chapter 5 Pseudorandom Bits

and Sequences Contents in Brief 5.1 5.2 5.3 5.4 5.5 5.6 Introduction . Random bit generation . Pseudorandom bit generation . Statistical tests . Cryptographically secure pseudorandom bit generation Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 171 173 175 185 187 5.1 Introduction The security of many cryptographic systems depends upon the generation of unpredictable quantities. Examples include the keystream in the one-time pad (§154), the secret key in the DES encryption algorithm (§7.42), the primes p, q in the RSA encryption (§82) and digital signature (§11.31) schemes, the private key a in the DSA (§1151), and the challenges used in challenge-response identification systems (§103) In all these cases, the quantities generated must be of sufficient size and be “random” in the sense that the probability of any particular

value being selected must be sufficiently small to preclude an adversary from gaining advantage through optimizing a search strategy based on such probability. For example, the key space for DES has size 256 . If a secret key k were selected using a true random generator, an adversary would on average have to try 255 possible keys before guessing the correct key k. If, on the other hand, a key k were selected by first choosing a 16-bit random secret s, and then expanding it into a 56-bit key k using a complicated but publicly known function f , the adversary would on average only need to try 215 possible keys (obtained by running every possible value for s through the function f ). This chapter considers techniques for the generation of random and pseudorandom bits and numbers. Related techniques for pseudorandom bit generation that are generally discussed in the literature in the context of stream ciphers, including linear and nonlinear feedback shift registers (Chapter 6) and the

output feedback mode (OFB) of block ciphers (Chapter 7), are addressed elsewhere in this book. Chapter outline The remainder of §5.1 introduces basic concepts relevant to random and pseudorandom bit generation. §52 considers techniques for random bit generation, while §53 considers some techniques for pseudorandom bit generation. §54 describes statistical tests designed 169 170 Ch. 5 Pseudorandom Bits and Sequences to measure the quality of a random bit generator. Cryptographically secure pseudorandom bit generators are the topic of §5.5 §56 concludes with references and further chapter notes 5.11 Background and Classification 5.1 Definition A random bit generator is a device or algorithm which outputs a sequence of statistically independent and unbiased binary digits. 5.2 Remark (random bits vs random numbers) A random bit generator can be used to generate (uniformly distributed) random numbers For example, a random integer in the interval [0, n] can be obtained by

generating a random bit sequence of length blg nc + 1, and converting it to an integer; if the resulting integer exceeds n, one option is to discard it and generate a new random bit sequence. §5.2 outlines some physical sources of random bits that are used in practice Ideally, secrets required in cryptographic algorithms and protocols should be generated with a (true) random bit generator. However, the generation of random bits is an inefficient procedure in most practical environments. Moreover, it may be impractical to securely store and transmit a large number of random bits if these are required in applications such as the one-time pad (§6.11) In such situations, the problem can be ameliorated by substituting a random bit generator with a pseudorandom bit generator. 5.3 Definition A pseudorandom bit generator (PRBG) is a deterministic1 algorithm which, given a truly random binary sequence of length k, outputs a binary sequence of length l  k which “appears” to be random. The

input to the PRBG is called the seed, while the output of the PRBG is called a pseudorandom bit sequence. The output of a PRBG is not random; in fact, the number of possible output sequences is at most a small fraction, namely 2k /2l , of all possible binary sequences of length l. The intent is to take a small truly random sequence and expand it to a sequence of much larger length, in such a way that an adversary cannot efficiently distinguish between output sequences of the PRBG and truly random sequences of length l. §53 discusses ad-hoc techniques for pseudorandom bit generation. In order to gain confidence that such generators are secure, they should be subjected to a variety of statistical tests designed to detect the specific characteristics expected of random sequences. A collection of such tests is given in §54 As the following example demonstrates, passing these statistical tests is a necessary but not sufficient condition for a generator to be secure. 5.4 Example (linear

congruential generators) A linear congruential generator produces a pseudorandom sequence of numbers x1 , x2 , x3 , . according to the linear recurrence xn = axn−1 + b mod m, n ≥ 1; integers a, b, and m are parameters which characterize the generator, while x0 is the (secret) seed. While such generators are commonly used for simulation purposes and probabilistic algorithms, and pass the statistical tests of §5.4, they are predictable and hence entirely insecure for cryptographic purposes: given a partial output sequence, the remainder of the sequence can be reconstructed even if the parameters a, b, and m are unknown.  1 Deterministic here means that given the same initial seed, the generator will always produce the same output sequence. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.2 Random bit generation 171 A minimum security requirement for a pseudorandom bit generator is that the length k of the random seed should be sufficiently large

so that a search over 2k elements (the total number of possible seeds) is infeasible for the adversary. Two general requirements are that the output sequences of a PRBG should be statistically indistinguishable from truly random sequences, and the output bits should be unpredictable to an adversary with limited computational resources; these requirements are captured in Definitions 5.5 and 56 5.5 Definition A pseudorandom bit generator is said to pass all polynomial-time2 statistical tests if no polynomial-time algorithm can correctly distinguish between an output sequence of the generator and a truly random sequence of the same length with probability significantly greater that 12 . 5.6 Definition A pseudorandom bit generator is said to pass the next-bit test if there is no polynomial-time algorithm which, on input of the first l bits of an output sequence s, can predict the (l + 1)st bit of s with probability significantly greater than 12 . Although Definition 5.5 appears to impose a

more stringent security requirement on pseudorandom bit generators than Definition 5.6 does, the next result asserts that they are, in fact, equivalent. 5.7 Fact (universality of the next-bit test) A pseudorandom bit generator passes the next-bit test if and only if it passes all polynomial-time statistical tests. 5.8 Definition A PRBG that passes the next-bit test (possibly under some plausible but unproved mathematical assumption such as the intractability of factoring integers) is called a cryptographically secure pseudorandom bit generator (CSPRBG). 5.9 Remark (asymptotic nature of Definitions 55, 56, and 58) Each of the three definitions above are given in complexity-theoretic terms and are asymptotic in nature because the notion of “polynomial-time” is meaningful for asymptotically large inputs only; the resulting notions of security are relative in the same sense. To be more precise in Definitions 55, 56, 5.8, and Fact 57, a pseudorandom bit generator is actually a family of

such PRBGs Thus the theoretical security results for a family of PRBGs are only an indirect indication about the security of individual members. Two cryptographically secure pseudorandom bit generators are presented in §5.5 5.2 Random bit generation A (true) random bit generator requires a naturally occurring source of randomness. Designing a hardware device or software program to exploit this randomness and produce a bit sequence that is free of biases and correlations is a difficult task. Additionally, for most cryptographic applications, the generator must not be subject to observation or manipulation by an adversary. This section surveys some potential sources of random bits Random bit generators based on natural sources of randomness are subject to influence by external factors, and also to malfunction. It is imperative that such devices be tested periodically, for example by using the statistical tests of §5.4 2 The running time of the test is bounded by a polynomial in the

length l of the output sequence. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 172 Ch. 5 Pseudorandom Bits and Sequences (i) Hardware-based generators Hardware-based random bit generators exploit the randomness which occurs in some physical phenomena. Such physical processes may produce bits that are biased or correlated, in which case they should be subjected to de-skewing techniques mentioned in (iii) below. Examples of such physical phenomena include: 1. elapsed time between emission of particles during radioactive decay; 2. thermal noise from a semiconductor diode or resistor; 3. the frequency instability of a free running oscillator; 4. the amount a metal insulator semiconductor capacitor is charged during a fixed period of time; 5. air turbulence within a sealed disk drive which causes random fluctuations in disk drive sector read latency times; and 6. sound from a microphone or video input from a camera Generators based on the first two

phenomena would, in general, have to be built externally to the device using the random bits, and hence may be subject to observation or manipulation by an adversary. Generators based on oscillators and capacitors can be built on VLSI devices; they can be enclosed in tamper-resistant hardware, and hence shielded from active adversaries. (ii) Software-based generators Designing a random bit generator in software is even more difficult than doing so in hardware. Processes upon which software random bit generators may be based include: 1. the system clock; 2. elapsed time between keystrokes or mouse movement; 3. content of input/output buffers; 4. user input; and 5. operating system values such as system load and network statistics The behavior of such processes can vary considerably depending on various factors, such as the computer platform. It may also be difficult to prevent an adversary from observing or manipulating these processes. For instance, if the adversary has a rough idea of

when a random sequence was generated, she can guess the content of the system clock at that time with a high degree of accuracy. A well-designed software random bit generator should utilize as many good sources of randomness as are available. Using many sources guards against the possibility of a few of the sources failing, or being observed or manipulated by an adversary. Each source should be sampled, and the sampled sequences should be combined using a complex mixing function; one recommended technique for accomplishing this is to apply a cryptographic hash function such as SHA-1 (Algorithm 9.53) or MD5 (Algorithm 951) to a concatenation of the sampled sequences. The purpose of the mixing function is to distill the (true) random bits from the sampled sequences. (iii) De-skewing A natural source of random bits may be defective in that the output bits may be biased (the probability of the source emitting a 1 is not equal to 12 ) or correlated (the probability of the source emitting a

1 depends on previous bits emitted). There are various techniques for generating truly random bit sequences from the output bits of such a defective generator; such techniques are called de-skewing techniques. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.3 Pseudorandom bit generation 173 5.10 Example (removing biases in output bits) Suppose that a generator produces biased but uncorrelated bits. Suppose that the probability of a 1 is p, and the probability of a 0 is 1 − p, where p is unknown but fixed, 0 < p < 1. If the output sequence of such a generator is grouped into pairs of bits, with a 10 pair transformed to a 1, a 01 pair transformed to a 0, and 00 and 11 pairs discarded, then the resulting sequence is both unbiased and uncorrelated.  A practical (although not provable) de-skewing technique is to pass sequences whose bits are biased or correlated through a cryptographic hash function such as SHA-1 or MD5. 5.3 Pseudorandom bit

generation A one-way function f (Definition 1.12) can be utilized to generate pseudorandom bit sequences (Definition 53) by first selecting a random seed s, and then applying the function to the sequence of values s, s+1, s+2, . ; the output sequence is f (s), f (s+1), f (s+2), Depending on the properties of the one-way function used, it may be necessary to only keep a few bits of the output values f (s + i) in order to remove possible correlations between successive values. Examples of suitable one-way functions f include a cryptographic hash function such as SHA-1 (Algorithm 9.53), or a block cipher such as DES (§74) with secret key k. Although such ad-hoc methods have not been proven to be cryptographically secure, they appear sufficient for most applications. Two such methods for pseudorandom bit and number generation which have been standardized are presented in §5.31 and §532 Techniques for the cryptographically secure generation of pseudorandom bits are given in §55

5.31 ANSI X917 generator Algorithm 5.11 is a US Federal Information Processing Standard (FIPS) approved method from the ANSI X9.17 standard for the purpose of pseudorandomly generating keys and initialization vectors for use with DES. Ek denotes DES E-D-E two-key triple-encryption (Definition 7.32) under a key k; the key k should be reserved exclusively for use in this algorithm. 5.11 Algorithm ANSI X917 pseudorandom bit generator INPUT: a random (and secret) 64-bit seed s, integer m, and DES E-D-E encryption key k. OUTPUT: m pseudorandom 64-bit strings x1 , x2 , . , xm 1. Compute the intermediate value I = Ek (D), where D is a 64-bit representation of the date/time to as fine a resolution as is available. 2. For i from 1 to m do the following: 2.1 xi ←Ek (I ⊕ s) 2.2 s←Ek (xi ⊕ I) 3. Return(x1 , x2 , , xm ) Each output bitstring xi may be used as an initialization vector (IV) for one of the DES modes of operation (§7.22) To obtain a DES key from xi , every eighth bit of

xi should be reset to odd parity (cf. §742) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 174 Ch. 5 Pseudorandom Bits and Sequences 5.32 FIPS 186 generator The algorithms presented in this subsection are FIPS-approved methods for pseudorandomly generating the secret parameters for the DSA (§11.51) Algorithm 512 generates DSA private keys a, while Algorithm 5.14 generates the per-message secrets k to be used in signing messages Both algorithms use a secret seed s which should be randomly generated, and utilize a one-way function constructed by using either SHA-1 (Algorithm 9.53) or DES (Algorithm 782), respectively described in Algorithms 515 and 516 5.12 Algorithm FIPS 186 pseudorandom number generator for DSA private keys INPUT: an integer m and a 160-bit prime number q. OUTPUT: m pseudorandom numbers a1 , a2 , . , am in the interval [0, q − 1] which may be used as DSA private keys. 1. If Algorithm 515 is to be used in step 43 then select

an arbitrary integer b, 160 ≤ b ≤ 512; if Algorithm 5.16 is to be used then set b←160 2. Generate a random (and secret) b-bit seed s 3. Define the 160-bit string t = 67452301 efcdab89 98badcfe 10325476 c3d2e1f0 (in hexadecimal). 4. For i from 1 to m do the following: 4.1 (optional user input) Either select a b-bit string yi , or set yi ←0 4.2 zi ←(s + yi ) mod 2b 4.3 ai ←G(t, zi ) mod q (G is either that defined in Algorithm 515 or 516) 4.4 s←(1 + s + ai ) mod 2b 5. Return(a1 , a2 , , am ) 5.13 Note (optional user input) Algorithm 512 permits a user to augment the seed s with random or pseudorandom strings derived from alternate sources The user may desire to do this if she does not trust the quality or integrity of the random bit generator which may be built into a cryptographic module implementing the algorithm. 5.14 Algorithm FIPS 186 pseudorandom number generator for DSA per-message secrets INPUT: an integer m and a 160-bit prime number q. OUTPUT: m pseudorandom

numbers k1 , k2 , . , km in the interval [0, q − 1] which may be used as the per-message secret numbers k in the DSA. 1. If Algorithm 515 is to be used in step 41 then select an integer b, 160 ≤ b ≤ 512; if Algorithm 5.16 is to be used then set b←160 2. Generate a random (and secret) b-bit seed s 3. Define the 160-bit string t = efcdab89 98badcfe 10325476 c3d2e1f0 67452301 (in hexadecimal). 4. For i from 1 to m do the following: 4.1 ki ←G(t, s) mod q (G is either that defined in Algorithm 515 or 516) 4.2 s←(1 + s + ki ) mod 2b 5. Return(k1 , k2 , , km ) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.4 Statistical tests 175 5.15 Algorithm FIPS 186 one-way function using SHA-1 INPUT: a 160-bit string t and a b-bit string c, 160 ≤ b ≤ 512. OUTPUT: a 160-bit string denoted G(t, c). 1. Break up t into five 32-bit blocks: t = H1 kH2 kH3 kH4 kH5 2. Pad c with 0’s to obtain a 512-bit message block: X←ck0512−b 3. Divide X into 16

32-bit words: x0 x1 x15 , and set m←1 4. Execute step 4 of SHA-1 (Algorithm 953) (This alters the Hi ’s) 5. The output is the concatenation: G(t, c) = H1 kH2 kH3 kH4 kH5 5.16 Algorithm FIPS 186 one-way function using DES INPUT: two 160-bit strings t and c. OUTPUT: a 160-bit string denoted G(t, c). 1. Break up t into five 32-bit blocks: t = t0 kt1 kt2 kt3 kt4 2. Break up c into five 32-bit blocks: c = c0 kc1 kc2 kc3 kc4 3. For i from 0 to 4 do the following: xi ←ti ⊕ ci 4. For i from 0 to 4 do the following: 4.1 b1 ←c(i+4)mod5 , b2 ←c(i+3)mod5 4.2 a1 ←xi , a2 ←x(i+1)mod5 ⊕ x(i+4)mod5 4.3 A←a1 ka2 , B←b01 kb2 , where b01 denotes the 24 least significant bits of b1 4.4 Use DES with key B to encrypt A: yi ←DESB (A) 4.5 Break up yi into two 32-bit blocks: yi = Li kRi 5. For i from 0 to 4 do the following: zi ←Li ⊕ R(i+2)mod5 ⊕ L(i+3)mod5 6. The output is the concatenation: G(t, c) = z0 kz1 kz2 kz3 kz4 5.4 Statistical tests This section presents

some tests designed to measure the quality of a generator purported to be a random bit generator (Definition 5.1) While it is impossible to give a mathematical proof that a generator is indeed a random bit generator, the tests described here help detect certain kinds of weaknesses the generator may have. This is accomplished by taking a sample output sequence of the generator and subjecting it to various statistical tests Each statistical test determines whether the sequence possesses a certain attribute that a truly random sequence would be likely to exhibit; the conclusion of each test is not definite, but rather probabilistic. An example of such an attribute is that the sequence should have roughly the same number of 0’s as 1’s. If the sequence is deemed to have failed any one of the statistical tests, the generator may be rejected as being non-random; alternatively, the generator may be subjected to further testing. On the other hand, if the sequence passes all of the

statistical tests, the generator is accepted as being random More precisely, the term “accepted” should be replaced by “not rejected”, since passing the tests merely provides probabilistic evidence that the generator produces sequences which have certain characteristics of random sequences. §5.41 and §542 provide some relevant background in statistics §543 establishes some notation and lists Golomb’s randomness postulates. Specific statistical tests for randomness are described in §544 and §545 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 176 Ch. 5 Pseudorandom Bits and Sequences 5.41 The normal and chi-square distributions The normal and χ2 distributions are widely used in statistical applications. 5.17 Definition If the result X of an experiment can be any real number, then X is said to be a continuous random variable. 5.18 Definition A probability density function of a continuous random variable X is a function f (x) which can

be integrated and satisfies: (i) fR (x) ≥ 0 for all x ∈ R; ∞ (ii) −∞ f (x) dx = 1; and Rb (iii) for all a, b ∈ R, P (a < X ≤ b) = a f (x) dx. (i) The normal distribution The normal distribution arises in practice when a large number of independent random variables having the same mean and variance are summed. 5.19 Definition A (continuous) random variable X has a normal distribution with mean µ and variance σ2 if its probability density function is defined by   1 −(x − µ)2 f (x) = √ exp , −∞ < x < ∞. 2σ2 σ 2π Notation: X is said to be N (µ, σ2 ). If X is N (0, 1), then X is said to have a standard normal distribution. A graph of the N (0, 1) distribution is given in Figure 5.1 The graph is symmetric 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2 x 3 Figure 5.1: The normal distribution N(0, 1) about the vertical axis, and hence P (X > x) = P (X < −x) for any x. Table 51 gives some percentiles for the

standard normal distribution. For example, the entry (α = 005, x = 1.6449) means that if X is N (0, 1), then X exceeds 16449 about 5% of the time Fact 5.20 can be used to reduce questions about a normal distribution to questions about the standard normal distribution. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.4 Statistical tests α x 177 0.1 1.2816 0.05 1.6449 0.025 1.9600 0.01 2.3263 0.005 2.5758 0.0025 2.8070 0.001 3.0902 0.0005 3.2905 Table 5.1: Selected percentiles of the standard normal distribution If X is a random variable having a standard normal distribution, then P (X > x) = α. 5.20 Fact If the random variable X is N (µ, σ2 ), then the random variable Z = (X − µ)/σ is N (0, 1). (ii) The χ2 distribution The χ2 distribution can be used to compare the goodness-of-fit of the observed frequencies of events to their expected frequencies under a hypothesized distribution. The χ2 distribution with v degrees of freedom

arises in practice when the squares of v independent random variables having standard normal distributions are summed. 5.21 Definition Let v ≥ 1 be an integer A (continuous) random variable X has a χ2 (chi-square) distribution with v degrees of freedom if its probability density function is defined by  1  x(v/2)−1 e−x/2 , 0 ≤ x < ∞, f (x) = Γ(v/2)2v/2  0, x < 0, where Γ is the gamma function.3 The mean and variance of this distribution are µ = v, and σ2 = 2v. A graph of the χ2 distribution with v = 7 degrees of freedom is given in Figure 5.2 Table 5.2 gives some percentiles of the χ2 distribution for various degrees of freedom For 0.12 0.1 0.08 f(x) 0.06 0.04 0.02 0 0 5 10 15 x 20 Figure 5.2: The χ2 (chi-square) distribution with v = 7 degrees of freedom example, the entry in row v = 5 and column α = 0.05 is x = 110705; this means that if X has a χ2 distribution with 5 degrees of freedom, then X exceeds 11.0705 about 5% of the time. 3

The gamma function is defined by Γ(t) = R∞ 0 xt−1 e−x dx, for t > 0. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 178 Ch. 5 Pseudorandom Bits and Sequences α v 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 63 127 255 511 1023 0.100 2.7055 4.6052 6.2514 7.7794 9.2364 10.6446 12.0170 13.3616 14.6837 15.9872 17.2750 18.5493 19.8119 21.0641 22.3071 23.5418 24.7690 25.9894 27.2036 28.4120 29.6151 30.8133 32.0069 33.1962 34.3816 35.5632 36.7412 37.9159 39.0875 40.2560 41.4217 77.7454 147.8048 284.3359 552.3739 1081.3794 0.050 3.8415 5.9915 7.8147 9.4877 11.0705 12.5916 14.0671 15.5073 16.9190 18.3070 19.6751 21.0261 22.3620 23.6848 24.9958 26.2962 27.5871 28.8693 30.1435 31.4104 32.6706 33.9244 35.1725 36.4150 37.6525 38.8851 40.1133 41.3371 42.5570 43.7730 44.9853 82.5287 154.3015 293.2478 564.6961 1098.5208 0.025 5.0239 7.3778 9.3484 11.1433 12.8325 14.4494 16.0128 17.5345 19.0228 20.4832 21.9200

23.3367 24.7356 26.1189 27.4884 28.8454 30.1910 31.5264 32.8523 34.1696 35.4789 36.7807 38.0756 39.3641 40.6465 41.9232 43.1945 44.4608 45.7223 46.9792 48.2319 86.8296 160.0858 301.1250 575.5298 1113.5334 0.010 6.6349 9.2103 11.3449 13.2767 15.0863 16.8119 18.4753 20.0902 21.6660 23.2093 24.7250 26.2170 27.6882 29.1412 30.5779 31.9999 33.4087 34.8053 36.1909 37.5662 38.9322 40.2894 41.6384 42.9798 44.3141 45.6417 46.9629 48.2782 49.5879 50.8922 52.1914 92.0100 166.9874 310.4574 588.2978 1131.1587 0.005 7.8794 10.5966 12.8382 14.8603 16.7496 18.5476 20.2777 21.9550 23.5894 25.1882 26.7568 28.2995 29.8195 31.3193 32.8013 34.2672 35.7185 37.1565 38.5823 39.9968 41.4011 42.7957 44.1813 45.5585 46.9279 48.2899 49.6449 50.9934 52.3356 53.6720 55.0027 95.6493 171.7961 316.9194 597.0978 1143.2653 0.001 10.8276 13.8155 16.2662 18.4668 20.5150 22.4577 24.3219 26.1245 27.8772 29.5883 31.2641 32.9095 34.5282 36.1233 37.6973 39.2524 40.7902 42.3124 43.8202 45.3147 46.7970 48.2679 49.7282 51.1786

52.6197 54.0520 55.4760 56.8923 58.3012 59.7031 61.0983 103.4424 181.9930 330.5197 615.5149 1168.4972 Table 5.2: Selected percentiles of the χ2 (chi-square) distribution A (v, α)-entry of x in the table has the following meaning: if X is a random variable having a χ2 distribution with v degrees of freedom, then P (X > x) = α. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.4 Statistical tests 179 Fact 5.22 relates the normal distribution to the χ2 distribution 5.22 Fact If the random variable X is N (µ, σ2 ), σ2 > 0, then the random variable Z = (X − µ)2 /σ2 has a χ2 distribution with 1 degree of freedom. In particular, if X is N (0, 1), then Z = X 2 has a χ2 distribution with 1 degree of freedom. 5.42 Hypothesis testing A statistical hypothesis, denoted H0 , is an assertion about a distribution of one or more random variables. A test of a statistical hypothesis is a procedure, based upon observed values of the random variables,

that leads to the acceptance or rejection of the hypothesis H0 . The test only provides a measure of the strength of the evidence provided by the data against the hypothesis; hence, the conclusion of the test is not definite, but rather probabilistic. 5.23 Definition The significance level α of the test of a statistical hypothesis H0 is the probability of rejecting H0 when it is true In this section, H0 will be the hypothesis that a given binary sequence was produced by a random bit generator. If the significance level α of a test of H0 is too high, then the test may reject sequences that were, in fact, produced by a random bit generator (such an error is called a Type I error). On the other hand, if the significance level of a test of H0 is too low, then there is the danger that the test may accept sequences even though they were not produced by a random bit generator (such an error is called a Type II error).4 It is, therefore, important that the test be carefully designed to have

a significance level that is appropriate for the purpose at hand; a significance level α between 0.001 and 005 might be employed in practice. A statistical test is implemented by specifying a statistic on the random sample.5 Statistics are generally chosen so that they can be efficiently computed, and so that they (approximately) follow an N (0, 1) or a χ2 distribution (see §541) The value of the statistic for the sample output sequence is computed and compared with the value expected for a random sequence as described below. 1. Suppose that a statistic X for a random sequence follows a χ2 distribution with v degrees of freedom, and suppose that the statistic can be expected to take on larger values for nonrandom sequences. To achieve a significance level of α, a threshold value xα is chosen (using Table 5.2) so that P (X > xα ) = α If the value Xs of the statistic for the sample output sequence satisfies Xs > xα , then the sequence fails the test; otherwise, it passes

the test. Such a test is called a one-sided test For example, if v = 5 and α = 0.025, then xα = 128325, and one expects a random sequence to fail the test only 2.5% of the time 2. Suppose that a statistic X for a random sequence follows an N (0, 1) distribution, and suppose that the statistic can be expected to take on both larger and smaller values for nonrandom sequences. To achieve a significance level of α, a threshold value xα is chosen (using Table 5.1) so that P (X > xα ) = P (X < −xα ) = α/2 If the value 4 Actually, the probability β of a Type II error may be completely independent of α. If the generator is not a random bit generator, the probability β depends on the nature of the defects of the generator, and is usually difficult to determine in practice. For this reason, assuming that the probability of a Type II error is proportional to α is a useful intuitive guide when selecting an appropriate significance level for a test. 5 A statistic is a function of

the elements of a random sample; for example, the number of 0’s in a binary sequence is a statistic. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 180 Ch. 5 Pseudorandom Bits and Sequences Xs of the statistic for the sample output sequence satisfies Xs > xα or Xs < −xα , then the sequence fails the test; otherwise, it passes the test. Such a test is called a two-sided test. For example, if α = 005, then xα = 196, and one expects a random sequence to fail the test only 5% of the time. 5.43 Golomb’s randomness postulates Golomb’s randomness postulates (Definition 5.28) are presented here for historical reasons – they were one of the first attempts to establish some necessary conditions for a periodic pseudorandom sequence to look random. It is emphasized that these conditions are far from being sufficient for such sequences to be considered random. Unless otherwise stated, all sequences are binary sequences. 5.24 Definition Let

s = s0 , s1 , s2 , be an infinite sequence The subsequence consisting of the first n terms of s is denoted by sn = s0 , s1 , . , sn−1 5.25 Definition The sequence s = s0 , s1 , s2 , is said to be N -periodic if si = si+N for all i ≥ 0. The sequence s is periodic if it is N -periodic for some positive integer N The period of a periodic sequence s is the smallest positive integer N for which s is N -periodic. If s is a periodic sequence of period N , then the cycle of s is the subsequence sN . 5.26 Definition Let s be a sequence A run of s is a subsequence of s consisting of consecutive 0’s or consecutive 1’s which is neither preceded nor succeeded by the same symbol. A run of 0’s is called a gap, while a run of 1’s is called a block. 5.27 Definition Let s = s0 , s1 , s2 , be a periodic sequence of period N The autocorrelation function of s is the integer-valued function C(t) defined as C(t) = N −1 1 X (2si − 1) · (2si+t − 1), N i=0 for 0 ≤ t ≤ N

− 1. The autocorrelation function C(t) measures the amount of similarity between the sequence s and a shift of s by t positions. If s is a random periodic sequence of period N , then |N · C(t)| can be expected to be quite small for all values of t, 0 < t < N . 5.28 Definition Let s be a periodic sequence of period N Golomb’s randomness postulates are the following. R1: In the cycle sN of s, the number of 1’s differs from the number of 0’s by at most 1. R2: In the cycle sN , at least half the runs have length 1, at least one-fourth have length 2, at least one-eighth have length 3, etc., as long as the number of runs so indicated exceeds 1. Moreover, for each of these lengths, there are (almost) equally many gaps and blocks.6 R3: The autocorrelation function C(t) is two-valued. That is for some integer K,  N −1 X N, if t = 0, N · C(t) = (2si − 1) · (2si+t − 1) = K, if 1 ≤ t ≤ N − 1. i=0 6 Postulate R2 implies postulate R1. c 1997 by CRC Press, Inc. See

accompanying notice at front of chapter §5.4 Statistical tests 181 5.29 Definition A binary sequence which satisfies Golomb’s randomness postulates is called a pseudo-noise sequence or a pn-sequence. Pseudo-noise sequences arise in practice as output sequences of maximum-length linear feedback shift registers (cf. Fact 614) 5.30 Example (pn-sequence) Consider the periodic sequence s of period N = 15 with cycle s15 = 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1. The following shows that the sequence s satisfies Golomb’s randomness postulates. R1: The number of 0’s in s15 is 7, while the number of 1’s is 8. R2: s15 has 8 runs. There are 4 runs of length 1 (2 gaps and 2 blocks), 2 runs of length 2 (1 gap and 1 block), 1 run of length 3 (1 gap), and 1 run of length 4 (1 block). R3: The autocorrelation function C(t) takes on two values: C(0) = 1 and C(t) = −1 15 for 1 ≤ t ≤ 14. Hence, s is a pn-sequence.  5.44 Five basic tests Let s = s0 , s1 , s2 , . , sn−1 be a

binary sequence of length n This subsection presents five statistical tests that are commonly used for determining whether the binary sequence s possesses some specific characteristics that a truly random sequence would be likely to exhibit. It is emphasized again that the outcome of each test is not definite, but rather probabilistic If a sequence passes all five tests, there is no guarantee that it was indeed produced by a random bit generator (cf. Example 54) (i) Frequency test (monobit test) The purpose of this test is to determine whether the number of 0’s and 1’s in s are approximately the same, as would be expected for a random sequence. Let n0 , n1 denote the number of 0’s and 1’s in s, respectively The statistic used is X1 = (n0 − n1 )2 n (5.1) which approximately follows a χ2 distribution with 1 degree of freedom if n ≥ 10. 7 (ii) Serial test (two-bit test) The purpose of this test is to determine whether the number of occurrences of 00, 01, 10, and 11 as

subsequences of s are approximately the same, as would be expected for a random sequence. Let n0 , n1 denote the number of 0’s and 1’s in s, respectively, and let n00 , n01 , n10 , n11 denote the number of occurrences of 00, 01, 10, 11 in s, respectively. Note that n00 + n01 + n10 + n11 = (n − 1) since the subsequences are allowed to overlap. The statistic used is  2 2  4 X2 = n200 + n201 + n210 + n211 − n0 + n21 + 1 (5.2) n−1 n which approximately follows a χ2 distribution with 2 degrees of freedom if n ≥ 21. 7 In practice, it is recommended that the length n of the sample output sequence be much larger (for example, n  10000) than the minimum specified for each test in this subsection. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 182 Ch. 5 Pseudorandom Bits and Sequences (iii) Poker test n n Let m be a positive integer such that b m c ≥ 5·(2m ), and let k = b m c. Divide the sequence s into k non-overlapping parts each of

length m, and let ni be the number of occurrences of the ith type of sequence of length m, 1 ≤ i ≤ 2m . The poker test determines whether the sequences of length m each appear approximately the same number of times in s, as would be expected for a random sequence. The statistic used is ! 2m 2m X 2 X3 = ni − k (5.3) k i=1 which approximately follows a χ2 distribution with 2m − 1 degrees of freedom. Note that the poker test is a generalization of the frequency test: setting m = 1 in the poker test yields the frequency test. (iv) Runs test The purpose of the runs test is to determine whether the number of runs (of either zeros or ones; see Definition 5.26) of various lengths in the sequence s is as expected for a random sequence. The expected number of gaps (or blocks) of length i in a random sequence of length n is ei = (n−i+3)/2i+2. Let k be equal to the largest integer i for which ei ≥ 5 Let Bi , Gi be the number of blocks and gaps, respectively, of length i in s for each

i, 1 ≤ i ≤ k. The statistic used is X4 = k X (Bi − ei )2 i=1 ei + k X (Gi − ei )2 i=1 ei (5.4) which approximately follows a χ2 distribution with 2k − 2 degrees of freedom. (v) Autocorrelation test The purpose of this test is to check for correlations between the sequence s and (non-cyclic) shifted versions of it. Let d be a fixed integer, 1 ≤ d ≤ bn/2c The number of bits in s not Pn−d−1 equal to their d-shifts is A(d) = i=0 si ⊕si+d , where ⊕ denotes the XOR operator. The statistic used is   √ n−d X5 = 2 A(d) − / n−d (5.5) 2 which approximately follows an N (0, 1) distribution if n − d ≥ 10. Since small values of A(d) are as unexpected as large values of A(d), a two-sided test should be used. 5.31 Example (basic statistical tests) Consider the (non-random) sequence s of length n = 160 obtained by replicating the following sequence four times: 11100 01100 01000 10100 11101 11100 10010 01001. (i) (frequency test) n0 = 84, n1 = 76, and the

value of the statistic X1 is 0.4 (ii) (serial test) n00 = 44, n01 = 40, n10 = 40, n11 = 35, and the value of the statistic X2 is 0.6252 (iii) (poker test) Here m = 3 and k = 53. The blocks 000, 001, 010, 011, 100, 101, 110, 111 appear 5, 10, 6, 4, 12, 3, 6, and 7 times, respectively, and the value of the statistic X3 is 9.6415 (iv) (runs test) Here e1 = 20.25, e2 = 100625, e3 = 5, and k = 3 There are 25, 4, 5 blocks of lengths 1, 2, 3, respectively, and 8, 20, 12 gaps of lengths 1, 2, 3, respectively. The value of the statistic X4 is 317913 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.4 Statistical tests 183 (v) (autocorrelation test) If d = 8, then A(8) = 100. The value of the statistic X5 is 3.8933 For a significance level of α = 0.05, the threshold values for X1 , X2 , X3 , X4 , and X5 are 3.8415, 59915, 140671, 94877, and 196, respectively (see Tables 51 and 52) Hence, the given sequence s passes the frequency, serial, and poker tests, but fails

the runs and autocorrelation tests.  5.32 Note (FIPS 140-1 statistical tests for randomness) FIPS 140-1 specifies four statistical tests for randomness. Instead of making the user select appropriate significance levels for these tests, explicit bounds are provided that the computed value of a statistic must satisfy. A single bitstring s of length 20000 bits, output from a generator, is subjected to each of the following tests. If any of the tests fail, then the generator fails the test (i) monobit test. The number n1 of 1’s in s should satisfy 9654 < n1 < 10346 (ii) poker test. The statistic X3 defined by equation (53) is computed for m = 4 The poker test is passed if 1.03 < X3 < 574 (iii) runs test. The number Bi and Gi of blocks and gaps, respectively, of length i in s are counted for each i, 1 ≤ i ≤ 6. (For the purpose of this test, runs of length greater than 6 are considered to be of length 6.) The runs test is passed if the 12 counts Bi , Gi , 1 ≤ i ≤ 6,

are each within the corresponding interval specified by the following table. Length of run 1 2 3 4 5 6 Required interval 2267 − 2733 1079 − 1421 502 − 748 223 − 402 90 − 223 90 − 223 (iv) long run test. The long run test is passed if there are no runs of length 34 or more For high security applications, FIPS 140-1 mandates that the four tests be performed each time the random bit generator is powered up. FIPS 140-1 allows these tests to be substituted by alternative tests which provide equivalent or superior randomness checking. 5.45 Maurer’s universal statistical test The basic idea behind Maurer’s universal statistical test is that it should not be possible to significantly compress (without loss of information) the output sequence of a random bit generator. Thus, if a sample output sequence s of a bit generator can be significantly compressed, the generator should be rejected as being defective Instead of actually compressing the sequence s, the universal

statistical test computes a quantity that is related to the length of the compressed sequence. The universality of Maurer’s universal statistical test arises because it is able to detect any one of a very general class of possible defects a bit generator might have. This class includes the five defects that are detectable by the basic tests of §5.44 A drawback of the universal statistical test over the five basic tests is that it requires a much longer sample output sequence in order to be effective. Provided that the required output sequence can be efficiently generated, this drawback is not a practical concern since the universal statistical test itself is very efficient. Algorithm 5.33 computes the statistic Xu for a sample output sequence s = s0 , s1 , , sn−1 to be used in the universal statistical test. The parameter L is first chosen from the Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 184 Ch. 5 Pseudorandom Bits and Sequences L µ

σ12 L µ σ12 1 2 3 4 5 6 7 8 0.7326495 1.5374383 2.4016068 3.3112247 4.2534266 5.2177052 6.1962507 7.1836656 0.690 1.338 1.901 2.358 2.705 2.954 3.125 3.238 9 10 11 12 13 14 15 16 8.1764248 9.1723243 10.170032 11.168765 12.168070 13.167693 14.167488 15.167379 3.311 3.356 3.384 3.401 3.410 3.416 3.419 3.421 Table 5.3: Mean µ and variance σ 2 of the statistic Xu for random sequences, with parameters L, K as Q ∞. The variance of Xu is σ 2 = c(L, K)2 · σ12 /K, where c(L, K) ≈ 07 − (08/L) + (1.6 + (128/L)) · K −4/L for K ≥ 2L interval [6, 16]. The sequence s is then partitioned into non-overlapping L-bit blocks, with any leftover bits discarded; the total number of blocks is Q+K, where Q and K are defined below. For each i, 1 ≤ i ≤ Q+K, let bi be the integer whose binary representation is the ith block. The blocks are scanned in order A table T is maintained so that at each stage T [j] is the position of the last occurrence of the block corresponding to

integer j, 0 ≤ j ≤ 2L − 1. The first Q blocks of s are used to initialize table T ; Q should be chosen to be at least 10·2L in order to have a high likelihood that each of the 2L L-bit blocks occurs at least once in the first Q blocks. The remaining K blocks are used to define the statistic Xu as follows For each i, Q + 1 ≤ i ≤ Q + K, let Ai = i − T [bi]; Ai is the number of positions since the last occurrence of block bi . Then Xu = Q+K 1 X lg Ai . K (5.6) i=Q+1 K should be at least 1000 · 2L (and, hence, the sample sequence s should be at least (1010 · 2L · L) bits in length). Table 53 lists the mean µ and variance σ2 of Xu for random sequences for some sample choices of L as Q ∞ 5.33 Algorithm Computing the statistic Xu for Maurer’s universal statistical test INPUT: a binary sequence s = s0 , s1 , . , sn−1 of length n, and parameters L, Q, K OUTPUT: the value of the statistic Xu for the sequence s. 1. Zero the table T For j from 0 to 2L − 1 do the

following: T [j]←0 2. Initialize the table T For i from 1 to Q do the following: T [bi]←i 3. sum←0 4. For i from Q + 1 to Q + K do the following: 4.1 sum←sum + lg(i − T [bi ]) 4.2 T [bi]←i 5. Xu ←sum/K 6. Return(Xu ) Maurer’s universal statistical test uses the computed value of Xu for the sample output sequence s in the manner prescribed by Fact 5.34 To test the sequence s, a two-sided test should be used with a significance level α between 0.001 and 001 (see §542) c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.5 Cryptographically secure pseudorandom bit generation 185 5.34 Fact Let Xu be the statistic defined in (56) having mean µ and variance σ 2 as given in Table 5.3 Then, for random sequences, the statistic Zu = (Xu − µ)/σ approximately follows an N (0, 1) distribution. 5.5 Cryptographically secure pseudorandom bit generation Two cryptographically secure pseudorandom bit generators (CSPRBG – see Definition 5.8) are

presented in this section. The security of each generator relies on the presumed intractability of an underlying number-theoretic problem The modular multiplications that these generators use make them relatively slow compared to the (ad-hoc) pseudorandom bit generators of §5.3 Nevertheless they may be useful in some circumstances, for example, generating pseudorandom bits on hardware devices which already have the circuitry for performing modular multiplications. Efficient techniques for implementing modular multiplication are presented in §143 5.51 RSA pseudorandom bit generator The RSA pseudorandom bit generator is a CSPRBG under the assumption that the RSA problem is intractable (§3.3; see also §392) 5.35 Algorithm RSA pseudorandom bit generator SUMMARY: a pseudorandom bit sequence z1 , z2 , . , zl of length l is generated 1. Setup Generate two secret RSA-like primes p and q (cf Note 88), and compute n = pq and φ = (p − 1)(q − 1). Select a random integer e, 1 < e

< φ, such that gcd(e, φ) = 1. 2. Select a random integer x0 (the seed) in the interval [1, n − 1] 3. For i from 1 to l do the following: 3.1 xi ←xei−1 mod n 3.2 zi ← the least significant bit of xi 4. The output sequence is z1 , z2 , , zl 5.36 Note (efficiency of the RSA PRBG) If e = 3 is chosen (cf Note 89(ii)), then generating each pseudorandom bit zi requires one modular multiplication and one modular squaring. The efficiency of the generator can be improved by extracting the j least significant bits of xi in step 3.2, where j = c lg lg n and c is a constant Provided that n is sufficiently large, this modified generator is also cryptographically secure (cf. Fact 387) For a modulus n of a fixed bitlength (eg, 1024 bits), an explicit range of values of c for which the resulting generator remains cryptographically secure (cf. Remark 59) under the intractability assumption of the RSA problem has not been determined The following modification improves the efficiency of

the RSA PRBG. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 186 Ch. 5 Pseudorandom Bits and Sequences 5.37 Algorithm Micali-Schnorr pseudorandom bit generator SUMMARY: a pseudorandom bit sequence is generated. 1. Setup Generate two secret RSA-like primes p and q (cf Note 88), and compute n = pq and φ = (p − 1)(q − 1). Let N = blg nc + 1 (the bitlength of n) Select an integer e, 1 < e < φ, such that gcd(e, φ) = 1 and 80e ≤ N . Let k = bN (1 − 2e )c and r = N − k. 2. Select a random sequence x0 (the seed) of bitlength r 3. Generate a pseudorandom sequence of length k ·l For i from 1 to l do the following: 3.1 yi ←xei−1 mod n 3.2 xi ← the r most significant bits of yi 3.3 zi ← the k least significant bits of yi 4. The output sequence is z1 k z2 k · · · k zl , where k denotes concatenation 5.38 Note (efficiency of the Micali-Schnorr PRBG) Algorithm 537 is more efficient than the RSA PRBG since bN (1 − 2e )c bits are

generated per exponentiation by e. For example, if e = 3 and N = 1024, then k = 341 bits are generated per exponentiation. Moreover, each exponentiation requires only one modular squaring of an r = 683-bit number, and one modular multiplication. 5.39 Note (security of the Micali-Schnorr PRBG) Algorithm 537 is cryptographically secure under the assumption that the following is true: the distribution xe mod n for random r-bit sequences x is indistinguishable by all polynomial-time statistical tests from the uniform distribution of integers in the interval [0, n − 1]. This assumption is stronger than requiring that the RSA problem be intractable. 5.52 Blum-Blum-Shub pseudorandom bit generator The Blum-Blum-Shub pseudorandom bit generator (also known as the x2 mod n generator or the BBS generator) is a CSPRBG under the assumption that integer factorization is intractable (§3.2) It forms the basis for the Blum-Goldwasser probabilistic public-key encryption scheme (Algorithm 856) 5.40

Algorithm Blum-Blum-Shub pseudorandom bit generator SUMMARY: a pseudorandom bit sequence z1 , z2 , . , zl of length l is generated 1. Setup Generate two large secret random (and distinct) primes p and q (cf Note 88), each congruent to 3 modulo 4, and compute n = pq. 2. Select a random integer s (the seed) in the interval [1, n − 1] such that gcd(s, n) = 1, and compute x0 ←s2 mod n. 3. For i from 1 to l do the following: 3.1 xi ←x2i−1 mod n 3.2 zi ← the least significant bit of xi 4. The output sequence is z1 , z2 , , zl c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.6 Notes and further references 187 5.41 Note (efficiency of the Blum-Blum-Shub PRBG) Generating each pseudorandom bit zi requires one modular squaring The efficiency of the generator can be improved by extracting the j least significant bits of xi in step 3.2, where j = c lg lg n and c is a constant Provided that n is sufficiently large, this modified generator is also

cryptographically secure. For a modulus n of a fixed bitlength (eg. 1024 bits), an explicit range of values of c for which the resulting generator is cryptographically secure (cf. Remark 59) under the intractability assumption of the integer factorization problem has not been determined. 5.6 Notes and further references §5.1 Chapter 3 of Knuth [692] is the definitive reference for the classic (non-cryptographic) generation of pseudorandom numbers. Knuth [692, pp142-166] contains an extensive discussion of what it means for a sequence to be random Lagarias [724] gives a survey of theoretical results on pseudorandom number generators Luby [774] provides a comprehensive and rigorous overview of pseudorandom generators. For a study of linear congruential generators (Example 5.4), see Knuth [692, pp9-25] Plumstead/Boyar [979, 980] showed how to predict the output of a linear congruential generator given only a few elements of the output sequence, and when the parameters a, b, and m of the

generator are unknown. Boyar [180] extended her method and showed that linear multivariate congruential generators (having recurrence equation xn = a1 xn−1 + a2 xn−2 + · · · + al xn−l + b mod m), and quadratic congruential generators (having recurrence equation xn = ax2n−1 + bxn−1 + c mod m) are cryptographically insecure. Finally, Krawczyk [713] generalized these results and showed how the output of any multivariate polynomial congruential generator can be efficiently predicted. A truncated linear congruential generator is one where a fraction of the least significant bits of the xi are discarded Frieze et al. [427] showed that these generators can be efficiently predicted if the generator parameters a, b, and m are known Stern [1173] extended this method to the case where only m is known. Boyar [179] presented an efficient algorithm for predicting linear congruential generators when O(log log m) bits are discarded, and when the parameters a, b, and m are unknown. No

efficient prediction algorithms are known for truncated multivariate polynomial congruential generators. For a summary of cryptanalytic attacks on congruential generators, see Brickell and Odlyzko [209, pp523-526] For a formal definition of a statistical test (Definition 5.5), see Yao [1258] Fact 57 on the universality of the next-bit test is due to Yao [1258]. For a proof of Yao’s result, see Kranakis [710] and §12.2 of Stinson [1178] A proof of a generalization of Yao’s result is given by Goldreich, Goldwasser, and Micali [468]. The notion of a cryptographically secure pseudorandom bit generator (Definition 5.8) was introduced by Blum and Micali [166]. Blum and Micali also gave a formal description of the next-bit test (Definition 56), and presented the first cryptographically secure pseudorandom bit generator whose security is based on the discrete logarithm problem (see page 189). Universal tests were presented by Schrift and Shamir [1103] for verifying the assumed properties

of a pseudorandom generator whose output sequences are not necessarily uniformly distributed. The first provably secure pseudorandom number generator was proposed by Shamir [1112]. Shamir proved that predicting the next number of an output sequence of this generator is equivalent to inverting the RSA function. However, even though the numbers as a whole may be unpredictable, certain parts of the number (for example, its least significant bit) may Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 188 Ch. 5 Pseudorandom Bits and Sequences be biased or predictable. Hence, Shamir’s generator is not cryptographically secure in the sense of Definition 5.8 §5.2 Agnew [17] proposed a VLSI implementation of a random bit generator consisting of two identical metal insulator semiconductor capacitors close to each other. The cells are charged over the same period of time, and then a 1 or 0 is assigned depending on which cell has a greater charge. Fairfield,

Mortenson, and Coulthart [382] described an LSI random bit generator based on the frequency instability of a free running oscillator. Davis, Ihaka, and Fenstermacher [309] used the unpredictability of air turbulence occurring in a sealed disk drive as a random bit generator. The bits are extracted by measuring the variations in the time to access disk blocks. Fast Fourier Transform (FFT) techniques are then used to remove possible biases and correlations A sample implementation generated 100 random bits per minute. For further guidance on hardware and software-based techniques for generating random bits, see RFC 1750 [1043] The de-skewing technique of Example 5.10 is due to von Neumann [1223] Elias [370] generalized von Neumann’s technique to a more efficient scheme (one where fewer bits are discarded). Fast Fourier Transform techniques for removing biases and correlations are described by Brillinger [213]. For further ways of removing correlations, see Blum [161], Santha and

Vazirani [1091], Vazirani [1217], and Chor and Goldreich [258]. §5.3 The idea of using a one-way function f for generating pseudorandom bit sequences is due to Shamir [1112]. Shamir illustrated why it is difficult to prove that such ad-hoc generators are cryptographically secure without imposing some further assumptions on f . Algorithm 511 is from Appendix C of the ANSI X9.17 standard [37]; it is one of the approved methods for pseudorandom bit generation listed in FIPS 186 [406]. Meyer and Matyas [859, pp316317] describe another DES-based pseudorandom bit generator whose output is intended for use as data-encrypting keys. The four algorithms of §532 for generating DSA parameters are from FIPS 186. §5.4 Standard references on statistics include Hogg and Tanis [559] and Wackerly, Mendenhall, and Scheaffer [1226]. Tables 51 and 52 were generated using the Maple symbolic algebra system [240]. Golomb’s randomness postulates (§543) were proposed by Golomb [498] The five statistical

tests for local randomness outlined in §5.44 are from Beker and Piper [84]. The serial test (§544(ii)) is due to Good [508] It was generalized to subsequences of length greater than 2 by Marsaglia [782] who called it the overlapping m-tuple test, and later by Kimberley [674] who called it the generalized serial test. The underlying distribution theories of the serial test and the runs test (§5.44(iv)) were analyzed by Good [507] and Mood [897], respectively. Gustafson [531] considered alternative statistics for the runs test and the autocorrelation test (§5.44(v)) There are numerous other statistical tests of local randomness. Many of these tests, including the gap test, coupon collector’s test, permutation test, run test, maximum-of-t test, collision test, serial test, correlation test, and spectral test are described by Knuth [692] The poker test as formulated by Knuth [692, p.62] is quite different from that of §544(iii) In the former, a sample sequence is divided into m-bit

blocks, each of which is further subdivided into l-bit sub-blocks (for some divisor l of m). The number of m-bit blocks having r distinct l-bit sub-blocks (1 ≤ r ≤ m/l) is counted and compared to the corresponding expected numbers for random sequences. Erdmann [372] gives a detailed exposition of many c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §5.6 Notes and further references 189 of these tests, and applies them to sample output sequences of six pseudorandom bit generators. Gustafson et al [533] describe a computer package which implements various statistical tests for assessing the strength of a pseudorandom bit generator Gustafson, Dawson, and Golić [532] proposed a new repetition test which measures the number of repetitions of l-bit blocks. The test requires a count of the number of patterns repeated, but does not require the frequency of each pattern For this reason, it is feasible to apply this test for larger values of l (e.g l = 64)

than would be permissible by the poker test or Maurer’s universal statistical test (Algorithm 5.33) Two spectral tests have been developed, one based on the discrete Fourier transform by Gait [437], and one based on the Walsh transform by Yuen [1260]. For extensions of these spectral tests, see Erdmann [372] and Feldman [389] FIPS 140-1 [401] specifies security requirements for the design and implementation of cryptographic modules, including random and pseudorandom bit generators, for protecting (U.S government) unclassified information The universal statistical test (Algorithm 5.33) is due to Maurer [813] and was motivated by source coding algorithms of Elias [371] and Willems [1245]. The class of defects that the test is able to detect consists of those that can be modeled by an ergodic stationary source with limited memory; Maurer argues that this class includes the possible defects that could occur in a practical implementation of a random bit generator. Table 53 is due to

Maurer [813], who provides derivations of formulae for the mean and variance of the statistic Xu . §5.5 Blum and Micali [166] presented the following general construction for CSPRBGs. Let D be a finite set, and let f : D D be a permutation that can be efficiently computed. Let B : D {0, 1} be a Boolean predicate with the property that B(x) is hard to compute given only x ∈ D, however, B(x) can be efficiently computed given y = f −1 (x). The output sequence z1 , z2 , . , zl corresponding to a seed x0 ∈ D is obtained by computing xi = f (xi−1 ), zi = B(xi ), for 1 ≤ i ≤ l. This generator can be shown to pass the next-bit test (Definition 5.6) Blum and Micali [166] proposed the first concrete instance of a CSPRBG, called the Blum-Micali generator. Using the notation introduced above, their method can be described as follows. Let p be a large prime, and α a generator of Z∗p Define D = Z∗p = {1, 2, . , p − 1} The function f : D D is defined by f (x) = αx mod p

The function B : D {0, 1} is defined by B(x) = 1 if 0 ≤ logα x ≤ (p − 1)/2, and B(x) = 0 if logα x > (p−1)/2. Assuming the intractability of the discrete logarithm problem in Z∗p (§36; see also §391), the Blum-Micali generator was proven to satisfy the nextbit test Long and Wigderson [772] improved the efficiency of the Blum-Micali generator by simultaneously extracting O(lg lg p) bits (cf. §391) from each xi Kaliski [650, 651] modified the Blum-Micali generator so that the security depends on the discrete logarithm problem in the group of points on an elliptic curve defined over a finite field. The RSA pseudorandom bit generator (Algorithm 5.35) and the improvement mentioned in Note 5.36 are due to Alexi et al [23] The Micali-Schnorr improvement of the RSA PRBG (Algorithm 5.37) is due to Micali and Schnorr [867], who also described a method that transforms any CSPRBG into one that can be accelerated by parallel evaluation. The method of parallelization is perfect:

m parallel processors speed the generation of pseudorandom bits by a factor of m. Algorithm 5.40 is due to Blum, Blum, and Shub [160], who showed that their pseudorandom bit generator is cryptographically secure assuming the intractability of the quadratic residuosity problem (§3.4) Vazirani and Vazirani [1218] established a stronger result regarding the security of this generator by proving it cryptographically secure under the weaker assumption that integer factorization is intractable. The improvement mentioned in Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 190 Ch. 5 Pseudorandom Bits and Sequences Note 5.41 is due to Vazirani and Vazirani Alexi et al [23] proved analogous results for the modified-Rabin generator, which differs as follows from the Blum-Blum-Shub generator: in step 3.1 of Algorithm 540, let x = x2i−1 mod n; if x < n/2, then xi = x; otherwise, xi = n − x. Impagliazzo and Naor [569] devised efficient constructions for a

CSPRBG and for a universal one-way hash function which are provably as secure as the subset sum problem. Fischer and Stern [411] presented a simple and efficient CSPRBG which is provably as secure as the syndrome decoding problem. Yao [1258] showed how to obtain a CSPRBG using any one-way permutation. Levin [761] generalized this result and showed how to obtain a CSPRBG using any one-way function. For further refinements, see Goldreich, Krawczyk, and Luby [470], Impagliazzo, Levin, and Luby [568], and Håstad [545]. A random function f : {0, 1}n {0, 1}n is a function which assigns independent and random values f (x) ∈ {0, 1}n to all arguments x ∈ {0, 1}n. Goldreich, Goldwasser, and Micali [468] introduced a computational complexity measure of the randomness of functions. They defined a function to be poly-random if no polynomial-time algorithm can distinguish between values of the function and true random strings, even when the algorithm is permitted to select the arguments to

the function. Goldreich, Goldwasser, and Micali presented an algorithm for constructing poly-random functions assuming the existence of one-way functions. This theory was applied by Goldreich, Goldwasser, and Micali [467] to develop provably secure protocols for the (essentially) storageless distribution of secret identification numbers, message authentication with timestamping, dynamic hashing, and identify friend or foe systems. Luby and Rackoff [776] showed how poly-random permutations can be efficiently constructed from poly-random functions This result was used, together with some of the design principles of DES, to show how any CSPRBG can be used to construct a symmetric-key block cipher which is provably secure against chosenplaintext attack. A simplified and generalized treatment of Luby and Rackoff’s construction was given by Maurer [816]. Schnorr [1096] used Luby and Rackoff’s poly-random permutation generator to construct a pseudorandom bit generator that was claimed to

pass all statistical tests depending only on a small fraction of the output sequence, even when infinite computational resources are available. Rueppel [1079] showed that this claim is erroneous, and demonstrated that the generator can be distinguished from a truly random bit generator using only a small number of output bits. Maurer and Massey [821] extended Schnorr’s work, and proved the existence of pseudorandom bit generators that pass all statistical tests depending only on a small fraction of the output sequence, even when infinite computational resources are available. The security of the generators does not rely on any unproved hypothesis, but rather on the assumption that the adversary can access only a limited number of bits of the generated sequence. This work is primarily of theoretical interest since no such polynomial-time generators are known. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter Chapter 6 Stream Ciphers Contents in Brief 6.1

6.2 6.3 6.4 6.5 Introduction . Feedback shift registers . Stream ciphers based on LFSRs Other stream ciphers . Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 195 203 212 216 6.1 Introduction Stream ciphers are an important class of encryption algorithms. They encrypt individual characters (usually binary digits) of a plaintext message one at a time, using an encryption transformation which varies with time. By contrast, block ciphers (Chapter 7) tend to simultaneously encrypt groups of characters of a plaintext message using a fixed encryption transformation. Stream ciphers are generally faster than block ciphers in hardware, and have less complex hardware circuitry. They are also more appropriate, and in some cases mandatory (e.g, in some telecommunications applications), when

buffering is limited or when characters must be individually processed as they are received Because they have limited or no error propagation, stream ciphers may also be advantageous in situations where transmission errors are highly probable. There is a vast body of theoretical knowledge on stream ciphers, and various design principles for stream ciphers have been proposed and extensively analyzed. However, there are relatively few fully-specified stream cipher algorithms in the open literature. This unfortunate state of affairs can partially be explained by the fact that most stream ciphers used in practice tend to be proprietary and confidential. By contrast, numerous concrete block cipher proposals have been published, some of which have been standardized or placed in the public domain. Nevertheless, because of their significant advantages, stream ciphers are widely used today, and one can expect increasingly more concrete proposals in the coming years. Chapter outline The

remainder of §6.1 introduces basic concepts relevant to stream ciphers Feedback shift registers, in particular linear feedback shift registers (LFSRs), are the basic building block in most stream ciphers that have been proposed; they are studied in §6.2 Three general techniques for utilizing LFSRs in the construction of stream ciphers are presented in §63: using 191 192 Ch. 6 Stream Ciphers a nonlinear combining function on the outputs of several LFSRs (§6.31), using a nonlinear filtering function on the contents of a single LFSR (§632), and using the output of one (or more) LFSRs to control the clock of one (or more) other LFSRs (§6.33) Two concrete proposals for clock-controlled generators, the alternating step generator and the shrinking generator are presented in §6.33 §64 presents a stream cipher not based on LFSRs, namely SEAL. §65 concludes with references and further chapter notes 6.11 Classification Stream ciphers can be either symmetric-key or public-key. The

focus of this chapter is symmetric-key stream ciphers; the Blum-Goldwasser probabilistic public-key encryption scheme (§8.72) is an example of a public-key stream cipher 6.1 Note (block vs stream ciphers) Block ciphers process plaintext in relatively large blocks (e.g, n ≥ 64 bits) The same function is used to encrypt successive blocks; thus (pure) block ciphers are memoryless. In contrast, stream ciphers process plaintext in blocks as small as a single bit, and the encryption function may vary as plaintext is processed; thus stream ciphers are said to have memory. They are sometimes called state ciphers since encryption depends on not only the key and plaintext, but also on the current state. This distinction between block and stream ciphers is not definitive (see Remark 7.25); adding a small amount of memory to a block cipher (as in the CBC mode) results in a stream cipher with large blocks. (i) The one-time pad Recall (Definition 1.39) that a Vernam cipher over the binary

alphabet is defined by ci = mi ⊕ki for i = 1, 2, 3 . , where m1 , m2 , m3 , . are the plaintext digits, k1 , k2 , k3 , (the keystream) are the key digits, c1 , c2 , c3 , . are the ciphertext digits, and ⊕ is the XOR function (bitwise addition modulo 2). Decryption is defined by mi = ci ⊕ki If the keystream digits are generated independently and randomly, the Vernam cipher is called a one-time pad, and is unconditionally secure (§1.133(i)) against a ciphertext-only attack More precisely, if M , C, and K are random variables respectively denoting the plaintext, ciphertext, and secret key, and if H() denotes the entropy function (Definition 2.39), then H(M |C) = H(M ) Equivalently, I(M ; C) = 0 (see Definition 245): the ciphertext contributes no information about the plaintext. Shannon proved that a necessary condition for a symmetric-key encryption scheme to be unconditionally secure is that H(K) ≥ H(M ). That is, the uncertainty of the secret key must be at least as

great as the uncertainty of the plaintext. If the key has bitlength k, and the key bits are chosen randomly and independently, then H(K) = k, and Shannon’s necessary condition for unconditional security becomes k ≥ H(M ). The one-time pad is unconditionally secure regardless of the statistical distribution of the plaintext, and is optimal in the sense that its key is the smallest possible among all symmetric-key encryption schemes having this property. An obvious drawback of the one-time pad is that the key should be as long as the plaintext, which increases the difficulty of key distribution and key management. This motivates the design of stream ciphers where the keystream is pseudorandomly generated from a smaller secret key, with the intent that the keystream appears random to a computationally bounded adversary. Such stream ciphers do not offer unconditional security (since H(K)  H(M )), but the hope is that they are computationally secure (§1.133(iv)) c 1997 by CRC Press,

Inc. See accompanying notice at front of chapter §6.1 Introduction 193 Stream ciphers are commonly classified as being synchronous or self-synchronizing. (ii) Synchronous stream ciphers 6.2 Definition A synchronous stream cipher is one in which the keystream is generated independently of the plaintext message and of the ciphertext The encryption process of a synchronous stream cipher can be described by the equations σi+1 zi ci = f (σi , k), = g(σi , k), = h(zi , mi ), where σ0 is the initial state and may be determined from the key k, f is the next-state function, g is the function which produces the keystream zi , and h is the output function which combines the keystream and plaintext mi to produce ciphertext ci . The encryption and decryption processes are depicted in Figure 6.1 The OFB mode of a block cipher (see §7.22(iv)) is an example of a synchronous stream cipher (i) Encryption mi σi σi+1 (ii) Decryption Plaintext mi Ciphertext ci Key k Keystream zi State

σi f k ci σi σi+1 f g zi h ci k g zi h−1 mi Figure 6.1: General model of a synchronous stream cipher 6.3 Note (properties of synchronous stream ciphers) (i) synchronization requirements. In a synchronous stream cipher, both the sender and receiver must be synchronized – using the same key and operating at the same position (state) within that key – to allow for proper decryption. If synchronization is lost due to ciphertext digits being inserted or deleted during transmission, then decryption fails and can only be restored through additional techniques for re-synchronization. Techniques for re-synchronization include re-initialization, placing special markers at regular intervals in the ciphertext, or, if the plaintext contains enough redundancy, trying all possible keystream offsets. (ii) no error propagation. A ciphertext digit that is modified (but not deleted) during transmission does not affect the decryption of other ciphertext digits. (iii) active

attacks. As a consequence of property (i), the insertion, deletion, or replay of ciphertext digits by an active adversary causes immediate loss of synchronization, and hence might possibly be detected by the decryptor. As a consequence of property (ii), an active adversary might possibly be able to make changes to selected ciphertext digits, and know exactly what affect these changes have on the plaintext. This illustrates that additional mechanisms must be employed in order to provide data origin authentication and data integrity guarantees (see §9.54) Most of the stream ciphers that have been proposed to date in the literature are additive stream ciphers, which are defined below. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 194 Ch. 6 Stream Ciphers 6.4 Definition A binary additive stream cipher is a synchronous stream cipher in which the keystream, plaintext, and ciphertext digits are binary digits, and the output function h is the XOR

function. Binary additive stream ciphers are depicted in Figure 6.2 Referring to Figure 62, the keystream generator is composed of the next-state function f and the function g (see Figure 6.1), and is also known as the running key generator (i) Encryption (ii) Decryption Plaintext mi Ciphertext ci mi ci Key k Keystream zi k Keystream zi ci Generator Keystream k zi mi Generator Figure 6.2: General model of a binary additive stream cipher (iii) Self-synchronizing stream ciphers 6.5 Definition A self-synchronizing or asynchronous stream cipher is one in which the keystream is generated as a function of the key and a fixed number of previous ciphertext digits The encryption function of a self-synchronizing stream cipher can be described by the equations σi zi ci = (ci−t , ci−t+1 , . , ci−1 ), = g(σi , k), = h(zi , mi ), where σ0 = (c−t , c−t+1 , . , c−1 ) is the (non-secret) initial state, k is the key, g is the function which produces the keystream zi ,

and h is the output function which combines the keystream and plaintext mi to produce ciphertext ci . The encryption and decryption processes are depicted in Figure 6.3 The most common presently-used self-synchronizing stream ciphers are based on block ciphers in 1-bit cipher feedback mode (see §7.22(iii)) (i) Encryption (ii) Decryption mi k g zi h ci ci k g zi Figure 6.3: General model of a self-synchronizing stream cipher c 1997 by CRC Press, Inc. See accompanying notice at front of chapter h−1 mi §6.2 Feedback shift registers 195 6.6 Note (properties of self-synchronizing stream ciphers) (i) self-synchronization. Self-synchronization is possible if ciphertext digits are deleted or inserted, because the decryption mapping depends only on a fixed number of preceding ciphertext characters. Such ciphers are capable of re-establishing proper decryption automatically after loss of synchronization, with only a fixed number of plaintext characters unrecoverable.

(ii) limited error propagation. Suppose that the state of a self-synchronization stream cipher depends on t previous ciphertext digits If a single ciphertext digit is modified (or even deleted or inserted) during transmission, then decryption of up to t subsequent ciphertext digits may be incorrect, after which correct decryption resumes. (iii) active attacks. Property (ii) implies that any modification of ciphertext digits by an active adversary causes several other ciphertext digits to be decrypted incorrectly, thereby improving (compared to synchronous stream ciphers) the likelihood of being detected by the decryptor. As a consequence of property (i), it is more difficult (than for synchronous stream ciphers) to detect insertion, deletion, or replay of ciphertext digits by an active adversary. This illustrates that additional mechanisms must be employed in order to provide data origin authentication and data integrity guarantees (see §9.54) (iv) diffusion of plaintext statistics.

Since each plaintext digit influences the entire following ciphertext, the statistical properties of the plaintext are dispersed through the ciphertext. Hence, self-synchronizing stream ciphers may be more resistant than synchronous stream ciphers against attacks based on plaintext redundancy 6.2 Feedback shift registers Feedback shift registers, in particular linear feedback shift registers, are the basic components of many keystream generators. §621 introduces linear feedback shift registers The linear complexity of binary sequences is studied in §6.22, while the Berlekamp-Massey algorithm for computing it is presented in §623 Finally, nonlinear feedback shift registers are discussed in §6.24 6.21 Linear feedback shift registers Linear feedback shift registers (LFSRs) are used in many of the keystream generators that have been proposed in the literature. There are several reasons for this: 1. LFSRs are well-suited to hardware implementation; 2. they can produce sequences of

large period (Fact 612); 3. they can produce sequences with good statistical properties (Fact 614); and 4. because of their structure, they can be readily analyzed using algebraic techniques 6.7 Definition A linear feedback shift register (LFSR) of length L consists of L stages (or delay elements) numbered 0, 1, . , L − 1, each capable of storing one bit and having one input and one output; and a clock which controls the movement of data. During each unit of time the following operations are performed: (i) the content of stage 0 is output and forms part of the output sequence; Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 196 Ch. 6 Stream Ciphers (ii) the content of stage i is moved to stage i − 1 for each i, 1 ≤ i ≤ L − 1; and (iii) the new content of stage L − 1 is the feedback bit sj which is calculated by adding together modulo 2 the previous contents of a fixed subset of stages 0, 1, . , L − 1 Figure 6.4 depicts an LFSR

Referring to the figure, each ci is either 0 or 1; the closed semi-circles are AND gates; and the feedback bit sj is the modulo 2 sum of the contents of those stages i, 0 ≤ i ≤ L − 1, for which cL−i = 1. sj c1 c2 Stage L-1 Stage L-2 cL−1 cL Stage 1 Stage 0 output Figure 6.4: A linear feedback shift register (LFSR) of length L 6.8 Definition The LFSR of Figure 64 is denoted hL, C(D)i, where C(D) = 1 + c1 D + c2 D2 + · · · + cL DL ∈ Z2 [D] is the connection polynomial. The LFSR is said to be nonsingular if the degree of C(D) is L (that is, cL = 1) If the initial content of stage i is si ∈ {0, 1} for each i, 0 ≤ i ≤ L − 1, then [sL−1 , . , s1 , s0 ] is called the initial state of the LFSR. 6.9 Fact If the initial state of the LFSR in Figure 64 is [sL−1 , , s1 , s0 ], then the output sequence s = s0 , s1 , s2 , . is uniquely determined by the following recursion: sj = (c1 sj−1 + c2 sj−2 + · · · + cL sj−L ) mod 2 for j ≥ L. 6.10

Example (output sequence of an LFSR) Consider the LFSR h4, 1 + D + D4 i depicted in Figure 6.5 If the initial state of the LFSR is [0, 0, 0, 0], the output sequence is the zero sequence. The following tables show the contents of the stages D3 , D2 , D1 , D0 at the end of each unit of time t when the initial state is [0, 1, 1, 0]. t 0 1 2 3 4 5 6 7 D3 0 0 1 0 0 0 1 1 D2 1 0 0 1 0 0 0 1 D1 1 1 0 0 1 0 0 0 D0 0 1 1 0 0 1 0 0 t 8 9 10 11 12 13 14 15 D3 1 1 0 1 0 1 1 0 D2 1 1 1 0 1 0 1 1 D1 1 1 1 1 0 1 0 1 D0 0 1 1 1 1 0 1 0 The output sequence is s = 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, . , and is periodic with period 15 (see Definition 5.25)  The significance of an LFSR being non-singular is explained by Fact 6.11 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.2 Feedback shift registers 197 Stage 3 Stage 2 Stage 1 Stage 0 D3 D2 D1 D0 output Figure 6.5: The LFSR h4, 1 + D + D4 i of Example 610 6.11 Fact Every output sequence

(ie, for all possible initial states) of an LFSR hL, C(D)i is periodic if and only if the connection polynomial C(D) has degree L. If an LFSR hL, C(D)i is singular (i.e, C(D) has degree less than L), then not all output sequences are periodic However, the output sequences are ultimately periodic; that is, the sequences obtained by ignoring a certain finite number of terms at the beginning are periodic. For the remainder of this chapter, it will be assumed that all LFSRs are nonsingular Fact 612 determines the periods of the output sequences of some special types of non-singular LFSRs. 6.12 Fact (periods of LFSR output sequences) Let C(D) ∈ Z2 [D] be a connection polynomial of degree L. (i) If C(D) is irreducible over Z2 (see Definition 2.190), then each of the 2L − 1 nonzero initial states of the non-singular LFSR hL, C(D)i produces an output sequence with period equal to the least positive integer N such that C(D) divides 1 + DN in Z2 [D]. (Note: it is always the case that this N

is a divisor of 2L − 1) (ii) If C(D) is a primitive polynomial (see Definition 2.228), then each of the 2L −1 nonzero initial states of the non-singular LFSR hL, C(D)i produces an output sequence with maximum possible period 2L − 1. A method for generating primitive polynomials over Z2 uniformly at random is given in Algorithm 4.78 Table 48 lists a primitive polynomial of degree m over Z2 for each m, 1 ≤ m ≤ 229. Fact 612(ii) motivates the following definition 6.13 Definition If C(D) ∈ Z2 [D] is a primitive polynomial of degree L, then hL, C(D)i is called a maximum-length LFSR. The output of a maximum-length LFSR with non-zero initial state is called an m-sequence Fact 6.14 demonstrates that the output sequences of maximum-length LFSRs have good statistical properties. 6.14 Fact (statistical properties of m-sequences) Let s be an m-sequence that is generated by a maximum-length LFSR of length L. (i) Let k be an integer, 1 ≤ k ≤ L, and let s be any subsequence of s of

length 2L + k − 2. Then each non-zero sequence of length k appears exactly 2L−k times as a subsequence of s. Furthermore, the zero sequence of length k appears exactly 2L−k − 1 times as a subsequence of s. In other words, the distribution of patterns having fixed length of at most L is almost uniform. (ii) s satisfies Golomb’s randomness postulates (§5.43) That is, every m-sequence is also a pn-sequence (see Definition 5.29) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 198 Ch. 6 Stream Ciphers 6.15 Example (m-sequence) Since C(D) = 1 + D + D4 is a primitive polynomial over Z2 , the LFSR h4, 1 + D + D4 i is a maximum-length LFSR. Hence, the output sequence of this LFSR is an m-sequence of maximum possible period N = 24 −1 = 15 (cf. Example 610) Example 5.30 verifies that this output sequence satisfies Golomb’s randomness properties  6.22 Linear complexity This subsection summarizes selected results about the linear complexity of

sequences. All sequences are assumed to be binary sequences. Notation: s denotes an infinite sequence whose terms are s0 , s1 , s2 , . ; sn denotes a finite sequence of length n whose terms are s0 , s1 , . , sn−1 (see Definition 524) 6.16 Definition An LFSR is said to generate a sequence s if there is some initial state for which the output sequence of the LFSR is s. Similarly, an LFSR is said to generate a finite sequence sn if there is some initial state for which the output sequence of the LFSR has sn as its first n terms. 6.17 Definition The linear complexity of an infinite binary sequence s, denoted L(s), is defined as follows: (i) if s is the zero sequence s = 0, 0, 0, . , then L(s) = 0; (ii) if no LFSR generates s, then L(s) = ∞; (iii) otherwise, L(s) is the length of the shortest LFSR that generates s. 6.18 Definition The linear complexity of a finite binary sequence sn , denoted L(sn ), is the length of the shortest LFSR that generates a sequence having sn as its

first n terms. Facts 6.19 – 622 summarize some basic results about linear complexity 6.19 Fact (i) (ii) (iii) (iv) (v) (properties of linear complexity) Let s and t be binary sequences. For any n ≥ 1, the linear complexity of the subsequence sn satisfies 0 ≤ L(sn ) ≤ n. L(sn ) = 0 if and only if sn is the zero sequence of length n. L(sn ) = n if and only if sn = 0, 0, 0, . , 0, 1 If s is periodic with period N , then L(s) ≤ N . L(s⊕t) ≤ L(s) + L(t), where s⊕t denotes the bitwise XOR of s and t. 6.20 Fact If the polynomial C(D) ∈ Z2 [D] is irreducible over Z2 and has degree L, then each of the 2L −1 non-zero initial states of the non-singular LFSR hL, C(D)i produces an output sequence with linear complexity L. 6.21 Fact (expectation and variance of the linear complexity of a random sequence) Let sn be chosen uniformly at random from the set of all binary sequences of length n, and let L(sn ) be the linear complexity of sn . Let B(n) denote the parity function:

B(n) = 0 if n is even; B(n) = 1 if n is odd. (i) The expected linear complexity of sn is   n 4 + B(n) 1 n 2 n E(L(s )) = + − n + . 2 18 2 3 9 Hence, for moderately large n, E(L(sn )) ≈ n 5 2 + 18 if n is odd. n 2 + 2 9 if n is even, and E(L(sn )) ≈ c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.2 Feedback shift registers 199 (ii) The variance of the linear complexity of sn is Var(L(sn )) =     4 86 1 14 − B(n) 82 − 2B(n) 1 4 1 2 − n+ − 2n n + n+ . 81 2n 27 81 2 9 27 81 Hence, Var(L(sn )) ≈ 86 81 for moderately large n. 6.22 Fact (expectation of the linear complexity of a random periodic sequence) Let sn be chosen uniformly at random from the set of all binary sequences of length n, where n = 2t for some fixed t ≥ 1, and let s be the n-periodic infinite sequence obtained by repeating the sequence sn . Then the expected linear complexity of s is E(L(sn )) = n − 1 + 2−n The linear complexity profile of a binary

sequence is introduced next. 6.23 Definition Let s = s0 , s1 , be a binary sequence, and let LN denote the linear complexity of the subsequence sN = s0 , s1 , , sN −1 , N ≥ 0 The sequence L1 , L2 , is called the linear complexity profile of s. Similarly, if sn = s0 , s1 , , sn−1 is a finite binary sequence, the sequence L1 , L2 , . , Ln is called the linear complexity profile of sn The linear complexity profile of a sequence can be computed using the BerlekampMassey algorithm (Algorithm 6.30); see also Note 631 The following properties of the linear complexity profile can be deduced from Fact 6.29 6.24 Fact (properties of linear complexity profile) Let L1 , L2 , be the linear complexity profile of a sequence s = s0 , s1 , (i) If j > i, then Lj ≥ Li . (ii) LN +1 > LN is possible only if LN ≤ N/2. (iii) If LN +1 > LN , then LN +1 + LN = N + 1. The linear complexity profile of a sequence s can be graphed by plotting the points (N, LN ), N ≥ 1, in

the N × L plane and joining successive points by a horizontal line followed by a vertical line, if necessary (see Figure 6.6) Fact 624 can then be interpreted as saying that the graph of a linear complexity profile is non-decreasing. Moreover, a (vertical) jump in the graph can only occur from below the line L = N/2; if a jump occurs, then it is symmetric about this line. Fact 625 shows that the expected linear complexity of a random sequence should closely follow the line L = N/2. 6.25 Fact (expected linear complexity profile of a random sequence) Let s = s0 , s1 , be a random sequence, and let LN be the linear complexity of the subsequence sN = s0 , s1 , . , sN −1 for each N ≥ 1. For any fixed index N ≥ 1, the expected smallest j for which LN +j > LN is 2 if LN ≤ N/2, or 2 + 2LN − N if LN > N/2. Moreover, the expected increase in linear complexity is 2 if LN ≥ N/2, or N − 2LN + 2 if LN < N/2. 6.26 Example (linear complexity profile) Consider the

20-periodic sequence s with cycle s20 = 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0. The linear complexity profile of s is 1, 1, 1, 3, 3, 3, 3, 5, 5, 5, 6, 6, 6, 8, 8, 8, 9, 9, 10, 10, 11, 11, 11, 11, 14, 14, 14, 14, 15, 15, 15, 17, 17, 17, 18, 18, 19, 19, 19, 19, . Figure 66 shows the graph of the linear complexity profile of s.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 200 Ch. 6 Stream Ciphers L = L(sN ) L = N/2 line 20 15 10 5 N 10 20 30 40 Figure 6.6: Linear complexity profile of the 20-periodic sequence of Example 626 As is the case with all statistical tests for randomness (cf. §54), the condition that a sequence s have a linear complexity profile that closely resembles that of a random sequence is necessary but not sufficient for s to be considered random. This point is illustrated in the following example. 6.27 Example (limitations of the linear complexity profile) The linear complexity profile of the

sequence s defined as  1, if i = 2j − 1 for some j ≥ 0, si = 0, otherwise, follows the line L = N/2 as closely as possible. That is, L(sN ) = b(N + 1)/2c for all N ≥ 1. However, the sequence s is clearly non-random  6.23 Berlekamp-Massey algorithm The Berlekamp-Massey algorithm (Algorithm 6.30) is an efficient algorithm for determining the linear complexity of a finite binary sequence sn of length n (see Definition 618) The algorithm takes n iterations, with the N th iteration computing the linear complexity of the subsequence sN consisting of the first N terms of sn . The theoretical basis for the algorithm is Fact 6.29 6.28 Definition Consider the finite binary sequence sN +1 = s0 , s1 , , sN −1 , sN For C(D) = 1 + c1 D + · · · + cL DL , let hL, C(D)i be an LFSR that generates the subsequence sN = s0 , s1 , . , sN −1 The next discrepancy dN is the difference between sN and the (N +1)st P term generated by the LFSR: dN = (sN + L i=1 ci sN −i ) mod 2. 6.29 Fact

Let sN = s0 , s1 , , sN −1 be a finite binary sequence of linear complexity L = L(sN ), and let hL, C(D)i be an LFSR which generates sN . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.2 Feedback shift registers 201 (i) The LFSR hL, C(D)i also generates sN +1 = s0 , s1 , . , sN −1 , sN if and only if the next discrepancy dN is equal to 0. (ii) If dN = 0, then L(sN +1 ) = L. (iii) Suppose dN = 1. Let m the largest integer < N such that L(sm ) < L(sN ), and let hL(sm ), B(D)i be an LFSR of length L(sm ) which generates sm . Then hL0 , C 0 (D)i is an LFSR of smallest length which generates sN +1 , where  L, if L > N/2, 0 L = N + 1 − L, if L ≤ N/2, and C 0 (D) = C(D) + B(D) · DN −m . 6.30 Algorithm Berlekamp-Massey algorithm INPUT: a binary sequence sn = s0 , s1 , s2 , . , sn−1 of length n OUTPUT: the linear complexity L(sn ) of sn , 0 ≤ L(sn ) ≤ n. 1. Initialization C(D)←1, L←0, m← − 1, B(D)←1, N ←0 2. While

(N < n) do the following: P 2.1 Compute the next discrepancy d d←(sN + L i=1 ci sN −i ) mod 2. 2.2 If d = 1 then do the following: T (D)←C(D), C(D)←C(D) + B(D) · DN −m . If L ≤ N/2 then L←N + 1 − L, m←N , B(D)←T (D). 2.3 N ←N + 1 3. Return(L) 6.31 Note (intermediate results in Berlekamp-Massey algorithm) At the end of each iteration of step 2, hL, C(D)i is an LFSR of smallest length which generates sN . Hence, Algorithm 630 can also be used to compute the linear complexity profile (Definition 623) of a finite sequence. 6.32 Fact The running time of the Berlekamp-Massey algorithm (Algorithm 630) for determining the linear complexity of a binary sequence of bitlength n is O(n2 ) bit operations 6.33 Example (Berlekamp-Massey algorithm) Table 61 shows the steps of Algorithm 630 for computing the linear complexity of the binary sequence sn = 0, 0, 1, 1, 0, 1, 1, 1, 0 of length n = 9. This sequence is found to have linear complexity 5, and an LFSR which generates it

is h5, 1 + D3 + D5 i.  6.34 Fact Let sn be a finite binary sequence of length n, and let the linear complexity of sn be L. Then there is a unique LFSR of length L which generates sn if and only if L ≤ n2 An important consequence of Fact 6.34 and Fact 624(iii) is the following 6.35 Fact Let s be an (infinite) binary sequence of linear complexity L, and let t be a (finite) subsequence of s of length at least 2L. Then the Berlekamp-Massey algorithm (with step 3 modified to return both L and C(D)) on input t determines an LFSR of length L which generates s. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 202 Ch. 6 Stream Ciphers sN − 0 0 1 1 0 1 1 1 0 T (D) − − − 1 1 + D3 1 + D + D3 1 + D + D2 + D3 1 + D + D2 + D3 1 + D + D2 1 + D + D2 + D5 d − 0 0 1 1 1 1 0 1 1 C(D) 1 1 1 1 + D3 1 + D + D3 1 + D + D2 + D3 1 + D + D2 1 + D + D2 1 + D + D2 + D5 1 + D3 + D5 L 0 0 0 3 3 3 3 3 5 5 m −1 −1 −1 2 2 2 2 2 7 7 B(D) 1 1 1 1 1 1 1 1 1 + D

+ D2 1 + D + D2 N 0 1 2 3 4 5 6 7 8 9 Table 6.1: Steps of the Berlekamp-Massey algorithm of Example 633 6.24 Nonlinear feedback shift registers This subsection summarizes selected results about nonlinear feedback shift registers. A function with n binary inputs and one binary output is called a Boolean function of n varin ables; there are 22 different Boolean functions of n variables. 6.36 Definition A (general) feedback shift register (FSR) of length L consists of L stages (or delay elements) numbered 0, 1, . , L − 1, each capable of storing one bit and having one input and one output, and a clock which controls the movement of data. During each unit of time the following operations are performed: (i) the content of stage 0 is output and forms part of the output sequence; (ii) the content of stage i is moved to stage i − 1 for each i, 1 ≤ i ≤ L − 1; and (iii) the new content of stage L − 1 is the feedback bit sj = f (sj−1 , sj−2 , . , sj−L ), where the

feedback function f is a Boolean function and sj−i is the previous content of stage L − i, 1 ≤ i ≤ L. If the initial content of stage i is si ∈ {0, 1} for each 0 ≤ i ≤ L−1, then [sL−1 , . , s1 , s0 ] is called the initial state of the FSR. Figure 6.7 depicts an FSR Note that if the feedback function f is a linear function, then the FSR is an LFSR (Definition 6.7) Otherwise, the FSR is called a nonlinear FSR f (sj−1 , sj−2 , . , sj−L ) sj sj−1 Stage L-1 sj−2 Stage L-2 sj−L+1 Stage 1 sj−L Stage 0 output Figure 6.7: A feedback shift register (FSR) of length L 6.37 Fact If the initial state of the FSR in Figure 67 is [sL−1 , , s1 , s0 ], then the output sequence s = s0 , s1 , s2 , is uniquely determined by the following recursion: sj = f (sj−1 , sj−2 , . , sj−L ) for j ≥ L c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.3 Stream ciphers based on LFSRs 203 6.38 Definition An FSR is said to be

non-singular if and only if every output sequence of the FSR (i.e, for all possible initial states) is periodic 6.39 Fact An FSR with feedback function f (sj−1 , sj−2 , , sj−L ) is non-singular if and only if f is of the form f = sj−L ⊕ g(sj−1 , sj−2 , . , sj−L+1 ) for some Boolean function g The period of the output sequence of a non-singular FSR of length L is at most 2L . 6.40 Definition If the period of the output sequence (for any initial state) of a non-singular FSR of length L is 2L , then the FSR is called a de Bruijn FSR, and the output sequence is called a de Bruijn sequence. 6.41 Example (de Bruijn sequence) Consider the FSR of length 3 with nonlinear feedback function f (x1 , x2 , x3 ) = 1⊕x2 ⊕x3 ⊕x1 x2 . The following tables show the contents of the 3 stages of the FSR at the end of each unit of time t when the initial state is [0, 0, 0]. t 0 1 2 3 Stage 2 0 1 1 1 Stage 1 0 0 1 1 Stage 0 0 0 0 1 t 4 5 6 7 Stage 2 0 1 0 0 Stage 1 1 0 1 0

Stage 0 1 1 0 1 The output sequence is the de Bruijn sequence with cycle 0, 0, 0, 1, 1, 1, 0, 1.  Fact 6.42 demonstrates that the output sequence of de Bruijn FSRs have good statistical properties (compare with Fact 6.14(i)) 6.42 Fact (statistical properties of de Bruijn sequences) Let s be a de Bruijn sequence that is generated by a de Bruijn FSR of length L. Let k be an integer, 1 ≤ k ≤ L, and let s be any subsequence of s of length 2L + k − 1. Then each sequence of length k appears exactly 2L−k times as a subsequence of s. In other words, the distribution of patterns having fixed length of at most L is uniform. 6.43 Note (converting a maximum-length LFSR to a de Bruijn FSR) Let R1 be a maximumlength LFSR of length L with (linear) feedback function f (sj−1 , sj−2 , , sj−L ) Then the FSR R2 with feedback function g(sj−1 , sj−2 , . , sj−L ) = f ⊕ sj−1 sj−2 · · · sj−L+1 is a de Bruijn FSR. Here, si denotes the complement of si The output sequence

of R2 is obtained from that of R1 by simply adding a 0 to the end of each subsequence of L − 1 0’s occurring in the output sequence of R1 . 6.3 Stream ciphers based on LFSRs As mentioned in the beginning of §6.21, linear feedback shift registers are widely used in keystream generators because they are well-suited for hardware implementation, produce sequences having large periods and good statistical properties, and are readily analyzed using algebraic techniques. Unfortunately, the output sequences of LFSRs are also easily predictable, as the following argument shows. Suppose that the output sequence s of an LFSR has linear complexity L. The connection polynomial C(D) of an LFSR of length L which generates s can be efficiently determined using the Berlekamp-Massey algorithm Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 204 Ch. 6 Stream Ciphers (Algorithm 6.30) from any (short) subsequence t of s having length at least n = 2L (cf Fact 6.35)

Having determined C(D), the LFSR hL, C(D)i can then be initialized with any substring of t having length L, and used to generate the remainder of the sequence s. An adversary may obtain the required subsequence t of s by mounting a known or chosenplaintext attack (§1.131) on the stream cipher: if the adversary knows the plaintext subsequence m1 , m2 , , mn corresponding to a ciphertext sequence c1 , c2 , , cn , the corresponding keystream bits are obtained as mi ⊕ci , 1 ≤ i ≤ n 6.44 Note (use of LFSRs in keystream generators) Since a well-designed system should be secure against known-plaintext attacks, an LFSR should never be used by itself as a keystream generator. Nevertheless, LFSRs are desirable because of their very low implementation costs. Three general methodologies for destroying the linearity properties of LFSRs are discussed in this section: (i) using a nonlinear combining function on the outputs of several LFSRs (§6.31); (ii) using a nonlinear filtering

function on the contents of a single LFSR (§6.32); and (iii) using the output of one (or more) LFSRs to control the clock of one (or more) other LFSRs (§6.33) Desirable properties of LFSR-based keystream generators For essentially all possible secret keys, the output sequence of an LFSR-based keystream generator should have the following properties: 1. large period; 2. large linear complexity; and 3. good statistical properties (eg, as described in Fact 614) It is emphasized that these properties are only necessary conditions for a keystream generator to be considered cryptographically secure. Since mathematical proofs of security of such generators are not known, such generators can only be deemed computationally secure (§1.133(iv)) after having withstood sufficient public scrutiny 6.45 Note (connection polynomial) Since a desirable property of a keystream generator is that its output sequences have large periods, component LFSRs should always be chosen to be maximum-length LFSRs,

i.e, the LFSRs should be of the form hL, C(D)i where C(D) ∈ Z2 [D] is a primitive polynomial of degree L (see Definition 6.13 and Fact 612(ii)) 6.46 Note (known vs secret connection polynomial) The LFSRs in an LFSR-based keystream generator may have known or secret connection polynomials. For known connections, the secret key generally consists of the initial contents of the component LFSRs. For secret connections, the secret key for the keystream generator generally consists of both the initial contents and the connections. For LFSRs of length L with secret connections, the connection polynomials should be selected uniformly at random from the set of all primitive polynomials of degree L over Z2 . Secret connections are generally recommended over known connections as the former are more resistant to certain attacks which use precomputation for analyzing the particular connection, and because the former are more amenable to statistical analysis. Secret connection LFSRs have the

drawback of requiring extra circuitry to implement in hardware. However, because of the extra security possible with secret connections, this cost may sometimes be compensated for by choosing shorter LFSRs. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.3 Stream ciphers based on LFSRs 205 6.47 Note (sparse vs dense connection polynomial) For implementation purposes, it is advantageous to choose an LFSR that is sparse; ie, only a few of the coefficients of the connection polynomial are non-zero Then only a small number of connections must be made between the stages of the LFSR in order to compute the feedback bit. For example, the connection polynomial might be chosen to be a primitive trinomial (cf Table 48) However, in some LFSR-based keystream generators, special attacks can be mounted if sparse connection polynomials are used. Hence, it is generally recommended not to use sparse connection polynomials in LFSR-based keystream generators. 6.31

Nonlinear combination generators One general technique for destroying the linearity inherent in LFSRs is to use several LFSRs in parallel. The keystream is generated as a nonlinear function f of the outputs of the component LFSRs; this construction is illustrated in Figure 6.8 Such keystream generators are called nonlinear combination generators, and f is called the combining function. The remainder of this subsection demonstrates that the function f must satisfy several criteria in order to withstand certain particular cryptographic attacks. LFSR 1 LFSR 2 f keystream LFSR n Figure 6.8: A nonlinear combination generator f is a nonlinear combining function 6.48 Definition A product of m distinct variables is called an mth order product of the variables Every Boolean function f (x1 , x2 , , xn ) can be written as a modulo 2 sum of distinct mth order products of its variables, 0 ≤ m ≤ n; this expression is called the algebraic normal form of f . The nonlinear order of f is

the maximum of the order of the terms appearing in its algebraic normal form For example, the Boolean function f (x1 , x2 , x3 , x4 , x5 ) = 1 ⊕ x2 ⊕ x3 ⊕ x4 x5 ⊕ x1 x3 x4 x5 has nonlinear order 4. Note that the maximum possible nonlinear order of a Boolean function in n variables is n. Fact 649 demonstrates that the output sequence of a nonlinear combination generator has high linear complexity, provided that a combining function f of high nonlinear order is employed. 6.49 Fact Suppose that n maximum-length LFSRs, whose lengths L1 , L2 , , Ln are pairwise distinct and greater than 2, are combined by a nonlinear function f (x1 , x2 , . , xn ) (as in Figure 6.8) which is expressed in algebraic normal form Then the linear complexity of the keystream is f (L1 , L2 , . , Ln ) (The expression f (L1 , L2 , , Ln ) is evaluated over the integers rather than over Z2 .) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 206 Ch. 6 Stream Ciphers

6.50 Example (Geffe generator) The Geffe generator, as depicted in Figure 69, is defined by three maximum-length LFSRs whose lengths L1 , L2 , L3 are pairwise relatively prime, with nonlinear combining function f (x1 , x2 , x3 ) = x1 x2 ⊕ (1 + x2 )x3 = x1 x2 ⊕ x2 x3 ⊕ x3 . The keystream generated has period (2L1 − 1) · (2L2 − 1) · (2L3 − 1) and linear complexity L = L1 L2 + L2 L3 + L3 . LFSR 1 LFSR 2 LFSR 3 x1 x2 keystream x3 Figure 6.9: The Geffe generator The Geffe generator is cryptographically weak because information about the states of LFSR 1 and LFSR 3 leaks into the output sequence. To see this, let x1 (t), x2 (t), x3 (t), z(t) denote the tth output bits of LFSRs 1, 2, 3 and the keystream, respectively. Then the correlation probability of the sequence x1 (t) to the output sequence z(t) is P (z(t) = x1 (t)) = P (x2 (t) = 1) + P (x2 (t) = 0) · P (x3 (t) = x1 (t)) 1 1 1 3 = + · = . 2 2 2 4 Similarly, P (z(t) = x3 (t)) = 34 . For this reason, despite

having high period and moderately high linear complexity, the Geffe generator succumbs to correlation attacks, as described in Note 651  6.51 Note (correlation attacks) Suppose that n maximum-length LFSRs R1 , R2 , , Rn of lengths L1 , L2 , . , Ln are employed in a nonlinear combination generator If the connection polynomials of the LFSRs and the combining Qn function f are public knowledge, then the number of different keys of the generator is i=1 (2Li − 1). (A key consists of the initial states of the LFSRs) Suppose that there is a correlation between the keystream and the output sequence of R1 , with correlation probability p > 12 . If a sufficiently long segment of the keystream is known (eg, as is possible under a known-plaintext attack on a binary additive stream cipher), the initial state of R1 can be deduced by counting the number of coincidences between the keystream and all possible shifts of the output sequence of R1 , until this number agrees with the correlation

probability p. Under these conditions, finding the initial state of R1 will take at most 2L1 − 1 trials. In the case where there is a correlation between the keystream and the output sequences of each of R1 , R2 , . , Rn , the initial state of each LFSR can be determined independently in a total of about Pn(secret) Li (2 − 1) trials; this number is far smaller than the total number of different keys. i=1 In a similar manner, correlations between the output sequences of particular subsets of the LFSRs and the keystream can be exploited. In view of Note 6.51, the combining function f should be carefully selected so that there is no statistical dependence between any small subset of the n LFSR sequences and c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.3 Stream ciphers based on LFSRs 207 the keystream. This condition can be satisfied if f is chosen to be mth -order correlation immune. 6.52 Definition Let X1 , X2 , , Xn be independent binary

variables, each taking on the values 0 or 1 with probability 12 A Boolean function f (x1 , x2 , , xn ) is mth -order correlation immune if for each subset of m random variables Xi1 , Xi2 , , Xim with 1 ≤ i1 < i2 < · · · < im ≤ n, the random variable Z = f (X1 , X2 , . , Xn ) is statistically independent of the random vector (Xi1 , Xi2 , , Xim ); equivalently, I(Z; Xi1 , Xi2 , , Xim ) = 0 (see Definition 2.45) For example, the function f (x1 , x2 , . , xn ) = x1 ⊕ x2 ⊕ · · · ⊕ xn is (n − 1)th order correlation immune In light of Fact 649, the following shows that there is a tradeoff between achieving high linear complexity and high correlation immunity with a combining function. 6.53 Fact If a Boolean function f (x1 , x2 , , xn ) is mth -order correlation immune, where 1 ≤ m < n, then the nonlinear order of f is at most n − m. Moreover, if f is balanced (ie, exactly half of the output values of f are 0) then the nonlinear order of f

is at most n−m−1 for 1 ≤ m ≤ n − 2. The tradeoff between high linear complexity and high correlation immunity can be avoided by permitting memory in the nonlinear combination function f . This point is illustrated by the summation generator 6.54 Example (summation generator) The combining function in the summation generator is based on the fact that integer addition, when viewed over Z2 , is a nonlinear function with memory whose correlation immunity is maximum. To see this in the case n = 2, let a = am−1 2m−1 +· · ·+a1 2+a0 and b = bm−1 2m−1 +· · ·+b1 2+b0 be the binary representations of integers a and b. Then the bits of z = a + b are given by the recursive formula: zj cj = f1 (aj , bj , cj−1 ) = aj ⊕ bj ⊕ cj−1 0 ≤ j ≤ m, = f2 (aj , bj , cj−1 ) = aj bj ⊕ (aj ⊕ bj )cj−1 , 0 ≤ j ≤ m − 1, where cj is the carry bit, and c−1 = am = bm = 0. Note that f1 is 2nd -order correlation immune, while f2 is a memoryless nonlinear function The

carry bit cj−1 carries all the nonlinear influence of less significant bits of a and b (namely, aj−1 , . , a1 , a0 and bj−1 , . , b1 , b0 ) The summation generator, as depicted in Figure 6.10, is defined by n maximum-length LFSRs whose lengths L1 , L2 , . , Ln are pairwise relatively prime The secret key conCarry LFSR 1 LFSR 2 LFSR n x1 x2 xn keystream Figure 6.10: The summation generator Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 208 Ch. 6 Stream Ciphers sists of the initial states of the LFSRs, and an initial (integer) carry C0 . The keystream is generated as follows. At time j (j ≥ 1),Pthe LFSRs are stepped producing output bits x1 , x2 , . , xn , and the integer sum Sj = ni=1 xi + Cj−1 is computed The keystream bit is Sj mod 2 (the least significant bit of Sj ), while the new carry Q is computed as Cj = bSj /2c (the remaining bits of Sj ). The period of the keystream is ni=1 (2Li − 1), while its linear complexity is

close to this number. Even though the summation generator has high period, linear complexity, and correlation immunity, it is vulnerable to certain correlation attacks and a known-plaintext attack based on its 2-adic span (see page 218).  6.32 Nonlinear filter generators Another general technique for destroying the linearity inherent in LFSRs is to generate the keystream as some nonlinear function of the stages of a single LFSR; this construction is illustrated in Figure 6.11 Such keystream generators are called nonlinear filter generators, and f is called the filtering function. sj c1 c2 cL−1 Stage L-1 Stage L-2 Stage 1 cL Stage 0 f keystream Figure 6.11: A nonlinear filter generator f is a nonlinear Boolean filtering function Fact 6.55 describes the linear complexity of the output sequence of a nonlinear filter generator. 6.55 Fact Suppose that a nonlinear filter generator is constructed using a maximum-length LFSR of length L and a filtering function f of nonlinear

order m (as in Figure 6.11)  P L (i) (Key’s bound) The linear complexity of the keystream is at most Lm = m i=1 i . (ii) For a fixed maximum-length LFSR of prime length L, the fraction of Boolean functions f of nonlinear order m which produce sequences of maximum linear complexity Lm is Pm ≈ exp(−Lm /(L · 2L )) > e−1/L . Therefore, for large L, most of the generators produce sequences whose linear complexity meets the upper bound in (i). The nonlinear function f selected for a filter generator should include many terms of each order up to the nonlinear order of f . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.3 Stream ciphers based on LFSRs 209 6.56 Example (knapsack generator) The knapsack keystream generator is defined by a maximum-length LFSR hL, C(D)i and a modulus Q = 2L The secret key consists of L knapsack integer weights a1 , a2 , . , aL each of bitlength L, and the initial state of the LFSR Recall that the subset sum problem

(§310) is to determine a subset of the knapsack weights which add up to a given integer s, provided that such a subset exists; this problem is NPhard (Fact 3.91) The keystream PL is generated as follows: at time j, the LFSR is stepped and the knapsack sum Sj = i=1 xi ai mod Q is computed, where [xL , . , x2 , x1 ] is the state of the LFSR at time j. Finally, selected bits of Sj (after Sj is converted to its binary representation) are extracted to form part of the keystream (the dlg Le least significant bits of Sj should be discarded). The linear complexity of the keystream is then virtually certain to be L(2L − 1). Since the state of an LFSR is a binary vector, the function which maps the LFSR state to the knapsack sum Sj is indeed nonlinear. Explicitly, let the function f be defined by PL f (x) = i=1 xi ai mod Q, where x = [xL , . , x2 , x1 ] is a state If x and y are two states then, in general, f (x ⊕ y) 6= f (x) + f (y).  6.33 Clock-controlled generators In nonlinear

combination generators and nonlinear filter generators, the component LFSRs are clocked regularly; i.e, the movement of data in all the LFSRs is controlled by the same clock. The main idea behind a clock-controlled generator is to introduce nonlinearity into LFSR-based keystream generators by having the output of one LFSR control the clocking (i.e, stepping) of a second LFSR Since the second LFSR is clocked in an irregular manner, the hope is that attacks based on the regular motion of LFSRs can be foiled. Two clockcontrolled generators are described in this subsection: (i) the alternating step generator and (ii) the shrinking generator. (i) The alternating step generator The alternating step generator uses an LFSR R1 to control the stepping of two LFSRs, R2 and R3 . The keystream produced is the XOR of the output sequences of R2 and R3 6.57 Algorithm Alternating step generator SUMMARY: a control LFSR R1 is used to selectively step two other LFSRs, R2 and R3 . OUTPUT: a sequence which

is the bitwise XOR of the output sequences of R2 and R3 . The following steps are repeated until a keystream of desired length is produced. 1. Register R1 is clocked 2. If the output of R1 is 1 then: R2 is clocked; R3 is not clocked but its previous output bit is repeated. (For the first clock cycle, the “previous output bit” of R3 is taken to be 0.) 3. If the output of R1 is 0 then: R3 is clocked; R2 is not clocked but its previous output bit is repeated. (For the first clock cycle, the “previous output bit” of R2 is taken to be 0.) 4. The output bits of R2 and R3 are XORed; the resulting bit is part of the keystream More formally, let the output sequences of LFSRs R1 , R2 , and R3 be a0 , a1 , a2 , . , b0 , b1 , b2 , . , and c0 , c1 , c2 , respectively Define b−1 = c−1 = 0 Then the keystream produced by the alternating step generator is x0 , x1 , x2 , . , where xj = bt(j) ⊕ cj−t(j)−1 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S

Vanstone 210 Ch. 6 Stream Ciphers Pj and t(j) = ( i=0 ai ) − 1 for all j ≥ 0. The alternating step generator is depicted in Figure 6.12 LFSR R2 clock LFSR R1 output LFSR R3 Figure 6.12: The alternating step generator 6.58 Example (alternating step generator with artificially small parameters) Consider an alternating step generator with component LFSRs R1 = h3, 1 + D2 + D3 i, R2 = h4, 1 + D3 + D4 i, and R3 = h5, 1 + D + D3 + D4 + D5 i. Suppose that the initial states of R1 , R2 , and R3 are [0, 0, 1], [1, 0, 1, 1], and [0, 1, 0, 0, 1], respectively. The output sequence of R1 is the 7-periodic sequence with cycle a7 = 1, 0, 0, 1, 0, 1, 1. The output sequence of R2 is the 15-periodic sequence with cycle b15 = 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0. The output sequence of R3 is the 31-periodic sequence with cycle c31 = 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0. The keystream generated is x = 1, 0, 1, 1, 1, 0, 1, 0, 1,

0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, .  Fact 6.59 establishes, under the assumption that R1 produces a de Bruijn sequence (see Definition 6.40), that the output sequence of an alternating step generator satisfies the basic requirements of high period, high linear complexity, and good statistical properties. 6.59 Fact (properties of the alternating step generator) Suppose that R1 produces a de Bruijn sequence of period 2L1 . Furthermore, suppose that R2 and R3 are maximum-length LFSRs of lengths L2 and L3 , respectively, such that gcd(L2 , L3 ) = 1. Let x be the output sequence of the alternating step generator formed by R1 , R2 , and R3 . (i) The sequence x has period 2L1 · (2L2 − 1) · (2L3 − 1). (ii) The linear complexity L(x) of x satisfies (L2 + L3 ) · 2L1 −1 < L(x) ≤ (L2 + L3 ) · 2L1 . (iii) The distribution of patterns in x is almost uniform. More precisely, let P be any binary string of length t bits, where t ≤ min(L2 , L3 ) If

x(t) denotes any t consecutive t bits in x, then the probability that x(t) = P is 12 + O(1/2L2 −t ) + O(1/2L3 −t ). Since a de Bruijn sequence can be obtained from the output sequence s of a maximumlength LFSR (of length L) by simply adding a 0 to the end of each subsequence of L − 1 0’s occurring in s (see Note 6.43), it is reasonable to expect that the assertions of high period, c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.3 Stream ciphers based on LFSRs 211 high linear complexity, and good statistical properties in Fact 6.59 also hold when R1 is a maximum-length LFSR. Note, however, that this has not yet been proven 6.60 Note (security of the alternating step generator) The LFSRs R1 , R2 , R3 should be chosen to be maximum-length LFSRs whose lengths L1 , L2 , L3 are pairwise relatively prime: gcd(L1 , L2 ) = 1, gcd(L2 , L3 ) = 1, gcd(L1 , L3 ) = 1. Moreover, the lengths should be about the same. If L1 ≈ l, L2 ≈ l, and L3 ≈ l, the

best known attack on the alternating step generator is a divide-and-conquer attack on the control register R1 which takes approximately 2l steps. Thus, if l ≈ 128, the generator is secure against all presently known attacks. (ii) The shrinking generator The shrinking generator is a relatively new keystream generator, having been proposed in 1993. Nevertheless, due to its simplicity and provable properties, it is a promising candidate for high-speed encryption applications In the shrinking generator, a control LFSR R1 is used to select a portion of the output sequence of a second LFSR R2 . The keystream produced is, therefore, a shrunken version (also known as an irregularly decimated subsequence) of the output sequence of R2 , as specified in Algorithm 6.61 and depicted in Figure 613 6.61 Algorithm Shrinking generator SUMMARY: a control LFSR R1 is used to control the output of a second LFSR R2 . The following steps are repeated until a keystream of desired length is produced. 1.

Registers R1 and R2 are clocked 2. If the output of R1 is 1, the output bit of R2 forms part of the keystream 3. If the output of R1 is 0, the output bit of R2 is discarded More formally, let the output sequences of LFSRs R1 and R2 be a0 , a1 , a2 , . and b0 , b1 , b2 , . , respectively Then the keystream produced by the shrinking generator is x0 , x1 , x2 , . , where xj = bij , and, for each j ≥ 0, ij is the position of the j th 1 in the sequence a0 , a1 , a2 , . LFSR R1 ai clock LFSR R2 bi ai = 1 ai = 0 output bi discard bi Figure 6.13: The shrinking generator 6.62 Example (shrinking generator with artificially small parameters) Consider a shrinking generator with component LFSRs R1 = h3, 1 + D + D3 i and R2 = h5, 1 + D3 + D5 i. Suppose that the initial states of R1 and R2 are [1, 0, 0] and [0, 0, 1, 0, 1], respectively. The output sequence of R1 is the 7-periodic sequence with cycle a7 = 0, 0, 1, 1, 1, 0, 1, Handbook of Applied Cryptography by A. Menezes, P van

Oorschot and S Vanstone 212 Ch. 6 Stream Ciphers while the output sequence of R2 is the 31-periodic sequence with cycle b31 = 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0. The keystream generated is x = 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, .  Fact 6.63 establishes that the output sequence of a shrinking generator satisfies the basic requirements of high period, high linear complexity, and good statistical properties. 6.63 Fact (properties of the shrinking generator) Let R1 and R2 be maximum-length LFSRs of lengths L1 and L2 , respectively, and let x be an output sequence of the shrinking generator formed by R1 and R2 . (i) If gcd(L1 , L2 ) = 1, then x has period (2L2 − 1) · 2L1 −1 . (ii) The linear complexity L(x) of x satisfies L2 · 2L1 −2 < L(x) ≤ L2 · 2L1 −1 . (iii) Suppose that the connection polynomials for R1 and R2 are chosen uniformly at random from the set of all primitive polynomials of

degrees L1 and L2 over Z2 . Then the distribution of patterns in x is almost uniform. More precisely, if P is any binary string of length t bits and x(t) denotes any t consecutive bits in x, then the probability that x(t) = P is ( 12 )t + O(t/2L2 ). 6.64 Note (security of the shrinking generator) Suppose that the component LFSRs R1 and R2 of the shrinking generator have lengths L1 and L2 , respectively. If the connection polynomials for R1 and R2 are known (but not the initial contents of R1 and R2 ), the best attack known for recovering the secret key takes O(2L1 · L32 ) steps. On the other hand, if secret (and variable) connection polynomials are used, the best attack known takes O(22L1 · L1 · L2 ) steps. There is also an attack through the linear complexity of the shrinking generator which takes O(2L1 · L22 ) steps (regardless of whether the connections are known or secret), but this attack requires 2L1 · L2 consecutive bits from the output sequence and is, therefore,

infeasible for moderately large L1 and L2 . For maximum security, R1 and R2 should be maximum-length LFSRs, and their lengths should satisfy gcd(L1 , L2 ) = 1. Moreover, secret connections should be used Subject to these constraints, if L1 ≈ l and L2 ≈ l, the shrinking generator has a security level approximately equal to 22l . Thus, if L1 ≈ 64 and L2 ≈ 64, the generator appears to be secure against all presently known attacks. 6.4 Other stream ciphers While the LFSR-based stream ciphers discussed in §6.3 are well-suited to hardware implementation, they are not especially amenable to software implementation This has led to several recent proposals for stream ciphers designed particularly for fast software implementation. Most of these proposals are either proprietary, or are relatively new and have not received sufficient scrutiny from the cryptographic community; for this reason, they are not presented in this section, and instead only mentioned in the chapter notes on page

222. Two promising stream ciphers specifically designed for fast software implementation are SEAL and RC4. SEAL is presented in §641 RC4 is used in commercial products, and has a variable key-size, but it remains proprietary and is not presented here. Two c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.4 Other stream ciphers 213 other widely used stream ciphers not based on LFSRs are the Output Feedback (OFB; see §7.22(iv)) and Cipher Feedback (CFB; see §722(iii)) modes of block ciphers Another class of keystream generators not based on LFSRs are those whose security relies on the intractability of an underlying number-theoretic problem; these generators are much slower than those based on LFSRs and are discussed in §5.5 6.41 SEAL SEAL (Software-optimized Encryption Algorithm) is a binary additive stream cipher (see Definition 6.4) that was proposed in 1993 Since it is relatively new, it has not yet received much scrutiny from the cryptographic

community. However, it is presented here because it is one of the few stream ciphers that was specifically designed for efficient software implementation and, in particular, for 32-bit processors. SEAL is a length-increasing pseudorandom function which maps a 32-bit sequence number n to an L-bit keystream under control of a 160-bit secret key a. In the preprocessing stage (step 1 of Algorithm 668), the key is stretched into larger tables using the tablegeneration function Ga specified in Algorithm 667; this function is based on the Secure Hash Algorithm SHA-1 (Algorithm 9.53) Subsequent to this preprocessing, keystream generation requires about 5 machine instructions per byte, and is an order of magnitude faster than DES (Algorithm 7.82) The following notation is used in SEAL for 32-bit quantities A, B, C, D, Xi , and Yj : • A: bitwise complement of A • A ∧ B, A ∨ B, A⊕B: bitwise AND, inclusive-OR, exclusive-OR • “A ←- s”: 32-bit result of rotating A left through s

positions • “A , s”: 32-bit result of rotating A right through s positions • A + B: mod 232 sum of the unsigned integers A and B def def • f (B, C, D) = (B ∧ C) ∨ (B ∧ D); g(B, C, D) = (B ∧ C) ∨ (B ∧ D) ∨ (C ∧ D); def h(B, C, D) = B⊕C⊕D • AkB: concatenation of A and B • (X1 , . , Xj )←(Y1 , , Yj ): simultaneous assignments (Xi ←Yi ), where (Y1 , . , Yj ) is evaluated prior to any assignments 6.65 Note (SEAL 10 vs SEAL 20) The table-generation function (Algorithm 667) for the first version of SEAL (SEAL 1.0) was based on the Secure Hash Algorithm (SHA) SEAL 20 differs from SEAL 1.0 in that the table-generation function for the former is based on the modified Secure Hash Algorithm SHA-1 (Algorithm 9.53) 6.66 Note (tables) The table generation (step 1 of Algorithm 668) uses the compression function of SHA-1 to expand the secret key a into larger tables T , S, and R These tables can be precomputed, but only after the secret key a has been

established. Tables T and S are 2K bytes and 1K byte in size, respectively. The size of table R depends on the desired bitlength L of the keystream each 1K byte of keystream requires 16 bytes of R. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 214 Ch. 6 Stream Ciphers 6.67 Algorithm Table-generation function for SEAL 20 Ga (i) INPUT: a 160-bit string a and an integer i, 0 ≤ i < 232 . OUTPUT: a 160-bit string, denoted Ga (i). 1. Definition of constants Define four 32-bit constants (in hex): y1 = 0x5a827999, y2 = 0x6ed9eba1, y3 = 0x8f1bbcdc, y4 = 0xca62c1d6. 2. Table-generation function (initialize 80 32-bit words X0 , X1 , . , X79 ) Set X0 ← i. For j from 1 to 15 do: Xj ← 0x00000000 For j from 16 to 79 do: Xj ← ((Xj−3 ⊕Xj−8 ⊕Xj−14 ⊕Xj−16 ) ←- 1). (initialize working variables) Break up the 160-bit string a into five 32-bit words: a = H0 H1 H2 H3 H4 . (A, B, C, D, E) ← (H0 , H1 , H2 , H3 , H4 ). (execute four rounds

of 20 steps, then update; t is a temporary variable) (Round 1) For j from 0 to 19 do the following: t ← ((A ←- 5) + f (B, C, D) + E + Xj + y1 ), (A, B, C, D, E) ← (t, A, B ←- 30, C, D). (Round 2) For j from 20 to 39 do the following: t ← ((A ←- 5) + h(B, C, D) + E + Xj + y2 ), (A, B, C, D, E) ← (t, A, B ←- 30, C, D). (Round 3) For j from 40 to 59 do the following: t ← ((A ←- 5) + g(B, C, D) + E + Xj + y3 ), (A, B, C, D, E) ← (t, A, B ←- 30, C, D). (Round 4) For j from 60 to 79 do the following: t ← ((A ←- 5) + h(B, C, D) + E + Xj + y4 ), (A, B, C, D, E) ← (t, A, B ←- 30, C, D). (update chaining values) (H0 , H1 , H2 , H3 , H4 ) ← (H0 + A, H1 + B, H2 + C, H3 + D, H4 + E). (completion) The value of Ga (i) is the 160-bit string H0 kH1 kH2 kH3 kH4 . 6.68 Algorithm Keystream generator for SEAL 20 SEAL(a,n) INPUT: a 160-bit string a (the secret key), a (non-secret) integer n, 0 ≤ n < 232 (the sequence number), and the desired bitlength L of the

keystream. OUTPUT: keystream y of bitlength L0 , where L0 is the least multiple of 128 which is ≥ L. 1. Table generation Generate the tables T , S, and R, whose entries are 32-bit words i The function F used below is defined by Fa (i) = Himod5 , where H0i H1i H2i H3i H4i = Ga (bi/5c), and where the function Ga is defined in Algorithm 6.67 1.1 For i from 0 to 511 do the following: T [i]←Fa (i) 1.2 For j from 0 to 255 do the following: S[j]←Fa (0x00001000 + j) 1.3 For k from 0 to 4 · d(L − 1)/8192e − 1 do: R[k]←Fa (0x00002000 + k) 2. Initialization procedure The following is a description of the subroutine Initialize(n, l, A, B, C, D, n1, n2 , n3 , n4 ) which takes as input a 32-bit word n and an integer l, and outputs eight 32-bit words A, B, C, D, n1 , n2 , n3 , and n4 . This subroutine is used in step 4. A←n⊕R[4l], B←(n , 8)⊕R[4l + 1], C←(n , 16)⊕R[4l + 2], D←(n , 24)⊕R[4l + 3]. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter

§6.4 Other stream ciphers 215 For j from 1 to 2 do the following: P ←A∧0x000007fc, B←B + T [P/4], A←(A , 9), P ←B∧0x000007fc, C←C + T [P/4], B←(B , 9), P ←C∧0x000007fc, D←D + T [P/4], C←(C , 9), P ←D∧0x000007fc, A←A + T [P/4], D←(D , 9). (n1 , n2 , n3 , n4 )←(D, B, A, C). P ←A∧0x000007fc, B←B + T [P/4], A←(A , 9). P ←B∧0x000007fc, C←C + T [P/4], B←(B , 9). P ←C∧0x000007fc, D←D + T [P/4], C←(C , 9). P ←D∧0x000007fc, A←A + T [P/4], D←(D , 9). 3. Initialize y to be the empty string, and l←0 4. Repeat the following: 4.1 Execute the procedure Initialize(n, l, A, B, C, D, n1, n2 , n3 , n4 ) 4.2 For i from 1 to 64 do the following: P ←A∧ 0x000007fc, B←B + T [P/4], A←(A , 9), B←B⊕A, Q←B∧0x000007fc, C←C⊕T [Q/4], B←(B , 9), C←C + B, P ←(P + C)∧0x000007fc, D←D + T [P/4], C←(C , 9), D←D⊕C, Q←(Q + D)∧0x000007fc, A←A⊕T [Q/4], D←(D , 9), A←A + D, P ←(P + A)∧0x000007fc, B←B⊕T

[P/4], A←(A , 9), Q←(Q + B)∧0x000007fc, C←C + T [Q/4], B←(B , 9), P ←(P + C)∧0x000007fc, D←D⊕T [P/4], C←(C , 9), Q←(Q + D)∧0x000007fc, A←A + T [Q/4], D←(D , 9), y←y k (B + S[4i − 4]) k (C⊕S[4i − 3]) k (D + S[4i − 2]) k (A⊕S[4i − 1]). If y is ≥ L bits in length then return(y) and stop. If i is odd, set (A, C)←(A+n1 , C+n2 ). Otherwise, (A, C)←(A+n3 , C+n4 ) 4.3 Set l←l + 1 6.69 Note (choice of parameter L) In most applications of SEAL 20 it is expected that L ≤ 219 ; larger values of L are permissible, but come at the expense of a larger table R. A preferred method for generating a longer keystream without requiring a larger table R is to compute the concatenation of the keystreams SEAL(a,0), SEAL(a,1), SEAL(a,2),. Since the sequence number is n < 232 , a keystream of length up to 251 bits can be obtained in this manner with L = 219 . 6.70 Example (test vectors for SEAL 20) Suppose the key a is the 160-bit (hexadecimal) string

67452301 efcdab89 98badcfe 10325476 c3d2e1f0, n = 0x013577af, and L = 32768 bits. Table R consists of words R[0], R[1], , R[15]: 5021758d ce577c11 fa5bd5dd 366d1b93 182cff72 ac06d7c6 2683ead8 fabe3573 82a10c96 48c483bd ca92285c 71fe84c0 bd76b700 6fdcc20c 8dada151 4506dd64 The table T consists of words T [0], T [1], . , R[511]: 92b404e5 b863f14e . 3af3a4bf 56588ced 2b014a2f . 021e4080 6c1acd4e 4407e646 . 2a677d95 bf053f68 38665610 . 405c7db0 09f73a93 222d2f91 . 338e4b1e cd5f176a 4d941a21 . 19ccf158 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 216 Ch. 6 Stream Ciphers The table S consists of words S[0], S[1], . , S[255]: 907c1e3d 4fde0efa . bd7dea87 ce71ef0a 1a845f94 . fd036d87 48f559ef 38512c3b . 53aa3013 2b7ab8bc d4b44591 . ec60e282 4557f4b8 53765dce . 1eaef8f9 033e9b05 469efa02 . 0b5a0949 The output y of Algorithm 6.68 consists of 1024 words y[0], y[1], , y[1023]: 37a00595 da6666d0 . 547dfde9 9b84c49c 6da71328 . 668d50b5

a4be1e05 1419bdf2 . ba9e2567 0673530f d258bebb . 413403c5 0ac8389d b6a42a4d . 43120b5a c5878ec8 8a311a72 . ecf9d062 The XOR of the 1024 words of y is 0x098045fc.  6.5 Notes and further references §6.1 Although now dated, Rueppel [1075] provides a solid introduction to the analysis and design of stream ciphers. For an updated and more comprehensive survey, see Rueppel [1081]. Another recommended survey is that of Robshaw [1063] The concept of unconditional security was introduced in the seminal paper by Shannon [1120]. Maurer [819] surveys the role of information theory in cryptography and, in particular, secrecy, authentication, and secret sharing schemes Maurer [811] devised a randomized stream cipher that is unconditionally secure “with high probability” More precisely, an adversary is unable to obtain any information whatsoever about the plaintext with probability arbitrarily close to 1, unless the adversary can perform an infeasible computation. The cipher utilizes a

publicly-accessible source of random bits whose length is much greater than that of all the plaintext to be encrypted, and can conceivably be made practical. Maurer’s cipher is based on the impractical Rip van Winkle cipher of Massey and Ingermarsson [789], which is described by Rueppel [1081]. One technique for solving the re-synchronization problem with synchronous stream ciphers is to have the receiver send a resynchronization request to the sender, whereby a new internal state is computed as a (public) function of the original internal state (or key) and some public information (such as the time at the moment of the request). Daemen, Govaerts, and Vandewalle [291] showed that this approach can result in a total loss of security for some published stream cipher proposals. Proctor [1011] considered the trade-off between the security and error propagation problems that arise by varying the number of feedback ciphertext digits. Maurer [808] presented various design approaches for

self-synchronizing stream ciphers that are potentially superior to designs based on block ciphers, both with respect to encryption speed and security. §6.2 An excellent introduction to the theory of both linear and nonlinear shift registers is the book by Golomb [498]; see also Selmer [1107], Chapters 5 and 6 of Beker and Piper [84], and Chapter 8 of Lidl and Niederreiter [764]. A lucid treatment of m-sequences can be found in Chapter 10 of McEliece [830]. While the discussion in this chapter has been restricted to sequences and feedback shift registers over the binary field Z2 , many of the results presented can be generalized to sequences and feedback shift registers over any finite field Fq . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.5 Notes and further references 217 The results on the expected linear complexity and linear complexity profile of random sequences (Facts 6.21, 622, 624, and 625) are from Chapter 4 of Rueppel [1075]; they also

appear in Rueppel [1077]. Dai and Yang [294] extended Fact 622 and obtained bounds for the expected linear complexity of an n-periodic sequence for each possible value of n. The bounds imply that the expected linear complexity of a random periodic sequence is close to the period of the sequence. The linear complexity profile of the sequence defined in Example 6.27 was established by Dai [293] For further theoretical analysis of the linear complexity profile, consult the work of Niederreiter [927, 928, 929, 930]. Facts 6.29 and 634 are due to Massey [784] The Berlekamp-Massey algorithm (Algorithm 630) is due to Massey [784], and is based on an earlier algorithm of Berlekamp [118] for decoding BCH codes. While the algorithm in §623 is only described for binary sequences, it can be generalized to find the linear complexity of sequences over any field Further discussion and refinements of the Berlekamp-Massey algorithm are given by Blahut [144]. There are numerous other algorithms for

computing the linear complexity of a sequence For example, Games and Chan [439] and Robshaw [1062] present efficient algorithms for determining the linear complexity of binary sequences of period 2n ; these algorithms have limited practical use since they require an entire cycle of the sequence Jansen and Boekee [632] defined the maximum order complexity of a sequence to be the length of the shortest (not necessarily linear) feedback shift register (FSR) that can generate the sequence. The expected maximum order complexity of a random binary sequence of length n is approximately 2 lg n. An efficient linear-time algorithm for computing this complexity measure was also presented; see also Jansen and Boekee [631]. Another complexity measure, the Ziv-Lempel complexity measure, was proposed by Ziv and Lempel [1273]. This measure quantifies the rate at which new patterns appear in a sequence Mund [912] used a heuristic argument to derive the expected Ziv-Lempel complexity of a random binary

sequence of a given length. For a detailed study of the relative strengths and weaknesses of the linear, maximum order, and Ziv-Lempel complexity measures, see Erdmann [372]. Kolmogorov [704] and Chaitin [236] introduced the notion of so-called Turing-Kolmogorov -Chaitin complexity, which measures the minimum size of the input to a fixed universal Turing machine which can generate a given sequence; see also Martin-Löf [783]. While this complexity measure is of theoretical interest, there is no algorithm known for computing it and, hence, it has no apparent practical significance. Beth and Dai [124] have shown that the Turing-Kolmogorov-Chaitin complexity is approximately twice the linear complexity for most sequences of sufficient length. Fact 6.39 is due to Golomb and Welch, and appears in the book of Golomb [498, p115] Lai [725] showed that Fact 6.39 is only true for the binary case, and established necessary and sufficient conditions for an FSR over a general finite field to be

nonsingular. Klapper and Goresky [677] introduced a new type of feedback register called a feedback with carry shift register (FCSR), which is equipped with auxiliary memory for storing the (integer) carry. An FCSR is similar to an LFSR (see Figure 64), except that the contents of the tapped stages of the shift register are added as integers to the current content of the memory to form a sum S. The least significant bit of S (ie, S mod 2) is then fed back into the first (leftmost) stage of the shift register, while the remaining higher order bits (i.e, bS/2c) are retained as the new value of the memory. If the FCSR has L stages, then the space required for the auxiliary memory is at most lg L bits. FCSRs can be conveniently analyzed using the algebra over the 2-adic numbers just as the algebra over finite fields is used to analyze LFSRs. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 218 Ch. 6 Stream Ciphers Any periodic binary sequence can be

generated by a FCSR. The 2-adic span of a periodic sequence is the number of stages and memory bits in the smallest FCSR that generates the sequence. Let s be a periodic sequence having a 2-adic span of T ; note that T is no more than the period of s. Klapper and Goresky [678] presented an efficient algorithm for finding an FCSR of length T which generates s, given 2T + 2dlg T e + 4 of the initial bits of s. A comprehensive treatment of FCSRs and the 2-adic span is given by Klapper and Goresky [676]. §6.3 Notes 6.46 and 647 on the selection of connection polynomials were essentially first pointed out by Meier and Staffelbach [834] and Chepyzhov and Smeets [256] in relation to fast correlation attacks on regularly clocked LFSRs. Similar observations were made by Coppersmith, Krawczyk, and Mansour [279] in connection with the shrinking generator. More generally, to withstand sophisticated correlation attacks (e.g, see Meier and Staffelbach [834]), the connection polynomials should not

have low-weight polynomial multiples whose degrees are not sufficiently large. Klapper [675] provides examples of binary sequences having high linear complexity, but whose linear complexity is low when considered as sequences (whose elements happen to be only 0 or 1) over a larger finite field. This demonstrates that high linear complexity (over Z2 ) by itself is inadequate for security. Fact 649 was proven by Rueppel and Staffelbach [1085]. The Geffe generator (Example 6.50) was proposed by Geffe [446] The Pless generator (Arrangement D of [978]) was another early proposal for a nonlinear combination generator, and uses four J-K flip-flops to combine the output of eight LFSRs. This generator also succumbs to a divide-and-conquer attack, as was demonstrated by Rubin [1074]. The linear syndrome attack of Zeng, Yang, and Rao [1265] is a known-plaintext attack on keystream generators, and is based on earlier work of Zeng and Huang [1263]. It is effective when the known keystream B can be

written in the form B = A⊕X, where A is the output sequence of an LFSR with known connection polynomial, and the sequence X is unknown but sparse in the sense that it contains more 0’s than 1’s. If the connection polynomials of the Geffe generator are all known to an adversary, and are primitive trinomials of degrees not exceeding n, then the initial states of the three component LFSRs (i.e, the secret key) can be efficiently recovered from a known keystream segment of length 37n bits. The correlation attack (Note 6.51) on nonlinear combination generators was first developed by Siegenthaler [1133], and estimates were given for the length of the observed keystream required for the attack to succeed with high probability. The importance of correlation immunity to nonlinear combining functions was pointed out by Siegenthaler [1132], who showed the tradeoff between high correlation immunity and high nonlinear order (Fact 6.53) Meier and Staffelbach [834] presented two new so-called

fast correlation attacks which are more efficient than Siegenthaler’s attack in the case where the component LFSRs have sparse feedback polynomials, or if they have low-weight polynomial multiples (e.g, each having fewer than 10 non-zero terms) of not too large a degree Further extensions and refinements of correlation attacks can be found in the papers of Mihaljević and Golić [874], Chepyzhov and Smeets [256], Golić and Mihaljević [491], Mihaljević and J. Golić [875], Mihaljević [873], Clark, Golić, and Dawson [262], and Penzhorn and Kühn [967]. A comprehensive survey of correlation attacks on LFSR-based stream ciphers is the paper by Golić [486]; the cases where the combining function is memoryless or with memory, as well as when the LFSRs are clocked regularly or irregularly, are all considered. The summation generator (Example 6.54) was proposed by Rueppel [1075, 1076] Meier c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.5 Notes

and further references 219 and Staffelbach [837] presented correlation attacks on combination generators having memory, cracked the summation generator having only two component LFSRs, and as a result recommended using several LFSRs of moderate lengths rather than just a few long LFSRs in the summation generator. As an example, if a summation generator employs two LFSRs each having length approximately 200, and if 50 000 keystream bits are known, then Meier and Staffelbach’s attack is expected to take less than 700 trials, where the dominant step in each trial involves solving a 400 × 400 system of binary linear equations. Dawson [312] presented another known-plaintext attack on summation generators having two component LFSRs, which requires fewer known keystream bits than Meier and Staffelbach’s attack. Dawson’s attack is only faster than that of Meier and Staffelbach in the case where both LFSRs are relatively short. Recently, Klapper and Goresky [678] showed that the

summation generator has comparatively low 2-adic span (see page 218) More precisely, if a and b are two sequences of 2-adic span λ2 (a) and λ2 (b), respectively, and if s is the result of combining them with the summation generator, then the 2-adic span of s is at most λ2 (a) + λ2 (b) + 2dlg(λ2 (a))e + 2dlg(λ2 (b))e + 6. For example, if m-sequences of period 2L − 1 for L = 7, 11, 13, 15, 16, 17 are combined with the summation generator, then the resulting sequence has linear complexity nearly 279 , but the 2-adic span is less than 218 . Hence, the summation generator is vulnerable to a known-plaintext attack when the component LFSRs are all relatively short. The probability distribution of the carry for addition of n random integers was analyzed by Staffelbach and Meier [1167]. It was proven that the carry is balanced for even n and biased for odd n. For n = 3 the carry is strongly biased, however, the bias converges to 0 as n tends to ∞. Golić [485] pointed out the

importance of the correlation between linear functions of the output and input in general combiners with memory, and introduced the so-called linear sequential circuit approximation method for finding such functions that produce correlated sequences. Golić [488] used this as a basis for developing a linear cryptanalysis technique for stream ciphers, and in the same paper proposed a stream cipher called GOAL, incorporating principles of modified truncated linear congruential generators (see page 187), selfclock-control, and randomly generated combiners with memory. Fact 6.55(i) is due to Key [670], while Fact 655(ii) was proven by Rueppel [1075] Massey and Serconek [794] gave an alternate proof of Key’s bound that is based on the Discrete Fourier Transform. Siegenthaler [1134] described a correlation attack on nonlinear filter generators. Forré [418] has applied fast correlation attacks to such generators Anderson [29] demonstrated other correlations which may be useful in

improving the success of correlation attacks. An attack called the inversion attack, proposed by Golić [490], may be more effective than Anderson’s attack. Golić also provides a list of design criteria for nonlinear filter generators Ding [349] introduced the notion of differential cryptanalysis for nonlinear filter generators where the LFSR is replaced by a simple counter having arbitrary period. The linear consistency attack of Zeng, Yang, and Rao [1264] is a known-plaintext attack on keystream generators which can discover key redundancies in various generators. It is effective in situations where it is possible to single out a certain portion k1 of the secret key k, and form a linear system of equations Ax = b where the matrix A is determined by k1 , and b is determined from the known keystream. The system of equations should have the property that it is consistent (and with high probability has a unique solution) if k1 is the true value of the subkey, while it is

inconsistent with high probability otherwise. In these circumstances, one can mount an exhaustive search for k1 , and subsequently mount a separate attack for the remaining bits of k. If the bitlengths of k1 and k are l1 and l, respectively, the attack demonstrates that the security level of the generator is 2l1 + 2l−l1 , rather than 2l . Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 220 Ch. 6 Stream Ciphers The multiplexer generator was proposed by Jennings [637]. Two maximum-length LFSRs having lengths L1 , L2 that are relatively prime are employed. Let h be a positive integer satisfying h ≤ min(L1 , lg L2 ). After each clock cycle, the contents of a fixed subset of h stages of the first LFSR are selected, and converted to an integer t in the interval [0, L2 − 1] using a 1 − 1 mapping θ. Finally, the content of stage t of the second LFSR is output as part of the keystream. Assuming that the connection polynomials of the LFSRs are known,

the linear consistency attack provides a known-plaintext attack on the multiplexer generator requiring a known keystream sequence of length N ≥ L1 + L2 2h and 2L1 +h linear consistency tests. This demonstrates that the choice of the mapping θ and the second LFSR do not contribute significantly to the security of the generator. The linear consistency attack has also been considered by Zeng, Yang, and Rao [1264] for the multispeed inner-product generator of Massey and Rueppel [793]. In this generator, two LFSRs of lengths L1 and L2 are clocked at different rates, and their contents combined at the lower clock rate by taking the inner-product of the min(L1 , L2 ) stages of the two LFSRs. The paper by Zeng et al [1266] is a readable survey describing the effectiveness of the linear consistency and linear syndrome attacks in cryptanalyzing stream ciphers. The knapsack generator (Example 6.56) was proposed by Rueppel and Massey [1084] and extensively analyzed by Rueppel [1075], however,

no concrete suggestions on selecting appropriate parameters (the length L of the LFSR and the knapsack weights) for the generator were given. No weaknesses of the knapsack generator have been reported in the literature The idea of using the output of a register to control the stepping of another register was used in several rotor machines during the second world war, for example, the German Lorenz SZ40 cipher. A description of this cipher, and also an extensive survey of clock-controlled shift registers, is provided by Gollmann and Chambers [496]. The alternating step generator (Algorithm 6.57) was proposed in 1987 by Günther [528], who also proved Fact 6.59 and described the divide-and-conquer attack mentioned in Note 6.60 The alternating step generator is based on the stop-and-go generator of Beth and Piper [126]. In the stop-and-go generator, a control register R1 is used to control the stepping of another register R2 as follows. If the output of R1 is 1, then R2 is clocked; if

the output of R1 is 0, then R2 is not clocked, however, its previous output is repeated. The output of R2 is then XORed with the output sequence of a third register R3 which is clocked at the same rate as R1 . Beth and Piper showed how a judicious choice of registers R1 , R2 , and R3 can guarantee that the output sequence has high linear complexity and period, and good statistical properties. Unfortunately, the generator succumbs to the linear syndrome attack of Zeng, Yang, and Rao [1265] (see also page 218): if the connection polynomials of R1 and R2 are primitive trinomials of degree not exceeding n, and known to the adversary, then the initial states of the three component LFSRs (i.e, the secret key) can be efficiently recovered from a known-plaintext segment of length 37n bits. Another variant of the stop-and-go generator is the step-1/step-2 generator due to Gollmann and Chambers [496]. This generator uses two maximum-length registers R1 and R2 of the same length. Register R1 is

used to control the stepping of R2 as follows If the output of R1 is 0, then R2 is clocked once; if the output of R1 is 1, then R2 is clocked twice before producing the next output bit. Živković [1274] proposed an embedding correlation attack on R2 whose complexity of O(2L2 ), where L2 is the length of R2 . A cyclic register of length L is an LFSR with feedback polynomial C(D) = 1 + DL. Gollmann [494] proposed cascading n cyclic registers of the same prime length p by arranging them serially in such a way that all except the first register are clock-controlled by their predecessors; the Gollmann p-cycle cascade can be viewed as an extension of the stop-and-go c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §6.5 Notes and further references 221 generator (page 220). The first register is clocked regularly, and its output bit is the input bit to the second register. In general, if the input bit to the ith register (for i ≥ 2) at time t is at , then the

ith register is clocked if at = 1; if at = 0, the register is not clocked but its previous output bit is repeated. The output bit of the ith register is then XORed with at , and the result becomes the input bit to the (i+1)st register. The output of the last register is the output of the p-cycle cascade. The initial (secret) stage of a component cyclic register should not be the all-0’s vector or the all-1’s vector. Gollmann proved that the period of the output sequence is pn . Moreover, if p is a prime such that 2 is a generator of Z∗p , then the output sequence has linear complexity pn . This suggests very strongly using long cascades (i.e, n large) of shorter registers rather than short cascades of longer registers A variant of the Gollmann cascade, called an m-sequence cascade, has the cyclic registers replaced by maximum-length LFSRs of the same length L. Chambers [237] showed that the output sequence of such an m-sequence cascade has period (2L − 1)n and linear complexity

at least L(2L − 1)n−1 . Park, Lee, and Goh [964] extended earlier work of Menicocci [845] and reported breaking 9-stage m-sequence cascades where each LFSR has length 100; they also suggested that 10-stage m-sequence cascades may be insecure. Chambers and Gollmann [239] studied an attack on p-cycle and m-sequence cascades called lock-in, which results in a reduction in the effective key space of the cascades. The shrinking generator (Algorithm 6.61) was proposed in 1993 by Coppersmith, Krawczyk, and Mansour [279], who also proved Fact 6.63 and described the attacks mentioned in Note 664 The irregular output rate of the shrinking generator can be overcome by using a short buffer for the output; the influence of such a buffer is analyzed by Kessler and Krawczyk [669]. Krawczyk [716] mentions some techniques for improving software implementations A throughput of 25 Mbits/sec is reported for a C language implementation on a 33MHz IBM workstation, when the two shift registers each have

lengths in the range 61–64 bits and secret connections are employed. The security of the shrinking generator is studied further by Golić [487]. A key generator related to the shrinking generator is the self-shrinking generator (SSG) of Meier and Staffelbach [838]. The self-shrinking generator uses only one maximum-length LFSR R. The output sequence of R is partitioned into pairs of bits The SSG outputs a 0 if a pair is 10, and outputs a 1 if a pair is 11; 01 and 00 pairs are discarded. Meier and Staffelbach proved that the self-shrinking generator can be implemented as a shrinking generator. Moreover, the shrinking generator can be implemented as a self-shrinking generator (whose component LFSR is not maximum-length) More precisely, if the component LFSRs of a shrinking generator have connection polynomials C1 (D) and C2 (D), its output sequence can be produced by a self-shrinking generator with connection polynomial C(D) = C1 (D)2 · C2 (D)2 . Meier and Staffelbach also proved

that if the length of R is L, then the period and linear complexity of the output sequence of the SSG are at least 2bL/2c and 2bL/2c−1 , respectively. Moreover, they provided strong evidence that this period and linear complexity is in fact about 2L−1 . Assuming a randomly chosen, but known, connection polynomial, the best attack presented by Meier and Staffelbach on the SSG takes 2079L steps. More recently, Mihaljević [871] presented a significantly faster probabilistic attack on the SSG. For example, if L = 100, then the new attack takes 257 steps and requires a portion of the output sequence of length 4.9 × 108 The attack does not have an impact on the security of the shrinking generator. A recent survey of techniques for attacking clock-controlled generators is given by Gollmann [495]. For some newer attack techniques, see Mihaljević [872], Golić and O’Connor [492], and Golić [489]. Chambers [238] proposed a clock-controlled cascade composed of LFSRs each of length

32. Each 32-bit portion of the output sequence of a component LFSR Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 222 Ch. 6 Stream Ciphers is passed through an invertible scrambler box (S-box), and the resulting 32-bit sequence is used to control the clock of the next LFSR. Baum and Blackburn [77] generalized the notion of a clock-controlled shift register to that of a register based on a finite group. §6.4 SEAL (Algorithm 6.68) was designed and patented by Coppersmith and Rogaway [281] Rogaway and Coppersmith [1066] report an encryption speed of 7.2 Mbytes/sec for an assembly language implementation on a 50 MHz 486 processor with L = 4096 bits, assuming precomputed tables (cf. Note 666) Although the stream cipher RC4 remains proprietary, alleged descriptions have been published which are output compatible with certified implementations of RC4; for example, see Schneier [1094]. Blöcher and Dichtl [156] proposed a fast software stream cipher

called FISH (Fibonacci Shrinking generator), which is based on the shrinking generator principle applied to the lagged Fibonacci generator (also known as the additive generator) of Knuth [692, p.27] Anderson [28] subsequently presented a known-plaintext attack on FISH which requires a few thousand 32-bit words of known plaintext and a work factor of about 240 computations. Anderson also proposed a fast software stream cipher called PIKE based on the Fibonacci generator and the stream cipher A5; a description of A5 is given by Anderson [28]. Wolfram [1251, 1252] proposed a stream cipher based on one-dimensional cellular automata with nonlinear feedback. Meier and Staffelbach [835] presented a known-plaintext attack on this cipher which demonstrated that key lengths of 127 bits suggested by Wolfram [1252] are insecure; Meier and Staffelbach recommend key sizes of about 1000 bits. Klapper and Goresky [679] presented constructions for FCSRs (see page 217) whose output sequences have nearly

maximal period, are balanced, and are nearly de Bruijn sequences in the sense that for any fixed non-negative integer t, the number of occurrences of any two t-bit sequences as subsequences of a period differs by at most 2. Such FCSRs are good candidates for usage in the construction of secure stream ciphers, just as maximum-length LFSRs were used in §6.3 Goresky and Klapper [518] introduced a generalization of FCSRs called d-FCSRs, based on ramified extensions of the 2-adic numbers (d is the ramification). c 1997 by CRC Press, Inc. See accompanying notice at front of chapter Chapter 7 Block Ciphers Contents in Brief 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Introduction and overview . Background and general concepts . Classical ciphers and historical development DES . FEAL . IDEA . SAFER, RC5, and other block ciphers . Notes and further references . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 224 237 250 259 263 266 271 7.1 Introduction and overview Symmetric-key block ciphers are the most prominent and important elements in many cryptographic systems. Individually, they provide confidentiality As a fundamental building block, their versatility allows construction of pseudorandom number generators, stream ciphers, MACs, and hash functions. They may furthermore serve as a central component in message authentication techniques, data integrity mechanisms, entity authentication protocols, and (symmetric-key) digital signature schemes. This chapter examines symmetric-key block ciphers, including both general concepts and details of specific algorithms. Publickey block ciphers are discussed in Chapter 8 No block cipher is ideally suited for all applications, even one offering a high level of security. This is a result of

inevitable tradeoffs required in practical applications, including those arising from, for example, speed requirements and memory limitations (e.g, code size, data size, cache memory), constraints imposed by implementation platforms (e.g, hardware, software, chipcards), and differing tolerances of applications to properties of various modes of operation. In addition, efficiency must typically be traded off against security Thus it is beneficial to have a number of candidate ciphers from which to draw. Of the many block ciphers currently available, focus in this chapter is given to a subset of high profile and/or well-studied algorithms. While not guaranteed to be more secure than other published candidate ciphers (indeed, this status changes as new attacks become known), emphasis is given to those of greatest practical interest. Among these, DES is paramount; FEAL has received both serious commercial backing and a large amount of independent cryptographic analysis; and IDEA (originally

proposed as a DES replacement) is widely known and highly regarded. Other recently proposed ciphers of both high promise and high profile (in part due to the reputation of their designers) are SAFER and RC5. Additional ciphers are presented in less detail 223 224 Ch. 7 Block Ciphers Chapter outline Basic background on block ciphers and algorithm-independent concepts are presented in §7.2, including modes of operation, multiple encryption, and exhaustive search techniques Classical ciphers and cryptanalysis thereof are addressed in §7.3, including historical details on cipher machines. Modern block ciphers covered in chronological order are DES (§74), FEAL (§7.5), and IDEA (§76), followed by SAFER, RC5, and other ciphers in §77, collectively illustrating a wide range of modern block cipher design approaches Further notes, including details on additional ciphers (e.g, Lucifer) and references for the chapter, may be found in §7.8 7.2 Background and general concepts

Introductory material on block ciphers is followed by subsections addressing modes of operation, and discussion of exhaustive key search attacks and multiple encryption. 7.21 Introduction to block ciphers Block ciphers can be either symmetric-key or public-key. The main focus of this chapter is symmetric-key block ciphers; public-key encryption is addressed in Chapter 8. (i) Block cipher definitions A block cipher is a function (see §1.31) which maps n-bit plaintext blocks to n-bit ciphertext blocks; n is called the blocklength It may be viewed as a simple substitution cipher with large character size. The function is parameterized by a k-bit key K,1 taking values from a subset K (the key space) of the set of all k-bit vectors Vk . It is generally assumed that the key is chosen at random. Use of plaintext and ciphertext blocks of equal size avoids data expansion. To allow unique decryption, the encryption function must be one-to-one (i.e, invertible) For n-bit plaintext and

ciphertext blocks and a fixed key, the encryption function is a bijection, defining a permutation on n-bit vectors. Each key potentially defines a different bijection The number of keys is |K|, and the effective key size is lg |K|; this equals the key length if all k-bit vectors are valid keys (K = Vk ). If keys are equiprobable and each defines a different bijection, the entropy of the key space is also lg |K|. 7.1 Definition An n-bit block cipher is a function E : Vn × K Vn , such that for each key K ∈ K, E(P, K) is an invertible mapping (the encryption function for K) from Vn to Vn , written EK (P ). The inverse mapping is the decryption function, denoted DK (C) C = EK (P ) denotes that ciphertext C results from encrypting plaintext P under K. Whereas block ciphers generally process plaintext in relatively large blocks (e.g, n ≥ 64), stream ciphers typically process smaller units (see Note 6.1); the distinction, however, is not definitive (see Remark 7.25) For plaintext

messages exceeding one block in length, various modes of operation for block ciphers are used (see §7.22) The most general block cipher implements every possible substitution, as per Definition 7.2 To represent the key of such an n-bit (true) random block cipher would require 1 This use of symbols k and K may differ from other chapters. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 225 lg(2n !) ≈ (n − 1.44)2n bits, or roughly 2n times the number of bits in a message block This excessive bitsize makes (true) random ciphers impractical. Nonetheless, it is an accepted design principle that the encryption function corresponding to a randomly selected key should appear to be a randomly chosen invertible function. 7.2 Definition A (true) random cipher is an n-bit block cipher implementing all 2n ! bijections on 2n elements. Each of the 2n ! keys specifies one such permutation A block cipher whose block size n is too

small may be vulnerable to attacks based on statistical analysis. One such attack involves simple frequency analysis of ciphertext blocks (see Note 7.74) This may be thwarted by appropriate use of modes of operation (eg, Algorithm 713) Other such attacks are considered in Note 78 However, choosing too large a value for the blocksize n may create difficulties as the complexity of implementation of many ciphers grows rapidly with block size. In practice, consequently, for larger n, easilyimplementable functions are necessary which appear to be random (without knowledge of the key). An encryption function per Definition 7.1 is a deterministic mapping Each pairing of plaintext block P and key K maps to a unique ciphertext block. In contrast, in a randomized encryption technique (Definition 7.3; see also Remark 822), each (P, K) pair is associated with a set C(P,K) of eligible ciphertext blocks; each time P is encrypted under K, an output R from a random source non-deterministically selects

one of these eligible blocks. To ensure invertibility, for every fixed key K, the subsets C(P,K) over all plaintexts P must be disjoint. Since the encryption function is essentially one-to-many involving an additional parameter R (cf. homophonic substitution, §732), the requirement for invertibility implies data expansion, which is a disadvantage of randomized encryption and is often unacceptable. 7.3 Definition A randomized encryption mapping is a function E from a plaintext space Vn to a ciphertext space Vm , m > n, drawing elements from a space of random numbers R = Vt . E is defined by E : Vn × K ×R Vm , such that for each key K ∈ K and R ∈ R, R E(P, K, R), also written EK (P ), maps P ∈ Vn to Vm ; and an inverse (corresponding decryption) function exists, mapping Vm × K Vn . (ii) Practical security and complexity of attacks The objective of a block cipher is to provide confidentiality. The corresponding objective of an adversary is to recover plaintext from

ciphertext. A block cipher is totally broken if a key can be found, and partially broken if an adversary is able to recover part of the plaintext (but not the key) from ciphertext. 7.4 Note (standard assumptions) To evaluate block cipher security, it is customary to always assume that an adversary (i) has access to all data transmitted over the ciphertext channel; and (ii) (Kerckhoffs’ assumption) knows all details of the encryption function except the secret key (which security consequently rests entirely upon). Under the assumptions of Note 7.4, attacks are classified based on what information a cryptanalyst has access to in addition to intercepted ciphertext (cf. §1131) The most prominent classes of attack for symmetric-key ciphers are (for a fixed key): 1. ciphertext-only – no additional information is available 2. known-plaintext – plaintext-ciphertext pairs are available Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 226 Ch. 7 Block

Ciphers 3. chosen-plaintext – ciphertexts are available corresponding to plaintexts of the adversary’s choice A variation is an adaptive chosen-plaintext attack, where the choice of plaintexts may depend on previous plaintext-ciphertext pairs. Additional classes of attacks are given in Note 7.6; while somewhat more hypothetical, these are nonetheless of interest for the purposes of analysis and comparison of ciphers. 7.5 Remark (chosen-plaintext principle) It is customary to use ciphers resistant to chosenplaintext attack even when mounting such an attack is not feasible A cipher secure against chosen-plaintext attack is secure against known-plaintext and ciphertext-only attacks. 7.6 Note (chosen-ciphertext and related-key attacks) A chosen-ciphertext attack operates under the following model: an adversary is allowed access to plaintext-ciphertext pairs for some number of ciphertexts of his choice, and thereafter attempts to use this information to recover the key (or plaintext

corresponding to some new ciphertext). In a related-key attack, an adversary is assumed to have access to the encryption of plaintexts under both an unknown key and (unknown) keys chosen to have or known to have certain relationships with this key. With few exceptions (e.g, the one-time pad), the best available measure of security for practical ciphers is the complexity of the best (currently) known attack. Various aspects of such complexity may be distinguished as follows: 1. data complexity – expected number of input data units required (eg, ciphertext) 2. storage complexity – expected number of storage units required 3. processing complexity – expected number of operations required to process input data and/or fill storage with data (at least one time unit per storage unit). The attack complexity is the dominant of these (e.g, for linear cryptanalysis on DES, essentially the data complexity) When parallelization is possible, processing complexity may be divided across many

processors (but not reduced), reducing attack time. Given a data complexity of 2n , an attack is always possible; this many different nbit blocks completely characterize the encryption function for a fixed k-bit key. Similarly, given a processing complexity of 2k , an attack is possible by exhaustive key search (§7.23) Thus as a minimum, the effective key size should be sufficiently large to preclude exhaustive key search, and the block size sufficiently large to preclude exhaustive data analysis. A block cipher is considered computationally secure if these conditions hold and no known attack has both data and processing complexity significantly less than, respectively, 2n and 2k . However, see Note 78 for additional concerns related to block size 7.7 Remark (passive vs active complexity) For symmetric-key block ciphers, data complexity is beyond the control of the adversary, and is passive complexity (plaintext-ciphertext pairs cannot be generated by the adversary itself). Processing

complexity is active complexity which typically benefits from increased resources (eg, parallelization) 7.8 Note (attacks based on small block size) Security concerns which arise if the block size n is too small include the feasibility of text dictionary attacks and matching ciphertext attacks. A text dictionary may be assembled if plaintext-ciphertext pairs become known for a fixed key. The more pairs available, the larger the dictionary and the greater the chance of locating a random ciphertext block therein. A complete dictionary results if 2n plaintextciphertext pairs become known, and fewer suffice if plaintexts contain redundancy and a non-chaining mode of encryption (such as ECB) is used. Moreover, if about 2n/2 such pairs c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 227 are known, and about 2n/2 ciphertexts are subsequently created, then by the birthday paradox one expects to locate a ciphertext in the

dictionary. Relatedly, from ciphertext blocks alone, as the number of available blocks approaches 2n/2 , one expects to find matching ciphertext blocks. These may reveal partial information about the corresponding plaintexts, depending on the mode of operation of the block cipher, and the amount of redundancy in the plaintext. Computational and unconditional security are discussed in §1.133 Unconditional security is both unnecessary in many applications and impractical; for example, it requires as many bits of secret key as plaintext, and cannot be provided by a block cipher used to encrypt more than one block (due to Fact 7.9, since identical ciphertext implies matching plaintext). Nonetheless, results on unconditional security provide insight for the design of practical ciphers, and has motivated many of the principles of cryptographic practice currently in use (see Remark 7.10) 7.9 Fact A cipher provides perfect secrecy (unconditional security) if the ciphertext and plaintext

blocks are statistically independent 7.10 Remark (theoretically-motivated principles) The unconditional security of the one-timepad motivates both additive stream ciphers (Chapter 6) and the frequent changing of cryptographic keys (§1331) Theoretical results regarding the effect of redundancy on unicity distance (Fact 7.71) motivate the principle that for plaintext confidentiality, the plaintext data should be as random as possible, e.g, via data-compression prior to encryption, use of random-bit fields in message blocks, or randomized encryption (Definition 7.3) The latter two techniques may, however, increase the data length or allow covert channels. (iii) Criteria for evaluating block ciphers and modes of operation Many criteria may be used for evaluating block ciphers in practice, including: 1. estimated security level Confidence in the (historical) security of a cipher grows if it has been subjected to and withstood expert cryptanalysis over a substantial time period, e.g,

several years or more; such ciphers are certainly considered more secure than those which have not. This may include the performance of selected cipher components relative to various design criteria which have been proposed or gained favor in recent years. The amount of ciphertext required to mount practical attacks often vastly exceeds a cipher’s unicity distance (Definition 7.69), which provides a theoretical estimate of the amount of ciphertext required to recover the unique encryption key. 2. key size The effective bitlength of the key, or more specifically, the entropy of the key space, defines an upper bound on the security of a cipher (by considering exhaustive search). Longer keys typically impose additional costs (eg, generation, transmission, storage, difficulty to remember passwords) 3. throughput Throughput is related to the complexity of the cryptographic mapping (see below), and the degree to which the mapping is tailored to a particular implementation medium or

platform. 4. block size Block size impacts both security (larger is desirable) and complexity (larger is more costly to implement). Block size may also affect performance, for example, if padding is required. 5. complexity of cryptographic mapping Algorithmic complexity affects the implementation costs both in terms of development and fixed resources (hardware gate Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 228 Ch. 7 Block Ciphers count or software code/data size), as well as real-time performance for fixed resources (throughput). Some ciphers specifically favor hardware or software implementations 6. data expansion It is generally desirable, and often mandatory, that encryption does not increase the size of plaintext data. Homophonic substitution and randomized encryption techniques result in data expansion 7. error propagation Decryption of ciphertext containing bit errors may result in various effects on the recovered plaintext, including

propagation of errors to subsequent plaintext blocks. Different error characteristics are acceptable in various applications Block size (above) typically affects error propagation 7.22 Modes of operation A block cipher encrypts plaintext in fixed-size n-bit blocks (often n = 64). For messages exceeding n bits, the simplest approach is to partition the message into n-bit blocks and encrypt each separately. This electronic-codebook (ECB) mode has disadvantages in most applications, motivating other methods of employing block ciphers (modes of operation) on larger messages. The four most common modes are ECB, CBC, CFB, and OFB These are summarized in Figure 7.1 and discussed below In what follows, EK denotes the encryption function of the block cipher E parame−1 terized by key K, while EK denotes decryption (cf. Definition 71) A plaintext message x = x1 . xt is assumed to consist of n-bit blocks for ECB and CBC modes (see Algorithm 958 regarding padding), and r-bit blocks for CFB and

OFB modes for appropriate fixed r ≤ n. (i) ECB mode The electronic codebook (ECB) mode of operation is given in Algorithm 7.11 and illustrated in Figure 7.1(a) 7.11 Algorithm ECB mode of operation INPUT: k-bit key K; n-bit plaintext blocks x1 , . , xt SUMMARY: produce ciphertext blocks c1 , . , ct ; decrypt to recover plaintext 1. Encryption: for 1 ≤ j ≤ t, cj ← EK (xj ) −1 2. Decryption: for 1 ≤ j ≤ t, xj ← EK (cj ). Properties of the ECB mode of operation: 1. Identical plaintext blocks (under the same key) result in identical ciphertext 2. Chaining dependencies: blocks are enciphered independently of other blocks Reordering ciphertext blocks results in correspondingly re-ordered plaintext blocks 3. Error propagation: one or more bit errors in a single ciphertext block affect decipherment of that block only For typical ciphers E, decryption of such a block is then random (with about 50% of the recovered plaintext bits in error) Regarding bits being deleted, see

Remark 7.15 7.12 Remark (use of ECB mode) Since ciphertext blocks are independent, malicious substitution of ECB blocks (eg, insertion of a frequently occurring block) does not affect the decryption of adjacent blocks. Furthermore, block ciphers do not hide data patterns – identical ciphertext blocks imply identical plaintext blocks For this reason, the ECB mode is not recommended for messages longer than one block, or if keys are reused for more than c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 229 a) Electronic Codebook (ECB) b) Cipher-block Chaining (CBC) xj c0 = IV n key cj cj−1 E −1 E key xj n E −1 n key x0j = xj cj (i) encipherment key (ii) decipherment cj−1 E n x0j = xj cj (i) encipherment (ii) decipherment c) Cipher feedback (CFB), r-bit characters/r-bit feedback r-bit shift I1 = IV r-bit shift cj−1 Ij Ij cj−1 n E key E r key n leftmost r bits Oj Oj r xj r

x0j = xj cj (i) encipherment (ii) decipherment d) Output feedback (OFB), r-bit characters/n-bit feedback Oj−1 Oj−1 I1 = IV Ij Ij n E key leftmost r bits E n Oj key Oj r xj r r (i) encipherment x0j = xj cj (ii) decipherment Figure 7.1: Common modes of operation for an n-bit block cipher Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 230 Ch. 7 Block Ciphers a single one-block message. Security may be improved somewhat by inclusion of random padding bits in each block. (ii) CBC mode The cipher-block chaining (CBC) mode of operation, specified in Algorithm 7.13 and illustrated in Figure 71(b), involves use of an n-bit initialization vector, denoted IV 7.13 Algorithm CBC mode of operation INPUT: k-bit key K; n-bit IV ; n-bit plaintext blocks x1 , . , xt SUMMARY: produce ciphertext blocks c1 , . , ct ; decrypt to recover plaintext 1. Encryption: c0 ← IV For 1 ≤ j ≤ t, cj ← EK (cj−1 ⊕xj ) −1 (cj ). 2.

Decryption: c0 ← IV For 1 ≤ j ≤ t, xj ← cj−1 ⊕EK Properties of the CBC mode of operation: 1. Identical plaintexts: identical ciphertext blocks result when the same plaintext is enciphered under the same key and IV Changing the IV , key, or first plaintext block (e.g, using a counter or random field) results in different ciphertext 2. Chaining dependencies: the chaining mechanism causes ciphertext cj to depend on xj and all preceding plaintext blocks (the entire dependency on preceding blocks is, however, contained in the value of the previous ciphertext block). Consequently, rearranging the order of ciphertext blocks affects decryption Proper decryption of a correct ciphertext block requires a correct preceding ciphertext block. 3. Error propagation: a single bit error in ciphertext block cj affects decipherment of blocks cj and cj+1 (since xj depends on cj and cj−1 ). Block x0j recovered from cj is typically totally random (50% in error), while the recovered plaintext

x0j+1 has bit errors precisely where cj did. Thus an adversary may cause predictable bit changes in xj+1 by altering corresponding bits of cj . See also Remark 714 4. Error recovery: the CBC mode is self-synchronizing or ciphertext autokey (see Remark 715) in the sense that if an error (including loss of one or more entire blocks) occurs in block cj but not cj+1 , cj+2 is correctly decrypted to xj+2 . 7.14 Remark (error propagation in encryption) Although CBC mode decryption recovers from errors in ciphertext blocks, modifications to a plaintext block xj during encryption alter all subsequent ciphertext blocks. This impacts the usability of chaining modes for applications requiring random read/write access to encrypted data. The ECB mode is an alternative (but see Remark 7.12) 7.15 Remark (self-synchronizing vs framing errors) Although self-synchronizing in the sense of recovery from bit errors, recovery from “lost” bits causing errors in block boundaries (framing integrity errors)

is not possible in the CBC or other modes. 7.16 Remark (integrity of IV in CBC) While the IV in the CBC mode need not be secret, its integrity should be protected, since malicious modification thereof allows an adversary to make predictable bit changes to the first plaintext block recovered. Using a secret IV is one method for preventing this. However, if message integrity is required, an appropriate mechanism should be used (see §9.65); encryption mechanisms typically guarantee confidentiality only c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 231 (iii) CFB mode While the CBC mode processes plaintext n bits at a time (using an n-bit block cipher), some applications require that r-bit plaintext units be encrypted and transmitted without delay, for some fixed r < n (often r = 1 or r = 8). In this case, the cipher feedback (CFB) mode may be used, as specified in Algorithm 7.17 and illustrated in Figure 71(c) 7.17

Algorithm CFB mode of operation (CFB-r) INPUT: k-bit key K; n-bit IV ; r-bit plaintext blocks x1 , . , xu (1 ≤ r ≤ n) SUMMARY: produce r-bit ciphertext blocks c1 , . , cu ; decrypt to recover plaintext 1. Encryption: I1 ← IV (Ij is the input value in a shift register) For 1 ≤ j ≤ u: (a) Oj ← EK (Ij ). (Compute the block cipher output) (b) tj ← the r leftmost bits of Oj . (Assume the leftmost is identified as bit 1) (c) cj ← xj ⊕tj . (Transmit the r-bit ciphertext block cj ) (d) Ij+1 ← 2r · Ij + cj mod 2n . (Shift cj into right end of shift register) 2. Decryption: I1 ← IV For 1 ≤ j ≤ u, upon receiving cj : xj ← cj ⊕tj , where tj , Oj and Ij are computed as above. Properties of the CFB mode of operation: 1. Identical plaintexts: as per CBC encryption, changing the IV results in the same plaintext input being enciphered to a different output. The IV need not be secret (although an unpredictable IV may be desired in some applications). 2. Chaining

dependencies: similar to CBC encryption, the chaining mechanism causes ciphertext block cj to depend on both xj and preceding plaintext blocks; consequently, re-ordering ciphertext blocks affects decryption. Proper decryption of a correct ciphertext block requires the preceding dn/re ciphertext blocks to be correct (so that the shift register contains the proper value). 3. Error propagation: one or more bit errors in any single r-bit ciphertext block cj affects the decipherment of that and the next dn/re ciphertext blocks (ie, until n bits of ciphertext are processed, after which the error block cj has shifted entirely out of the shift register). The recovered plaintext x0j will differ from xj precisely in the bit positions cj was in error; the other incorrectly recovered plaintext blocks will typically be random vectors, i.e, have 50% of bits in error Thus an adversary may cause predictable bit changes in xj by altering corresponding bits of cj . 4. Error recovery: the CFB mode is

self-synchronizing similar to CBC, but requires dn/re ciphertext blocks to recover. 5. Throughput: for r < n, throughput is decreased by a factor of n/r (vs CBC) in that each execution of E yields only r bits of ciphertext output. 7.18 Remark (CFB use of encryption only) Since the encryption function E is used for both CFB encryption and decryption, the CFB mode must not be used if the block cipher E is a public-key algorithm; instead, the CBC mode should be used. 7.19 Example (ISO variant of CFB) The CFB mode of Algorithm 717 may be modified as follows, to allow processing of plaintext blocks (characters) whose bitsize s is less than the bitsize r of the feedback variable (e.g, 7-bit characters using 8-bit feedback; s < r) The leftmost s (rather than r) bits of Oj are assigned to tj ; the s-bit ciphertext character cj is computed; the feedback variable is computed from cj by pre-prepending (on the left) r − s 1-bits; the resulting r-bit feedback variable is shifted into the

least significant (LS) end of the shift register as before.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 232 Ch. 7 Block Ciphers (iv) OFB mode The output feedback (OFB) mode of operation may be used for applications in which all error propagation must be avoided. It is similar to CFB, and allows encryption of various block sizes (characters), but differs in that the output of the encryption block function E (rather than the ciphertext) serves as the feedback. Two versions of OFB using an n-bit block cipher are common. The ISO version (Figure 71(d) and Algorithm 720) requires an n-bit feedback, and is more secure (Note 724) The earlier FIPS version (Algorithm 7.21) allows r < n bits of feedback 7.20 Algorithm OFB mode with full feedback (per ISO 10116) INPUT: k-bit key K; n-bit IV ; r-bit plaintext blocks x1 , . , xu (1 ≤ r ≤ n) SUMMARY: produce r-bit ciphertext blocks c1 , . , cu ; decrypt to recover plaintext 1. Encryption: I1 ← IV

For 1 ≤ j ≤ u, given plaintext block xj : (a) Oj ← EK (Ij ). (Compute the block cipher output) (b) tj ← the r leftmost bits of Oj . (Assume the leftmost is identified as bit 1) (c) cj ← xj ⊕tj . (Transmit the r-bit ciphertext block cj ) (d) Ij+1 ← Oj . (Update the block cipher input for the next block) 2. Decryption: I1 ← IV For 1 ≤ j ≤ u, upon receiving cj : xj ← cj ⊕tj , where tj , Oj , and Ij are computed as above. 7.21 Algorithm OFB mode with r-bit feedback (per FIPS 81) INPUT: k-bit key K; n-bit IV ; r-bit plaintext blocks x1 , . , xu (1 ≤ r ≤ n) SUMMARY: produce r-bit ciphertext blocks c1 , . , cu ; decrypt to recover plaintext As per Algorithm 7.20, but with “Ij+1 ← Oj ” replaced by: Ij+1 ← 2r · Ij + tj mod 2n . (Shift output tj into right end of shift register) Properties of the OFB mode of operation: 1. Identical plaintexts: as per CBC and CFB modes, changing the IV results in the same plaintext being enciphered to a different output.

2. Chaining dependencies: the keystream is plaintext-independent (see Remark 722) 3. Error propagation: one or more bit errors in any ciphertext character cj affects the decipherment of only that character, in the precise bit position(s) cj is in error, causing the corresponding recovered plaintext bit(s) to be complemented. 4. Error recovery: the OFB mode recovers from ciphertext bit errors, but cannot selfsynchronize after loss of ciphertext bits, which destroys alignment of the decrypting keystream (in which case explicit re-synchronization is required). 5. Throughput: for r < n, throughput is decreased as per the CFB mode However, in all cases, since the keystream is independent of plaintext or ciphertext, it may be pre-computed (given the key and IV ). 7.22 Remark (changing IV in OFB) The IV , which need not be secret, must be changed if an OFB key K is re-used. Otherwise an identical keystream results, and by XORing corresponding ciphertexts an adversary may reduce

cryptanalysis to that of a running-key cipher with one plaintext as the running key (cf. Example 758 ff) Remark 7.18 on public-key block ciphers applies to the OFB mode as well as CFB c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 233 7.23 Example (counter mode) A simplification of OFB involves updating the input block as a counter, Ij+1 = Ij + 1, rather than using feedback. This both avoids the short-cycle problem of Note 724, and allows recovery from errors in computing E Moreover, it provides a random-access property: ciphertext block i need not be decrypted in order to decrypt block i + 1.  7.24 Note (OFB feedback size) In OFB with full n-bit feedback (Algorithm 720), the keystream is generated by the iterated function Oj = EK (Oj−1 ) Since EK is a permutation, and under the assumption that for random K, EK is effectively a random choice among all (2n )! permutations on n elements, it can be shown that for a

fixed (random) key and starting value, the expected cycle length before repeating any value Oj is about 2n−1 . On the other hand, if the number of feedback bits is r < n as allowed in Algorithm 7.21, the keystream is generated by the iteration Oj = f (Oj−1 ) for some non-permutation f which, assuming it behaves as a random function, has an expected cycle length of about 2n/2 . Consequently, it is strongly recommended to use the OFB mode with full n-bit feedback. 7.25 Remark (modes as stream ciphers) It is clear that both the OFB mode with full feedback (Algorithm 7.20) and the counter mode (Example 723) employ a block cipher as a keystream generator for a stream cipher Similarly the CFB mode encrypts a character stream using the block cipher as a (plaintext-dependent) keystream generator. The CBC mode may also be considered a stream cipher with n-bit blocks playing the role of very large characters. Thus modes of operation allow one to define stream ciphers from block ciphers.

7.23 Exhaustive key search and multiple encryption A fixed-size key defines an upper bound on the security of a block cipher, due to exhaustive key search (Fact 7.26) While this requires either known-plaintext or plaintext containing redundancy, it has widespread applicability since cipher operations (including decryption) are generally designed to be computationally efficient. A design technique which complicates exhaustive key search is to make the task of changing cipher keys computationally expensive, while allowing encryption with a fixed key to remain relatively efficient. Examples of ciphers with this property include the block cipher Khufu and the stream cipher SEAL. 7.26 Fact (exhaustive key search) For an n-bit block cipher with k-bit key, given a small number (eg, d(k + 4)/ne) of plaintext-ciphertext pairs encrypted under key K, K can be recovered by exhaustive key search in an expected time on the order of 2k−1 operations Justification: Progress through the entire key

space, decrypting a fixed ciphertext C with each trial key, and discarding those keys which do not yield the known plaintext P . The target key is among the undiscarded keys. The number of false alarms expected (non-target keys which map C to P ) depends on the relative size of k and n, and follows from unicity distance arguments; additional (P 0 , C 0 ) pairs suffice to discard false alarms. One expects to find the correct key after searching half the key space. 7.27 Example (exhaustive DES key search) For DES, k = 56, n = 64, and the expected requirement by Fact 726 is 255 decryptions and a single plaintext-ciphertext pair  If the underlying plaintext is known to contain redundancy as in Example 7.28, then ciphertext-only exhaustive key search is possible with a relatively small number of ciphertexts. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 234 Ch. 7 Block Ciphers 7.28 Example (ciphertext-only DES key search) Suppose DES is used to encrypt

64-bit blocks of 8 ASCII characters each, with one bit per character serving as an even parity bit. Trial decryption with an incorrect key K yields all 8 parity bits correct with probability 2−8 , and correct parity for t different blocks (each encrypted by K) with probability 2−8t . If this is used as a filter over all 256 keys, the expected number of unfiltered incorrect keys is 256 /28t . For most practical purposes, t = 10 suffices.  (i) Cascades of ciphers and multiple encryption If a block cipher is susceptible to exhaustive key search (due to inadequate keylength), encipherment of the same message block more than once may increase security. Various such techniques for multiple encryption of n-bit messages are considered here. Once defined, they may be extended to messages exceeding one block by using standard modes of operation (§7.22), with E denoting multiple rather than single encryption 7.29 Definition A cascade cipher is the concatenation of L ≥ 2 block ciphers

(called stages), each with independent keys. Plaintext is input to first stage; the output of stage i is input to stage i + 1; and the output of stage L is the cascade’s ciphertext output. In the simplest case, all stages in a cascade cipher have k-bit keys, and the stage inputs and outputs are all n-bit quantities. The stage ciphers may differ (general cascade of ciphers), or all be identical (cascade of identical ciphers). 7.30 Definition Multiple encryption is similar to a cascade of L identical ciphers, but the stage keys need not be independent, and the stage ciphers may be either a block cipher E or its corresponding decryption function D = E −1 . Two important cases of multiple encryption are double and triple encryption, as illustrated in Figure 7.2 and defined below (a) double encryption K1 plaintext P E K2 M ciphertext C E (b) triple encryption (K1 = K3 for two-key variant) K1 plaintext P E (1) K2 A E (2) K3 B E (3) ciphertext C Figure 7.2: Multiple

encryption 7.31 Definition Double encryption is defined as E(x) = EK2 (EK1 (x)), where EK denotes a block cipher E with key K. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.2 Background and general concepts 235 (3) (2) (1) (j) 7.32 Definition Triple encryption is defined as E(x) = EK3 (EK2 (EK1 (x))), where EK de−1 notes either EK or DK = EK . The case E(x) = EK3 (DK2 (EK1 (x))) is called E-D-E triple-encryption; the subcase K1 = K3 is often called two-key triple-encryption. Independent stage keys K1 and K2 are typically used in double encryption. In triple encryption (Definition 7.32), to save on key management and storage costs, dependent stage keys are often used. E-D-E triple-encryption with K1 = K2 = K3 is backwards compatible with (ie, equivalent to) single encryption (ii) Meet-in-the-middle attacks on multiple encryption A naive exhaustive key search attack on double encryption tries all 22k key pairs. The attack of Fact 7.33 reduces time

from 22k , at the cost of substantial space 7.33 Fact For a block cipher with a k-bit key, a known-plaintext meet-in-the-middle attack defeats double encryption using on the order of 2k operations and 2k storage Justification (basic meet-in-the-middle): Noting Figure 7.2(a), given a (P, C) pair, compute Mi = Ei (P ) under all 2k possible key values K1 = i; store all pairs (Mi , i), sorted or indexed on Mi (e.g, using conventional hashing) Decipher C under all 2k possible values K2 = j, and for each pair (Mj , j) where Mj = Dj (C), check for hits Mj = Mi against entries Mi in the first table. (This can be done creating a second sorted table, or simply checking each Mj entry as generated.) Each hit identifies a candidate solution key pair (i, j), since Ei (P ) = M = Dj (C). Using a second known-plaintext pair (P 0 , C 0 ) (cf Fact 7.35), discard candidate key pairs which do not map P 0 to C 0 A concept analogous to unicity distance for ciphertext-only attack (Definition 7.69) can be

defined for known-plaintext key search, based on the following strategy. Select a key; check if it is consistent with a given set (history) of plaintext-ciphertext pairs; if so, label the key a hit. A hit that is not the target key is a false key hit 7.34 Definition The number of plaintext-ciphertext pairs required to uniquely determine a key under a known-plaintext key search is the known-plaintext unicity distance. This is the smallest integer t such that a history of length t makes false key hits improbable. Using Fact 7.35, the (known-plaintext) unicity distance of a cascade of L random ciphers can be estimated Less than one false hit is expected when t > Lk/n 7.35 Fact For an L-stage cascade of random block ciphers with n-bit blocks and k-bit keys, the expected number of false key hits for a history of length t is about 2Lk−tn . Fact 7.35 holds with respect to random block ciphers defined as follows (cf Definitions 72 and 770): given n and k, of the possible (2n )!

permutations on 2n elements, choose 2k randomly and with equal probabilities, and associate these with the 2k keys. 7.36 Example (meet-in-the-middle – double-DES) Applying Fact 733 to DES (n = 64, k = 56), the number of candidate key pairs expected for one (P, C) pair is 248 = 2k · 2k /2n , and the likelihood of a false key pair satisfying a second (P 0 , C 0 ) sample is 2−16 = 248 /2n . Thus with high probability, two (P, C) pairs suffice for key determination. This agrees with the unicity distance estimate of Fact 7.35: for L = 2, a history of length t = 2 yields 2−16 expected false key hits.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 236 Ch. 7 Block Ciphers A naive exhaustive attack on all key pairs in double-DES uses 2112 time and negligible space, while the meet-in-the-middle attack (Fact 7.33) requires 256 time and 256 space Note 7.37 illustrates that the latter can be modified to yield a time-memory trade-off at any point

between these two extremes, with the time-memory product essentially constant at 2112 (e.g, 272 time, 240 space) 7.37 Note (time-memory tradeoff – double-encryption) In the attack of Example 736, memory may be reduced (from tables of 256 entries) by independently guessing s bits of each of K1 , K2 (for any fixed s, 0 ≤ s ≤ k). The tables then each have 2k−s entries (fixing s key bits eliminates 2s entries), but the attack must be run over 2s · 2s pairs of such tables to allow all possible key pairs. The memory requirement is 2·2k−s entries (each n+k−s bits, omitting s fixed key bits), while time is on the order of 22s ·2k−s = 2k+s . The time-memory product is 22k+1 . 7.38 Note (generalized meet-in-the-middle trade-off ) Variations of Note 737 allow time-space tradeoffs for meet-in-the-middle key search on any concatenation of L ≥ 2 ciphers. For L even, meeting between the first and last L/2 stages results in requirements on the order of 2 · 2(kL/2)−s space and

2(kL/2)+s time, 0 ≤ s ≤ kL/2. For L odd, meeting after the first (L − 1)/2 and before the last (L + 1)/2 stages results in requirements on the order of 2 · 2k(L−1)/2 − s space and 2k(L+1)/2 + s time, 1 ≤ s ≤ k(L − 1)/2. For a block cipher with k-bit key, a naive attack on two-key triple encryption (Definition 7.32) involves trying all 22k key pairs Fact 739 notes a chosen-plaintext alternative 7.39 Fact For an n-bit block cipher with k-bit key, two-key triple encryption may be defeated by a chosen-plaintext attack requiring on the order of 2k of each of the following: cipher operations, words of (n + k)-bit storage, and plaintext-ciphertext pairs with plaintexts chosen. Justification (chosen-plaintext attack on two-key triple-encryption): Using 2k chosen plaintexts, two-key triple encryption may be reduced to double-encryption as follows. Noting Figure 7.2(b), focus on the case where the result after the first encryption stage is the allzero vector A = 0 For all 2k

values K1 = i, compute Pi = Ei−1 (A) Submit each resulting Pi as a chosen plaintext, obtaining the corresponding ciphertext Ci For each, compute Bi = Ei−1 (Ci ), representing an intermediate result B after the second of three encryption stages. Note that the values Pi also represent candidate values B Sort the values Pj and Bj in a table (using standard hashing for efficiency). Identify the keys corresponding to pairs Pj = Bi as candidate solution key pairs K1 = i, K2 = j to the given problem. Confirm these by testing each key pair on a small number of additional known plaintext-ciphertext pairs as required. While generally impractical due to the storage requirement, the attack of Fact 7.39 is referred to as a certificational attack on two-key triple encryption, demonstrating it to be weaker than triple encryption. This motivates consideration of triple-encryption with three independent keys, although a penalty is a third key to manage. Fact 7.40, stated specifically for DES (n =

64, k = 56), indicates that for the price of additional computation, the memory requirement in Fact 7.39 may be reduced and the chosen-plaintext condition relaxed to known-plaintext. The attack, however, appears impractical even with extreme parallelization; for example, for lg t = 40, the number of operations is still 280 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 237 7.40 Fact If t known plaintext-ciphertext pairs are available, an attack on two-key triple-DES requires O(t) space and 2120−lg t operations. (iii) Multiple-encryption modes of operation In contrast to the single modes of operation in Figure 7.1, multiple modes are variants of multiple encryption constructed by concatenating selected single modes. For example, the combination of three single-mode CBC operations provides triple-inner-CBC; an alternative is triple-outer-CBC, the composite operation of triple encryption (per Definition

7.32) with one outer ciphertext feedback after the sequential application of three single-ECB operations. With replicated hardware, multiple modes such as triple-inner-CBC may be pipelined allowing performance comparable to single encryption, offering an advantage over triple-outer-CBC. Unfortunately (Note 741), they are often less secure 7.41 Note (security of triple-inner-CBC) Many multiple modes of operation are weaker than the corresponding multiple-ECB mode (i.e, multiple encryption operating as a black box with only outer feedbacks), and in some cases multiple modes (e.g, ECB-CBC-CBC) are not significantly stronger than single encryption. In particular, under some attacks tripleinner-CBC is significantly weaker than triple-outer-CBC; against other attacks based on the block size (e.g, Note 78), it appears stronger (iv) Cascade ciphers Counter-intuitively, it is possible to devise examples whereby cascading of ciphers (Definition 7.29) actually reduces security However, Fact 742

holds under a wide variety of attack models and meaningful definitions of “breaking”. 7.42 Fact A cascade of n (independently keyed) ciphers is at least as difficult to break as the first component cipher. Corollary: for stage ciphers which commute (eg, additive stream ciphers), a cascade is at least as strong as the strongest component cipher. Fact 7.42 does not apply to product ciphers consisting of component ciphers which may have dependent keys (e.g, two-key triple-encryption); indeed, keying dependencies across stages may compromise security entirely, as illustrated by a two-stage cascade wherein the components are two binary additive stream ciphers using an identical keystream – in this case, the cascade output is the original plaintext. Fact 7.42 may suggest the following practical design strategy: cascade a set of keystream generators each of which relies on one or more different design principles It is not clear, however, if this is preferable to one large keystream

generator which relies on a single principle. The cascade may turn out to be less secure for a fixed set of parameters (number of key bits, block size), since ciphers built piecewise may often be attacked piecewise. 7.3 Classical ciphers and historical development The term classical ciphers refers to encryption techniques which have become well-known over time, and generally created prior to the second half of the twentieth century (in some cases, many hundreds of years earlier). Many classical techniques are variations of simple substitution and simple transposition Some techniques that are not technically block ciphers are also included here for convenience and context. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 238 Ch. 7 Block Ciphers Classical ciphers and techniques are presented under §7.3 for historical and pedagogical reasons only They illustrate important basic principles and common pitfalls However, since these techniques are neither

sophisticated nor secure against current cryptanalytic capabilities, they are not generally suitable for practical use. 7.31 Transposition ciphers (background) For a simple transposition cipher with fixed period t, encryption involves grouping the plaintext into blocks of t characters, and applying to each block a single permutation e on the numbers 1 through t. More precisely, the ciphertext corresponding to plaintext block m = m1 . mt is c = Ee (m) = me(1) me(t) The encryption key is e, which implicitly defines t; the key space K has cardinality t! for a given value t Decryption involves use of the permutation d which inverts e. The above corresponds to Definition 132 The mathematical notation obscures the simplicity of the encryption procedure, as is evident from Example 7.43 7.43 Example (simple transposition) Consider a simple transposition cipher with t = 6 and e = (6 4 1 3 5 2). The message m = CAESAR is encrypted to c = RSCEAA Decryption uses the inverse permutation d =

(3 6 4 2 5 1). The transposition may be represented by a two-row matrix with the second indicating the position to which  the element indexed by the corresponding number of the first is mapped to: 13 26 34 42 55 61 . Encryption may be done by writing a block of plaintext under headings “3 6 4 2 5 1”, and then reading off the characters under the headings in numerical order.  7.44 Note (terminology: transposition vs permutation) While the term “transposition” is traditionally used to describe a transposition cipher, the mapping of Example 743 may alternately be called a permutation on the set {1, 2, , 6} The latter terminology is used, for example, in substitution-permutation networks, and in DES (§7.4) A mnemonic keyword may be used in place of a key, although this may seriously decrease the key space entropy. For example, for n = 6, the keyword “CIPHER” could be used to specify the column ordering 1, 5, 4, 2, 3, 6 (by alphabetic priority). 7.45 Definition Sequential

composition of two or more simple transpositions with respective periods t1 , t2 , . , ti is called a compound transposition 7.46 Fact The compound transposition of Definition 745 is equivalent to a simple transposition of period t = lcm(t1 , . , ti ) 7.47 Note (recognizing simple transposition) Although simple transposition ciphers alter dependencies between consecutive characters, they are easily recognized because they preserve the frequency distribution of each character 7.32 Substitution ciphers (background) This section considers the following types of classical ciphers: simple (or mono-alphabetic) substitution, polygram substitution, and homophonic substitution. The difference between codes and ciphers is also noted. Polyalphabetic substitution ciphers are considered in §733 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 239 (i) Mono-alphabetic substitution Suppose the ciphertext and

plaintext character sets are the same. Let m = m1 m2 m3 be a plaintext message consisting of juxtaposed characters mi ∈ A, where A is some fixed character alphabet such as A = {A, B, . , Z} A simple substitution cipher or monoalphabetic substitution cipher employs a permutation e over A, with encryption mapping Ee (m) = e(m1 )e(m2 )e(m3 ) . Here juxtaposition indicates concatenation (rather than multiplication), and e(mi ) is the character to which mi is mapped by e. This corresponds to Definition 1.27 7.48 Example (trivial shift cipher/Caesar cipher) A shift cipher is a simple substitution cipher with the permutation e constrained to an alphabetic shift through k characters for some fixed k. More precisely, if |A| = s, and mi is associated with the integer value i, 0 ≤ i ≤ s − 1, then ci = e(mi ) = mi + k mod s. The decryption mapping is defined by d(ci ) = ci − k mod s. For English text, s = 26, and characters A through Z are associated with integers 0 through 25.

For k = 1, the message m = HAL is encrypted to c = IBM According to folklore, Julius Caesar used the key k = 3.  The shift cipher can be trivially broken because there are only s = |A| keys (e.g, s = 26) to exhaustively search. A similar comment holds for affine ciphers (Example 749) More generally, see Fact 7.68 7.49 Example (affine cipher – historical) The affine cipher on a 26-letter alphabet is defined by eK (x) = ax + b mod 26, where 0 ≤ a, b ≤ 25. The key is (a, b) Ciphertext c = eK (x) is decrypted using dK (c) = (c − b)a−1 mod 26, with the necessary and sufficient condition for invertibility that gcd(a, 26) = 1. Shift ciphers are a subclass defined by a = 1  7.50 Note (recognizing simple substitution) Mono-alphabetic substitution alters the frequency of individual plaintext characters, but does not alter the frequency distribution of the overall character set. Thus, comparing ciphertext character frequencies to a table of expected letter frequencies (unigram

statistics) in the plaintext language allows associations between ciphertext and plaintext characters. (Eg, if the most frequent plaintext character X occurred twelve times, then the ciphertext character that X maps to will occur twelve times). (ii) Polygram substitution A simple substitution cipher substitutes for single plaintext letters. In contrast, polygram substitution ciphers involve groups of characters being substituted by other groups of characters. For example, sequences of two plaintext characters (digrams) may be replaced by other digrams. The same may be done with sequences of three plaintext characters (trigrams), or more generally using n-grams In full digram substitution over an alphabet of 26 characters, the key may be any of the 262 digrams, arranged in a table with row and column indices corresponding to the first and second characters in the digram, and the table entries being the ciphertext digrams substituted for the plaintext pairs. There are then (262 )! keys

7.51 Example (Playfair cipher – historical) A digram substitution may be defined by arranging the characters of a 25-letter alphabet (I and J are equated) in a 5 × 5 matrix M Adjacent plaintext characters are paired The pair (p1 , p2 ) is replaced by the digram (c3 , c4 ) as follows. If p1 and p2 are in distinct rows and columns, they define the corners of a submatrix (possibly M itself), with the remaining corners c3 and c4 ; c3 is defined as the character in the same column as p1 . If p1 and p2 are in a common row, c3 is defined as the character immediately to the right of p1 and c4 that immediately right of p2 (the first column is Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 240 Ch. 7 Block Ciphers viewed as being to the right of the last). If p1 and p2 are in the same column, the characters immediately (circularly) below them are c3 and c4 If p1 = p2 , an infrequent plaintext character (e.g, X) is inserted between them and the plaintext

is re-grouped While cryptanalysis based on single character frequencies fails for the Playfair cipher (each letter may be replaced by any other), cryptanalysis employing digram frequencies succeeds.  The key for a Playfair cipher is the 5 × 5 square. A mnemonic aid may be used to more easily remember the square. An example is the use of a meaningful keyphrase, with repeated letters deleted and the remaining alphabet characters included alphabetically at the end. The keyphrase “PLAYFAIR IS A DIGRAM CIPHER” would define a square with rows PLAYF, IRSDG, MCHEB, KNOQT, VWXYZ. To avoid the trailing characters always being from the end of the alphabet, a further shift cipher (Example 7.48) could be applied to the resulting 25-character string. Use of keyphrases may seriously reduce the key space entropy. This effect is reduced if the keyphrase is not directly written into the square. For example, the non-repeated keyphrase characters might be written into an 8-column rectangle (followed

by the remaining alphabet letters), the trailing columns being incomplete. The 25-character string obtained by reading the columns vertically is then used to fill the 5 × 5 square row by row. 7.52 Example (Hill cipher – historical) An n-gram substitution may be defined using an invertible n × n matrix AP= aij as the key to map an n-character plaintext m1 mn to a n ciphertext n-gram ci = j=1 aij mj , i = 1, . , n Decryption involves using A−1 Here characters A–Z, for example, are associated with integers 0–25. This polygram substitution cipher is a linear transformation, and falls under known-plaintext attack.  (iii) Homophonic substitution The idea of homophonic substitution, introduced in §1.5, is for each fixed key k to associate with each plaintext unit (eg, character) m a set S(k, m) of potential corresponding ciphertext units (generally all of common size). To encrypt m under k, randomly choose one element from this set as the ciphertext. To allow decryption, for

each fixed key this one-to-many encryption function must be injective on ciphertext space. Homophonic substitution results in ciphertext data expansion In homophonic substitution, |S(k, m)| should be proportional to the frequency of m in the message space. The motivation is to smooth out obvious irregularities in the frequency distribution of ciphertext characters, which result from irregularities in the plaintext frequency distribution when simple substitution is used. While homophonic substitution complicates cryptanalysis based on simple frequency distribution statistics, sufficient ciphertext may nonetheless allow frequency analysis, in conjunction with additional statistical properties of plaintext manifested in the ciphertext. For example, in long ciphertexts each element of S(k, m) will occur roughly the same number of times. Digram distributions may also provide information (iv) Codes vs. ciphers A technical distinction is made between ciphers and codes. Ciphers are encryption

techniques which are applied to plaintext units (bits, characters, or blocks) independent of their semantic or linguistic meaning; the result is called ciphertext. In contrast, cryptographic codes operate on linguistic units such as words, groups of words, or phrases, and substitute (replace) these by designated words, letter groups, or number groups called codegroups. The key is a dictionary-like codebook listing plaintext units and their corresponding codegroups, indexed by the former; a corresponding codebook for decoding is reverse-indexed. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 241 When there is potential ambiguity, codes in this context (vs. ciphers) may be qualified as cryptographic codebooks, to avoid confusion with error-correcting codes (EC-codes) used to detect and/or correct non-malicious errors and authentication codes (A-codes, or MACs as per Definition 9.7) which provide data origin

authentication Several factors suggest that codes may be more difficult to break than ciphers: the key (codebook) is vastly larger than typical cipher keys; codes may result in data compression (cf. Fact 771); and statistical analysis is complicated by the large plaintext unit block size (cf. Note 774) Opposing this are several major disadvantages: the coding operation not being easily automated (relative to an algorithmic mapping); and identical encryption of repeated occurrences of plaintext units implies susceptibility to known-plaintext attacks, and allows frequency analysis based on observed traffic. This implies a need for frequent rekeying (changing the codebook), which is both more costly and inconvenient Consequently, codes are not commonly used to secure modern telecommunications. 7.33 Polyalphabetic substitutions and Vigenère ciphers (historical) A simple substitution cipher involves a single mapping of the plaintext alphabet onto ciphertext characters. A more complex

alternative is to use different substitution mappings (called multiple alphabets) on various portions of the plaintext. This results in so-called polyalphabetic substitution (also introduced in Definition 1.30) In the simplest case, the different alphabets are used sequentially and then repeated, so the position of each plaintext character in the source string determines which mapping is applied to it. Under different alphabets, the same plaintext character is thus encrypted to different ciphertext characters, precluding simple frequency analysis as per mono-alphabetic substitution (§7.35) The simple Vigenère cipher is a polyalphabetic substitution cipher, introduced in Example 1.31 The definition is repeated here for convenience 7.53 Definition A simple Vigenère cipher of period t, over an s-character alphabet, involves a t-character key k1 k2 . kt The mapping of plaintext m = m1 m2 m3 to ciphertext c = c1 c2 c3 . is defined on individual characters by ci = mi + ki mod s,

where subscript i in ki is taken modulo t (the key is re-used). The simple Vigenère uses t shift ciphers (see Example 7.48), defined by t shift values ki , each specifying one of s (mono-alphabetic) substitutions; ki is used on the characters in position i, i + s, i + 2s, . In general, each of the t substitutions is different; this is referred to as using t alphabets rather than a single substitution mapping. The shift cipher (Example 7.48) is a simple Vigenère with period t = 1 7.54 Example (Beaufort variants of Vigenère) Compared to the simple Vigenère mapping ci = mi + ki mod s, the Beaufort cipher has ci = ki − mi mod s, and is its own inverse. The variant Beaufort has encryption mapping ci = mi − ki mod s.  7.55 Example (compound Vigenère) The compound Vigenère has encryption mapping ci = mi + (ki1 + ki2 + · · · + kir ) mod s, where in general the keys k j , 1 ≤ j ≤ r, have distinct periods tj , and the subscript i in kij , indicating the ith character of

k j , is taken modulo tj . This corresponds to the sequential application of r simple Vigenères, and is equivalent to a simple Vigenère of period lcm(t1 , . , tr )  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 242 Ch. 7 Block Ciphers 7.56 Example (single mixed alphabet Vigenère) A simple substitution mapping defined by a general permutation e (not restricted to an alphabetic shift), followed by a simple Vigenère, is defined by the mapping ci = e(mi ) + ki mod s, with inverse mi = e−1 (ci − ki ) mod s. An alternative is a simple Vigenère followed by a simple substitution: ci = e(mi + ki mod s), with inverse mi = e−1 (ci ) − ki mod s.  7.57 Example (full Vigenère) In a simple Vigenère of period t, replace the mapping defined by the shift value ki (for shifting character mi ) by a general permutation ei of the alphabet. The result is the substitution mapping ci = ei (mi ), where the subscript i in ei is taken modulo t. The

key consists of t permutations e1 , , et  7.58 Example (running-key Vigenère) If the keystream ki of a simple Vigenère is as long as the plaintext, the cipher is called a running-key cipher. For example, the key may be meaningful text from a book  While running-key ciphers prevent cryptanalysis by the Kasiski method (§7.35), if the key has redundancy, cryptanalysis exploiting statistical imbalances may nonetheless succeed. For example, when encrypting plaintext English characters using a meaningful text as a running key, cryptanalysis is possible based on the observation that a significant proportion of ciphertext characters results from the encryption of high-frequency running text characters with high-frequency plaintext characters. 7.59 Fact A running-key cipher can be strengthened by successively enciphering plaintext under two or more distinct running keys For typical English plaintext and running keys, it can be shown that iterating four such encipherments appears

unbreakable. 7.60 Definition An auto-key cipher is a cipher wherein the plaintext itself serves as the key (typically subsequent to the use of an initial priming key). 7.61 Example (auto-key Vigenère) In a running-key Vigenère (Example 758) with an s-character alphabet, define a priming key k = k1 k2 kt Plaintext characters mi are encrypted as ci = mi + ki mod s for 1 ≤ i ≤ t (simplest case: t = 1). For i > t, ci = (mi + mi−t ) mod s. An alternative involving more keying material is to replace the simple shift by a full Vigenère with permutations ei , 1 ≤ i ≤ s, defined by the key ki or character mi : for 1 ≤ i ≤ t, ci = eki (mi ), and for i > t, ci = emi−t (mi ).  An alternative to Example 7.61 is to auto-key a cipher using the resulting ciphertext as the key: for example, for i > t, ci = (mi + ci−t ) mod s. This, however, is far less desirable, as it provides an eavesdropping cryptanalyst the key itself. 7.62 Example (Vernam viewed as a

Vigenère) Consider a simple Vigenère defined by ci = mi + ki mod s. If the keystream is truly random and independent – as long as the plaintext and never repeated (cf Example 758) – this yields the unconditionally secure Vernam cipher (Definition 1.39; §611), generalized from a binary to an arbitrary alphabet  7.34 Polyalphabetic cipher machines and rotors (historical) The Jefferson cylinder is a deceptively simple device which implements a polyalphabetic substitution cipher; conceived in the late 18th century, it had remarkable cryptographic c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 243 strength for its time. Polyalphabetic substitution ciphers implemented by a class of rotorbased machines were the dominant cryptographic tool in World War II Such machines, including the Enigma machine and those of Hagelin, have an alphabet which changes continuously for a very long period before repeating;

this provides protection against Kasiski analysis and methods based on the index of coincidence (§7.35) (i) Jefferson cylinder The Jefferson cylinder (Figure 7.3) implements a polyalphabetic substitution cipher while avoiding complex machinery, extensive user computations, and Vigenère tableaus. A solid cylinder 6 inches long is sliced into 36 disks. A rod inserted through the cylinder axis allows the disks to rotate. The periphery of each disk is divided into 26 parts On each disk, the letters A–Z are inscribed in a (different) random ordering. Plaintext messages are encrypted in 36-character blocks. A reference bar is placed along the cylinder’s length Each of the 36 wheels is individually rotated to bring the appropriate character (matching the plaintext block) into position along the reference line. The 25 other parallel reference positions then each define a ciphertext, from which (in an early instance of randomized encryption) one is selected as the ciphertext to transmit.

B Q N A S R C X T O W K D J E R L F R I S L M H Y P F Z O S Figure 7.3: The Jefferson cylinder The second party possesses a cylinder with identically marked and ordered disks (1– 36). The ciphertext is decrypted by rotating each of the 36 disks to obtain characters along a fixed reference line matching the ciphertext. The other 25 reference positions are examined for a recognizable plaintext If the original message is not recognizable (eg, random data), both parties agree beforehand on an index 1 through 25 specifying the offset between plaintext and ciphertext lines. To accommodate plaintext digits 0–9 without extra disk sections, each digit is permanently assigned to one of 10 letters (a,e,i,o,u,y and f,l,r,s) which is encrypted as above but annotated with an overhead dot, identifying that the procedure must be reversed. Reordering disks (1 through 36) alters the polyalphabetic substitution key The number of possible orderings is 36! ≈ 372 × 1041 Changing the ordering of

letters on each disk affords 25! further mappings (per disk), but is more difficult in practice. (ii) Rotor-based machines – technical overview A simplified generic rotor machine (Figure 7.4) consists of a number of rotors (wired codewheels) each implementing a different fixed mono-alphabetic substitution, mapping a character at its input face to one on its output face A plaintext character input to the first rotor generates an output which is input to the second rotor, and so on, until the final ciphertext character emerges from the last. For fixed rotor positions, the bank of rotors collectively implements a mono-alphabetic substitution which is the composition of the substitutions defined by the individual rotors. To provide polyalphabetic substitution, the encipherment of each plaintext character causes various rotors to move. The simplest case is an odometer-like movement, with a single rotor stepped until it completes a full revolution, at which time it steps the adjacent

Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 244 Ch. 7 Block Ciphers plaintext A A B B C C D D E E ciphertext Figure 7.4: A rotor-based machine rotor one position, and so on. Stepping a rotor changes the mono-alphabetic substitution it defines (the active mapping). More precisely, each rotor Ri effects a mono-alphabetic substitution fi . Ri can rotate into ti positions (eg, ti = 26) When offset j places from a reference setting, Ri maps input a to fi (a − j) + j, where both the input to fi and the final output are reduced mod 26. The cipher key is defined by the mono-alphabetic substitutions determined by the fixed wheel wirings and initial rotor positions. Re-arranging the order of rotors provides additional variability Providing a machine with more rotors than necessary for operation at any one time allows further keying variation (by changing the active rotors). 7.63 Fact Two properties of rotor machines desirable for

security-related reasons are: (1) long periods; and (2) state changes which are almost all “large”. The second property concerns the motion of rotors relative to each other, so that the sub-mappings between rotor faces change when the state changes. Rotor machines with odometer-like state changes fail to achieve this second property. 7.64 Note (rotor machine output methods) Rotor machines were categorized by their method of providing ciphertext output. In indicating machines, ciphertext output characters are indicated by means such as lighted lamps or displayed characters in output apertures In printing machines, ciphertext is printed or typewritten onto an output medium such as paper With on-line machines, output characters are produced in electronic form suitable for direct transmission over telecommunications media. (iii) Rotor-based machines – historical notes A number of individuals are responsible for the development of early machines based on rotor principles. In 1918, the

American EH Hebern built the first rotor apparatus, based on an earlier typewriting machine modified with wired connections to generate a mono-alphabetic substitution. The output was originally by lighted indicators The first rotor patent was filed in 1921, the year Hebern Electric Code, Inc. became the first US cipher machine company (and first to bankrupt in 1926). The US Navy (circa 1929-1930 and some years thereafter) used a number of Hebern’s five-rotor machines. In October 1919, H.A Koch filed Netherlands patent no10,700 (“Geheimschrijfmachine” – secret writing machine), demonstrating a deep understanding of rotor principles; no machine was built. In 1927, the patent rights were assigned to A Scherbius The German inventor Scherbius built a rotor machine called the Enigma. Model A was replaced by Model B with typewriter output, and a portable Model C with indicator lamps. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and

historical development 245 The company set up in 1923 dissolved in 1934, but thereafter the Germans used the portable battery-powered Enigma, including for critical World War II operations. In October 1919, three days after Koch, A.G Damm filed Swedish patent no52,279 describing a double-rotor device His firm was joined by the Swede, B Hagelin, whose 1925 modification yielded the B-21 rotor machine (with indicating lamps) used by the Swedish army. The B-21 had keywheels with varying number of teeth or gears, each of which was associated with a settable two-state pin. The period of the resulting polyalphabetic substitution was the product of the numbers of keywheel pins; the key was defined by the state of each pin and the initial keywheel positions. Hagelin later produced other models: B-211 (a printing machine); a more compact (phone-sized) model C-36 for the French in 1934; and based on alterations suggested by Friedman and others, model C-48 (of which over 140 000 were produced)

which was called M-209 when used by the U.S Army as a World War II field cipher. His 1948 Swiss factory later produced: model C-52, a strengthened version of M-209 (C-48) with period exceeding 2.75 × 109 (with keywheels of 47, 43, 41, 37, 31, 29 pins); CD-55, a pocket-size version of the C-52; and T-55, an on-line version of the same, modifiable to use a one-time tape. A further model was CD-57 7.65 Note (Enigma details) The Enigma initially had three rotors Ri , each with 26 positions R1 stepped R2 which stepped R3 odometer-like, with R2 also stepping itself; the period was 26 · 25 · 26 ≈ 17 000. The key consisted of the initial positions of these rotors (≈ 17 000 choices), their order (3! = 6 choices), and the state of a plugboard, which implemented a fixed but easily changed (e.g, manually, every hour) mono-alphabetic substitution (26! choices), in addition to that carried out by rotor combinations. 7.66 Note (Hagelin M-209 details) The Hagelin M-209 rotor machine implements

a polyalphabetic substitution using 6 keywheels – more specifically, a self-decrypting Beaufort cipher (Example 7.54), Eki (mi ) = ki −mi mod 26, of period 101 405 850 = 26·25·23·21·19·17 letters. Thus for a fixed ordered set of 6 keywheels, the cipher period exceeds 108 ki may be viewed as the ith character in the key stream, as determined by a particular ordering of keywheels, their pin settings, and starting positions. All keywheels rotate one position forward after each character is enciphered The wheels simultaneously return to their initial position only after a period equal to the least-common-multiple of their gear-counts, which (since these are co-prime) is their product. A ciphertext-only attack is possible with 10002000 characters, using knowledge of the machine’s internal mechanical details, and assuming natural language redundancy in the plaintext; a known-plaintext attack is possible with 50-100 characters. 7.35 Cryptanalysis of classical ciphers (historical)

This section presents background material on redundancy and unicity distance, and techniques for cryptanalysis of classical ciphers, (i) Redundancy All natural languages are redundant. This redundancy results from linguistic structure For example, in English the letter “E” appears far more frequently than “Z”, “Q” is almost always followed by “U”, and “TH” is a common digram. An alphabet with 26 characters (e.g, Roman alphabet) can theoretically carry up to lg 26 = 4.7 bits of information per character Fact 767 indicates that, on average, far less information is actually conveyed by a natural language. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 246 Ch. 7 Block Ciphers 7.67 Fact The estimated average amount of information carried per character (per-character entropy) in meaningful English alphabetic text is 15 bits The per-character redundancy of English is thus about 4.7 − 15 = 32 bits 7.68 Fact Empirical evidence suggests

that, for essentially any simple substitution cipher on a meaningful message (e.g, with redundancy comparable to English), as few as 25 ciphertext characters suffices to allow a skilled cryptanalyst to recover the plaintext. (ii) Unicity distance and random cipher model 7.69 Definition The unicity distance of a cipher is the minimum amount of ciphertext (number of characters) required to allow a computationally unlimited adversary to recover the unique encryption key. The unicity distance is primarily a theoretical measure, useful in relation to unconditional security. A small unicity distance does not necessarily imply that a block cipher is insecure in practice. For example, consider a 64-bit block cipher with a unicity distance of two ciphertext blocks. It may still be computationally infeasible for a cryptanalyst (of reasonable but bounded computing power) to recover the key, although theoretically there is sufficient information to allow this. The random cipher model (Definition

7.70) is a simplified model of a block cipher providing a reasonable approximation for many purposes, facilitating results on block cipher properties not otherwise easily established (e.g, Fact 771) 7.70 Definition Let C and K be random variables, respectively, denoting the ciphertext block and the key, and let D denote the decryption function. Under the random cipher model, DK (C) is a random variable uniformly distributed over all possible pre-images of C (meaningful messages and otherwise, with and without redundancy). In an intuitive sense, a random cipher as per the model of Definition 7.70 is a random mapping. (A more precise approximation would be as a random permutation) 7.71 Fact Under the random cipher model, the expected unicity distance N0 of a cipher is N0 = H(K)/D, where H(K) is the entropy of the key space (e.g, 64 bits for 264 equiprobable keys), and D is the plaintext redundancy (in bits/character). For a one-time pad, the unbounded entropy of the key space implies, by

Fact 7.71, that the unicity distance is likewise unbounded. This is consistent with the one-time pad being theoretically unbreakable. Data compression reduces redundancy. Fact 771 implies that data compression prior to encryption increases the unicity distance, thus increasing security. If the plaintext contains no redundancy whatsoever, then the unicity distance is infinite; that is, the system is theoretically unbreakable under a ciphertext-only attack. 7.72 Example (unicity distance – transposition cipher) The unicity distance of a simple transposition cipher of period t can be estimated under the random cipher model using Fact 771, and the assumption of plaintext redundancy of D = 3.2 bits/character In this case, H(K)/D = lg(t!)/3.2 and for t = 12 the estimated unicity distance is 9 characters, which is very crude, this being less than one 12-character block. For t = 27, the estimated unicity distance is a more plausible 29 √ characters; this can be computed using Stirling’s

approximation of Fact 2.57(iii) (t! ≈ 2πt(t/e)t , for large t and e = 2718) as H(K)/D = lg(t!)/3.2 ≈ (03t) · lg(t/e)  c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 247 7.73 Example (unicity distance – simple substitution) The number of keys for a mono-alphabetic substitution cipher over alphabet A is |K| = s!, where s = |A| For example, s = 26 (Roman alphabet) yields 26! ≈ 4 × 1026 keys. Assuming equiprobable keys, an estimate of the entropy of the key space is then (cf. Example 772) H(K) = lg(26!) ≈ 884 bits Assuming English text with D = 32 bits of redundancy per character (Fact 767), a theoretical estimate of the unicity distance of a simple substitution cipher is H(K)/D = 88.4/32 ≈ 28 characters. This agrees closely with empirical evidence (Fact 768)  (iii) Language statistics Cryptanalysis of classical ciphers typically relies on redundancy in the source language (plaintext). In many

cases a divide-and-conquer approach is possible, whereby the plaintext or key is recovered piece by piece, each facilitating further recovery. Mono-alphabetic substitution on short plaintext blocks (e.g, Roman alphabet characters) is easily defeated by associating ciphertext characters with plaintext characters (Note 7.50) The frequency distribution of individual ciphertext characters can be compared to that of single characters in the source language, as given by Figure 7.5 (estimated from 1964 English text). This is facilitated by grouping plaintext letters by frequency into high, medium, low, and rare classes; focussing on the high-frequency class, evidence supporting trial letter assignments can be obtained by examining how closely hypothesized assignments match those of the plaintext language. Further evidence is available by examination of digram and trigram frequencies. Figure 76 gives the most common English digrams as a percentage of all digrams; note that of 262 = 676

possible digrams, the top 15 account for 27% of all occurrences. Other examples of plaintext redundancy appearing in the ciphertext include associations of vowels with consonants, and repeated letters in pattern words (e.g, “that”, “soon”, “three”) 13 12 11 10 9 8 7 6 5 4 3 2 1 0 % 12.51 9.25 8.04 7.60 7.26 7.09 6.54 6.12 5.49 4.14 3.99 3.06 2.71 2.53 2.30 2.00 1.96 1.92 1.73 1.54 0.99 0.67 0.16 0.11 0.19 0.09 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Figure 7.5: Frequency of single characters in English text 7.74 Note (large blocks preclude statistical analysis) An n-bit block size implies 2n plaintext units (“characters”). Compilation of frequency statistics on plaintext units thus becomes infeasible as the block size of the simple substitution increases; for example, this is clearly infeasible for DES (§7.4), where n = 64 Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 248 Ch. 7 Block Ciphers

Cryptanalysis of simple transposition ciphers is similarly facilitated by source language statistics (see Note 7.47) Cryptanalyzing transposed blocks resembles solving an anagram Attempts to reconstruct common digrams and trigrams are facilitated by frequency statistics. Solutions may be constructed piecewise, with the appearance of digrams and trigrams in trial decryptions confirming (partial) success. 4 % 3.21 3.05 3 2 1.81 2.30 2.13 1.51 132 153 1.90 1.83 1.36 1.28 1.22 130 1.28 1 0 AN AT ED EN ER ES HE IN ON OR RE ST TE TH TI Figure 7.6: Frequency of 15 common digrams in English text Cryptanalysis of polyalphabetic ciphers is possible by various methods, including Kasiski’s method and methods based on the index of coincidence, as discussed below. (iv) Method of Kasiski (vs. polyalphabetic substitution) Kasiski’s method provides a general technique for cryptanalyzing polyalphabetic ciphers with repeated keywords, such as the simple Vigenère cipher (Definition

7.53), based on the following observation: repeated portions of plaintext encrypted with the same portion of the keyword result in identical ciphertext segments. Consequently one expects the number of characters between the beginning of repeated ciphertext segments to be a multiple of the keyword length. Ideally, it suffices to compute the greatest common divisor of the various distances between such repeated segments, but coincidental repeated ciphertext segments may also occur Nonetheless, an analysis (Kasiski examination) of the common factors among all such distances is possible; the largest factor which occurs most commonly is the most likely keyword length. Repeated ciphertext segments of length 4 or longer are most useful, as coincidental repetitions are then less probable. The number of letters in the keyword indicates the number of alphabets t in the polyalphabetic substitution. Ciphertext characters can then be partitioned into t sets, each of which is then the result of a

mono-alphabetic substitution. Trial values for t are confirmed if the frequency distribution of the (candidate) mono-alphabetic groups matches the frequency distribution of the plaintext language. For example, the profile for plaintext English (Figure 7.5) exhibits a long trough characterizing uvwxyz, followed by a spike at a, and preceded by the triple-peak of rst. The resulting mono-alphabetic portions can be solved individually, with additional information available by combining their solution (based on digrams, probable words, etc) If the source language is unknown, comparing the frequency distribution of ciphertext characters to that of candidate languages may allow determination of the source language itself. (v) Index of coincidence (vs. polyalphabetic substitution) The index of coincidence (IC) is a measure of the relative frequency of letters in a ciphertext sample, which facilitates cryptanalysis of polyalphabetic ciphers by allowing determination of the period t (as an

alternative to Kasiski’s method). For concreteness, consider a Vigènere cipher and assume natural language English plaintext. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.3 Classical ciphers and historical development 249 Let the ciphertext alphabet be {a0 , a1 , . , an−1 }, and let pi be the unknown probability that an arbitrarily chosen character in a random ciphertext is ai The measure of roughness measures the deviation of ciphertext characters from a flat frequency distribution as follows: 2 n−1 n−1 X X 1 1 pi − (7.1) MR = = pi 2 − n n i=0 i=0 The minimum value is MRmin = 0, corresponding to a flat distribution (for equiprobable ai , pi = 1/n). The maximum value occurs when the frequency distribution of pi has greatest variability, corresponding to a mono-alphabetic substitution (the plaintext frequency distribution is thenP manifested). Define this maximum value MRmax = κp − 1/n, where κp corresponds to pi 2 when pi are

plaintext frequencies. For English as per Figure 75, the maximum value is MR = κp − 1/n ≈ 0.0658 − 00385 = 00273 (This varies with letter frequency estimates; κp = 0.0667, yielding κp − 1/n = 00282 is commonly cited, and is used in Table 7.1) While MR cannot be computed directly from a ciphertext sample (since the period t is unknown, the mono-alphabetic substitutions cannot be separated), it may be estimated from the frequency distribution of ciphertext characters as follows. P Let fi denote the number of appearances of ai in an L-character ciphertext sample (thus fi = L). The number of pairs of letters among these L is L(L − 1)/2, of which fi (fi − 1)/2 are the pair (ai , ai ) for any fixed character ai . Define IC as the probability that two characters arbitrarily chosen from the given ciphertext sample are equal: Pn−1 fi  Pn−1 i=0 2 i=0 fi (fi − 1) IC =  = (7.2) L L(L − 1) 2 Independent of this given ciphertext Pn−1sample, the probability that two

randomly chosen ciphertextP characters are equal is i=0 pi 2 . Thus (comparing word definitions) IC is an estimate of pi 2 , and by equation (71), thereby an estimate of MR + 1/n Moreover, IC can be directly computed from a ciphertext sample, allowing estimation of MR itself. Since MR varies from 0 to κp − 1/n, one expects IC to range from 1/n (for polyalphabetic substitution with infinite period) to κp (for mono-alphabetic substitution). More precisely, the following result may be established. 7.75 Fact For a polyalphabetic cipher of period t, E(IC) as given below is the expected value of the index of coincidence for a ciphertext string of length L, where n is the number of alphabet characters, κr = 1/n, and κp is given in Table 7.1: E(IC) = 1 L−t t−1 L · · κp + · · κr t L−1 t L−1 (7.3) (p in κp is intended to denote a plaintext frequency distribution, while the r in κr denotes a distribution for random characters.) For Roman-alphabet languages, n = 26 implies

κr = 0.03846; for the Russian Cyrillic alphabet, n = 30 7.76 Example (estimating polyalphabetic period using IC) Tabulating the expected values for IC for periods t = 1, 2, . using Equation (73) (which is essentially independent of L for large L and small t), and comparing this to that obtained from a particular ciphertext using Equation (7.2) allows a crude estimate of the period t of the cipher, eg, whether it is mono-alphabetic or polyalphabetic with small period. Candidate values t in the range thus determined may be tested for correctness by partitioning ciphertext characters into groups of letters separated by t ciphertext positions, and in one or more such groups, comparing the character frequency distribution to that of plaintext.  Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 250 Ch. 7 Block Ciphers Language French Spanish German Italian English Russian κp 0.0778 0.0775 0.0762 0.0738 0.0667 0.0529 Table 7.1: Estimated roughness

constant κp for various languages (see Fact 775) A polyalphabetic period t may be determined either by Example 7.76 or the alternative of Example 7.77, based on the same underlying ideas Once t is determined, the situation is as per after successful completion of the Kasiski method. 7.77 Example (determining period by ciphertext auto-correlation) Given a sample of polyalphabetic ciphertext, the unknown period t may be determined by examining the number of coincidences when the ciphertext is auto-correlated. More specifically, given a ciphertext sample c1 c2 . cL , starting with t = 1, count the total number of occurrences ci = ci+t for 1 ≤ i ≤ L − t. Repeat for t = 2, 3, and tabulate the counts (or plot a bar graph) The actual period t∗ is revealed as follows: for values t that are a multiple of t∗ , the counts will be noticeably higher (easily recognized as spikes on the bar graph). In fact, for L appropriately large, one expects approximately L · κp coincidences in

this case, and significantly fewer in other cases.  In the auto-correlation method of coincidences of Example 7.77, the spikes on the bar graph reveal the period, independent of the source language. Once the period is determined, ciphertext characters from like alphabets can be grouped, and the profile of single-character letter frequencies among these, which differs for each language, may be used to determine the plaintext language. 7.4 DES The Data Encryption Standard (DES) is the most well-known symmetric-key block cipher. Recognized world-wide, it set a precedent in the mid 1970s as the first commercial-grade modern algorithm with openly and fully specified implementation details. It is defined by the American standard FIPS 46–2. 7.41 Product ciphers and Feistel ciphers The design of DES is related to two general concepts: product ciphers and Feistel ciphers. Each involves iterating a common sequence or round of operations. The basic idea of a product cipher (see §1.53) is to

build a complex encryption function by composing several simple operations which offer complementary, but individually insufficient, protection (note cascade ciphers per Definition 7.29 use independent keys) Basic operations include transpositions, translations (eg, XOR) and linear transformations, arithmetic operations, modular multiplication, and simple substitutions. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.4 DES 251 7.78 Definition A product cipher combines two or more transformations in a manner intending that the resulting cipher is more secure than the individual components. 7.79 Definition A substitution-permutation (SP) network is a product cipher composed of a number of stages each involving substitutions and permutations (Figure 7.7) plaintext S S S S S S P S S P ciphertext Figure 7.7: Substitution-permutation (SP) network Many SP networks are iterated ciphers as per Definition 7.80 7.80 Definition An iterated block cipher

is a block cipher involving the sequential repetition of an internal function called a round function. Parameters include the number of rounds r, the block bitsize n, and the bitsize k of the input key K from which r subkeys Ki (round keys) are derived. For invertibility (allowing unique decryption), for each value Ki the round function is a bijection on the round input. 7.81 Definition A Feistel cipher is an iterated cipher mapping a 2t-bit plaintext (L0 , R0 ), for t-bit blocks L0 and R0 , to a ciphertext (Rr , Lr ), through an r-round process where r ≥ 1. K For 1 ≤ i ≤ r, round i maps (Li−1 , Ri−1 ) i (Li , Ri ) as follows: Li = Ri−1 , Ri = Li−1 ⊕f (Ri−1 , Ki ), where each subkey Ki is derived from the cipher key K. Typically in a Feistel cipher, r ≥ 3 and often is even. The Feistel structure specifically orders the ciphertext output as (Rr , Lr ) rather than (Lr , Rr ); the blocks are exchanged from their usual order after the last round. Decryption is thereby

achieved using the same r-round process but with subkeys used in reverse order, Kr through K1 ; for example, the last round is undone by simply repeating it (see Note 7.84) The f function of the Feistel cipher may be a product cipher, though f itself need not be invertible to allow inversion of the Feistel cipher. Figure 7.9(b) illustrates that successive rounds of a Feistel cipher operate on alternating halves of the ciphertext, while the other remains constant Note the round function of Definition 7.81 may also be re-written to eliminate Li : Ri = Ri−2 ⊕f (Ri−1 , Ki ) In this case, the final ciphertext output is (Rr , Rr−1 ), with input labeled (R−1 , R0 ). Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 252 Ch. 7 Block Ciphers 7.42 DES algorithm DES is a Feistel cipher which processes plaintext blocks of n = 64 bits, producing 64-bit ciphertext blocks (Figure 7.8) The effective size of the secret key K is k = 56 bits; more precisely,

the input key K is specified as a 64-bit key, 8 bits of which (bits 8, 16, . , 64) may be used as parity bits. The 256 keys implement (at most) 256 of the 264 ! possible bijections on 64-bit blocks A widely held belief is that the parity bits were introduced to reduce the effective key size from 64 to 56 bits, to intentionally reduce the cost of exhaustive key search by a factor of 256. K K plaintext P 56 P 64 DES 56 ciphertext C C key K C 64 DES−1 P Figure 7.8: DES input-output Full details of DES are given in Algorithm 7.82 and Figures 79 and 710 An overview follows. Encryption proceeds in 16 stages or rounds From the input key K, sixteen 48-bit subkeys Ki are generated, one for each round. Within each round, 8 fixed, carefully selected 6-to-4 bit substitution mappings (S-boxes) Si , collectively denoted S, are used. The 64-bit plaintext is divided into 32-bit halves L0 and R0 . Each round is functionally equivalent, taking 32-bit inputs Li−1 and Ri−1 from the

previous round and producing 32-bit outputs Li and Ri for 1 ≤ i ≤ 16, as follows: Li Ri = Ri−1 ; (7.4) = Li−1 ⊕ f (Ri−1 , Ki ), where f (Ri−1 , Ki ) = P (S(E(Ri−1 ) ⊕ Ki ))(7.5) Here E is a fixed expansion permutation mapping Ri−1 from 32 to 48 bits (all bits are used once; some are used twice). P is another fixed permutation on 32 bits An initial bit permutation (IP) precedes the first round; following the last round, the left and right halves are exchanged and, finally, the resulting string is bit-permuted by the inverse of IP. Decryption involves the same key and algorithm, but with subkeys applied to the internal rounds in the reverse order (Note 7.84) A simplified view is that the right half of each round (after expanding the 32-bit input to 8 characters of 6 bits each) carries out a key-dependent substitution on each of 8 characters, then uses a fixed bit transposition to redistribute the bits of the resulting characters to produce 32 output bits. Algorithm

7.83 specifies how to compute the DES round keys Ki , each of which contains 48 bits of K These operations make use of tables PC1 and PC2 of Table 74, which are called permuted choice 1 and permuted choice 2. To begin, 8 bits (k8 , k16 , , k64 ) of K are discarded (by PC1). The remaining 56 bits are permuted and assigned to two 28-bit variables C and D; and then for 16 iterations, both C and D are rotated either 1 or 2 bits, and 48 bits (Ki ) are selected from the concatenated result. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.4 DES 253 7.82 Algorithm Data Encryption Standard (DES) INPUT: plaintext m1 . m64 ; 64-bit key K = k1 k64 (includes 8 parity bits) OUTPUT: 64-bit ciphertext block C = c1 . c64 (For decryption, see Note 784) 1. (key schedule) Compute sixteen 48-bit round keys Ki from K using Algorithm 783 2. (L0 , R0 ) ← IP(m1 m2 m64 ) (Use IP from Table 72 to permute bits; split the result into left and right 32-bit halves L0

= m58 m50 . m8 , R0 = m57 m49 m7 ) 3. (16 rounds) for i from 1 to 16, compute Li and Ri using Equations (74) and (75) above, computing f (Ri−1 , Ki ) = P (S(E(Ri−1 ) ⊕ Ki )) as follows: (a) Expand Ri−1 = r1 r2 . r32 from 32 to 48 bits using E per Table 73: T ← E(Ri−1 ). (Thus T = r32 r1 r2 r32 r1 ) (b) T 0 ← T ⊕Ki . Represent T 0 as eight 6-bit character strings: (B1 , , B8 ) = T 0. (c) T 00 ← (S1 (B1 ), S2 (B2 ), . S8 (B8 )) (Here Si (Bi ) maps Bi = b1 b2 b6 to the 4-bit entry in row r and column c of Si in Table 7.8, page 260 where r = 2 · b1 + b6 , and b2 b3 b4 b5 is the radix-2 representation of 0 ≤ c ≤ 15. Thus S1 (011011) yields r = 1, c = 13, and output 5, i.e, binary 0101) (d) T 000 ← P (T 00 ). (Use P per Table 73 to permute the 32 bits of T 00 = t1 t2 t32 , yielding t16 t7 . t25 ) 4. b1 b2 b64 ← (R16 , L16 ) (Exchange final blocks L16 , R16 ) 5. C ← IP−1 (b1 b2 b64 ) (Transpose using IP−1 from Table 72; C = b40 b8

b25 ) IP−1 16 56 15 55 14 54 13 53 12 52 11 51 10 50 9 49 IP 58 60 62 64 57 59 61 63 50 52 54 56 49 51 53 55 42 44 46 48 41 43 45 47 34 36 38 40 33 35 37 39 26 28 30 32 25 27 29 31 18 20 22 24 17 19 21 23 10 12 14 16 9 11 13 15 2 4 6 8 1 3 5 7 40 39 38 37 36 35 34 33 8 7 6 5 4 3 2 1 48 47 46 45 44 43 42 41 24 23 22 21 20 19 18 17 64 63 62 61 60 59 58 57 32 31 30 29 28 27 26 25 Table 7.2: DES initial permutation and inverse (IP and IP−1 ) E 32 4 8 12 16 20 24 28 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 P 3 7 11 15 19 23 27 31 4 8 12 16 20 24 28 32 5 9 13 17 21 25 29 1 16 29 1 5 2 32 19 22 7 12 15 18 8 27 13 11 20 28 23 31 24 3 30 4 21 17 26 10 14 9 6 25 Table 7.3: DES per-round functions: expansion E and permutation P Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 254 Ch. 7 Block Ciphers (a) twisted ladder (b) untwisted ladder input m1 m2 · · · input m64 64 initial permutation IP IP 64 L0 R0 48 K1 32

32 R0 L0 32 K1 L0 f f R1 L1 K2 L1 R1 f K2 K3 L2 f R2 f R3 L15 K4 L3 R15 f K16 K16 R15 L15 f f L16 R16 irregular swap R16 R16 L16 L16 64 IP −1 inverse permutation IP −1 64 output output c1 c2 · · · R0 c64 Li = Ri−1 Ri = Li−1 ⊕ f (Ri−1 , Ki ) Figure 7.9: DES computation path c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.4 DES 255 Ki Ri−1 32 expansion 48 E 48 48 8 × 6 bits 6 S1 S2 S3 S4 S5 S6 S7 S8 substitution 4 8 × 4 bits 32 P permutation 32 f (Ri−1 , Ki ) = P (S(E(Ri−1 ) ⊕ Ki )) Figure 7.10: DES inner function f 7.83 Algorithm DES key schedule INPUT: 64-bit key K = k1 . k64 (including 8 odd-parity bits) OUTPUT: sixteen 48-bit keys Ki , 1 ≤ i ≤ 16. 1. Define vi , 1 ≤ i ≤ 16 as follows: vi = 1 for i ∈ {1, 2, 9, 16}; vi = 2 otherwise (These are left-shift values for 28-bit circular rotations below.) 2. T ← PC1(K); represent T as 28-bit halves (C0 , D0 ) (Use

PC1 in Table 74 to select bits from K: C0 = k57 k49 . k36 , D0 = k63 k55 k4 ) 3. For i from 1 to 16, compute Ki as follows: Ci ← (Ci−1 ←- vi ), Di ← (Di−1 ←vi ), Ki ← PC2(Ci , Di ) (Use PC2 in Table 74 to select 48 bits from the concatenation b1 b2 b56 of Ci and Di : Ki = b14 b17 b32 ‘←-’ denotes left circular shift) If decryption is designed as a simple variation of the encryption function, savings result in hardware or software code size. DES achieves this as outlined in Note 784 7.84 Note (DES decryption) DES decryption consists of the encryption algorithm with the same key but reversed key schedule, using in order K16 , K15 , . , K1 (see Note 785) This works as follows (refer to Figure 7.9) The effect of IP−1 is cancelled by IP in decryption, leaving (R16 , L16 ); consider applying round 1 to this input The operation on the left half yields, rather than L0 ⊕f (R0 , K1 ), now R16 ⊕f (L16 , K16 ) which, since L16 = R15 and R16 = L15 ⊕f (R15 ,

K16 ), is equal to L15 ⊕f (R15 , K16 )⊕f (R15 , K16 ) = L15 . Thus round 1 decryption yields (R15 , L15 ), i.e, inverting round 16 Note that the cancellation Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 256 Ch. 7 Block Ciphers 57 1 10 19 63 7 14 21 PC1 49 41 33 25 17 58 50 42 34 26 2 59 51 43 35 11 3 60 52 44 above for Ci ; below for Di 55 47 39 31 23 62 54 46 38 30 6 61 53 45 37 13 5 28 20 12 9 18 27 36 15 22 29 4 14 3 23 16 41 30 44 46 17 28 19 7 52 40 49 42 PC2 11 24 15 6 12 4 27 20 31 37 51 45 39 56 50 36 1 21 26 13 47 33 34 29 5 10 8 2 55 48 53 32 Table 7.4: DES key schedule bit selections (PC1 and PC2) of each round is independent of the definition of f and the specific value of Ki ; the swapping of halves combined with the XOR process is inverted by the second application. The remaining 15 rounds are likewise cancelled one by one in reverse order of application, due to the reversed key schedule. 7.85 Note (DES decryption key

schedule) Subkeys K1 , , K16 may be generated by Algorithm 783 and used in reverse order, or generated in reverse order directly as follows Note that after K16 is generated, the original values of the 28-bit registers C and D are restored (each has rotated 28 bits). Consequently, and due to the choice of shift-values, modifying Algorithm 7.83 as follows generates subkeys in order K16 , , K1 : replace the left-shifts by right-shift rotates; change the shift value v1 to 0. 7.86 Example (DES test vectors) The plaintext “Now is the time for all ”, represented as a string of 8-bit hex characters (7-bit ASCII characters plus leading 0-bit), and encrypted using the DES key specified by the hex string K = 0123456789ABCDEF results in the following plaintext/ciphertext: P = 4E6F772069732074 68652074696D6520 666F7220616C6C20 C = 3FA40E8A984D4815 6A271787AB8883F9 893D51EC4B563B53.  7.43 DES properties and strength There are many desirable characteristics for block ciphers. These

include: each bit of the ciphertext should depend on all bits of the key and all bits of the plaintext; there should be no statistical relationship evident between plaintext and ciphertext; altering any single plaintext or key bit should alter each ciphertext bit with probability 12 ; and altering a ciphertext bit should result in an unpredictable change to the recovered plaintext block. Empirically, DES satisfies these basic objectives. Some known properties and anomalies of DES are given below. (i) Complementation property 7.87 Fact Let E denote DES, and x the bitwise complement of x Then y = EK (x) implies y = EK (x). That is, bitwise complementing both the key K and the plaintext x results in complemented DES ciphertext. Justification: Compare the first round output (see Figure 7.10) to (L0 , R0 ) for the uncomplemented case The combined effect of the plaintext and key being complemented results c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.4 DES

257 in the inputs to the XOR preceding the S-boxes (the expanded Ri−1 and subkey Ki ) both being complemented; this double complementation cancels out in the XOR operation, resulting in S-box inputs, and thus an overall result f (R0 , K1 ), as before. This quantity is then XORed (Figure 7.9) to L0 (previously L0 ), resulting in L1 (rather than L1 ) The same effect follows in the remaining rounds. The complementation property is normally of no help to a cryptanalyst in known-plaintext exhaustive key search. If an adversary has, for a fixed unknown key K, a chosenplaintext set of (x, y) data (P1 , C1 ), (P1 , C2 ), then C2 = EK (P1 ) implies C2 = EK (P1 ) Checking if the key K with plaintext P1 yields either C1 or C2 now rules out two keys with one encryption operation, thus reducing the expected number of keys required before success from 255 to 254 . This is not a practical concern (ii) Weak keys, semi-weak keys, and fixed points If subkeys K1 to K16 are equal, then the reversed and

original schedules create identical subkeys: K1 = K16 , K2 = K15 , and so on. Consequently, the encryption and decryption functions coincide. These are called weak keys (and also: palindromic keys) 7.88 Definition A DES weak key is a key K such that EK (EK (x)) = x for all x, ie, defining an involution. A pair of DES semi-weak keys is a pair (K1 , K2 ) with EK1 (EK2 (x)) = x Encryption with one key of a semi-weak pair operates as does decryption with the other. 7.89 Fact DES has four weak keys and six pairs of semi-weak keys The four DES weak keys are listed in Table 7.5, along with corresponding 28-bit variables C0 and D0 of Algorithm 783; here {0}j represents j repetitions of bit 0 Since C0 and D0 are all-zero or all-one bit vectors, and rotation of these has no effect, it follows that all subkeys Ki are equal and an involution results as noted above. The six pairs of DES semi-weak keys are listed in Table 7.6 Note their defining property (Definition 788) occurs when subkeys K1

through K16 of the first key, respectively, equal subkeys K16 through K1 of the second. This requires that a 1-bit circular left-shift of each of C0 and D0 for the first 56-bit key results in the (C0 , D0 ) pair for the second 56-bit key (see Note 7.84), and thereafter left-rotating Ci and Di one or two bits for the first results in the same value as right-rotating those for the second the same number of positions The values in Table 7.6 satisfy these conditions Given any one 64-bit semi-weak key, its paired semi-weak key may be obtained by splitting it into two halves and rotating each half through 8 bits. 7.90 Fact Let E denote DES For each of the four DES weak keys K, there exist 232 fixed points of EK , i.e, plaintexts x such that EK (x) = x Similarly, four of the twelve semi-weak keys K each have 232 anti-fixed points, i.e, x such that EK (x) = x The four semi-weak keys of Fact 7.90 are in the upper portion of Table 76 These are called anti-palindromic keys, since for these K1 =

K16 , K2 = K15 , and so on. (iii) DES is not a group For a fixed DES key K, DES defines a permutation from {0, 1}64 to {0, 1}64. The set of DES keys defines 256 such (potentially different) permutations. If this set of permutations was closed under composition (i.e, given any two keys K1 , K2 , there exists a third key K3 such that EK3 (x) = EK2 (EK1 (x)) for all x) then multiple encryption would be equivalent to single encryption. Fact 791 states that this is not the case for DES Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 258 Ch. 7 Block Ciphers weak key (hexadecimal) 0101 FEFE 1F1F E0E0 0101 FEFE 1F1F E0E0 0101 FEFE 0E0E F1F1 0101 FEFE 0E0E F1F1 C0 D0 28 {0}28 {1}28 {1}28 {0}28 {0} {1}28 {0}28 {1}28 Table 7.5: Four DES weak keys C0 D0 14 {01} {01}14 {01}14 {01}14 {0}28 {1}28 14 {01} {10}14 {0}28 {1}28 {01}14 {01}14 C0 semi-weak key pair (hexadecimal) 01FE 1FE0 01E0 1FFE 011F E0FE 01FE 1FE0 01E0 1FFE 011F E0FE 01FE 0EF1 01F1

0EFE 010E F1FE 01FE, 0EF1, 01F1, 0EFE, 010E, F1FE, FE01 E01F E001 FE1F 1F01 FEE0 FE01 E01F E001 FE1F 1F01 FEE0 FE01 F10E F101 FE0E 0E01 FEF1 FE01 F10E F101 FE0E 0E01 FEF1 14 {10} {10}14 {10}14 {10}14 {0}28 {1}28 D0 {10}14 {01}14 {0}28 {1}28 {10}14 {10}14 Table 7.6: Six pairs of DES semi-weak keys (one pair per line) 7.91 Fact The set of 256 permutations defined by the 256 DES keys is not closed under functional composition Moreover, a lower bound on the size of the group generated by composing this set of permutations is 102499 The lower bound in Fact 7.91 is important with respect to using DES for multiple encryption If the group generated by functional composition was too small, then multiple encryption would be less secure than otherwise believed. (iv) Linear and differential cryptanalysis of DES Assuming that obtaining enormous numbers of known-plaintext pairs is feasible, linear cryptanalysis provides the most powerful attack on DES to date; it is not, however,

considered a threat to DES in practical environments. Linear cryptanalysis is also possible in a ciphertext-only environment if some underlying plaintext redundancy is known (e.g, parity bits or high-order 0-bits in ASCII characters). Differential cryptanalysis is one of the most general cryptanalytic tools to date against modern iterated block ciphers, including DES, Lucifer, and FEAL among many others. It is, however, primarily a chosen-plaintext attack. Further information on linear and differential cryptanalysis is given in §7.8 7.92 Note (strength of DES) The complexity (see §721) of the best attacks currently known against DES is given in Table 7.7; percentages indicate success rate for specified attack parameters The ‘processing complexity’ column provides only an estimate of the expected cost (operation costs differ across the various attacks); for exhaustive search, the cost is in DES operations. Regarding storage complexity, both linear and differential cryptanalysis

require only negligible storage in the sense that known or chosen texts can be processed individually and discarded, but in a practical attack, storage for accumulated texts would be required if ciphertext was acquired prior to commencing the attack. c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.5 FEAL 259 attack method data complexity known chosen storage complexity processing complexity exhaustive precomputation 1 256 1 (table lookup) exhaustive search 1 negligible 255 43 linear cryptanalysis 2 (85%) 238 (10%) for texts for texts 243 250 differential cryptanalysis 255 247 for texts for texts 247 255 Table 7.7: DES strength against various attacks 7.93 Remark (practicality of attack models) To be meaningful, attack comparisons based on different models (e.g, Table 77) must appropriately weigh the feasibility of extracting (acquiring) enormous amounts of chosen (known) plaintexts, which is considerably more difficult

to arrange than a comparable number of computing cycles on an adversary’s own machine Exhaustive search with one known plaintext-ciphertext pair (for ciphertext-only, see Example 7.28) and 255 DES operations is significantly more feasible in practice (eg, using highly parallelized custom hardware) than linear cryptanalysis (LC) requiring 243 known pairs. While exhaustive search, linear, and differential cryptanalysis allow recovery of a DES key and, therefore, the entire plaintext, the attacks of Note 7.8, which become feasible once about 232 ciphertexts are available, may be more efficient if the goal is to recover only part of the text. 7.5 FEAL The Fast Data Encipherment Algorithm (FEAL) is a family of algorithms which has played a critical role in the development and refinement of various advanced cryptanalytic techniques, including linear and differential cryptanalysis. FEAL-N maps 64-bit plaintext to 64-bit ciphertext blocks under a 64-bit secret key. It is an N -round Feistel

cipher similar to DES (cf. Equations (74), (75)), but with a far simpler f -function, and augmented by initial and final stages which XOR the two data halves as well as XOR subkeys directly onto the data halves. FEAL was designed for speed and simplicity, especially for software on 8-bit microprocessors (e.g, chipcards) It uses byte-oriented operations (8-bit addition mod 256, 2-bit left rotation, and XOR), avoids bit-permutations and table look-ups, and offers small code size. The initial commercially proposed version with 4 rounds (FEAL-4), positioned as a fast alternative to DES, was found to be considerably less secure than expected (see Table 7.10) FEAL-8 was similarly found to offer less security than planned FEAL-16 or FEAL-32 may yet offer security comparable to DES, but throughput decreases as the number of rounds rises. Moreover, whereas the speed of DES implementations can be improved through very large lookup tables, this appears more difficult for FEAL. Algorithm 7.94

specifies FEAL-8 The f -function f (A, Y ) maps an input pair of 32 × 16 bits to a 32-bit output. Within the f function, two byte-oriented data substitutions (Sboxes) S0 and S1 are each used twice; each maps a pair of 8-bit inputs to an 8-bit output Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 260 Ch. 7 Block Ciphers row [0] [1] [2] [3] [0] [1] [2] [3] 14 0 4 15 4 15 1 12 13 7 14 8 1 4 8 2 [0] [1] [2] [3] 15 3 0 13 1 13 14 8 8 4 7 10 14 7 11 1 [0] [1] [2] [3] 10 13 13 1 0 7 6 10 9 0 4 13 14 9 9 0 [0] [1] [2] [3] 7 13 10 3 13 8 6 15 14 11 9 0 3 5 0 6 [0] [1] [2] [3] 2 14 4 11 12 11 2 8 4 2 1 12 1 12 11 7 [0] [1] [2] [3] 12 10 9 4 1 15 14 3 10 4 15 2 15 2 5 12 [0] [1] [2] [3] 4 13 1 6 11 0 4 11 2 11 11 13 14 7 13 8 [0] [1] [2] [3] 13 1 7 2 2 15 11 1 8 13 4 14 4 8 1 7 column number [4] [5] [6] [7] [8] [9] [10] [11] S1 2 15 11 8 3 10 6 12 14 2 13 1 10 6 12 11 13 6 2 11 15 12 9 7 4 9 1 7 5 11 3 14 S2 6 11 3 4 9

7 2 13 15 2 8 14 12 0 1 10 10 4 13 1 5 8 12 6 3 15 4 2 11 6 7 12 S3 6 3 15 5 1 13 12 7 3 4 6 10 2 8 5 14 8 15 3 0 11 1 2 12 6 9 8 7 4 15 14 3 S4 0 6 9 10 1 2 8 5 6 15 0 3 4 7 2 12 12 11 7 13 15 1 3 14 10 1 13 8 9 4 5 11 S5 7 10 11 6 8 5 3 15 4 7 13 1 5 0 15 10 10 13 7 8 15 9 12 5 1 14 2 13 6 15 0 9 S6 9 2 6 8 0 13 3 4 7 12 9 5 6 1 13 14 2 8 12 3 7 0 4 10 9 5 15 10 11 14 1 7 S7 15 0 8 13 3 12 9 7 4 9 1 10 14 3 5 12 12 3 7 14 10 15 6 8 1 4 10 7 9 5 0 15 S8 6 15 11 1 10 9 3 14 10 3 7 4 12 5 6 11 9 12 14 2 0 6 10 13 4 10 8 13 15 12 9 0 [12] [13] [14] [15] 5 9 3 10 9 5 10 0 0 3 5 6 7 8 0 13 12 6 9 0 0 9 3 5 5 11 2 14 10 5 15 9 11 12 5 11 4 11 10 5 2 15 14 2 8 1 7 12 11 1 5 12 12 10 2 7 4 14 8 2 15 9 4 14 13 3 6 10 0 9 3 4 14 8 0 5 9 6 14 3 14 0 1 6 7 11 13 0 5 3 11 8 11 8 6 13 5 2 0 14 10 15 5 2 6 8 9 3 1 6 2 12 5 0 15 3 0 14 3 5 12 9 5 6 7 2 8 11 Table 7.8: DES S-boxes c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.5 FEAL

261 (see Table 7.9) S0 and S1 add a single bit d ∈ {0, 1} to 8-bit arguments x and y, ignore the carry out of the top bit, and left rotate the result 2 bits (ROT2): Sd (x, y) = ROT 2(x + y + d mod 256) (7.6) The key schedule uses a function fK (A, B) similar to the f -function (see Table 7.9; Ai , Bi , Yi , ti , and Ui are 8-bit variables), mapping two 32-bit inputs to a 32-bit output. t1 = t2 = U1 = U2 = U0 = U3 = U ← f (A, Y ) (A0 ⊕A1 )⊕Y0 (A2 ⊕A3 )⊕Y1 S1 (t1 , t2 ) S0 (t2 , U1 ) S0 (A0 , U1 ) S1 (A3 , U2 ) U ← fK (A, B) A0 ⊕A1 A2 ⊕A3 S1 (t1 , t2 ⊕B0 ) S0 (t2 , U1 ⊕B1 ) S0 (A0 , U1 ⊕B2 ) S1 (A3 , U2 ⊕B3 ) Table 7.9: Output U = (U0 , U1 , U2 , U3 ) for FEAL functions f , fK (Algorithm 794) As the operations of 2-bit rotation and XOR are both linear, the only nonlinear elementary operation in FEAL is addition mod 256. 7.94 Algorithm Fast Data Encipherment Algorithm (FEAL-8) INPUT: 64-bit plaintext M = m1 . m64 ; 64-bit key K = k1 k64 OUTPUT:

64-bit ciphertext block C = c1 . c64 (For decryption, see Note 796) 1. (key schedule) Compute sixteen 16-bit subkeys Ki from K using Algorithm 795 2. Define ML = m1 · · · m32 , MR = m33 · · · m64 3. (L0 , R0 ) ← (ML , MR ) ⊕ ((K8 , K9 ), (K10 , K11 )) (XOR initial subkeys) 4. R0 ← R0 ⊕L0 5. For i from 1 to 8 do: Li ← Ri−1 , Ri ← Li−1 ⊕f (Ri−1 , Ki−1 ) (Use Table 79 for f (A, Y ) with A = Ri−1 = (A0 , A1 , A2 , A3 ) and Y = Ki−1 = (Y0 , Y1 ).) 6. L8 ← L8 ⊕R8 7. (R8 , L8 ) ← (R8 , L8 ) ⊕ ((K12 , K13 ), (K14 , K15 )) (XOR final subkeys) 8. C ← (R8 , L8 ) (Note the order of the final blocks is exchanged) 7.95 Algorithm FEAL-8 key schedule INPUT: 64-bit key K = k1 . k64 OUTPUT: 256-bit extended key (16-bit subkeys Ki , 0 ≤ i ≤ 15). 1. (initialize) U (−2) ← 0, U (−1) ← k1 k32 , U (0) ← k33 k64 def 2. U = (U0 , U1 , U2 , U3 ) for 8-bit Ui Compute K0 , , K15 as i runs from 1 to 8: (a) U ← fK (U (i−2) , U (i−1)

⊕U (i−3) ). (fK is defined in Table 79, where A and B denote 4-byte vectors (A0 , A1 , A2 , A3 ), (B0 , B1 , B2 , B3 ).) (b) K2i−2 = (U0 , U1 ), K2i−1 = (U2 , U3 ), U (i) ← U . 7.96 Note (FEAL decryption) Decryption may be achieved using Algorithm 794 with the same key K and ciphertext C = (R8 , L8 ) as the plaintext input M , but with the key schedule reversed. More specifically, subkeys ((K12 , K13 ), (K14 , K15 )) are used for the initial XOR (step 3), ((K8 , K9 ), (K10 , K11 )) for the final XOR (step 7), and the round keys are used from K7 back to K0 (step 5). This is directly analogous to decryption for DES (Note 784) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 262 Ch. 7 Block Ciphers 7.97 Note (FEAL-N) FEAL with 64-bit key can be generalized to N -rounds, N even N = 2x is recommended; x = 3 yields FEAL-8 (Algorithm 7.94) FEAL-N uses N + 8 sixteen-bit subkeys: K0 , . , KN −1 , respectively, in round i; KN , , KN +3 for the

initial XOR; and KN +4 , . KN +7 for the final XOR The key schedule of Algorithm 795 is directly generalized to compute keys K0 through KN +7 as i runs from 1 to (N/2) + 4. 7.98 Note (FEAL-NX) Extending FEAL-N to use a 128-bit key results in FEAL-NX, with altered key schedule as follows The key is split into 64-bit halves (KL , KR ) KR is partitioned into 32-bit halves (KR1 , KR2 ) For 1 ≤ i ≤ (N/2) + 4, define Qi = KR1 ⊕KR2 for i ≡ 1 mod 3; Qi = KR1 for i ≡ 2 mod 3; and Qi = KR2 for i ≡ 0 mod 3. The second argument (U (i−1) ⊕U (i−3) ) to fK in step 2a of Algorithm 7.95 is replaced by U (i−1) ⊕U (i−3) ⊕Qi . For KR = 0, FEAL-NX matches FEAL-N with KL as the 64-bit FEAL-N key K. 7.99 Example (FEAL test vectors) For hex plaintext M = 00000000 00000000 and hex key K = 01234567 89ABCDEF, Algorithm 7.95 generates subkeys (K0 , , K7 ) = DF3BCA36 F17C1AEC 45A5B9C7 26EBAD25, (K8 , . , K15 ) = 8B2AECB7 AC509D4C 22CD479B A8D50CB5. Algorithm 794 generates FEAL-8

ciphertext C = CEEF2C86 F2490752. For FEAL-16, the corresponding ciphertext is C 0 = 3ADE0D2A D84D0B6F; for FEAL-32, C 00 = 69B0FAE6 DDED6B0B. For 128-bit key (KL , KR ) with KL = KR = K as above, M has corresponding FEAL-8X ciphertext C 000 = 92BEB65D 0E9382FB.  7.100 Note (strength of FEAL) Table 710 gives various published attacks on FEAL; LC and DC denote linear and differential cryptanalysis, and times are on common personal computers or workstations. attack method data complexity known chosen storage complexity processing complexity FEAL-4 – LC 5 30K bytes 6 minutes FEAL-6 – LC 100 100K bytes 40 minutes FEAL-8 – LC FEAL-8 – DC 224 27 pairs 280K bytes 10 minutes 2 minutes FEAL-16 – DC 229 pairs 230 operations FEAL-24 – DC 245 pairs 246 operations FEAL-32 – DC 66 2 pairs 267 operations Table 7.10: FEAL strength against various attacks c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.6 IDEA 263 7.6

IDEA The cipher named IDEA (International Data Encryption Algorithm) encrypts 64-bit plaintext to 64-bit ciphertext blocks, using a 128-bit input key K. Based in part on a novel generalization of the Feistel structure, it consists of 8 computationally identical rounds fol(r) lowed by an output transformation (see Figure 7.11) Round r uses six 16-bit subkeys Ki , 1 ≤ i ≤ 6, to transform a 64-bit input X into an output of four 16-bit blocks, which are input to the next round. The round 8 output enters the output transformation, employing four (9) additional subkeys Ki , 1 ≤ i ≤ 4 to produce the final ciphertext Y = (Y1 , Y2 , Y3 , Y4 ). All subkeys are derived from K. A dominant design concept in IDEA is mixing operations from three different algebraic groups of 2n elements. The corresponding group operations on sub-blocks a and b of bitlength n = 16 are bitwise XOR: a⊕b; addition mod 2n : (a + b) AND 0xFFFF, denoted ab; and (modified) multiplication mod 2n +1, with 0 ∈ Z2n

associated with 2n ∈ Z2n +1 : a b (see Note 7.104) plaintext (X1 , X2 , X3 , X4 ) X1 16 (r) X2 subkeys Ki 16 (1) 16 X4 16 (1) K1 X3 for round r (1) K2 16 16 K3 (1) 16 16 K4 (1) K5 round 1 t0 (1) K6 t2 MA-box t1 round r (2 ≤ r ≤ 8) (9) (9) K1 K2 16 16 Y1 Y2 (9) (9) K3 K4 16 ciphertext (Y1 , Y2 , Y3 , Y4 ) bitwise XOR Y3 output transformation 16 Y4 addition mod 216 multiplication mod 216 + 1 (with 0 interpreted as 216 ) Figure 7.11: IDEA computation path Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 264 Ch. 7 Block Ciphers 7.101 Algorithm IDEA encryption INPUT: 64-bit plaintext M = m1 . m64 ; 128-bit key K = k1 k128 OUTPUT: 64-bit ciphertext block Y = (Y1 , Y2 , Y3 , Y4 ). (For decryption, see Note 7103) (r) (r) 1. (key schedule) Compute 16-bit subkeys K1 , , K6 for rounds 1 ≤ r ≤ 8, and (9) (9) K1 , . , K4 for the output transformation, using Algorithm 7102 2. (X1 , X2 , X3 , X4 )

← (m1 m16 , m17 m32 , m33 m48 , m49 m64 ), where Xi is a 16-bit data store. 3. For round r from 1 to 8 do: (r) (r) (r) (r) (a) X1 ← X1 K1 , X4 ← X4 K4 , X2 ← X2  K2 , X3 ← X3  K3 . (r) (r) (b) t0 ← K5 (X1 ⊕X3 ), t1 ← K6 (t0  (X2 ⊕X4 )), t2 ← t0  t1 . (c) X1 ← X1 ⊕t1 , X4 ← X4 ⊕t2 , a ← X2 ⊕t2 , X2 ← X3 ⊕t1 , X3 ← a. (9) (9) (9) 4. (output transformation) Y1 ← X1 K1 , Y4 ← X4 K4 , Y2 ← X3  K2 , Y3 ← (9) X2  K3 . 7.102 Algorithm IDEA key schedule (encryption) INPUT: 128-bit key K = k1 . k128 (r) OUTPUT: 52 16-bit key sub-blocks Ki for 8 rounds r and the output transformation. (1) (1) (2) (2) (8) (8) (9) (9) 1. Order the subkeys K1 K6 , K1 K6 , , K1 K6 , K1 K4 2. Partition K into eight 16-bit blocks; assign these directly to the first 8 subkeys 3. Do the following until all 52 subkeys are assigned: cyclic shift K left 25 bits; partition the result into 8 blocks; assign these blocks to the next

8 subkeys The key schedule of Algorithm 7.102 may be converted into a table which lists, for each of the 52 keys blocks, which 16 (consecutive) bits of the input key K form it. 7.103 Note (IDEA decryption) Decryption is achieved using Algorithm 7101 with the ciphertext Y provided as input M , and the same encryption key K, but the following change (r) to the key schedule. First use K to derive all encryption subkeys Ki ; from these com(r) (r) (r) pute the decryption subkeys K 0 i per Table 7.11; then use K 0 i in place of Ki in Algo16 rithm 7.101 In Table 711, −Ki denotes the additive inverse (mod 2 ) of Ki : the integer u = (216 − Ki ) AND 0xFFFF, 0 ≤ u ≤ 216 − 1. Ki−1 denotes the multiplicative inverse (mod 216 + 1) of Ki , also in {0, 1, . , 216 − 1}, derivable by the Extended Euclidean algorithm (Algorithm 2107), which on inputs a ≥ b ≥ 0 returns integers x and y such that ax + by = gcd(a, b). Using a = 216 + 1 and b = Ki , the gcd is always 1 (except for Ki =

0, addressed separately) and thus Ki−1 = y, or 216 + 1 + y if y < 0. When Ki = 0, this input is mapped to 216 (since the inverse is defined by Ki Ki−1 = 1; see Note 7.104) and (216 )−1 = 216 is then defined to give Ki−1 = 0. 7.104 Note (definition of ) In IDEA, a b corresponds to a (modified) multiplication, modulo 216 +1, of unsigned 16-bit integers a and b, where 0 ∈ Z216 is associated with 216 ∈ Z∗216 +1 as follows:2 if a = 0 or b = 0, replace it by 216 (which is ≡ −1 mod 216 + 1) prior to modular multiplication; and if the result is 216 , replace this by 0. Thus, maps two 16bit inputs to a 16-bit output Pseudo-code for is as follows (cf Note 7105, for ordinary 2 Thus the operands of are from a set of cardinality 216 (Z∗216 +1 ) as are those of ⊕ and . c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.6 IDEA 265 round r r=1 2≤r≤8 r=9 (r) K 01 (10−r) (r) K 02 (r) K 03 (10−r) (10−r) (r) K 04 (10−r)

(r) K 05 (r) K 06 (9−r) (9−r) (K1 )−1 −K2 −K3 (K4 )−1 K5 K6 (10−r) −1 (10−r) (10−r) (10−r) −1 (9−r) (9−r) (K1 ) −K3 −K2 (K4 ) K5 K6 (10−r) −1 (10−r) (10−r) (10−r) −1 (K1 ) −K2 −K3 (K4 ) (r) Table 7.11: IDEA decryption subkeys K 0 i (r) derived from encryption subkeys Ki . multiplication mod 216 + 1), for c a 32-bit unsigned integer: if (a = 0) r ← (0x10001 − b) (since 216 b ≡ −b), elseif (b = 0) r ← (0x10001 − a) (by similar reasoning), else {c ← ab; r ← ((c AND 0xFFFF) − (c >> 16)); if (r < 0) r ← (0x10001 + r)}, with return value (r AND 0xFFFF) in all 3 cases. 7.105 Note (implementing ab mod 2n +1) Multiplication mod 216 +1 may be efficiently implemented as follows, for 0 ≤ a, b ≤ 216 (cf §1434) Let c = ab = c0 · 232 + cH · 216 + cL , where c0 ∈ {0, 1} and 0 ≤ cL , cH < 216 . To compute c0 = c mod (216 + 1), first obtain cL and cH by standard multiplication. For a = b = 216 , note

that c0 = 1, cL = cH = 0, and c0 = (−1)(−1) = 1, since 216 ≡ −1 mod (216 + 1); otherwise, c0 = 0. Consequently, c0 = cL − cH + c0 if cL ≥ cH , while c0 = cL − cH + (216 + 1) if cL < cH (since then −216 < cL − cH < 0). 7.106 Example (IDEA test vectors) Sample data for IDEA encryption of 64-bit plaintext M using 128-bit key K is given in Table 712 All entries are 16-bit values displayed in hexadecimal Table 713 details the corresponding decryption of the resulting 64-bit ciphertext C under the same key K.  r 1 2 3 4 5 6 7 8 9 128-bit key K = (1, 2, 3, 4, 5, 6, 7, 8) (r) (r) (r) (r) (r) K1 K2 K3 K4 K5 0001 0002 0003 0004 0005 0007 0008 0400 0600 0800 0c00 0e00 1000 0200 0010 0018 001c 0020 0004 0008 2800 3000 3800 4000 0800 1800 2000 0070 0080 0010 0030 0040 0050 0060 0000 4000 6000 8000 a000 c000 0080 00c0 0100 0140 (r) K6 0006 0a00 0014 000c 1000 0020 2000 e001 64-bit plaintext M = (0, 1, 2, 3) X1 X2 X3 X4 00f0 00f5 010a 0105 222f 21b5 f45e e959 0f86

39be 8ee8 1173 57df ac58 c65b ba4d 8e81 ba9c f77f 3a4a 6942 9409 e21b 1c64 99d0 c7f6 5331 620e 0a24 0098 ec6b 4925 11fb ed2b 0198 6de5 Table 7.12: IDEA encryption sample: round subkeys and ciphertext (X1 , X2 , X3 , X4 ) 7.107 Note (security of IDEA) For the full 8-round IDEA, other than attacks on weak keys (see page 279), no published attack is better than exhaustive search on the 128-bit key space. The security of IDEA currently appears bounded only by the weaknesses arising from the relatively small (compared to its keylength) blocklength of 64 bits. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 266 Ch. 7 Block Ciphers r 1 2 3 4 5 6 7 8 9 (r) K01 fe01 fffd a556 554b 332d 4aab aa96 4925 0001 K = (1, 2, 3, 4, 5, 6, 7, 8) (r) (r) (r) (r) K02 K03 K04 K05 ff40 ff00 659a c000 8000 a000 cccc 0000 ffb0 ffc0 52ab 0010 ff90 e000 fe01 0800 c800 d000 fffd 0008 ffe0 ffe4 c001 0010 f000 f200 ff81 0800 fc00 fff8 552b 0005 fffe fffd c001 (r) K06 e001

2000 0020 1000 000c 0014 0a00 0006 C = (11fb,ed2b,0198,6de5) X1 X2 X3 X4 d98d d331 27f6 82b8 bc4d e26b 9449 a576 0aa4 f7ef da9c 24e3 ca46 fe5b dc58 116d 748f 8f08 39da 45cc 3266 045e 2fb5 b02e 0690 050a 00fd 1dfa 0000 0005 0003 000c 0000 0001 0002 0003 Table 7.13: IDEA decryption sample: round subkeys and variables (X1 , X2 , X3 , X4 ) 7.7 SAFER, RC5, and other block ciphers 7.71 SAFER SAFER K-64 (Secure And Fast Encryption Routine, with 64-bit key) is an iterated block cipher with 64-bit plaintext and ciphertext blocks. It consists of r identical rounds followed by an output transformation. The original recommendation of 6 rounds was followed by a recommendation to adopt a slightly modified key schedule (yielding SAFER SK-64, which should be used rather than SAFER K-64 – see Note 7.110) and to use 8 rounds (maximum r = 10). Both key schedules expand the 64-bit external key into 2r +1 subkeys each of 64bits (two for each round plus one for the output transformation) SAFER

consists entirely of simple byte operations, aside from byte-rotations in the key schedule; it is thus suitable for processors with small word size such as chipcards (cf. FEAL) Details of SAFER K-64 are given in Algorithm 7.108 and Figure 712 (see also page 280 regarding SAFER K-128 and SAFER SK-128). The XOR-addition stage beginning each round (identical to the output transformation) XORs bytes 1, 4, 5, and 8 of the (first) round subkey with the respective round input bytes, and respectively adds (mod 256) the remaining 4 subkey bytes to the others. The XOR and addition (mod 256) operations are interchanged in the subsequent addition-XOR stage The S-boxes are an invertible byte-to-byte substitution using one fixed 8-bit bijection (see Note 7.111) A linear transformation f (the Pseudo-Hadamard Transform) used in the 3-level linear layer was specially constructed for rapid diffusion. The introduction of additive key biases in the key schedule eliminates weak keys (cf. DES, IDEA) In

contrast to Feistel-like and many other ciphers, in SAFER the operations used for encryption differ from those for decryption (see Note 7113) SAFER may be viewed as an SP network (Definition 7.79) Algorithm 7.108 uses the following definitions (L, R denote left, right 8-bit inputs): 1. f (L, R) = (2L + R, L + R) Addition here is mod 256 (also denoted by ); 2. tables S and Sinv , and the constant table for key biases Bi [j] as per Note 7111 c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.7 SAFER, RC5, and other block ciphers X1 X2 X3 267 X4 X5 X6 X7 X8 64-bit plaintext 8 64 8 S −1 S S −1 S S −1 S S −1 K1 [1,.,8] S 8 round 1 64 f f f f f f f f f f f f K2 [1,.,8] K2i−1 [1,.,8] round i (2 ≤ i ≤ r) K2i [1,.,8] 8 output transformation K2r+1 [1,.,8] 8 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 64-bit ciphertext bitwise XOR addition mod 28 f (x, y) = (2x  y, x  y) Figure 7.12: SAFER K-64 computation path (r

rounds) Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 268 Ch. 7 Block Ciphers 7.108 Algorithm SAFER K-64 encryption (r rounds) INPUT: r, 6 ≤ r ≤ 10; 64-bit plaintext M = m1 · · · m64 and key K = k1 · · · k64 . OUTPUT: 64-bit ciphertext block Y = (Y1 , . , Y8 ) (For decryption, see Note 7113) 1. Compute 64-bit subkeys K1 , , K2r+1 by Algorithm 7109 with inputs K and r 2. (X1 , X2 , , X8 ) ← (m1 · · · m8 , m9 · · · m16 , , m57 · · · m64 ) 3. For i from 1 to r do: (XOR-addition, S-box, addition-XOR, and 3 linear layers) (a) For j = 1, 4, 5, 8: Xj ← Xj ⊕ K2i−1 [j]. For j = 2, 3, 6, 7: Xj ← Xj  K2i−1 [j]. (b) For j = 1, 4, 5, 8: Xj ← S[Xj ]. For j = 2, 3, 6, 7: Xj ← Sinv [Xj ] (c) For j = 1, 4, 5, 8: Xj ← Xj  K2i [j]. For j = 2, 3, 6, 7: Xj ← Xj ⊕ K2i [j] (d) For j = 1, 3, 5, 7: (Xj , Xj+1 ) ← f (Xj , Xj+1 ). (e) (Y1 , Y2 ) ← f (X1 , X3 ), (Y3 , Y4 ) ← f (X5 , X7 ), (Y5 , Y6 ) ← f (X2 , X4

), (Y7 , Y8 ) ← f (X6 , X8 ). For j from 1 to 8 do: Xj ← Yj . (f) (Y1 , Y2 ) ← f (X1 , X3 ), (Y3 , Y4 ) ← f (X5 , X7 ), (Y5 , Y6 ) ← f (X2 , X4 ), (Y7 , Y8 ) ← f (X6 , X8 ). For j from 1 to 8 do: Xj ← Yj . (This mimics the previous step) 4. (output transformation): For j = 1, 4, 5, 8: Yj ← Xj ⊕ K2r+1 [j]. For j = 2, 3, 6, 7: Yj ← Xj  K2r+1 [j] 7.109 Algorithm SAFER K-64 key schedule INPUT: 64-bit key K = k1 · · · k64 ; number of rounds r. OUTPUT: 64-bit subkeys K1 , . , K2r+1 Ki [j] is byte j of Ki (numbered left to right) 1. Let R[i] denote an 8-bit data store and let Bi [j] denote byte j of Bi (Note 7111) 2. (R[1], R[2], , R[8]) ← (k1 · · · k8 , k9 · · · k16 , , k57 · · · k64 ) 3. (K1 [1], K1 [2], , K1 [8]) ← (R[1], R[2], , R[8]) 4. For i from 2 to 2r + 1 do: (rotate key bytes left 3 bits, then add in the bias) (a) For j from 1 to 8 do: R[j] ← (R[j] ←- 3). (b) For j from 1 to 8 do: Ki [j] ← R[j]  Bi [j]. (See Note 7110) 7.110

Note (SAFER SK-64 – strengthened key schedule) An improved key schedule for Algorithm 7108, resulting in SAFER SK-64, involves three changes as follows (i) After initializing the R[i] in step 1 of Algorithm 7109, set R[9] ← R[1]⊕R[2]⊕ · · · ⊕R[8] (ii) Change the upper bound on the loop index in step 4a from 8 to 9. (iii) Replace the iterated line in step 4b by: Ki [j] ← R[((i + j − 2) mod 9) + 1]  Bi [j]. Thus, key bytes 1, , 8 of R[·] are used for K1 ; bytes 2, . , 9 for K2 ; bytes 3, 9, 1 for K3 , etc Here and originally,  denotes addition mod 256 No attack against SAFER SK-64 better than exhaustive key search is known. 7.111 Note (S-boxes and key biases in SAFER) The S-box, inverse S-box, and key biases for Algorithm 7108 are constant tables as follows g ← 45 S[0] ← 1, Sinv [1] ← 0 for i from 1 to 255 do: t ← g · S[i − 1] mod 257, S[i] ← t, Sinv [t] ← i. Finally, S[128] ← 0, Sinv [0] ← 128. (Since g generates Z∗257 , S[i] is a

bijection on {0, 1, , 255} (Note that g 128 ≡ 256 (mod 257), and associating 256 with 0 makes S a mapping with 8-bit input and output.) The additive key biases are 8-bit constants used in the key schedule (Algorithm 7109), intended to behave as random numbers, and defined Bi [j] = S[S[9i+j]] for i from 2 to 2r+1 and j from 1 to 8. For example: B2 = (22, 115, 59, 30, 142, 112, 189, 134) and B13 = (143, 41, 221, 4, 128, 222, 231, 49). c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.7 SAFER, RC5, and other block ciphers 269 7.112 Remark (S-box mapping) The S-box of Note 7111 is based on the function S(x) = g x mod 257 using a primitive element g = 45 ∈ Z257 . This mapping is nonlinear with respect to both Z257 arithmetic and the vector space of 8-tuples over F2 under the XOR operation. The inverse S-box is based on the base-g logarithm function. 7.113 Note (SAFER K-64 decryption) For decryption of Algorithm 7108, the same key K and subkeys Ki are

used as for encryption. Each encryption step is undone in reverse order, from last to first. Begin with an input transformation (XOR-subtraction stage) with key K2r+1 to undo the output transformation, replacing modular addition with subtraction. Follow with r decryption rounds using keys K2r through K1 (two-per-round), inverting each round in turn. Each starts with a 3-stage inverse linear layer using finv (L, R) = (L − R, 2R − L), with subtraction here mod 256, in a 3-step sequence defined as follows (to invert the byte-permutations between encryption stages): Level 1 (for j = 1, 3, 5, 7): (Xj , Xj+1 ) ← finv (Xj , Xj+1 ). Levels 2 and 3 (each): (Y1 , Y2 ) ← finv (X1 , X5 ), (Y3 , Y4 ) ← finv (X2 , X6 ), (Y5 , Y6 ) ← finv (X3 , X7 ), (Y7 , Y8 ) ← finv (X4 , X8 ); for j from 1 to 8 do: Xj ← Yj . A subtraction-XOR stage follows (replace modular addition with subtraction), then an inverse substitution stage (exchange S and S −1 ), and an XOR-subtraction stage. 7.114

Example (SAFER test vectors) Using 6-round SAFER K-64 (Algorithm 7108) on the 64bit plaintext M = (1, 2, 3, 4, 5, 6, 7, 8) with the key K = (8, 7, 6, 5, 4, 3, 2, 1) results in the ciphertext C = (200, 242, 156, 221, 135, 120, 62, 217), written as 8 bytes in decimal. Using 6-round SAFER SK-64 (Note 7.110) on the plaintext M above with the key K = (1, 2, 3, 4, 5, 6, 7, 8) results in the ciphertext C = (95, 206, 155, 162, 5, 132, 56, 199).  7.72 RC5 The RC5 block cipher has a word-oriented architecture for variable word sizes w = 16, 32, or 64 bits. It has an extremely compact description, and is suitable for hardware or software The number of rounds r and the key byte-length b are also variable. It is successively more completely identified as RC5–w, RC5–w/r, and RC5–w/r/b. RC5-32/12/16 is considered a common choice of parameters; r = 12 rounds are recommended for RC5–32, and r = 16 for RC5–64. Algorithm 7.115 specifies RC5 Plaintext and ciphertext are blocks of bitlength 2w

Each of r rounds updates both w-bit data halves, using 2 subkeys in an input transformation and 2 more for each round. The only operations used, all on w-bit words, are addition mod 2w (), XOR (⊕), and rotations (left ←- and right ,). The XOR operation is linear, while the addition may be considered nonlinear depending on the metric for linearity. The datadependent rotations featured in RC5 are the main nonlinear operation used: x ←- y denotes cyclically shifting a w-bit word left y bits; the rotation-count y may be reduced mod w (the low-order lg(w) bits of y suffice). The key schedule expands a key of b bytes into 2r + 2 subkeys Ki of w bits each. Regarding packing/unpacking bytes into words, the byte-order is little-endian: for w = 32, the first plaintext byte goes in the low-order end of A, the fourth in A’s high-order end, the fifth in B’s low order end, and so on. Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 270 Ch. 7 Block

Ciphers 7.115 Algorithm RC5 encryption (w-bit wordsize, r rounds, b-byte key) INPUT: 2w-bit plaintext M = (A, B); r; key K = K[0] . K[b − 1] OUTPUT: 2w-bit ciphertext C. (For decryption, see Note 7117) 1. Compute 2r + 2 subkeys K0 , , K2r+1 by Algorithm 7116 from inputs K and r 2. A ← A  K0 , B ← B  K1 (Use addition modulo 2w ) 3. For i from 1 to r do: A ← ((A⊕B) ←- B)  K2i , B ← ((B⊕A) ←- A)  K2i+1 4. The output is C ← (A, B) 7.116 Algorithm RC5 key schedule INPUT: word bitsize w; number of rounds r; b-byte key K[0] . K[b − 1] OUTPUT: subkeys K0 , . , K2r+1 (where Ki is w bits) 1. Let u = w/8 (number of bytes per word) and c = db/ue (number of words K fills) Pad K on the right with zero-bytes if necessary to achieve a byte-count divisible Pu−1 by u (i.e, K[j] ← 0 for b ≤ j ≤ c · u − 1) For i from 0 to c − 1 do: Li ← j=0 28j K[i · u + j] (i.e, fill Li low-order to high-order byte using each byte of K[·] once) 2. K0 ← Pw ; for i

from 1 to 2r + 1 do: Ki ← Ki−1  Qw (Use Table 714) 3. i ← 0, j ← 0, A ← 0, B ← 0, t ← max(c, 2r + 2) For s from 1 to 3t do: (a) Ki ← (Ki  A  B) ←- 3, A ← Ki , i ← i + 1 mod (2r + 2). (b) Lj ← (Lj  A  B) ←- (A  B), B ← Lj , j ← j + 1 mod c. 4. The output is K0 , K1 , , K2r+1 (The Li are not used) 7.117 Note (RC5 decryption) Decryption uses the Algorithm 7115 subkeys, operating on ciphertext C = (A, B) as follows (subtraction is mod 2w , denoted ) For i from r down to 1 do: B ← ((B K2i+1 ) , A)⊕A, A ← ((A K2i ) , B)⊕B. Finally M ← (A K0 , B K1 ). w: Pw : Qw : 16 B7E1 9E37 32 B7E15163 9E3779B9 64 B7E15162 8AED2A6B 9E3779B9 7F4A7C15 Table 7.14: RC5 magic constants (given as hex strings) 7.118 Example (RC5–32/12/16 test vectors) For the hexadecimal plaintext M = 65C178B2 84D197CC and key K = 5269F149 D41BA015 2497574D 7F153125, RC5 with w = 32, r = 12, and b = 16 generates ciphertext C = EB44E415 DA319824.  7.73 Other block ciphers

LOKI’91 (and earlier, LOKI’89) was proposed as a DES alternative with a larger 64-bit key, a matching 64-bit blocksize, and 16 rounds. It differs from DES mainly in key-scheduling and the f -function. The f -function of each round uses four identical 12-to-8 bit S-boxes, c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.8 Notes and further references 271 4 input bits of which select one of 16 functions, each of which implements exponentiation with a fixed exponent in a different representation of GF(28 ). While no significant exploitable weaknesses have been found in LOKI’91 when used for encryption, related-key attacks (see page 281) are viewed as a certificational weakness. Khufu and Khafre are DES-like ciphers which were proposed as fast software-oriented alternatives to DES. They have 64-bit blocks, 8 × 32 bit S-boxes, and a variable number of rounds (typically 16, 24, or 32). Khufu keys may be up to 512 bits Khafre keys have bitlength that is

a multiple of 64 (64 and 128-bit keys are typical); 64 key bits are XORed onto the data block before the first and thereafter following every 8 rounds. Whereas a DES round involves eight 6-to-4 bit S-boxes, one round of Khufu involves a single 8-to-32 bit table look-up, with a different S-box for every 8 rounds. The S-boxes are generated pseudorandomly from the user key Khafre uses fixed S-boxes generated pseudorandomly from an initial S-box constructed from random numbers published by the RAND corporation in 1955. Under the best currently known attacks, 16-round Khufu and 24-round Khafre are each more difficult to break than DES. 7.8 Notes and further references §7.1 The extensive and particularly readable survey by Diffie and Hellman [347], providing a broad introduction to cryptography especially noteworthy for its treatment of Hagelin and rotor machines and the valuable annotated bibliography circa 1979, is a source for much of the material in §7.2, §73, and §74 herein Aside

from the appearance of DES [396] in the mid 1970s and FEAL [884] later in the 1980s, prior to 1990 few fully-specified serious symmetric block cipher proposals were widely available or discussed. (See Chapter 15 for Pohlig and Hellman’s 1978 discrete exponentiation cipher.) With the increasing feasibility of exhaustive search on 56-bit DES keys, the period 1990-1995 resulted in a large number of proposals, beginning with PES [728], the preliminary version of IDEA [730]. The Fast Software Encryption workshops (Cambridge, U.K, Dec 1993; Leuven, Belgium, Dec. 1994; and again Cambridge, Feb 1996) were a major stimulus and forum for new proposals The most significant cryptanalytic advances over the 1990-1995 period were Matsui’s linear cryptanalysis [796, 795], and the differential cryptanalysis of Biham and Shamir [138] (see also [134, 139]). Extensions of these included the differential-linear analysis by Langford and Hellman [741], and the truncated differential analysis of Knudsen

[686]. For additional background on linear cryptanalysis, see Biham [132]; see also Matsui and Yamagishi [798] for a preliminary version of the method. Additional background on differential cryptanalysis is provided by many authors including Lai [726], Lai, Massey, and Murphy [730], and Coppersmith [271]; although more efficient 6-round attacks are known, Stinson [1178] provides detailed examples of attacks on 3-round and 6-round DES. Regarding both linear and differential cryptanalysis, see also Knudsen [684] and Kaliski and Yin [656]. §7.2 Lai [726, Chapter 2] provides an excellent concise introduction to block ciphers, including a lucid discussion of design principles (recommended for all block cipher designers). Regarding text dictionary and matching ciphertext attacks (Note 78), see Coppersmith, Johnson, and Matyas [278]. Rivest and Sherman [1061] provide a unified framework for randomized encryption (Definition 73); a common example is the use of random “salt” appended

Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 272 Ch. 7 Block Ciphers to passwords prior to password encryption in some operating systems (§10.23) Fact 79 is due to Shannon [1121], whose contributions are many (see below). The four basic modes of operation (including k-bit OFB feedback) were originally defined specifically for DES in 1980 by FIPS 81 [398] and in 1983 by ANSI X3.106 [34], while ISO 8732 [578] and ISO/IEC 10116 [604], respectively, defined these modes for general 64-bit and general n-bit block ciphers, mandating n-bit OFB feedback (see also Chapter 15). Brassard [192] gives a concise summary of modes of operation; Davies and Price [308] provide a comprehensive discussion, including OFB cycling (Note 7.24; see also Jueneman [643] and Davies and Parkin [307]), and a method for encrypting incomplete CBC final blocks without data expansion, which is important if plaintext must be encrypted and returned into its original store. See

Voydock and Kent [1225] for additional requirements on IV s Recommending r = s for maximum strength, ISO/IEC 10116 [604] specifies the CFB variation of Example 7.19, and provides extensive discussion of properties of the various modes The counter mode (Example 7.23) was suggested by Diffie and Hellman [347] The 1977 exhaustive DES key search machine (Example 7.27) proposed by Diffie and Hellman [346] contained 106 DES chips, with estimated cost US$20 million (1977 technology) and 12-hour expected search time; Diffie later revised the estimate upwards one order of magnitude in a BNR Inc. report (US$50 million machine, 2-day expected search time, 1980 technology). Diffie and Hellman noted the feasibility of a ciphertext-only attack (Example 728), and that attempting to preclude exhaustive search by changing DES keys more frequently, at best, doubles the expected search time before success. Subsequently Wiener [1241] provided a gate-level design for a US$1 million machine (1993

technology) using 57 600 DES chips with expected success in 3.5 hours Each chip contains 16 pipelined stages, each stage completing in one clock tick at 50 MHz; a chip with full pipeline completes a key test every 20 nanoseconds, providing a machine 57 600 × 50 times faster than the 1142 years noted in FIPS 74 [397] as the time required to check 255 keys if one key can be tested each microsecond. Comparable key search machines of equivalent cost by Eberle [362] and Wayner [1231] are, respectively, 55 and 200 times slower, although the former does not require a chip design, and the latter uses a general-purpose machine. Wiener also noted adaptations of the ECB known-plaintext attack to other 64-bit modes (CBC, OFB, CFB) and 1-bit and 8-bit CFB. Even and Goldreich [376] discuss the unicity distance of cascade ciphers under knownplaintext attack (Fact 7.35), present a generalized time-memory meet-in-the-middle tradeoff (Note 738), and give several other concise results on cascades,

including that under reasonable assumptions, the number of permutations realizable by a cascade of L random cipher stages is, with high probability, 2Lk . Diffie and Hellman [346] noted the meet-in-the-middle attack on double encryption (Fact 7.33), motivating their recommendation that multiple encipherment, if used, should be at least three-fold; Hoffman [558] credits them with suggesting E-E-E triple encryption with three independent keys. Merkle’s June 1979 thesis [850] explains the attack on two-key triple-encryption of Fact 7.39 (see also Merkle and Hellman [858]), and after noting Tuchman’s proposal of two-key E-D-E triple encryption in a June 1978 conference talk (National Computer Conference, Anaheim, CA; see also [1199]), recommended that E-D-E be used −1 with three independent keys: EK3 (EK2 (EK1 (x))). The two-key E-D-E idea, adopted in ANSI X9.17 [37] and ISO 8732 [578], was reportedly conceived circa April 1977 by Tuchman’s colleagues, Matyas and Meyer The attack

of Fact 740 is due to van Oorschot and Wiener [1206]. See Coppersmith, Johnson, and Matyas [278] for a proposed construction for a triple-DES algorithm. Other techniques intended to extend the strength of DES inc 1997 by CRC Press, Inc See accompanying notice at front of chapter §7.8 Notes and further references 273 clude the DESX proposal of Rivest as analyzed by Kilian and Rogaway [672], and the work of Biham and Biryukov [133]. Hellman [549] proposes a time-memory tradeoff for exhaustive key search on a cipher with N = 2m ciphertexts requiring a chosen-plaintext attack, O(N 2/3 ) time and O(N 2/3 ) space after an O(N ) precomputation; search time can be reduced somewhat by use of Rivest’s suggestion of distinguished points (see Denning [326, p.100]) Kusuda and Matsumoto [722] recently extended this analysis. Fiat and Naor [393] pursue time-memory tradeoffs for more general functions. Amirazizi and Hellman [25] note that time-memory tradeoff with constant time-memory

product offers no asymptotic cost advantage over exhaustive search; they examine tradeoffs between time, memory, and parallel processing, and using standard parallelization techniques, propose under a simplified model a search machine architecture for which doubling the machine budget (cost) increases the solution rate fourfold. This approach may be applied to exhaustive key search on double-encryption, as can the parallel collision search technique of van Oorschot and Wiener [1207, 1208]; see also Quisquater and Delescaille [1017, 1018]. Regarding Note 7.41, see Biham [131] (and earlier [130]) as well as Coppersmith, Johnson, and Matyas [278] Biham’s analysis on DES and FEAL shows that, in many cases, the use of intermediate data as feedback into an intermediate stage reduces security. 15 years earlier, reflecting on his chosen-plaintext attack on two-key triple-encryption, Merkle [850, p.149] noted “multiple encryption with any cryptographic system is liable to be much less

secure than a system designed originally for the longer key”. Maurer and Massey [822] formalize Fact 7.42, where “break” means recovering plaintext from ciphertext (under a known-plaintext attack) or recovering the key; the results hold also for chosen-plaintext and chosen-ciphertext attack. They illustrate, however, that the earlier result and commonly-held belief proven by Even and Goldreich [376] – that a cascade is as strong as any of its component ciphers – requires the important qualifying (and nonpractical) assumption that an adversary will not exploit statistics of the underlying plaintext; thus, the intuitive result is untrue for most practical ciphertext-only attacks. §7.3 Kahn [648] is the definitive historical reference for classical ciphers and machines up to 1967, including much of §7.3 and the notes below The selection of classical ciphers presented largely follows Shannon’s lucid 1949 paper [1121] Standard references for classical cryptanalysis include

Friedman [423], Gaines [436], and Sinkov [1152]. More recent books providing expository material on classical ciphers, machines, and cryptanalytic examples include Beker and Piper [84], Meyer and Matyas [859], Denning [326], and Davies and Price [308]. Polyalphabetic ciphers were invented circa 1467 by the Florentine architect Alberti, who devised a cipher disk with a larger outer and smaller inner wheel, respectively indexed by plaintext and ciphertext characters. Letter alignments defined a simple substitution, modified by rotating the disk after enciphering a few words The first printed book on cryptography, Polygraphia, written in 1508 by the German monk Trithemius and published in 1518, contains the first tableau – a square table on 24 characters listing all shift substitutions for a fixed ordering of plaintext alphabet characters. Tableau rows were used sequentially to substitute one plaintext character each for 24 letters, where-after the same tableau or one based on a

different alphabet ordering was used. In 1553 Belaso (from Lombardy) suggested using an easily changed key (and key-phrases as memory aids) to define the fixed alphabetic (shift) substitutions in a polyalphabetic substitution. The 1563 book of Porta (from Naples) noted the ordering of tableau letters may define arbitrary substitutions (vs. simply shifted Handbook of Applied Cryptography by A. Menezes, P van Oorschot and S Vanstone 274 Ch. 7 Block Ciphers alphabets). Various polyalphabetic auto-key ciphers, wherein the key changes with each message (the alteration depending on the message), were explored in the 16th century, most significantly by the Frenchman B. de Vigenère His 1586 book Traicté des Chiffres proposed the combined use of a mixed tableau (mixed alphabet on both the tableau top and side) and an autokeying technique (cf Example 761) A single character served as a priming key to select the tableau row for the first character substitution, where-after the ith

plaintext character determined the alphabet (tableau row) for substituting the next. The far less secure simple Vigenère cipher (Definition 7.53) is incorrectly attributed to Vigenère The Playfair cipher (Example 7.51), popularized by L Playfair in England circa 1854 and invented by the British scientist C. Wheatstone, was used as a British field cipher [648, p6] J. Mauborgne (see also the Vernam and PURPLE ciphers below) is credited in 1914 with the first known solution of this digram cipher. The Jefferson cylinder was designed by American statesman T. Jefferson, circa 1790-1800 In 1817, fellow American D. Wadsworth introduced the principle of plaintext and ciphertext alphabets of different lengths His disk (cf Alberti above) implemented a cipher similar to Trithemius’ polyalphabetic substitution, but wherein the various alphabets were brought into play irregularly in a plaintext-dependent manner, foreshadowing both the polyalphabetic ciphers of later 20th century rotor

machines, and the concept of chaining. The inner disk had 26 letters while the outer had an additional 7 digits; one full revolution of the larger caused the smaller to advance 7 characters into its second revolution. The driving disk was always turned in the same clockwise sense; when the character revealed through an aperture in the plaintext disk matched the next plaintext character, that visible through a corresponding ciphertext aperture indicated the resulting ciphertext. In 1867, Wheatstone displayed an independently devised similar device thereafter called the Wheatstone disc, receiving greater attention although less secure (having disks of respectively 26 and 27 characters, the extra character a plaintext space). Vernam [1222] recorded his idea for telegraph encryption in 1917; a patent filed in September 1918 was issued July 1919. Vernam’s device combined a stream of plaintext (5-bit Baudot coded) characters, via XOR, with a keystream of 5-bit (key) values, resulting in

the Vernam cipher (a term often used for related techniques) This, the first polyalphabetic substitution automated using electrical impulses, had period equal to the length of the key stream; each 5-bit key value determined one of 32 fixed mono-alphabetic substitutions. Credit for the actual one-time system goes to J. Mauborgne (US Army) who, after seeing Vernam’s device with a repeated tape, realized that use of a random, non-repeated key improved security. While Vernam’s device was a commercial failure, a related German system engineered by W Kunze, R Schauffler, and E Langlotz was put into practice circa 1921-1923 for German diplomatic communications; their encryption system, which involved manually adding a key string to decimal-coded plaintext, was secured by using as the numerical key a random non-repeating decimal digit stream – the original one-time pad. Pads of 50 numbered sheets were used, each with 48 five-digit groups; no pads were repeated aside for one identical pad

for a communicating partner, and no sheet was to be used twice; sheets were destroyed once used. The Vernam cipher proper, when used as a one-time system, involves only 32 alphabets, but provides more security than rotor machines with a far greater number of alphabets because the latter eventually repeat, whereas there is total randomness (for each plaintext character) in selecting among the 32 Vernam alphabets. The matrix cipher of Example 7.52 was proposed in 1929 by Hill [557], providing a practical method for polygraphic substitution, albeit a linear transformation susceptible to knownc 1997 by CRC Press, Inc See accompanying notice at front of chapter §7.8 Notes and further references 275 plaintext attack. Hill also recognized that using an involution as the encryption mapping allowed the same function to provide decryption Recent contributions on homophonic substitution include Günther [529] and Jendal, Kuhn, and Massey [636] Among the unrivalled cryptanalytic

contributions of the Russian-born American Friedman is his 1920 Riverbank Publication no.22 [426] on cryptanalysis using the index of coincidence Friedman coined the term cryptanalysis in 1920, using it in his 1923 book Elements of Cryptanalysis [425], a 1944 expansion of which, Military Cryptanalysis [423], remains highly recommended. The method of Kasiski (from West Prussia) was originally published in 1863; see Kahn [648, pp.208-213] for a detailed example The discussion on IC and MR follows that of Denning [326], itself based on Sinkov [1152]. Fact 775 follows from a standard expectation computation weighted by κp or κr depending on whether the second of a pair of randomly selected ciphertext characters is from the same ciphertext alphabet or one of the t − 1 remaining alphabets. The values in Table 71 are from Kahn [648], and vary somewhat over time as languages evolve. Friedman teaches how to cryptanalyze running-key ciphers in his (circa 1918) Riverbank Publication no.16,

Methods for the Solution of Running-Key Ciphers; the two basic techniques are outlined by Diffie and Hellman [347] The first is a probable word attack wherein an attacker guesses an (e.g, 10 character) word hopefully present in underlying text, and subtracts that word (mod 26) from all possible starting locations in the ciphertext in hopes of finding a recognizable 10-character result, where-after the guessed word (as either partial running-key or plaintext) might be extended using context. Probable-word attacks also apply to polyalphabetic substitution. The second technique is based on the fact that each ciphertext letter c results from a pair of plaintext/running-key letters (mi , m0i ), and is most likely to result from such pairs wherein both mi and m0i are high-frequency characters; one isolates the highest-probability pairs for each such ciphertext character value c, makes trial assumptions, and attempts to extend apparently successful guesses by similarly decrypting adjacent

ciphertext characters; see Denning [326, p.83] for a partial example Diffie and Hellman [347] note Fact 7.59 as an obvious method that is little-used (modern ciphers being more convenient); their suggestion that use of four iterative running keys is unbreakable follows from English being 75% redundant. They also briefly summarize various scrambling techniques (encryption via analog rather than digital methods), noting that analog scramblers are sometimes used in practice due to lower bandwidth and cost requirements, although such known techniques appear relatively insecure (possibly an inherent characteristic) and their use is waning as digital networks become prevalent. Denning [326] tabulates digrams into high, medium, low, and rare classes. Konheim [705, p.24] provides transition probabilities p(t|s), the probability that the next letter is t given that the current character is s in English text, in a table also presented by H. van Tilborg [1210]. Single-letter distributions in

plaintext languages other than English are given by Davies and Price [308]. The letter frequencies in Figure 75, which should be interpreted only as an estimate, were derived by Meyer and Matyas [859] using excerpts totaling 4 million characters from the 1964 publication: W. Francis, A Standard Sample of Present-Day Edited American English for Use with Digital Computers, Linguistics Dept., Brown University, Providence, Rhode Island, USA Figure 76 is based on data from Konheim [705, p.19] giving an estimated probability distribution of 2-grams in English, derived from a sample of size 67 320 digrams. See Shannon [1122] and Cover and King [285] regarding redundancy and Fact 7.67 While not proven in any concrete manner, Fact 7.68 is noted by Friedman [424] and generally accepted. Unicity distance was defined by Shannon [1121] Related issues are discussed in detail in various appendices of Meyer and Matyas [859]. Fact 771 and the random cipher Handbook of Applied Cryptography by A.

Menezes, P van Oorschot and S Vanstone 276 Ch. 7 Block Ciphers model are due to Shannon [1121]; see also Hellman [548]. Diffie and Hellman [347] give an instructive overview of rotor machines (see also Denning [326]), and note their use in World War II by the Americans in their highest level system, the British, and the Germans (Enigma); they also give Fact 7.63 and the number of characters required under ciphertext-only and known-plaintext attacks (Note 7.66) Beker and Piper [84] provide technical details of the Hagelin M-209, as does Kahn [648, pp.427-431] who notes its remarkable compactness and weight: 3.25 x 55 x 7 inches and 6 lb (including case); see also Barker [74], Morris [906], and Rivest [1053]. Davies and Price [308] briefly discuss the Enigma, noting it was cryptanalyzed during World War II in Poland, France, and then in the U.K (Bletchley Park); see also Konheim [705] The Japanese PURPLE cipher, used during World War II, was a polyalphabetic cipher cryptanalyzed

August 1940 [648, p.18-23] by Friedman’s team in the US Signal Intelligence Service, under (Chief Signal Officer) Mauborgne. The earlier RED cipher used two rotor arrays; preceding it, the ORANGE system implemented a vowels-to-vowels, consonantsto-consonants cipher using sets of rotors. §7.4 The concept of fractionation, related to product ciphers, is noted by Feistel [387], Shannon [1121], and Kahn [648, p.344] who identifies this idea in an early product cipher, the WWI German ADFGVX field cipher. As an example, an encryption function might operate on a block of t = 8 plaintext characters in three stages as follows: the first substitutes two symbols for each individual character; the second transposes (mixes) the substituted symbols among themselves; the third re-groups adjacent resulting symbols and maps them back to the plaintext alphabet. The action of the transposition on partial (rather than complete) characters contributes to the strength of the principle. Shannon [1121, §5

and §23-26] explored the idea of the product of two ciphers, noted the principles of confusion and diffusion (Remark 1.36), and introduced the idea of a mixing transformation F (suggesting a preliminary transposition followed by a sequence of alternating substitution and simple linear operations), and combining ciphers in a product using an intervening transformation F . Transposition and substitution, respectively, rest on the principles of diffusion and confusion. Harpes, Kramer, and Massey [541] discuss a general model for iterated block ciphers (cf. Definition 780) The name Lucifer is associated with two very different algorithms. The first is an SP network described by Feistel [387], which employs (bitwise nonlinear) 4 × 4 invertible Sboxes; the second, closely related to DES (albeit significantly weaker), is described by Smith [1160] (see also Sorkin [1165]). Principles related to both are discussed by Feistel, Notz, and Smith [388]; both are analyzed by Biham and Shamir [138],

and the latter in greater detail by Ben-Aroya and Biham [108] whose extension of differential cryptanalysis allows, using 236 chosen plaintexts and complexity, attack on 55% of the key space in Smith’s Lucifer – still infeasible in practice, but illustrating inferiority to DES despite the longer 128-bit key. Feistel’s product cipher Lucifer [387], instantiated by a blocksize n = 128, consists of an unspecified number of alternating substitution and permutation (transposition) stages, using a fixed (unpublished) n-bit permutation P and 32 parallel identical S-boxes each effecting a mapping S0 or S1 (fixed but unpublished bijections on {0, 1}4), depending on the value of one key bit; the unpublished key schedule requires 32-bits per S-box stage. Each stage operates on all n bits; decryption is by stage-wise inversion of P and Si . The structure of so-called Feistel ciphers (Definition 7.81) was first introduced in the Lucifer algorithm of Smith [1160], the direct predecessor of DES

This 16-round algorithm c 1997 by CRC Press, Inc. See accompanying notice at front of chapter §7.8 Notes and further references 277 with 128-bit key operates on alternating half-blocks of a 128-bit message block with a simplified f function based on two published invertible 4×4 bit S-boxes S0 and S1 (cf. above) Feistel, Notz, and Smith [388] discuss both the abstract Feistel cipher structure (suggesting its use with non-invertible S-boxes) and SP networks based on invertible (distinct) S-boxes. Suggestions for SP networks include the use of single key bits to select one of two mappings (a fixed bijection or its inverse) from both S-boxes and permutation boxes; decryption then uses a reversed key schedule with complemented key. They also noted the multi-round avalanche effect of changing a single input bit, subsequently pursued by Kam and Davida [659] in relation to SP networks and S-boxes having a completeness property: for every pair of bit positions i, j, there must exist at

least two input blocks x, y which differ only in bit i and whose outputs differ in at least bit j. More simply, a function is complete if each output bit depends on all input bits. Webster and Tavares [1233] proposed the more stringent strict avalanche criterion: whenever one input bit is changed, every output bit must change with probability 1/2. DES resulted from IBM’s submission to the 1974 U.S National Bureau of Standards (NBS) solicitation for encryption algorithms for the protection of computer data. The original specification is the 1977 U.S Federal Information Processing Standards Publication 46 [396], reprinted in its entirety as Appendix A in Meyer and Matyas [859]. DES is now specified in FIPS 46–2, which succeeded FIPS 46–1; the same cipher is defined in the American standard ANSI X3.92 [33] and referred to as the Data Encryption Algorithm (DEA) Differences between FIPS 46/46–1 and ANSI X392 included the following: these earlier FIPS required that DES be implemented

in hardware and that the parity bits be used for parity; ANSI X3.92 specifies that the parity bits may be used for parity Although no purpose was stated by the DES designers for the permutations IP and IP−1 , Preneel et al. [1008] provided some evidence of their cryptographic value in the CFB mode. FIPS 81 [398] specifies the common modes of operation. Davies and Price [308] provide a comprehensive discussion of both DES and modes of operation; see also Diffie and Hellman [347], and the extensive treatment of Meyer and Matyas [859]. The survey of Smid and Branstad [1156] discusses DES, its history, and its use in the U.S government Test vectors for various