Home/Magazine Archive/June 2023 (Vol. 66, No. 6)/On the (In)Security of ElGamal in OpenPGP/Full Text

Research Highlights
## On the (In)Security of ElGamal in OpenPGP

Roughly four decades ago, Taher ElGamal put forward what is today one of the most widely known and best understood public key encryption schemes. ElGamal encryption has been used in many different contexts, chiefly among them by the OpenPGP email encryption standard. Despite its simplicity, or perhaps because of it, in reality there is a large degree of ambiguity on several key aspects of the cipher. Each library in the OpenPGP ecosystem seems to have implemented a slightly different "flavor" of ElGamal encryption. While-taken in isolation-each implementation may be secure, we reveal that in the interoperable world of OpenPGP, unforeseen *cross-configuration attacks* become possible. Concretely, we propose different such attacks and show their practical efficacy by recovering plaintexts and even secret keys.

The 1984 ElGamal cryptosystem^{9} is one of the oldest and best-known public key encryption schemes. In the 80s and 90s, it earned wide adoption for being simultaneously efficient and patent-free. Its most prominent use is in OpenPGP,^{6} a standard for consumable and interoperable email security, where it has been the default and most popular encryption option for decades. At the time of writing, still at least one in six registered OpenPGP keys have an ElGamal subkey, with about 1000 new registrations per year.

The ElGamal scheme builds on elegant mathematical structures and can be defined very compactly. This simplicity, together with the opportunity to mature for roughly four decades now, suggests that a crisp specification with clear parameter choices, rules, and algorithms would be present in international standards, in particular in OpenPGP. Surprisingly, this turns out *not* to be the case: Our research reveals that OpenPGP's understanding of ElGamal encryption is open to interpretation, with several choices subject to the discretion of the implementer.

In this paper, we consider *cross-configuration attacks* on OpenPGP. Such attacks emerge when different interpretations ('configurations') of the same standard interact insecurely with each other. Toward identifying such conditions for ElGamal encryption, we need to first understand the universe of OpenPGP interpretations that are used in practice. We approach this challenge from various angles: We carefully study RFC4880^{6} which defines OpenPGP, we inspect the source code of a number of relevant OpenPGP-implementing software libraries and we conduct a large-scale examination of millions of keys registered on OpenPGP key servers.

Our results reveal an insecure posture. For instance, we develop a plaintext recovery attack that can be mounted on ciphertexts produced by the ubiquitous GNU Privacy Guard against keys generated following the original ElGamal specification.^{9} The attack is effective against 2048-bit keys, a bit length that is considered practically secure today. We further illustrate how cross-configuration attacks can be combined with known side-channel exploitation techniques like FLUSH+RELOAD^{23} or PRIME+PROBE.^{20} Concretely, using the example of `gcrypt`

—the cryptographic library underlying the GNU Privacy Guard that has already been fixed twice after seminal work^{14, 23} on side-channel attacks—we demonstrate that if standards conform 2048 bit ElGamal key is used to decrypt a cipher-text, then an attacker that is OS- or VM-colocated with the decrypter might be able to fully recover the decryption key. Given that interoperability is the explicit and almost exclusive goal of any standardization effort, and common-place in the OpenPGP world, we conclude that our cross-configuration attack conditions are as realistic as the attack results in awakening.

This manuscript is organized as follows: In Section 2, we survey (a) the meaningful options available when implementing ElGamal encryption, (b) the options adopted by different cryptographic libraries, and (c) the options picked by over 800,000 users in practice (as far as reflected on key server databases). In Section 3, we recall various standard algorithms for solving discrete logarithms. In Section 4, we describe "vanilla" cross-configuration attacks, and in Section 5, we describe those combined with side-channel attacks. In Section 6, we conduct end-to-end exploits and describe how we bring in the required side-channel information. We conclude in Section 7.

**1.1. Related work**

Since ElGamal encryption was first proposed, a number of research results revealed security issues, both in implementations (e.g., CVE-2018-6829, CVE-2018-6594) and at the specification level. For instance, Bone *et al.*^{5} identified security conditions if ElGamal encryption is used in textbook form. However, as OpenPGP employs ElGamal exclusively for key transport, the attacks do not seem to apply in practice. To the best of our knowledge, our work is the first to challenge the security of ElGamal encryption in the specific way OpenPGP builds and relies on it.

Protecting modular exponentiation from side-channel attacks has been the subject of research for years.^{2,10,11,15} Specifically, cache-based side channels have been explored extensively, using various techniques such as EVICT+TIME,^{4} PRIME+PROBE,^{20} or FLUSH+RELOAD.^{3, 23} Yarom and Falkner^{23} in particular target the `gcrypt`

implementation of modular exponentiation which at the time was using square-and-multiply. Prompted by that work, the implementation was significantly revised. The changes are however not sufficient to protect against the attack we present in Section 5. Most cache-based side-channel attacks target L1 or L2. However, Liu *et al.*^{14} show that exploitation based on last-level cache is possible for attacks targeting both data and instructions. The work also targets the `gcrypt`

implementation of modular exponentiation and the library has also been fixed to avoid leaking table accesses. Once again, however, this is not sufficient to prevent the attack described in Section 5.

After providing an abstract formulation of ElGamal encryption, we present common ways to instantiate the scheme in practice.

Let *G* be a group and *g* ∈ *G* a generator. To create a key pair (*sk, pk*), pick a random integer *x*, compute the element *X* = *g ^{x}*, and output (

To instantiate the scheme, the following details have to be fixed: Which group *G* shall be used? How is generator *g* chosen? Shall *g* generate the full group *G* or just a subgroup? From which sets and according to which distributions are exponents *x, y* picked? Multiple configurations for the parameters are possible and promise to lead to correct and secure public key encryption instances. Below we describe four such configurations that have in common that *G* is a "prime field group," that is, the multiplicative group ℤ^{*}* _{p}* = (ℤ /

**Configuration A: "The original".** The original proposal^{9} demands choosing *g* such that it generates *p* − 1 elements, that is, the full group. Exponents *x, y* are picked uniformly from [1 … *p* − 1]. The only condition on *p* is that *p* − 1 has at least one large prime factor [This is to counter the Pohlig-Hellman attack on Discrete Logarithm Problem (DLP).]

If Configuration A is used, any element in *G* could become a public key. However, some of these public keys would have a lower multiplicative order than others, that is, generate a smaller subgroup, and thus promise less security. (E.g., in the extreme case of *X* = 1, the encryption operation does effectively nothing.) This can be remedied by restricting the ElGamal group operations to a prime order subgroup *G'* ⊊ *G.* Indeed, if *G'* = 〈*g*〉 has prime order, then all its elements necessarily have the same order (with the one exception of the neutral element which is easily tested for). Let hence *p* − 1 = *q*_{0} … *q _{n}* be the prime factor decomposition of ord(

**Configuration B: "ElGamal over Schnorr groups".** Pick a large prime group order *q* and a prime *p* such that *q* | (*p* − 1), choose a generator γ with 〈γ〉 = *G* and let *g* = γ^{(p−1)/q} and *G'* = 〈*g*〉. Note that this implies ord(*G'*) = *q.* Pick exponents *x, y* in [1 … *q* − 1].

This setup satisfies the Pohlig–Hellman condition and ensures that all non-trivial elements of *G'* have the same order. Further, if *q* << *p*, as exponents *x, y* are now picked from [1 … *q* − 1] instead of [1 … *p* − 1], implementations of Configuration B are very efficient.

While Configuration B removes the risk of accidentally relying on small order elements, an adversary can still exploit the existence of small subgroups in an active attack: Observe that for any *t* with *t* | (*p* − 1) there exists an element α of order *t*, that is, with α* ^{t}* = 1 and 1 ∉ {α, α

**Configuration C: "ElGamal over safe primes".** The number of subgroups is minimized: Pick a large prime group order *q* and a prime modulus *p* such that *p* − 1 = 2*q*, choose a generator γ with 〈γ〉 = *G* and let *g* = γ^{(p−1)/q} and *G'* = 〈*g*〉. Pick exponents *x, y* in [1 … *q* − 1]. Note that *p* ≈ *q.* Choosing *g* = 4 = (±2)^{2} is always feasible, leading to smaller parameters.

**Configuration D: "ElGamal over Lim-Lee primes".** All non-trivial subgroups are large: Pick a large prime modulus *p* such that for the prime factorization *p* − 1 = 2*q*_{1} … *q _{n}* all

As exponents in Configuration C became large again, some implementations work with "short exponents" for improved performance. Configuration D maintains good performance without such tricks. However, its parameters are larger. When building a secure system from scratch, it seems advisable to use one of these two options.

**2.1. ElGamal in OpenPGP**

**The standard.** A widely-deployed cryptography standard that suggests using, and mandates implementing, ElGamal encryption is OpenPGP. While its first version was put forward in 1998, the latest official version appeared in 2007 as RFC4880.^{6} We observe that the RFC refers to external documents for the specification of the ElGamal cryptosystem. Precisely, Section 9.1 provides two references that both ultimately resolve to ElGamal's original paper,^{9} that is, our "Configuration A". We note that Sections 5.1 and 5.5, which define how ElGamal ciphertexts, public keys, and secret keys are to be represented as binary strings, add no further conditions on the generation and use of ElGamal keys and ciphertexts.

**OpenPGP libraries.** We inspected how OpenPGP libraries interpret the standard. Here are our findings on three popular ones:

`Go Go`

does not provide code to generate ElGamal keys; it only provides algorithms to encrypt and decrypt. The ephemeral exponent*y*used in encryption is chosen from [0 …*p*− 1]. This fully conforms with RFC4880.`Crypto++`

The`Crypto++`

library follows Configuration C by generating a random safe prime and choosing for the generator the smallest quadratic residue. It though deviates from Configuration C by picking both*x*and*y*from a*short interval*of size roughly twice the security parameter. (See the full version^{7, 8}for details.) As the group generator is not primitive, this setting conflicts with RFC4880.`gcrypt`

The`gcrypt`

library generates a Lim-Lee modulus like in Configuration D; the minimum size of the prime factors*q*| (_{i}*p*− 1) is roughly twice the security parameter. (See the full version^{7, 8}for details.) Unlike in Configuration D, the generator is chosen to be the smallest integer generating the full group ℤBoth exponents^{x}_{p}.*x*and*y*are sampled from*short intervals*of size roughly*q*^{3/2}. Due to the short exponents, this setting conflicts with RFC4880.

**ElGamal parameters in the wild.** To get a complete picture of the use of ElGamal in OpenPGP, it is not sufficient to look at the prominent open-source libraries. Indeed a large portion of the user base relies on proprietary or exotic implementations, which are impossible to track. To address these, we look at the OpenPGP keys registered on public key servers. We analyze an OpenPGP server dump^{1} produced on Jan 15, 2021 subkeys.

An OpenPGP ElGamal public key consists of the triplet (*p, g, X*). This information alone is not sufficient to ascribe the key to one configuration with certainty, however, partial information can be deduced by attempting to factor *p* − 1. For example, safe primes are easily recognized by running a primality test on (*p* − 1)/2, then a quadratic residuosity test reveals whether *g* generates the prime order subgroup or the full group. For random primes of the sizes we look at, it is in general infeasible to obtain a full factorization of *p* − 1, however, partial factorizations and residuosity tests let us at least formulate credible hypotheses on the key generation process.

To classify public keys, we conducted trial division on *p* − 1 with primes up to 2^{25}, then repeatedly applied the elliptic curve factorization method (ECM),^{12} until we felt guilty for the carbon emissions. Then we applied *n*-th residuosity tests to *g* for the factors we found. We did not attempt to gain information on the exponent *x* that defines *X* = *g ^{x}.* In our findings, 69.4% of the keys use safe primes, 25.3% use moduli that were very likely generated using the Lim–Lee method, and 5.0% use what we call "quasi-safe primes", that is, primes of the form

In the next sections, we will need to solve discrete logarithms given partial knowledge of the exponent. We review here the necessary algorithms. In what follows, let *g* be a group generator of order *N*, and let *X* = *g ^{x}* be the element of which we seek the discrete logarithm.

Pollard's Rho algorithm^{17} is the most efficient generic algorithm to compute discrete logarithms. On average, it performs group operations and uses a constant amount of memory. Pollard^{17} also introduced the lesser-known Lambda method, which performs better when it is known that *x* < *B* << *N*, requiring only group operations and *O*(log(*B*)) memory (Section 5.1 of van Oorschot and Wiener^{22}).

When *N* = *LM*, we can compute *x* mod *L* by solving a discrete logarithm in the group of order *L* generated by *g ^{M}.* The Pohlig–Hellman algorithm

We will combine all of these techniques in the case where *N* = *q*_{0} ··· *q _{n}*, and

In Section 5, we will need to solve discrete logarithm instances where some non-adjacent bits of *x* are known. Neither the Rho nor the Lambda method can take advantage of this information, but the simpler baby-step/giant-step (BSGS) method^{19} can. If *x* has *n* unknown bits, BSGS performs 1.5 · 2^{n/2} group operations on average, and stores 2^{n/2} hash table entries. A linear time/memory trade-off is possible in BSGS.

However, as *n* becomes larger, BSGS has two important drawbacks: it uses unrealistically large amounts of memory, and it parallelizes poorly. A better alternative is van Oorschot and Wiener's (vOW) parallel collision search applied to meet-in-the-middle algorithms (Section 5.3 of van Oorschot and Wiener^{22}), which is much more memory efficient, and promises a linear parallel speed-up. Based on their analysis, vOW is expected to require 7 · 2^{3n/4-m/2-1}*n* group operations, where 2* ^{m}* is the amount of storage available, counted as a number of hash table entries, and subject to the constraint

The disagreements on the interpretation of the OpenPGP standard may raise doubts about the interoperability between the libraries. For instance, in an imaginary setting where the `Crypto++`

code is used to generate a key pair, the `Go`

code is used to encrypt a message to the public key, and the `gcrypt`

code is used to decrypt the ciphertext, it has to be asked whether confidentiality is maintained. While in a basic scenario these three libraries can, to the best of our knowledge, in fact, interoperate securely, we shall now see that some choices made by `Crypto++`

and `gcrypt`

prove to be fatal in a broader context.

As van Oorschot and Wiener^{21} have already observed: if (1) *p* − 1 contains enough small factors, and (2) *g* generates the full group (e.g., in Configuration A), then using short exponents in the public key *X* = *g ^{x}* can lead to key recovery, as described in Section 3. As we have seen,

`Crypto++`

uses safe primes, so it is not at risk. Although `gcrypt`

generates However, both `Crypto++`

and `gcrypt`

also use short exponents in the ephemeral key *Y* = *g ^{y}.* But the generator

To recap, whenever: (1) the public key of the receiver defines a generator *g* of a group containing "enough" small-order subgroups, and (2) the sender uses short exponents in the ephemeral key, an attacker who intercepts messages can decrypt any communication from sender to receiver. The exact cost of the attack depends on the number and size of the small subgroups, which we analyze next.

**4.1. Practical exploitation**

Like in Section 3, let *N* = *q*_{0} ··· *q _{n}* =:

To quantify the threat, we look at these parameters for those groups in the OpenPGP key dump that contain small factors. None of the "quasi-safe" moduli are seriously at risk, since *Q* is too small. From the remaining 2158 keys, we narrow down the selection to those that had 2048 bits modulus, were not created before 2016, and were not expired nor revoked, leaving us with 2071 keys. In Figure 1, we plot the (normalized) cumulative counting function

**Figure 1. Normalized cumulative distribution function Φ(λ, ρ) of group orders N that contain a 2^{ρ}-smooth factor Q of at least λ bits.**

which counts, among the 2071 keys, how many are exposed to message recovery with a Rho effort at most and a Lambda effort at most . Note that the factors of a single group order can be arranged in more than one way, and thus appear in more than one bucket.

From ϕ(λ,*ρ*), we obtain the number of keys for which message recovery is possible with a given effort with respect to a given encrypting library. For example, for 2048-bit moduli `gcrypt`

implements the sampling of *y* such that *y* < 2^{344}: limiting the Lambda and Rho efforts to roughly 2^{40} modular multiplications, a computation well within reach of a casual attacker, we find that there are at least ϕ(344-80, 80) = 15 vulnerable keys. Even more concerning is the case of `Crypto++`

, which samples exponents in [1 … 2^{226}]: in this case, the number of weak keys goes up to ϕ(226-80, 80) ≥ 231. It is plausible that a state-level attacker could target as many as ϕ(344-140, 140) ≥ 83 (`gcrypt`

) and ϕ(226-140, 140) ≥ 895 (`Crypto++`

) keys.

To confirm the vulnerability we used GPG to encrypt a message to the only key contributing to ϕ(344-56, 59), expecting to complete the attack using roughly 2^{30} modular multiplications. We were able to recover the plaintext in less than 2.5 hours on a single Intel E5-2640 core clocked at 2.40GHz.

The attack conditions exploited in Section 4 did not depend on implementation flaws but were solely implied by different implementers coming up with different conceptual interpretations of ElGamal encryption. The current section considers cross-configuration attacks that are also based on implementation weaknesses. We focus specifically on runtime leakage of the modular exponentiation algorithm of `gcrypt`

, which represents a crucial component of its ElGamal implementation. Specifically, an exponentiation algorithm (EA) computes *R* = *B ^{x}* from

`gcrypt`

implementation took explicit measures to prevent side-channel leakage (in particular, to protect exponent `gcrypt.`

In isolation, that is, not in the cross-configuration setting, the attack is only practically exploitable against `gcrypt`

-generated keys with moduli up to 1024 bits. However, we show that the vulnerability is more worrisome in an interoperability context where `gcrypt`

decryption is used in combination with a `Crypto++`

key: in this case, we experimentally confirm a practical key recovery attack against a non-negligible fraction of `Crypto++`

keys with 2048 bits modulus. In the full version,`Go`

and `Crypto++`

libraries.The `gcrypt`

library implements its EA via the sliding-window method, which is derived from the basic Square-and-Multiply algorithm. The algorithm depends on a parameter *w* (for "width" or "window length") that is instantiated such that 2 ≤ *w* ≤ 5. The first step of the EA is independent of exponent *x* and tabulates the values *B*^{1}, *B*^{3}, …, *B*^{2w−1}. The second step is independent of base *B* and regroups the exponent bits into an *odd-digit representation* (ODR). Precisely, an ODR of a non-negative integer *x* is a sequence Γ = (γ* _{l}*, …, γ

The exponent *x* is first converted into ODR form that has leading coefficient γ* _{l}* { = 1,

The third step of the EA is a computation-intensive online phase in which the results of the first two steps are combined.

We provide a pseudo-code version of `gcrypt`

's EA in Figure 2. The function *f _{w}* invoked in line 2 (and specified in the full version

**Figure 2. Core of exponentiation routine of gcrypt.**

Despite its built-in protection measures, we identify a side-channel condition in `gcrypt`

's EA that can lead to full exponent recovery. The root of the problem is that the algorithm implements sliding-window exponentiation. Intuitively, this method covers the digits of the bit-representation of the exponent *x* with a sequence of *w*−wide non-overlapping windows such that every 1-bit of the exponent is covered by a window and conditioned on this the number of windows is minimized. In Equation (1), value *L* corresponds with the number of windows used for this, any coefficient *c̄ _{i}* corresponds with the position of the

The approach of our side-channel attack is to closely monitor the execution of line 4: The number of iterations of the loop during one exponentiation immediately reveals the encoding length *L*, and the time taken for the *i*-th iteration is, by line 6, linear in *c _{i}* + 1 and thus leaks, one by one, the coefficients

Before assessing the options for amplifying the extracted information toward recovering the full exponent, we make a final observation on `gcrypt`

's EA. Recall that line 7 tries to hide both the table index *e _{i}* and whether the multiplication in line 8 is with

`gcrypt`

were likely not aware of the fact that the coefficients Now that we retrieve copies of the coefficients *c*_{1}, …, *c*_{L+1}, the next subsections are dedicated to assessing the value of this information toward an attack that recovers the full exponent.

**5.1. Modeling the leakage**

As a warm-up, observe that when *w* = 1 the coefficients *c _{i}* leak the whole exponent, indeed all

Letting *n*+1 = ⌈log_{2}(*x*+1)⌉ be the bit length of *x*, we start by observing that *n* = Σ*c _{i}* and is thus fully known. The sequence of

Given *h, h ^{x}* and the leakage for

To this end, we model the encoding function *f _{w}* as a Markov process: Model

Using the Markov chain central limit theorem, we compute that as *n* → ∞ the entropy *H _{w}*(

**Table 1. Per bit statistics of the entropy H_{w}(x) as n → ∞ for different values of the window width w.**

Of course, these asymptotic approximations only give a rough estimate of the actual probability distribution of *H _{w}*(

`gcrypt`

and other libraries. **5.2. Key recovery**

To assess the practical impact of the leakage of sliding window exponentiation exposed so far we need to look into the exact parameters used by the various libraries. Recall that `gcrypt`

uses short secret exponents. The window size used for exponentiation goes from 1 to 5, depending on the bit size of the exponent. Table 2 summarizes the parameters adopted by `gcrypt`

for some common modulus sizes.

**Table 2. Group (| q|) and secret exponent (|x|) bit sizes, and window size (w) used by **

`gcrypt`

for each modulus size (|p|). E(H_{w}(x)) is the mean leakage entropy estimated using Table 1. "Pollard" indicates the (base 2 log of the) expected running time of Pollard's Rho algorithm in a group of size q, as a number of modular multiplications. "vOW" indicates the expected running time of van Oorschot and Wiener's algorithm using a table of 2^{60} entries.Based on `gcrypt`

's parameters, Table 2 also reports on the mean leakage entropy *H _{w}*(

`gcrypt`

's moduli are parameterized so that the best algorithm for computing discrete logarithms is, indifferently, index calculus in ℤTo exploit the leakage, we resort to vOW parallel collision search (see Section 3), which costs 7.2^{3Hw(x)/4-m/2-1}*H*_{w}(*x*) modular multiplications, where 2* ^{m}* is the amount of RAM available, counted as a number of hash table entries (e.g., for |

The last two columns of Table 2 report the base 2 logarithm of the expected efforts of Pollard's Rho, and of vOW assuming a hash table with at most 2^{60} entries (note that even this "little" memory is quite unrealistic). Comparing the columns, the leakage of windowed exponentiation does not appear to be a serious threat for the average `gcrypt`

secret key. However this assessment is deceiving on two accounts: (a) the standard deviation of *H _{w}*(

`gcrypt`

is assumed to be secure even when it operates on secret keys generated by a different library, as long as these are valid OpenPGP keys. Some libraries, such as `Crypto++`

, generate secret exponents with as few as |Taking for reference the most commonly used modulus size |*p*| = 2048, `Crypto++`

exponents are between 1 and 226 bits long, and the window size used by `gcrypt`

is at most *w* = 3, leading to an average entropy of 98.0 bits. Worse still, more than one in 10,000 keys is expected to have at most 80 bits of leakage entropy: for such a small search space, vOWis expected to only require 2^{50} modular multiplications (roughly 2^{35} Gcycles in software) for a hash table of 2^{35} entries (roughly 4 TiB). We thus conclude that attacking a non-negligible proportion of `Crypto++`

-generated public keys within `gcrypt`

's decryption routine is well within reach of a moderately resourceful adversary.^{f}

**5.3. Proof of concept**

To confirm the reality of the threat, without going as far as spending several thousands of dollars, we implement a simple parallelized BSGS using GMP 6.1.2, and test it on a node equipped with 20 Xeon E5-2640 cores clocked at 2.40GHz and 64GiB of RAM. Due to memory contention, the parallel speed-up is sublinear, nevertheless, we observe a speed-up slightly above 10x using all 20 cores, thus justifying our preference for BSGS over the more complex vOW algorithm which is marred by larger constants. Our implementation uses hash table entries of 16 bytes, independently of the modulus size; the node can thus accommodate for tables with as many as 2^{32} entries, ensuring the running time scales like the square root of the entropy, up to 64 bits of entropy. Figure 3 plots the running time of BSGS for increasing values of *H _{w}*(

`Crypto++`

and `gcrypt`

, indicating what percentage of keys is broken in a given time. Since BSGS is deterministic, we use for the benchmarks the exponent that is tested by BSGS last, thus the expected running time for an average exponent of given entropy is half the time reported in Figure 3.

**Figure 3. Running time of BSGS for increasing values of the entropy H_{w}(x), on our node equipped with 20 cores and 64GiB of RAM. The dashed line indicates extrapolated running times. Also represented the cumulative distribution functions of the normal distributions of parameters (225 · 7/16, 225 · 0.098632) and (339 · 341/640, 339 · 0.126731), approximately corresponding to the entropy distributions of **

`Crypto++`

and `gcrypt`

for a modulus of 2,048 bits.As a final confirmation, we instrument the `Crypto++`

code to generate random secret exponents in a tight loop, and output them along with their leakage entropy. Over 2^{28} samples, we obtained a 226 bits exponent with 62 bits of leakage entropy. Our probing strategy described in Section 6 produces the expected leakage pattern, from which our BSGS code recovers the exponent in approximately 30 min using 20 cores.

In this section, we prototype the attacks described in Section 5 to show that practical exploitation is possible. The attacks are mounted on a Core i7-8650U CPU with Ubuntu 18.04.5. We demonstrate how FLUSH+RELOAD may be employed to extract leakage from the sliding window modular exponentiation implementation of `gcrypt`

and use that to obtain full key recovery. The exploitation of `gcrypt`

presents interesting aspects for two reasons: (i) the library was patched to protect its modular exponentiation routine against cache-based side channels attacks highlighted by Yarom and Falkner^{23} and by Liu *et al.*^{14}; and (ii) the attack can be conducted on commodity hardware specifically owing to the ElGamal interoperability issues highlighted in Section 2.

**6.1. Threat model**

We consider a victim process that uses a private key to decrypt ElGamal ciphertexts. We assume the attacker has knowledge of the victim program and of the specific decryption algorithm that is used. We also assume that the attacker may cause the victim process to execute, for example, because the victim uses a PGP-enabled mail client that automatically decrypts emails upon reception. Consequently, we assume that the attacker is able to synchronize side channel measurements with the victim execution. No physical access is required to perform the attacks. Given that we target cache-side channels, the attacker may either be local, or it may be running on a different, colocated, VM. The former may target any cache level, the latter may only target LLC (unless the VM is hyperthread-colocated). Wlog we conduct the exploitation against the L1 cache: as shown by Liu et al.,^{14} the same exploitation may be achieved against higher cache levels.

**6.2. Environment**

We target the function `_gcry_mpi_powm`

of the latest version `gcrypt`

shipped by Ubuntu 18.04.5 LTS. The function implements the approach described in Figure 2. The implementation takes a few precautions to avoid leaking sensitive data and instruction cache accesses. In particular, it hides (i) whether the operation at line 8 is a modular multiplication or squaring; and (ii) whether, at line 7, *W* is initialized with *R* or with an entry from table *T* (and in that case, it also hides which position *e _{i}* of the table is loaded). This is accomplished by way of a conditional

`memcpy`

-like operation (`mpi_set_cond`

) that—depending on whether a flag argument is one or zero—copies a source operand into a destination operand, or leaves the destination operand unchanged. Masking is employed to ensure that an attacker cannot learn the nature of the operation by observing the control flow or changes to data or instruction cache. Furthermore, the flag is set using branchless operations.Despite these precautions, considerable leakage can still be obtained by an attacker who can observe control flow-dependent cache perturbations: in particular, the full set of values *c*_{1}, …, *c _{L}* can be recovered by counting the number of iterations of the

`for`

the loop at line 6 for each of the iterations of the loop at line 4. The attacker may obtain this information by monitoring the state of the instruction cache line that contains code that is executed as part of an iteration of the outer loop `for`

loop. Crucially, the latter depends on the set of values `_gcry_mpi_powm`

function: the implementation inlines the logic to determine the **6.3. Evaluation**

To validate that the assumptions are correct, we prototype an end-to-end attack in the SMT-colocated threat model. The prototype uses two SMT-colocated processes, attacker and victim. The victim process may be triggered by the attacker to perform ElGamal decryption. We use `Crypto++`

to generate a 2048-bit ElGamal instance. The attacker process uses L1i FLUSH+RELOAD on the virtual address of the memory-mapped `libgcrypt.so`

a shared object that contains code that is executed by the outer loop before the inner loop begins. To collect side channel data, the attacker partitions time into *N* slots using the `rdtsc`

instruction: at the beginning of each slot, the target line is loaded into the cache and the access time is measured; after that, the cache line is flushed and finally, the remaining (if any) time of the slot is spent in a busy loop. As we shall discuss later, we do not have to change any of the settings that might influence the running time of the operation (e.g., c/p states or Turbo Boost). The attacker thus collects *N* timing samples for load instructions from the target cache line. We establish a threshold for the target system to determine whether the load is likely to have been served from the cache (cache *hit*). We then collect 10,000 measurements by repeatedly triggering the victim. We keep track of each slot across all runs with a set of per-slot counters, all initialized to zero. For each run, whenever the sample of a slot indicates a hit based on the threshold, we increment the value of the corresponding counter.

Figure 4a plots the value of the counter of each slot for all runs. While patterns are discernible, the leakage cannot be obtained yet without further data processing. We employ a clustering strategy based on execution time, under the hypothesis that Figure 4a contains samples that are generated at different clock frequencies. Influencing factors are likely to include p-states, c-states, frequency scaling, and the power state of certain execution units. Figure 4b confirms the hypothesis: it plots a histogram of the running times for the operation. The distribution of running times has a long tail, but most of the mass is concentrated in the three peaks. We use this information to cluster samples and show the results of the clustering in Figures 4c and 4d, showing probes whose running time falls in the interval covered by the highest, and second highest peak, respectively. The peak intervals in Figures 4c and 4d accurately encode the *c _{i}* leakage for the key used in the exponentiation. Figures 4c and 4d are identical modulo a scaling factor which depends on the difference in running time. With the leakage thus obtained, we refer back to Section 5 for a detailed description of how the secret exponent is fully recovered.

**Figure 4. FLUSH+RELOAD attack using 10,000 decryption operations with gcrypt's modular exponentiation function. a): number of samples with an icache hit on the target cache line across all runs. b): histogram of the running times (tsc count) for the operation. c) as a) but only considering samples whose total running time lies in the [2.25 · 10 ^{6}, 2.31 · 10^{6}] interval. d) as a) but only considering samples whose total running time lies in the [2.2 · 10^{6}, 2.25 · 10^{6}] interval.**

We analyze one of the oldest, best understood and most historically used asymmetric encryption schemes, ElGamal.^{9} We reveal that despite its popularity and longevity, when we speak about ElGamal we are referring to several different flavors of it, with key choices being left at the discretion of vendors and implementers. We show how some of these choices create interoperability challenges that lead to insecurity. We propose two "cross-configuration" attacks that are attributable to different, and—from a security standpoint—incompatible, configurations that operate together in the interconnected OpenPGP ecosystem. We believe our work can improve the security of the OpenPGP community and influence the new revision of the standard that is being drafted at the time of writing.

The authors thank Werner Koch, Anil Kurmus, Yutaka Niibe, and Filippo Valsorda for the discussions and feedback, and the CCS'21 reviewers for their comments that helped us improve the paper.

1. OpenPGP server key dump from 15/01/2021. https://pgp.key-server.io/dump/, 2021. [Accessed 15/01/2021].

2. Aciiçmez, O., Koç, Ç., Seifert, J.-P. Predicting secret keys via branch prediction. In *CT-RSA 2007.* M. Abe, ed. Volume 4377 of *LNCS* (2007). Springer, Heidelberg, 225–242.

3. Apecechea, G.I., Inci, M.S., Eisenbarth, T., Sunar, B. Wait a minute! A fast, cross-VM attack on AES. IACR Cryptology ePrint Archive, Report 2014/435 2014, https://eprint.iacr.org/2014/435.

4. Bernstein, D.J. Cache-timing attacks on AES. Technical report, 2005.

5. Boneh, D., Joux, A., Nguyen, P.Q. Why textbook ElGamal and RSA encryption are insecure. In *ASIACRYPT 2000.* T. Okamoto, ed. Volume 1976 of *LNCS* (2000). Springer, Heidelberg, 30–43.

6. Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R. RFC4880: OpenPGP message format, 2007.

7. De Feo, L., Poettering, B., Sorniotti, A. On the (in)security of ElGamal in OpenPGP. In *ACM CCS 2021*, G. Vigna and E. Shi, eds. ACM, NY, 2021, 2066–2080.

8. De Feo, L., Poettering, B., Sorniotti, A. On the (in)security of ElGamal in OpenPGP. Cryptology ePrint Archive, Report 2021/923, 2021. https://eprint.iacr.org/2021/923.

9. ElGamal, T. A public key cryptosystem and a signature scheme based on discrete logarithms. In *CRYPTO'84.* G.R. Blakley and D. Chaum, eds. Volume 196 of *LNCS* (1984). Springer, Heidelberg, 10–18.

10. Genkin, D., Pachmanov, L., Pipman, I., Tromer, E. Stealing keys from PCs using a radio: Cheap electromagnetic attacks on windowed exponentiation. In *CHES 2015.* T. Güneysu and H. Handschuh, eds. Volume 9293 of *LNCS* (2015). Springer, Heidelberg, 207–228.

11. Hachez, G., Quisquater, J.-J. Montgomery exponentiation with no final subtractions: Improved results. In *CHES 2000.* Ç.K. Koç and C. Paar, eds. Volume 1965 of *LNCS* (2000). Springer, Heidelberg, 293–301.

12. Lenstra, H.W. Factoring integers with elliptic curves. *Ann. Math. 126*, (1987), 649–673.

13. Lim, C.H. Lee, P.J. A key recovery attack on discrete log-based schemes using a prime order subgroup. In *CRYPTO'97.* B.S. Kaliski Jr., ed. Volume 1294 of *LNCS* (1997). Springer, Heidelberg, 249–263.

14. Liu, F., Yarom, Y., Ge, Q., Heiser, G., Lee, R.B. Last-level cache side-channel attacks are practical. In *2015 IEEE Symposium on Security and Privacy, SP 2015* (San Jose, CA, USA, May 17–21, 2015), IEEE Computer Society, NY, 2015, 605–622.

15. Messerges, T.S., Dabbish, E.A., Sloan, R.H. Power analysis attacks of modular exponentiation in smartcards. In *CHES'99.* Ç.K. Koç and C. Paar, eds. Volume 1717 of *LNCS* (1999). Springer, Heidelberg, 144–157.

16. Pohlig, S., Hellman, M. An improved algorithm for computing logarithms over GF(*p*) and its cryptographic significance. *IEEE Trans. Inf. Theor. 24*, 1 (Jan. 1978), 106–110.

17. Pollard, J.M. Monte Carlo methods for index computation mod *p. Math. Comput. 32* (1978), 918–924.

18. Schnorr, C.-P. Efficient identification and signatures for smart cards. In, *CRYPTO'89.* G. Brassard, ed. Volume 435 of *LNCS* (1990). Springer, Heidelberg, 239–252.

19. Shanks, D. Class number, a theory of factorization, and genera. *Proc. Symp. Pure Math. 20* (1971), 41–440.

20. Tromer, E., Osvik, D.A., Shamir, A. Efficient cache attacks on AES, and countermeasures. *J. Cryptol. 23*, 1 (2010), 37–71.

21. van Oorschot, P.C., Wiener, M.J. On Diffie-Hellman key agreement with short exponents. In *EUROCRYPT'96.* U.M. Maurer, ed. Volume 1070 of *LNCS* (1996). Springer, Heidelberg, 332–343.

22. van Oorschot, P.C. Wiener, M.J. Parallel collision search with cryptanalytic applications. *J. Cryptol. 12*, 1 (Jan. 1999), 1–28.

23. Yarom, Y., Falkner, K. FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack. In *Proceedings of the 23rd USENIX Security Symposium* (San Diego, CA, USA, August 20–22, 2014). K. Fu and J. Jung, eds. (2014). USENIX Association, Berkeley, CA, 719–732.

a. Every *x* ≥ 1 has at least one, but often multiple, ODRs with leading coefficient 1.

b. Precisely, line 7 implements the instruction "If *j* ≤ *c _{i}* then

c. This argument assumes multiplications and squarings require uniform time. This is the case for the `gcrypt`

routines.

d. The performance gain comes from the possibility of removing some of the decoy code required for securely implementing line 7. For further details, see the source code.

e. Since in practice *q* is unknown to an attacker, this ignores the cost of factoring *p* − 1 first, but accounting for it would only make the side channel attack look better.

f. At the time of writing, 8 Amazon EC2 `r6g.metal`

instances with 0.5 TiB of RAM and 64 virtual CPUs each cost about 12,000$ per month.

To view the accompanying Technical Perspective, visit doi.acm.org/10.1145/3592836

The original version of this paper was published in *Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security*, Nov. 2021, 2066–2080; https://doi.org/10.1145/3460120.3485257

Copyright held by authors/owners. Publication rights licensed to ACM.

Request permission to publish from permissions@acm.org

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.

No entries found