Skip to main content

Chapter 21 Bounded gaps between primes: outline

In this chapter, we outline the proof by Maynard [16] of bounded gaps between primes and compare the approach with the original proofs of shorter-than-average gaps between primes by Goldston–Pintz–Yıldırım [9], [10] and bounded gaps between primes by Zhang [29]. (A similar approach was developed independently by Tao, who did not publish the result separately but instead contributed it to [21].) See also [25] and [8] for expositions of the GPY result.

Section 21.1 The goal: a weakening of the \(k\)-tuples conjecture

As we have observed previously, the Hardy–Littlewood \(k\)-tuples conjecture (Conjecture 20.2) already seems to be impervious to the combinatorial sieve when \(k=2\text{.}\) In particular, to prove existence of short gaps between primes we need a lower bound, which would seem to render the Selberg sieve irrelevant.
The original insight of Goldston–Yıldırım is that it should be much easier to prove a weak version of the \(k\)-tuples conjecture, which would in particular imply bounded gaps between primes.
It suffices to find a \(k\)-tuple \(\calH\) with \(k=105\) and \(\frakS(\calH) > 0\text{,}\) as then we can apply Theorem 21.1 to conclude. An explicit example is given in [16], taken from the table at maintained by Andrew Sutherland.

Remark 21.3.

Corollary 21.2 improves upon the original bounded gaps theorem of Zhang [29], which obtains a bound of \(70,000,000\text{.}\) The method of Zhang is similar but different in a key detail (see Remark 21.8); this combination of features makes it possible to combine the two approaches. This was done in [21], using Tao's independently obtained interpretation of the method of [16], to improve the bound to \(246\text{.}\)

Section 21.2 Algebraic setup

We will derive Theorem 21.1 from an algebraic statement about counting primes in \(k\)-tuples.

Definition 21.4.

Let \(P\) be the set of primes and \(\chi_P\) its characteristic function. To prove the conclusion of Theorem 21.1 for some particular tuple \(\calH\text{,}\) it will suffice to show that for \(x\) sufficiently large, we can find an arithmetic function \(a(n)\) with nonnegative values (depending on \(x\)) such that
\begin{equation} \sum_{x \lt n \leq 2x} \left( \sum_{j=1}^k \chi_P(N+h_j) - (\ell-1) \right) a(n) > 0.\tag{21.2.1} \end{equation}
To limit some technical complications later, it will be helpful to pin down \(n\) modulo some small primes. Set \(D_0 := \log \log \log x\) (the exact choice here is convenient, not critical) and set \(W := \prod_{p \leq D_0} p\) (so that \(W = O((\log \log x)^2)\) by the prime number theorem). Since we are assuming that \(\frakS(\calH) > 0\text{,}\) we can pick a residue class \(v_0\) modulo \(W\) such that \(\gcd(v_0+h_i,W) = 1\) for \(i=1,\dots,k\text{.}\) We will then further require that
\begin{equation*} a(n) \neq 0 \Longrightarrow n \equiv v_0 \pmod{W}. \end{equation*}
This allows us to forget the hypothesis \(\frakS(\calH) > 0\text{,}\) which will simplify some calculations.
We restate in the form of a lemma (compare [16], Proposition 4.3(3)).

Remark 21.6.

As a sanity check, note that if \(a(n)\) were supported only on those \(n\) for which \(n + h_1, \dots, n + h_k\) are all prime, then the \(k\)-tuples conjecture would imply (21.2.1) with \(\ell = k\text{,}\) This shows that on one hand, this approach gives no new insight towards the \(k\)-tuples conjecture even for \(k=2\text{.}\) On the other hand, since in Theorem 21.1 we have given ourselves the flexibility to make \(k\) large compared to \(\ell\text{,}\) we obtain some extra freedom to choose the weights \(a(n)\text{.}\) In particular, we will make a choice that will put \(S_1\) and \(S_2\) in the form of a main term that admits a simple interpretation (using the analytic techniques we led the course with) plus an error term that we hope to control using sieve methods.

Section 21.3 Revisiting the Selberg sieve

Our next move is inspired by the transition from the combinatorial sieve to the Selberg sieve. Since we need the weights \(a(n)\) to be nonnegative, it seems reasonable to follow Goldston–Pintz–Yıldırım in choosing them of the form
\begin{equation} a(n) := \left( \sum_{d|(n+h_1)\cdots(n+h_k)} \rho(d) \right)^2 \qquad (n \equiv v_0 \pmod{W})\tag{21.3.1} \end{equation}
for some arithmetic function \(\rho\) to be optimized later.
The key observation of Maynard and Tao is that one can gain some crucial added flexibility by taking \(\rho\) to instead be a multivariate function:
\begin{equation} a(n) := \left( \sum_{d_i|n+h_i} \rho(d_1,\dots,d_k) \right)^2 \qquad (n \equiv v_0 \pmod{W})\tag{21.3.2} \end{equation}
for some function \(\rho\colon \NN^k \to \RR\text{;}\) note that we can recover (21.3.1) by taking \(\rho\) to depend only on \(d_1 \cdots d_k\text{.}\)
This leaves the question of how to choose the function \(\rho\text{.}\) For the shape (21.3.1) we can look directly to the Selberg sieve weights (see Remark 14.2) and make an ansatz of
\begin{equation} \rho(d) = \mu(d) F\left(\frac{\log d}{\log R} \right) \qquad (d \leq R)\tag{21.3.3} \end{equation}
for some cutoff parameter \(R\) and some smooth function \(F\) supported on \([0,1]\text{.}\) The natural analogue of this for the shape (21.3.2) would be
\begin{equation} \rho(d_1,\dots,d_k) = \mu(d_1)\cdots \mu(d_k) F\left( \frac{\log d_1}{\log R}, \dots, \frac{\log d_k}{\log R} \right)\tag{21.3.4} \end{equation}
for a suitable smooth function \(F\) supported on some bounded region, which for consistency with the specialization of (21.3.2) back to (21.3.1) we take to be the standard simplex
\begin{equation*} \Delta_k := \{(x_1,\dots,x_k) \in \RR^k\colon x_i \geq 0, \sum_i x_i \leq 1\}. \end{equation*}
For convenience in later computations, we modify this recipe slightly:
\begin{equation} \rho(d_1,\dots,d_k) = \sum_{\substack{r_i\colon d_i|r_i \\ \gcd(r_i, W) = 1}} \mu\left(\prod_i r_i \right)^2 \prod_i \frac{d_i\mu(d_i)}{\varphi(r_i)} F\left( \frac{\log r_1}{\log R}, \dots, \frac{\log r_k}{\log R} \right).\tag{21.3.5} \end{equation}
(The effect of the term \(\mu\left(\prod_i r_i \right)^2\) is to force the \(r_i\) to be squarefree and pairwise coprime.)

Section 21.4 Identification of error terms

Our next step is to separate the quantities \(S_1\) and \(S_2\) into main terms and error terms, with the main terms being quantities we can hope to compute by complex-analytic techniques.
We first reinterpret the sums. To begin with, rewrite (21.3.2) by replacing the square with two copies of the summation:
\begin{equation*} a(n) = \sum_{d_{i,1},d_{i,2} | n+h_i} \rho(d_{1,1},\dots,d_{k,1}) \rho(d_{1,2},\dots,d_{k,2}). \end{equation*}
Then rewrite \(S_1\) and \(S_2^{(j)}\) as
\begin{align*} S_1 \amp= \sum_{d_{i,1},d_{i,2}} \rho(d_{1,1},\dots,d_{k,1}) \rho(d_{1,2},\dots,d_{k,2}) \sum_{\substack{x \lt n \leq 2x \\ \lcm(d_{i,1},d_{i,2}) | n+h_i}} 1\\ S_2^{(j)} \amp= \sum_{d_{i,1},d_{i,2}} \rho(d_{1,1},\dots,d_{k,1}) \rho(d_{1,2},\dots,d_{k,2}) \sum_{\substack{x \lt n \leq 2x \\ \lcm(d_{i,1},d_{i,2}) | n+h_i}} \chi_P(n+h_j). \end{align*}
For \(i=j\) we only get a nonzero contribution to \(S_2^{(j)}\) when \(d_{j,1}= d_{j,2} = 1\) (the other option \(d_{j,*} = n+h_j\) lies beyond the cutoff). For \(i \neq j\text{,}\) we can think of first pinning \(n\) down among some number of arithmetic progressions modulo \(X := \prod_{i \neq j} \lcm(d_{i,1},d_{i,2})\) and then picking out prime values of \(n+h_j\) within each progression; we can estimate the effect of restricting to primes by replacing \(\chi_P(n+h_j)\) with \(\frac{X}{\varphi(X) \log (n+h_j)}\) in the sum, at the expense of creating a sum of error terms in the prime number theorem in arithmetic progressions.

Remark 21.8.

Applying Lemma 21.7, we obtain
\begin{equation*} \lim_{x \to \infty} \sum_{j=1}^k \frac{S_2}{S_1} = \frac{\log R}{\log x} \sum_{j=1}^k \frac{ J_k^{(j)}(F)}{I_k(F)}. \end{equation*}
The fraction \(\frac{\log R}{\log x}\) equals \(\frac{1}{4} - \epsilon\text{;}\) it arises as \(\frac{1}{2}\) times the level of distribution in Theorem 18.1 (in the sense of Remark 18.3). In particular, Conjecture 18.2 would improve this by a factor of 2.
In [9] (see also [10]), the weights were taken to follow (21.3.1) which led to a weaker lower bound on \(S_2/S_1\text{;}\) as a result, they are only able to prove bounded gaps between primes assuming a level of distribution strictly greater than \(\frac{1}{2}\text{.}\) While this remains unknown, it was observed by Zhang that it would suffice for bounded gaps to prove a slightly weaker statement in which one picks out particular arithmetic progressions, rather than taking the worst case among them all; see Remark 18.6). While we will not discuss this strategy here, in the context of our present approach this refinement does improve the bound on gaps between primes; see [21].

Section 21.5 Optimizing the objective function

Given Lemma 21.7, it remains to see that the function \(F\colon \Delta_k \to \RR\) can be chosen to achieve the desired lower bounds on \(\sum_{j=1}^k \frac{J_k^{(j)}(F)}{I_k(F)}\text{.}\) Note that for \(F\) symmetric, this expression simplifies to \(k \frac{J_k^{(1)}(F)}{I_k(F)}\text{.}\)

Exercises 21.6 Exercises

1.

Say we want to produce large gaps between primes. Take \(N\) to be the product of the primes up to \(m\text{,}\) and consider \(N+2, \dots, N+m\text{.}\) For what function \(f\) does this imply \(p_{n+1} - p_n > f(p_n)\) for infinitely many \(n\text{?}\)