Skip to main content

Chapter 14 The Selberg sieve

In this chapter, we introduce Selberg's delightfully simple approach to upper bound sieving. We will have more to say about applications of the Selberg sieve in Chapter 15.

Section 14.1 Review of notation

Let \(f\colon \NN \to \CC\) be an arithmetic function, and suppose we want to estimate the sum of \(f\) over primes. More precisely, let \(P\) be a set of primes, and put
\begin{equation*} P(z) = \prod_{p \leq z, p \in P} p. \end{equation*}
If we define
\begin{align*} S(x,z) \amp= \sum_{n \leq x, (n, P(z)) = 1} f(n),\\ A_d(x) \amp= \sum_{n \leq x, n \equiv 0 \pmod{d}} f(n) \end{align*}
(with the dependence on \(P\) and \(f\) suppressed from the notation), we have
\begin{equation*} S(x,z) = \sum_{d | P(z)} \mu(d) A_d(x). \end{equation*}
Let \(g(d)\) be a multiplicative function with
\begin{gather*} g(p) \in [0,1) \qquad (p \in P);\\ g(p) = 0 \qquad (p \notin P), \end{gather*}
and write
\begin{equation*} A_d(x) = g(d) x + r_d(x). \end{equation*}
Then
\begin{align*} S(x,z) \amp= V(z)x + R(x,z)\\ V(z) \amp= \prod_{p | P(z)} (1 - g(p))\\ R(x,z) \amp= \sum_{d | P(z)} r_d(x). \end{align*}

Section 14.2 The Selberg upper bound sieve

In Chapter 13, we used the combinatorial sieve to construct an arithmetic function \(\lambda^+\colon \NN \to \RR\) such that
\begin{align*} \lambda^+(1) \amp= 1\\ \sum_{d | n} \lambda^+(d) \amp\geq 0 \quad (n > 1). \end{align*}
By setting
\begin{align*} V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\ R^{+}(x,z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x), \end{align*}
we were able to obtain the bound
\begin{equation} V^-(z) x + R^-(x,z) \leq S(x,z) \leq V^+(z) x + R^+(x,z),\tag{14.2.1} \end{equation}
but controlling \(V^+\) and \(R^+\) was rather painful.
Selberg proposed instead to construct an arithmetic function \(\rho\colon \NN \to \RR\) with \(\rho(1) = 1\) and
\begin{equation*} \sum_{d | n} \lambda^+(n) = \left( \sum_{d | n} \rho(d) \right)^2. \end{equation*}
In other words, let \(\rho\) be any arithmetic function with \(\rho(1) = 1\text{,}\) and put
\begin{equation*} \lambda^+(n) = \sum_{d_1, d_2: \lcm(d_1, d_2) = n} \rho(d_1) \rho(d_2). \end{equation*}
We will typically want \(\lambda^+(d) = 0\) for \(d \geq y\) for some prespecified number \(y\text{;}\) to enforce this, we may insist that \(\rho(n) = 0\) for \(n \geq \sqrt{y}\text{.}\) We call the resulting \(\lambda^+\) an \(L^2\)-sieve of level \(y\), or more commonly a Selberg (upper bound) sieve of level \(y\).
Let us drop \(x\) from consideration by agreeing to only consider functions \(f\) with finite support. (That is, we replace \(f\) by the function vanishing above \(x\text{.}\)) If we again set
\begin{align*} S(z) \amp= \sum_{(n, P(z)) = 1} f(n)\\ V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\ \amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) g(\lcm(d_1,d_2)) \end{align*}
\begin{align*} R^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x)\\ \amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) r_{\lcm(d_1,d_2)}(x), \end{align*}
we again have
\begin{equation} S(z) \leq V^+(z) x + R^+(z).\tag{14.2.2} \end{equation}
Ignoring the error term \(R^+(z)\) for the moment, one can ask about optimizing the main term \(V^+(z) x\) in the bound (14.2.2). This amounts to viewing \(V^+(z)\) as a quadratic form and then minimizing it.
For simplicity, we will assume that \(g(p) \in (0,1)\) for \(p \in P\text{,}\) and \(g(p) = 0\) for \(p \notin P\text{.}\) (Before we only wanted \(g(p) \in [0,1)\) for \(p \in P\text{,}\) but there is no harm in adding to \(P\) those primes \(p\) for which \(g(p) = 0\) into \(P\text{.}\)) Let \(h\) be a multiplicative function with
\begin{equation*} h(p) = \frac{g(p)}{1 - g(p)}. \end{equation*}
We can then diagonalize the quadratic form as follows: first, put \(c = \gcd(d_1, d_2)\text{,}\) \(a = d_1/c\text{,}\) \(b = d_2/c\) to obtain
\begin{align*} V^+(z) \amp= \sum_{a,b,c: abc | P(z)} \rho(ac) \rho(bc) g(abc)\\ \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{a,b: abc | P(z)} (g(ac) \rho(ac)) (g(bc) \rho(bc)). \end{align*}
Note that since \(P(z)\) is squarefree, the condition \(abc | P(z)\) forces \(\gcd(a,b) = 1\text{.}\) We now perform inclusion-exclusion on \(\gcd(a,b)\) to obtain
\begin{align*} V^+(z) \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d | P(z)/c} \mu(d) \left( \sum_{m | P(z)/(cd)} g(cdm) \rho(cdm) \right)^2\\ \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d|P(z)/c} \mu(d) \left( \sum_{m | P(z): m \equiv 0 \,(cd)} g(m) \rho(m) \right)^2. \end{align*}
We next substitute \(e,f/e\) in for \(c,d\text{,}\) and reorder the sum:
\begin{align*} V^+(z) \amp= \sum_{f | P(z)} \sum_{e | f} \mu(f/e) g(e)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2\\ \amp= \sum_{f | P(z)} h(f)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2. \end{align*}
Let's put
\begin{equation*} \xi(d) = \mu(d) \sum_{m|P(z): m \equiv 0\,(d)} g(m) \rho(m), \end{equation*}
so that we have
\begin{equation*} V^+(z) = \sum_{d | P(z)} h(d)^{-1} \xi(d)^2. \end{equation*}
Before we can minimize this quadratic form, we must first reexpress in terms of \(\xi\) the conditions we imposed on \(\rho\text{.}\) Namely, by Möbius inversion (Lemma 12.3),
\begin{equation*} \rho(n) = \frac{\mu(n)}{g(n)} \sum_{d|P(z): d \equiv 0\,(n)} \xi(d), \end{equation*}
so the condition \(\rho(1) = 1\) is equivalent to
\begin{equation*} \sum_{d | P(z)} \xi(d) = 1, \end{equation*}
and the condition \(\rho(d) = 0\) for \(d \geq \sqrt{y}\) is equivalent to
\begin{equation*} \xi(d) = 0 \qquad (d \geq \sqrt{y}). \end{equation*}
That is, \(\xi\) is restricted to a hyperplane.
Here's where the \(L^2\) part comes in. By the Cauchy–Schwartz inequality,
\begin{equation*} V^+(z) \geq H^{-1}, \qquad H = \sum_{d \lt \sqrt{y}, d | P(z)} h(d) \end{equation*}
and equality holds for
\begin{equation*} \xi(d) = h(d) H^{-1} \qquad (d \lt \sqrt{y}). \end{equation*}
Backing up, we get
\begin{equation} \rho(d) = \mu(d) \frac{h(d)}{g(d)} H^{-1} \sum_{n \lt \sqrt{y}/d: \gcd(d,n) = 1} h(n).\tag{14.2.3} \end{equation}
Putting this together, we obtain the following.
As a somewhat miraculous corollary (due to van Lint and Richert), we obtain
\begin{equation} 0 \leq \mu(d) \rho(d) \leq 1\tag{14.2.5} \end{equation}
(exercise; see Exercise 14.3.1); this makes it easy to estimate the error term in (14.2.4), e.g., by
\begin{equation} |\lambda^+(d)| \leq d^{(\log 3)/(\log 2)}\tag{14.2.6} \end{equation}
(exercise; see Exercise 14.3.2).

Remark 14.2.

For applications, it is useful to break the expression (14.2.3) for \(\rho(d)\) down as a product of the sign-alternating term \(\mu(d)\text{,}\) the ratio \(h(d)/g(d)\) which is roughly 1, and a decay factor. This viewpoint will play a key role in Chapter 21.

Exercises 14.3 Exercises

1.

Prove Exercise 14.3.1.
Hint.
Group terms in the definition of \(H\) according to the common divisor of \(d\) with some fixed number \(e\text{.}\)

2.

Deduce (14.2.6) from (14.2.5) by proving that \(|\lambda^+(d)| \leq 3^{\nu(d)}\text{,}\) for \(\nu(d)\) equal to the number of prime factors of \(d\text{.}\)

3.

In Theorem 14.1, prove that if we extend \(g\) to a completely multiplicative function, then
\begin{equation*} H \geq \sum_{n \lt \sqrt{y}} g(n). \end{equation*}

4.

Prove that for some \(c>0\text{,}\)
\begin{equation*} \sum_{n \leq x} \frac{2^{\nu(n)}}{n} \geq c \log^2 x \qquad (x \geq 1). \end{equation*}
Hint.
An elementary proof is possible, but one can also use analytic arguments on the Dirichlet series \(\zeta^2(s)/\zeta(2s) = \sum_{n=1}^\infty 2^{\nu(n)}n^{-s}\text{.}\)

5.

Let \(d(n)\) denote the number of divisors of the positive integer \(n\text{.}\) Prove that
\begin{equation*} \sum_{n \leq x} d(n) \sim x \log x. \end{equation*}