Skip to main content

Chapter 13 The Selberg sieve

Chapter 14

Section 13.1 Review of notation

Let \(f\colon \NN \to \CC\) be an arithmetic function, and suppose we want to estimate the sum of \(f\) over primes. More precisely, let \(P\) be a set of primes, and put
\begin{equation*} P(z) := \prod_{p \leq z, p \in P} p\text{.} \end{equation*}
If we define
\begin{align*} S(x,z) \amp= \sum_{n \leq x, (n, P(z)) = 1} f(n),\\ A_d(x) \amp= \sum_{n \leq x, n \equiv 0 \pmod{d}} f(n) \end{align*}
(with the dependence on \(P\) and \(f\) suppressed from the notation), we have
\begin{equation*} S(x,z) = \sum_{d | P(z)} \mu(d) A_d(x)\text{.} \end{equation*}
Let \(g(d)\) be a multiplicative function with
\begin{gather*} g(p) \in [0,1) \qquad (p \in P)\\ g(p) = 0 \qquad (p \notin P)\text{,} \end{gather*}
and write
\begin{equation*} A_d(x) = g(d) x + r_d(x)\text{.} \end{equation*}
Then
\begin{align*} S(x,z) \amp= V(z)x + R(x,z)\\ V(z) \amp= \prod_{p | P(z)} (1 - g(p))\\ R(x,z) \amp= \sum_{d | P(z)} r_d(x)\text{.} \end{align*}

Section 13.2 The Selberg upper bound sieve

In Chapter 12, we used the combinatorial sieve to construct an arithmetic function \(\lambda^+\colon \NN \to \RR\) such that
\begin{align*} \lambda^+(1) \amp= 1\\ \sum_{d | n} \lambda^+(d) \amp\geq 0 \quad (n \gt 1) \end{align*}
By setting
\begin{align*} V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\ R^{+}(x,z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x)\text{,} \end{align*}
we were able to obtain the bound
\begin{equation} V^-(z) x + R^-(x,z) \leq S(x,z) \leq V^+(z) x + R^+(x,z)\text{,}\tag{13.2.1} \end{equation}
but controlling \(V^+\) and \(R^+\) was rather painful.
Selberg proposed instead to construct an arithmetic function \(\rho\colon \NN \to \RR\) with \(\rho(1) = 1\) and
\begin{equation*} \sum_{d | n} \lambda^+(n) = \left( \sum_{d | n} \rho(d) \right)^2\text{.} \end{equation*}
In other words, let \(\rho\) be any arithmetic function with \(\rho(1) = 1\text{,}\) and put
\begin{equation*} \lambda^+(n) = \sum_{d_1, d_2: \lcm(d_1, d_2) = n} \rho(d_1) \rho(d_2)\text{.} \end{equation*}
We will typically want \(\lambda^+(d) = 0\) for \(d \geq y\) for some prespecified number \(y\text{;}\) to enforce this, we may insist that \(\rho(n) = 0\) for \(n \geq \sqrt{y}\text{.}\) We call the resulting \(\lambda^+\) an \(L^2\)-sieve of level \(y\), or more commonly a Selberg (upper bound) sieve of level \(y\).
Let us drop \(x\) from consideration by agreeing to only consider functions \(f\) with finite support. (That is, we replace \(f\) by the function vanishing above \(x\text{.}\)) If we again set
\begin{align*} S(z) \amp= \sum_{(n, P(z)) = 1} f(n)\\ V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\ \amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) g(\lcm(d_1,d_2)) \end{align*}
\begin{align*} R^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x)\\ \amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) r_{\lcm(d_1,d_2)}(x)\text{,} \end{align*}
we again have
\begin{equation} S(z) \leq V^+(z) x + R^+(z)\text{.}\tag{13.2.2} \end{equation}
Ignoring the error term \(R^+(z)\) for the moment, one can ask about optimizing the main term \(V^+(z) x\) in the bound (13.2.2). This amounts to viewing \(V^+(z)\) as a quadratic form and then minimizing it.
For simplicity, we will assume that \(g(p) \in (0,1)\) for \(p \in P\text{,}\) and \(g(p) = 0\) for \(p \notin P\text{.}\) (Before we only wanted \(g(p) \in [0,1)\) for \(p \in P\text{,}\) but there is no harm in adding to \(P\) those primes \(p\) for which \(g(p) = 0\) into \(P\text{.}\)) Let \(h\) be a multiplicative function with
\begin{equation*} h(p) = \frac{g(p)}{1 - g(p)}\text{.} \end{equation*}
We can then diagonalize the quadratic form as follows: first, put \(c := \gcd(d_1, d_2)\text{,}\) \(a := d_1/c\text{,}\) \(b := d_2/c\) to obtain
\begin{align*} V^+(z) \amp= \sum_{a,b,c: abc | P(z)} \rho(ac) \rho(bc) g(abc)\\ \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{a,b: abc | P(z)} (g(ac) \rho(ac)) (g(bc) \rho(bc))\text{.} \end{align*}
Note that since \(P(z)\) is squarefree, the condition \(abc | P(z)\) forces \(\gcd(a,b) = 1\text{.}\) We now perform inclusion-exclusion on \(\gcd(a,b)\) to obtain
\begin{align*} V^+(z) \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d | P(z)/c} \mu(d) \left( \sum_{m | P(z)/(cd)} g(cdm) \rho(cdm) \right)^2\\ \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d|P(z)/c} \mu(d) \left( \sum_{m | P(z): m \equiv 0 \,(cd)} g(m) \rho(m) \right)^2 \text{.} \end{align*}
We next substitute \(e,f/e\) in for \(c,d\text{,}\) and reorder the sum:
\begin{align*} V^+(z) \amp= \sum_{f | P(z)} \sum_{e | f} \mu(f/e) g(e)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2\\ \amp= \sum_{f | P(z)} h(f)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2\text{.} \end{align*}
Let’s put
\begin{equation*} \xi(d) := \mu(d) \sum_{m|P(z): m \equiv 0\,(d)} g(m) \rho(m)\text{,} \end{equation*}
so that we have
\begin{equation*} V^+(z) = \sum_{d | P(z)} h(d)^{-1} \xi(d)^2\text{.} \end{equation*}
Before we can minimize this quadratic form, we must first reexpress in terms of \(\xi\) the conditions we imposed on \(\rho\text{.}\) Namely, by Möbius inversion (Lemma 11.3),
\begin{equation*} \rho(n) = \frac{\mu(n)}{g(n)} \sum_{d|P(z): d \equiv 0\,(n)} \xi(d)\text{,} \end{equation*}
so the condition \(\rho(1) = 1\) is equivalent to
\begin{equation*} \sum_{d | P(z)} \xi(d) = 1\text{,} \end{equation*}
and the condition \(\rho(d) = 0\) for \(d \geq \sqrt{y}\) is equivalent to
\begin{equation*} \xi(d) = 0 \qquad (d \geq \sqrt{y})\text{.} \end{equation*}
That is, \(\xi\) is restricted to a hyperplane.
Here’s where the \(L^2\) part comes in. By the Cauchy–Schwartz inequality,
\begin{equation*} V^+(z) \geq H^{-1}, \qquad H = \sum_{d \lt \sqrt{y}, d | P(z)} h(d) \end{equation*}
and equality holds for
\begin{equation*} \xi(d) = h(d) H^{-1} \qquad (d \lt \sqrt{y})\text{.} \end{equation*}
Backing up, we get
\begin{equation} \rho(d) = \mu(d) \frac{h(d)}{g(d)} H^{-1} \sum_{n \lt \sqrt{y}/d\colon \gcd(d,n) = 1} h(n)\text{.}\tag{13.2.3} \end{equation}
Putting this together, we obtain the following.
As a somewhat miraculous corollary (due to van Lint and Richert), we obtain
\begin{equation} 0 \leq \mu(d) \rho(d) \leq 1\tag{13.2.5} \end{equation}
(exercise; see Exercise 13.3.1); this makes it easy to estimate the error term in (13.2.4), e.g., by
\begin{equation} |\lambda^+(d)| \leq d^{(\log 3)/(\log 2)}\tag{13.2.6} \end{equation}
(exercise; see Exercise 13.3.2).

Remark 13.2.

For applications, it is useful to break the expression (13.2.3) for \(\rho(d)\) down as a product of the sign-alternating term \(\mu(d)\text{,}\) the ratio \(h(d)/g(d)\) which is roughly 1, and a decay factor. This viewpoint will play a key role in Chapter 20.

Exercises 13.3 Exercises

1.

Hint.
Group terms in the definition of \(H\) according to the common divisor of \(d\) with some fixed number \(e\text{.}\)

2.

Deduce (13.2.6) from (13.2.5) by proving that \(|\lambda^+(d)| \leq 3^{\nu(d)}\text{,}\) for \(\nu(d)\) equal to the number of prime factors of \(d\text{.}\)

3.

In Theorem 13.1, prove that if we extend \(g\) to a completely multiplicative function, then
\begin{equation*} H \geq \sum_{n \lt \sqrt{y}} g(n)\text{.} \end{equation*}

4.

Prove that for some \(c>0\text{,}\)
\begin{equation*} \sum_{n \leq x} \frac{2^{\nu(n)}}{n} \geq c \log^2 x \qquad (x \geq 1)\text{.} \end{equation*}
Hint.
An elementary proof is possible, but one can also use analytic arguments on the Dirichlet series \(\zeta^2(s)/\zeta(2s) = \sum_{n=1}^\infty 2^{\nu(n)}n^{-s}\text{.}\)

5.

Let \(d(n)\) denote the number of divisors of the positive integer \(n\text{.}\) Prove that
\begin{equation*} \sum_{n \leq x} d(n) \sim x \log x\text{.} \end{equation*}