Section 14.1 Review of notation
Let \(f\colon \NN \to \CC\) be an arithmetic function, and suppose we want to estimate the sum of \(f\) over primes. More precisely, let \(P\) be a set of primes, and put
\begin{equation*}
P(z) = \prod_{p \leq z, p \in P} p.
\end{equation*}
If we define
\begin{align*}
S(x,z) \amp= \sum_{n \leq x, (n, P(z)) = 1} f(n),\\
A_d(x) \amp= \sum_{n \leq x, n \equiv 0 \pmod{d}} f(n)
\end{align*}
(with the dependence on \(P\) and \(f\) suppressed from the notation), we have
\begin{equation*}
S(x,z) = \sum_{d | P(z)} \mu(d) A_d(x).
\end{equation*}
Let \(g(d)\) be a multiplicative function with
\begin{gather*}
g(p) \in [0,1) \qquad (p \in P);\\
g(p) = 0 \qquad (p \notin P),
\end{gather*}
and write
\begin{equation*}
A_d(x) = g(d) x + r_d(x).
\end{equation*}
Then
\begin{align*}
S(x,z) \amp= V(z)x + R(x,z)\\
V(z) \amp= \prod_{p | P(z)} (1 - g(p))\\
R(x,z) \amp= \sum_{d | P(z)} r_d(x).
\end{align*}
Section 14.2 The Selberg upper bound sieve
In
Chapter 13, we used the combinatorial sieve to construct an arithmetic function
\(\lambda^+\colon \NN \to \RR\) such that
\begin{align*}
\lambda^+(1) \amp= 1\\
\sum_{d | n} \lambda^+(d) \amp\geq 0 \quad (n > 1).
\end{align*}
By setting
\begin{align*}
V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\
R^{+}(x,z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x),
\end{align*}
we were able to obtain the bound
\begin{equation}
V^-(z) x + R^-(x,z) \leq S(x,z) \leq V^+(z) x + R^+(x,z),\tag{14.2.1}
\end{equation}
but controlling \(V^+\) and \(R^+\) was rather painful.
Selberg proposed instead to construct an arithmetic function \(\rho\colon \NN \to \RR\) with \(\rho(1) = 1\) and
\begin{equation*}
\sum_{d | n} \lambda^+(n) = \left( \sum_{d | n} \rho(d) \right)^2.
\end{equation*}
In other words, let \(\rho\) be any arithmetic function with \(\rho(1) = 1\text{,}\) and put
\begin{equation*}
\lambda^+(n) = \sum_{d_1, d_2: \lcm(d_1, d_2) = n} \rho(d_1) \rho(d_2).
\end{equation*}
We will typically want \(\lambda^+(d) = 0\) for \(d \geq y\) for some prespecified number \(y\text{;}\) to enforce this, we may insist that \(\rho(n) = 0\) for \(n \geq \sqrt{y}\text{.}\) We call the resulting \(\lambda^+\) an \(L^2\)-sieve of level \(y\), or more commonly a Selberg (upper bound) sieve of level \(y\).
Let us drop \(x\) from consideration by agreeing to only consider functions \(f\) with finite support. (That is, we replace \(f\) by the function vanishing above \(x\text{.}\)) If we again set
\begin{align*}
S(z) \amp= \sum_{(n, P(z)) = 1} f(n)\\
V^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) g(d)\\
\amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) g(\lcm(d_1,d_2))
\end{align*}
\begin{align*}
R^{+}(z) \amp= \sum_{d | P(z)} \lambda^{+}(d) r_d(x)\\
\amp= \sum_{d_1, d_2 | P(z)} \rho(d_1) \rho(d_2) r_{\lcm(d_1,d_2)}(x),
\end{align*}
we again have
\begin{equation}
S(z) \leq V^+(z) x + R^+(z).\tag{14.2.2}
\end{equation}
Ignoring the error term
\(R^+(z)\) for the moment, one can ask about optimizing the main term
\(V^+(z) x\) in the bound
(14.2.2). This amounts to viewing
\(V^+(z)\) as a quadratic form and then minimizing it.
For simplicity, we will assume that \(g(p) \in (0,1)\) for \(p \in P\text{,}\) and \(g(p) = 0\) for \(p \notin P\text{.}\) (Before we only wanted \(g(p) \in [0,1)\) for \(p \in P\text{,}\) but there is no harm in adding to \(P\) those primes \(p\) for which \(g(p) = 0\) into \(P\text{.}\)) Let \(h\) be a multiplicative function with
\begin{equation*}
h(p) = \frac{g(p)}{1 - g(p)}.
\end{equation*}
We can then diagonalize the quadratic form as follows: first, put \(c = \gcd(d_1, d_2)\text{,}\) \(a = d_1/c\text{,}\) \(b = d_2/c\) to obtain
\begin{align*}
V^+(z) \amp= \sum_{a,b,c: abc | P(z)} \rho(ac) \rho(bc) g(abc)\\
\amp= \sum_{c|P(z)} g(c)^{-1} \sum_{a,b: abc | P(z)} (g(ac) \rho(ac)) (g(bc) \rho(bc)).
\end{align*}
Note that since \(P(z)\) is squarefree, the condition \(abc | P(z)\) forces \(\gcd(a,b) = 1\text{.}\) We now perform inclusion-exclusion on \(\gcd(a,b)\) to obtain
\begin{align*}
V^+(z) \amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d | P(z)/c} \mu(d) \left( \sum_{m | P(z)/(cd)} g(cdm) \rho(cdm) \right)^2\\
\amp= \sum_{c|P(z)} g(c)^{-1} \sum_{d|P(z)/c} \mu(d) \left( \sum_{m | P(z): m \equiv 0 \,(cd)} g(m) \rho(m) \right)^2.
\end{align*}
We next substitute \(e,f/e\) in for \(c,d\text{,}\) and reorder the sum:
\begin{align*}
V^+(z) \amp= \sum_{f | P(z)} \sum_{e | f} \mu(f/e) g(e)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2\\
\amp= \sum_{f | P(z)} h(f)^{-1} \left( \sum_{m | P(z): m \equiv 0 \,(f)} g(m) \rho(m) \right)^2.
\end{align*}
Let's put
\begin{equation*}
\xi(d) = \mu(d) \sum_{m|P(z): m \equiv 0\,(d)} g(m) \rho(m),
\end{equation*}
so that we have
\begin{equation*}
V^+(z) = \sum_{d | P(z)} h(d)^{-1} \xi(d)^2.
\end{equation*}
Before we can minimize this quadratic form, we must first reexpress in terms of
\(\xi\) the conditions we imposed on
\(\rho\text{.}\) Namely, by Möbius inversion (
Lemma 12.3),
\begin{equation*}
\rho(n) = \frac{\mu(n)}{g(n)} \sum_{d|P(z): d \equiv 0\,(n)} \xi(d),
\end{equation*}
so the condition \(\rho(1) = 1\) is equivalent to
\begin{equation*}
\sum_{d | P(z)} \xi(d) = 1,
\end{equation*}
and the condition \(\rho(d) = 0\) for \(d \geq \sqrt{y}\) is equivalent to
\begin{equation*}
\xi(d) = 0 \qquad (d \geq \sqrt{y}).
\end{equation*}
That is, \(\xi\) is restricted to a hyperplane.
Here's where the \(L^2\) part comes in. By the Cauchy–Schwartz inequality,
\begin{equation*}
V^+(z) \geq H^{-1}, \qquad H = \sum_{d \lt \sqrt{y}, d | P(z)} h(d)
\end{equation*}
and equality holds for
\begin{equation*}
\xi(d) = h(d) H^{-1} \qquad (d \lt \sqrt{y}).
\end{equation*}
Backing up, we get
\begin{equation}
\rho(d) = \mu(d) \frac{h(d)}{g(d)} H^{-1} \sum_{n \lt \sqrt{y}/d: \gcd(d,n) = 1}
h(n).\tag{14.2.3}
\end{equation}
Putting this together, we obtain the following.
Theorem 14.1. (Selberg).
Let \(f: \NN \to \RR_{\geq 0}\) be an arithmetic function with finite support. Let \(P\) be a set of primes, and put \(P(z) = \prod_{p \leq z, p \in P} p\text{.}\) For \(d | P(z)\text{,}\) write
\begin{equation*}
A_d = \sum_{n \equiv 0 \,(d)} f(n) = g(d) X + r_d(z)
\end{equation*}
for \(X>0\) and \(g\) a multiplicative function with \(0 \lt g(p) \lt 1\) for all \(p \in P\text{.}\) Let \(h(d)\) be a multiplicative function with \(h(p) = g(p) (1-g(p))^{-1}\) for all \(p \in P\text{,}\) and put
\begin{equation*}
H = \sum_{d \lt \sqrt{y}, d |P(z)} h(d)
\end{equation*}
for some \(y > 1\text{.}\) Then
\begin{equation}
S(z) = \sum_{(n, P(z)) = 1} f(n) \leq X H^{-1} + \sum_{d | P(z)} \lambda^+(d) r_d(z),\tag{14.2.4}
\end{equation}
for
\begin{align*}
\lambda^+(n) \amp= \sum_{d_1, d_2: \lcm(d_1, d_2) = n} \rho(d_1) \rho(d_2)\\
\rho(d) \amp= \mu(d) \frac{h(d)}{g(d)} H^{-1} \sum_{n \lt \sqrt{y}/d: \gcd(d,n) = 1} h(n).
\end{align*}
As a somewhat miraculous corollary (due to van Lint and Richert), we obtain
\begin{equation}
0 \leq \mu(d) \rho(d) \leq 1\tag{14.2.5}
\end{equation}
\begin{equation}
|\lambda^+(d)| \leq d^{(\log 3)/(\log 2)}\tag{14.2.6}
\end{equation}