Skip to main content

Chapter 2 The prime number theorem

In the first part of the course, we introduce the basic instruments of zeta functions and \(L\)-functions, which provide a way to apply complex analysis to study the distribution of prime numbers, including in arithmetc progressions.
We begin by presenting a self-contained proof of the prime number theorem. This amounts to a streamlined version of the original methods of Hadamard and de la Vallée-Poussin, who independently established the prime number theorem in 1897; we follow an argument due to D.J. Newman as exposed by Zagier [28].

Section 2.1 Euler's idea: revisiting the infinitude of primes

It can be argued that every truly transformative idea in analytic number theory gives rise to a new proof of the infinitude of primes. The first proof of this, due to Euclid, is entirely algebraic: assume there are only finitely many primes, multiply them together, add 1, then factor the result. This can be seen as the inspiration for the sieve of Eratosthenes (Chapter 12), and by extension for the combinatorial sieve-theoretic methods that we treat in the third part of the course.
Let us instead turn now to Euler's proof of the infinitude of primes, which flows from a basic fact of analysis.
This follows for instance by writing
\begin{equation*} \sum_{n=1}^N \frac{1}{n} \geq \sum_{n=1}^N \frac{1}{2^{\lceil \log_2 n \rceil}} \geq \frac{1}{2} \lfloor \log_2 N \rfloor \end{equation*}
and observing that the right side tends to \(\infty\) as \(N \to \infty\text{.}\) In practice we will usually want a more precise estimate, for which see Exercise 2.6.1.
Given the divergence of the harmonic series, Euler argues as follows. If there were only finitely many primes, then unique factorization of positive integers into prime powers would imply that
\begin{equation*} \sum_{n=1}^\infty \frac{1}{n} = \prod_{p} \left(1 + \frac{1}{p} + \frac{1}{p^2} + \cdots \right) = \prod_{p} \left(1 - \frac{1}{p} \right)^{-1}, \end{equation*}
which would give the equality between a divergent series and a finite quantity, a contradiction.
Euler's idea turns out to be quite fruitful: the introduction of analysis into the study of prime numbers allows us to prove distribution statements about primes in a much more flexible fashion than is allowed by algebraic techniques. For instance, we will see in Chapter 5 how Dirichlet adapted this idea to prove that every arithmetic progression whose terms do not all share a common factor contains infinitely many primes, a statement for which there seems to be no purely algebraic proof.

Section 2.2 Riemann's zeta function

For the moment, however, let us turn to Riemann's one paper in number theory, in which he fleshes out Euler's idea and fits it into the theory of complex functions of one variable.

Definition 2.2.

Following Riemann, we define the function \(\zeta(s)\) for all \(s \in \CC\) with \(\Real(s) > 1\) as the sum of an absolutely convergent power series:
\begin{equation*} \zeta(s) := \sum_{n=1}^\infty n^{-s}. \end{equation*}
More precisely, the series converges uniformly in any region of the form \(\Real(s) \geq 1 + \epsilon\) for \(\epsilon > 0\text{;}\) consequently, \(\zeta(s)\) is an analytic function on the half-plane \(\Real(s) > 1\text{.}\) The boundary \(\Real(s)=1\) of this half-plane is sometimes called the critical line for \(\zeta\text{.}\)
Again for \(\Real(s) > 1\text{,}\) we can also write
\begin{equation} \zeta(s) = \prod_p \left( 1 - p^{-s} \right)^{-1},\tag{2.2.1} \end{equation}
and this product converges absolutely and uniformly for \(\Real(s) \geq 1 + \epsilon\) for \(\epsilon > 0\text{.}\) (Reminder: a product \(\prod_i (1 + a_i)\) converges absolutely if and only if \(\sum_i a_i\) converges absolutely.) It follows that \(\zeta(s) \neq 0\) for \(\Real(s) > 1\text{.}\)
For future reference, we note that the product representation is sometimes more useful in the form
\begin{equation} \log \zeta(s) = \sum_p -\log (1-p^{-s}) = \sum_p \sum_{n=1}^\infty \frac{p^{-ns}}{n}.\tag{2.2.2} \end{equation}
We now show that \(\zeta\) extends somewhat beyond the domain of absolute convergence of the original series.

Definition 2.3.

By partial summation (or summation by parts, as in integration by parts), we will mean the following identity attributed to Abel: for any two sequences \(a_n,b_n\text{,}\)
\begin{equation*} \sum_{n=1}^N a_n b_n = a_{N+1} B_N - \sum_{n=1}^N (a_{n+1} - a_{n}) B_n, \qquad B_n := \sum_{i=1}^n b_i. \end{equation*}
This remains valid if \(N = \infty\) as long as both sums are absolutely convergent.
We apply partial summation to \(\zeta(s)\) by taking \(a_n = n^{-s}\) and \(b_n = 1\text{,}\) so that \(B_n = n\text{.}\) Rather, we apply partial summation to the truncated sum \(\sum_{n=1}^N n^{-s}\text{,}\) and note that the error term \(a_{N+1} B_N = (N+1)^{-s} N\) tends to 0 for \(\Real(s) > 1\text{.}\) (Warning: in general, I am not going to be nearly so verbose when applying partial summation. So make sure you understand this example!)
With that said, we have
\begin{align*} \zeta(s) &= \sum_{n=1}^\infty n (n^{-s} - (n+1)^{-s})\\ &= s \sum_{n=1}^\infty n \int_n^{n+1} x^{-s-1}\,dx\\ &= s \int_1^\infty \lfloor x \rfloor x^{-s-1}\,dx. \end{align*}
We can thus write
\begin{equation*} f(s) = -s\int_{i=1}^\infty \{x\} x^{-s-1}\,dx, \end{equation*}
for \(\{x\}\) the fractional part of \(x\text{;}\) the integral converges absolutely for \(\Real(s) > 0\text{,}\) and uniformly for \(\Real(s) \geq \epsilon\) for any \(\epsilon > 0\text{.}\) This proves the claim.
In the open half-plane \(\Real(s) > 1\text{,}\) the absolute convergence of the product representation (2.2.1) shows that \(\zeta(s) \neq 0\text{.}\) To prove the prime number theorem, we also need to exclude zeroes on the boundary of that half-plane, which is more subtle.
This is all we need about \(\zeta\) for the proof of the prime number theorem. We will return to Riemann's memoir, establishing more detailed properties of \(\zeta\text{,}\) in Chapter 6.

Section 2.3 Towards the prime number theorem

We next reformulate the prime number theorem in a way that brings it closer to \(\zeta\text{.}\)

Definition 2.6.

For \(x \in \RR\text{,}\) write
\begin{align*} \pi(x) &= \sum_{p \leq x} 1\\ \vartheta(x) &= \sum_{p \leq x} \log p. \end{align*}
This follows by observing that for any \(\epsilon > 0\text{,}\)
\begin{align*} \vartheta(x) &\leq \sum_{p \leq x} \log x = \pi(x) \log x\\ \vartheta(x) &\geq \sum_{x^{1-\epsilon} \leq p \leq x} \log x^{1-\epsilon} = (1-\epsilon) (\pi(x) + O(x^{1-\epsilon})) \log x. \end{align*}
What we will prove is that the improper integral
\begin{equation} \int_1^\infty \frac{\vartheta(x) - x}{x^2}\,dx\tag{2.3.1} \end{equation}
converges; remember that this means that for every \(\epsilon > 0\text{,}\) there exists \(N\) such that for \(y,z \geq N\text{,}\)
\begin{equation*} \left| \int_y^z \frac{\vartheta(x) - x}{x^2} \,dx \right| < \epsilon. \end{equation*}
To deduce from (2.3.1) that \(\vartheta(x) \sim x\text{,}\) first suppose that there exists \(\lambda>1\) such that \(\vartheta(x) \geq \lambda x\) for arbitrarily large \(x\text{.}\) Since \(\vartheta\) is nondecreasing, it then follows that for any such value of \(x\text{,}\)
\begin{equation*} \int_x^{\lambda x} \frac{\vartheta(t) - t}{t^2} dt \geq \int_x^{\lambda x} \frac{\lambda x - t}{t^2} dt = \int_1^\lambda \frac{\lambda - t}{t^2} dt > 0, \end{equation*}
contradiction. Similarly, if there exists \(\lambda < 1\) such that \(\vartheta(x) \leq \lambda x\) for arbitrarily large \(x\text{,}\) then such values of \(x\) satisfy
\begin{equation*} \int_{\lambda x}^x \frac{\vartheta(t) - t}{t^2} dt \leq \int_{\lambda x}^x \frac{\lambda x - t}{t^2} dt = \int_\lambda^1 \frac{\lambda - t}{t^2} dt < 0, \end{equation*}
contradiction.

Section 2.4 The Tauberian argument

We have thus reduced the prime number theorem to the convergence of the integral (2.3.1); we turn to this next.
Consider the function \(\Phi(s) = - \zeta'(s)/\zeta(s)\text{.}\) Starting from the log-product representation (2.2.2), using partial summation as in Theorem 2.4, and substituting \(x = e^t\text{,}\) we find
\begin{align*} \Phi(s) &= \sum_p (\log p) p^{-s} + \sum_p \sum_{n=2}^\infty (\log p) p^{-ns}\\ &= s \int_1^\infty \vartheta(x) x^{-s-1}\,dx + s \int_1^\infty \vartheta(x) \left( \sum_{n=2}^\infty n x^{-ns-1} \right) \,dx\\ &= s \int_0^\infty e^{-st} \vartheta(e^t)\,dt + s \int_0^\infty \frac{2 e^{-2st} - e^{-3st}}{(1-e^{-st})^2} \vartheta(e^t)\,dt \end{align*}
Define the functions
\begin{align*} f(t) &:= \vartheta(e^t) e^{-t} - 1\\ g(z) &:= \frac{\Phi(z+1)}{z+1} - \frac{1}{z}; \end{align*}
by the above, for \(\Real(z) > 0\) we have
\begin{equation} g(z) = \int_0^\infty f(t) e^{-zt}\,dt + \int_0^\infty \frac{2 e^{-2(z+1)t} - e^{-3(z+1)t}}{(1-e^{-(z+1)t})^2} \vartheta(e^t)\,dt.\tag{2.4.1} \end{equation}
We will deduce (2.3.1), and thus deduce the prime number theorem, by showing that (2.4.1) remains valid for \(z=0\) (as then we can substitute \(x = e^t\)). Note that the second integral in (2.4.1) converges absolutely for \(z=0\text{,}\) so we only have to show that computing the first integral commutes with taking the limit as \(z \to 0^+\text{.}\)
The idea here is to leverage complex function-theoretic information about \(\Phi\text{;}\) this sort of operation is known as a Tauberian argument. To be precise, we know that \(\Phi(s)\) is meromorphic on \(\Real(s) > 0\text{,}\) with a simple pole at \(s=1\) of residue 1 and no other poles in \(\Real(s) \geq 1\text{.}\) It follows that \(f\) and \(g\) satisfy the conditions of the following theorem.
For \(T>0\text{,}\) put \(g_T(z) = \int_0^T f(t) e^{-zt}\,dt\text{;}\) each function \(g_T\) is entire, and we want \(\lim_{T \to \infty} g_T(0) = g(0)\text{.}\)
For \(R\) large (but fixed until further notice), let \(C\) be the boundary of the region
\begin{equation*} \{z \in \CC: |z| \leq R, \quad \Real(z) \geq -\delta\} \end{equation*}
for some \(\delta = \delta(R) > 0\) chosen small enough that \(C\) lies inside the domain on which \(g\) is holomorphic. By the Cauchy integral theorem,
\begin{equation} g(0) - g_T(0) = \frac{1}{2 \pi i} \int_C (g(z) - g_T(z)) e^{zT} \left(1 + \frac{z^2}{R^2} \right) \frac{dz}{z};\tag{2.4.2} \end{equation}
namely, the only pole of the integrand is a simple pole at \(z=0\text{,}\) so we simply pop out the residue there.
To bound the right side of (2.4.2), we separate the contour of integration \(C\) into
\begin{align*} C_+ &= C \cap \{z \in \CC: \Real(z) \geq 0 \}\\ C_- &= C \cap \{z \in \CC: \Real(z) \leq 0 \} \end{align*}
Remember that we assumed \(f\) is bounded; choose \(B >0\) so that \(|f(t)| \leq B\) for all \(t\text{.}\) For \(\Real(z) > 0\) with \(|z| = R\text{,}\) we have
\begin{align*} |g(z) - g_T(z)| &= \left|\int_T^\infty f(t) e^{-zt}\,dt\right|\\ &\leq B \int_T^\infty |e^{-zt}|\,dt\\ &= \frac{B e^{-\Real(z) T}}{\Real(z)} \end{align*}
and
\begin{equation*} \left| e^{zT} \left( 1 + \frac{z^2}{R^2} \right) \frac{1}{z} \right| = e^{\Real(z) T} \frac{2 \Real(z)}{R^2}. \end{equation*}
Since the length of the contour is at most \(2 \pi R\text{,}\) the contribution over \(C_+\) to (2.4.2) is bounded in absolute value by
\begin{equation*} \frac{1}{2\pi} (2 \pi R) \frac{B e^{-\Real(z)T}}{\Real(z)} e^{\Real(z) T} \frac{2 \Real(z)}{R^2} = \frac{2B}{R}. \end{equation*}
Over \(C_-\text{,}\) we separate the integral into integrals involving \(g\) and \(g_T\text{.}\) Since \(g_T\) is entire, its integral over \(C_-\) can instead be calculated over the semicircle \(C'_- = \{z \in \CC: |z| = R, \Real(z) \leq 0\}\text{.}\) Since for \(\Real(z) < 0\) we have
\begin{align*} |g_T(z)| &= \left|\int_0^T f(t) e^{-zt}\,dt\right|\\ &\leq B \int_{-\infty}^T |e^{-zt}|\,dt\\ &= \frac{B e^{-\Real(z) T}}{|\Real(z)|} \end{align*}
as above we bound this contribution to (2.4.2) by \(2B/R\text{.}\)
Finally, we consider the contribution to (2.4.2) from \(g\) over \(C_-\text{;}\) we are going to show that this contribution tends to 0 as \(T \to\infty\text{.}\) By parametrizing the contour, we can write
\begin{equation*} \frac{1}{2 \pi i} \int_{C_-} g(z) e^{zT} \left( 1 + \frac{z^2}{R^2} \right) \frac{dz}{z} = \int_0^1 a(u) e^{b(u)T}\,du, \end{equation*}
where \(a(u)\) and \(b(u)\) are continuous, and \(\Real(b(u)) < 0\) for \(0 < u < 1\text{;}\) the key point is that \(a\) does not depend on \(T\text{,}\) so as \(T \to \infty\) the integrand tends to 0 pointwise except at the endpoints. Since the integrands are all bounded, Lebesgue's dominated convergence theorem implies that the integral tends to 0 as \(T \to \infty\text{.}\) (Again, I'm being more explicit with the analysis than I will be in general.)
We conclude that
\begin{equation*} \limsup_{T \to \infty} |g(0) - g_T(0)| \leq \frac{4B}{R}; \end{equation*}
since \(R\) can be chosen arbitrarily large, this yields the desired result.

Remark 2.9.

You might be thinking at this point that if one knew that \(g\) extends to a holomorphic function on a region a bit larger than \(\Real(s) \geq 0\text{,}\) then maybe one could prove something about the rate of convergence of the integral \(\int_0^\infty f(t)\,dt\text{.}\) In particular, if one could exclude zeroes of \(\zeta\) in some region beyond the line \(\Real(s) = 1\text{,}\) one should correspondingly get a prime number theorem with an improved error term. This turns to be correct except that one must replace the approximation \(\pi(x) \sim x/(\log x)\) with Gauss's more accurate approximation \(\pi(x) \sim \li(x)\) (see Exercise 2.6.7 and Exercise 8.4.1); we will take up this issues in the second part of the course.

Section 2.5 Historical aside: the Erdős-Selberg method

About 40 years after the original proof, Erdős and Selberg gave so-called elementary proofs of the prime number theorem, which do not use any complex analysis. The key step in Selberg's proof is to give an elementary proof of the bound
\begin{equation} |R(x)| \leq \frac{1}{\log x} \int_1^x |R(x/t)|\,dt + O \left( x \frac{\log \log x}{\log x} \right),\tag{2.5.1} \end{equation}
where \(R(x) := \vartheta(x) - x\text{.}\)
Using (2.5.1) and the fact that
\begin{equation} \int_1^x \frac{R(t)}{t^2} dt = O(1)\tag{2.5.2} \end{equation}
(much easier than the convergence of the integral; see Exercise 2.6.8), one then produces \(0 < c < 1\) such that if there exists \(\alpha > 0\) such that \(|R(x)| < \alpha x\) for \(x\) large, then also \(|R(x)| < \alpha cx\) for \(x\) large. I find this step somewhat unenlightening; if you must know the details, see [22], Chapter XXII of [12], or [19].

Exercises 2.6 Exercises

1.

Prove that there exists a positive constant \(\gamma\) such that
\begin{equation*} \sum_{i=1}^n \frac{1}{i} - \log n= \gamma + O(n^{-1}). \end{equation*}
The number \(\gamma\) is called Euler's constant or the Euler–Mascheroni constant, and it is one of the most fundamental constants in analytic number theory. However, since it is defined purely analytically, we remain astonishingly ignorant about it; for instance, \(\gamma\) is most likely irrational (even transcendental) but no proof is known.
Hint.
Compare the sum to a Riemann sum for \(\int_{1}^n \frac{1}{x}\,dx\text{.}\)

2.

Let \(d(n)\) denote the number of divisors of \(n \in \NN\text{.}\) Prove that
\begin{equation*} \sum_{i=1}^n d(i) = n \log n + (2\gamma-1) n + O(n^{1/2}), \end{equation*}
where \(\gamma\) is the Euler–Mascheroni constant (Exercise 2.6.1).
Hint.
Estimate the number of lattice points in the first quadrant under the curve \(xy = n\text{.}\)

3. (Mertens).

Fix \(t \in \RR\) nonzero. Prove that the function
\begin{equation*} Z(s) = \zeta(s)^3 \zeta(s+it)^4 \zeta(s+2it) \end{equation*}
extends to a meromorphic function on \(\Real(s) > 0\text{.}\) Then show that if \(s \in \RR\) and \(s > 1\text{,}\) then \(\log |Z(s)| = \Real(\log Z(s))\) can be written as a series of nonnegative terms, so \(|Z(s)| \geq 1\text{.}\)
Hint.
If you get stuck, peek ahead to the proof of Theorem 9.8 where we generalize this argument.

4.

Use Exercise 2.6.3 to prove that \(\zeta(s)\) has no zeroes on the line \(\Real(s) = 1\text{.}\)

5. (Chebyshev).

Prove that
\begin{equation*} \prod_{n < p \leq 2n} p \leq 2^{2n}, \end{equation*}
then deduce that \(\vartheta(x) = O(x)\text{.}\)
Hint.
Consider the central binomial coefficient \(\binom{2n}{n}\text{.}\)

6. (Chebyshev).

Let \(k\) be a positive integer. Prove that for any \(c>0\text{,}\) if we write \(C_R\) for the straight contour from \(c-iR\) to \(c+iR\text{,}\) then
\begin{equation*} \lim_{R \to \infty} \frac{1}{2 \pi i} \int_{C_R} \frac{x^s\,ds}{s(s+1) \cdots (s+k)} = \begin{cases} \frac{1}{k!} \left(1 - \frac{1}{x} \right)^k & x \geq 1 \\ 0 & 0 \leq x \leq 1. \end{cases} \end{equation*}
Hint.
Use a contour-shifting argument.

7. (Gauss).

Define the logarithmic integral function
\begin{equation*} \li(x) = \int_2^x \frac{dt}{\log t}. \end{equation*}
Prove that \(\li(x) - x/(\log x) = O(x/(\log x)^2)\text{,}\) so in particular \(\li(x) \sim x / (\log x)\) and the prime number theorem is equivalent to \(\pi(x) \sim \li(x)\text{.}\) (Note: since \(\li(x)\) is primarily used for asymptotic estimates, the choice of the lower limit of integration is somewhat arbitrary and not uniform across the literature.)

8.

Using the identity
\begin{equation*} \sum_{n \leq x} \log n = \sum_{i=1}^\infty \sum_{p \leq x^{1/i}} \left\lfloor \frac{x}{p^i} \right\rfloor \log p, \end{equation*}
prove that
\begin{equation*} \sum_{p \leq x} \frac{\log p}{p} = \log x + O(1), \end{equation*}
then deduce (2.5.2) by partial summation (Definition 2.3).