#### [SOLVED] How to explain the concentration-of-measure phenomenon intuitively?

One way to phrase the "concentration-of-measure" phenomenon is that, for a Euclidean sphere $S^d$ in $d$ dimensions, for large $d$, "most of the mass is close to the equator, for any equator."1

Q. How could one explain/justify this intuitively—perhaps just verbally—to a mathematically literate but naive audience (say, advanced undergraduate math majors)?

That "most of the mass is close to the equator, for any equator" seems almost contradictory (imagining orthogonal equatorial hyperplanes), or at the least, superficially quite puzzling. Can one only gain intuition via working through details of the Brunn–Minkowski theorem or the isoperimetric inequality?

1Boáz Klartag, in a book review in the AMS Bulletin, July 2015, p.540. According to the Wikipedia article, the idea goes back to Paul Lévy.

#### @foliations 2015-06-27 20:09:52

I'm not sure how conceptual this is, but I think the explanation is the same as the (intuitively obvious?) fact that most of the area underneath the graph of the function $t^n$ over the unit interval is near $t=1$. In the end it may just be this computational fact, but it is easy enough to see.

Just to flesh things out, consider $\mathbb{S}^n=\{x_1^2+\cdots+x_{n+1}^2=1\}$.

By the co-area formula (one can of course appeal to more elementary calculus in the computations below as well) we have that (for $0\leq h\leq 1$)

$$Vol_n(\mathbb{S}^n\cap \{-h\leq x_1\leq h\})=\int_{-h}^h \int_{\{x_1=t\}\cap \mathbb{S}^n}\frac{1}{|\nabla_{\mathbb{S}^n} x_1|} d\mu \, dt.$$ Observe that $\nabla_{\mathbb{S}^n} x_1$ actually depends only on $t$ (and in particular is independent of $n$).
Indeed, $$Vol_n(\mathbb{S}^n\cap \{-h\leq x_1\leq h\})=\int_{-h}^h \frac{1}{\sqrt{1-t^2}}Vol_{n-1}(\{x_1=t\}\cap \mathbb{S}^n) dt.$$ Hence, $$Vol_n(\mathbb{S}^n\cap \{-h\leq x_1\leq h\})=Vol_{n-1}(\mathbb{S}^{n-1})\int_{-h}^h(1-t^2)^{(n-2)/2} dt.$$ Since $0\leq (1-t^2)<1$ when $t\neq 0$, for large $n$ the area under the graph is concentrated near $t=0$, i.e. the equator.

#### @Henry Cohn 2015-06-27 20:55:00

As I see it, the key intuition is passing from the equator orthogonal to a single vector to looking at a whole orthonormal basis.

Suppose we pick a random unit vector $(x_1,\dots,x_n)$. What we want to know is why $x_1$ is probably near zero, since this is equivalent to being near the equator relative to the first basis vector. But this feels intuitively obvious to me: all the coordinates have the same distribution, and they surely can't all be large, so they had better all be small.

To be a little more precise, we have $x_1^2+\dots+x_n^2=1$, and each coordinate has the same distribution, so the expected value of $x_1^2$ is $1/n$. Now we can just apply Markov's inequality. For example, the probability that $|x_1|$ is at least $1/n^{1/4}$ must be at most $1/n^{1/2}$, since otherwise the expected value of $x_1^2$ would be too large.

(This is not so different from Bjørn and Dustin's answers, but expressed in a less sophisticated way.)

#### @Joseph O'Rourke 2015-06-27 23:11:30

Interesting. So maybe one insight that dispels some of the paradoxical nature is that "being near the equator" means that $x_i$ is small---as opposed to a mistaken intuition that other components are large.

#### @Henry Cohn 2015-06-27 23:25:09

That's a good way of putting it. Making one component small forces the others to be large in aggregate, but when there are many of them this is still compatible with each one being small individually.

#### @R Hahn 2015-06-28 07:00:47

+1 "all the coordinates have the same distribution, and they surely can't all be large, so they had better all be small"

#### @Bjørn Kjos-Hanssen 2015-06-27 20:32:28

If $$x_1^2+\dots+x_{n+1}^2=1$$ then $$x_1^2+\dots+x_n^2 = 1-x_{n+1}^2 \in [0,1].$$ Now $S = \sum_{i=1}^n x_i^2$ for $-1\le x_i\le 1$ has expectation $c\cdot n$ for a certain $c>0$, and will be approximately normally distributed. For large $n$, $c\cdot n>1$, so the most likely value of $S$ in $[0,1]$ is 1, corresponding to $x_{n+1}=0$. This much is trivial.

But moreover, crucially, the normal distribution has rapidly decaying tails (looking like $e^{-x^2/2}$), hence most of the points in $\mathbb S^n$ will have $x_{n+1}\approx 0$.

#### @Dustin G. Mixon 2015-06-27 20:28:10

Take iid Gaussian random variables $X_1,\ldots,X_d$ with mean $0$ and variance $1/d$. Normalizing the vector $X=(X_1,\ldots,X_d)$ will produce a random point on the unit sphere, but it's already close to having unit norm, so we will avoid this for the sake of intuition. For each unit vector $v$, there is an equator given by

$$\{x\in S^{d-1}:\langle x,v\rangle=0\}$$

Observe that $\langle X,v\rangle$ is Gaussian with mean $0$ and variance $1/d$. Thus, $X$ is typically close to $v$'s equator when $d$ is large. This is because it only has a unit of energy to spread across $d$ dimensions, so the amount of energy in each dimension must vanish.

#### @Joseph O'Rourke 2015-06-28 13:36:37

"it only has a unit of energy to spread across $d$ dimensions": Nice phrasing!

#### @Andreas Blass 2015-06-27 20:13:56

Visualize first, for comparison, the 2-dimensional unit sphere in 3-dimensional Euclidean space (something that I can visualize!), and imagine it cut, by circles of latitude (perpendicular to the $z$-axis), into narrow zones. Of course, the zones closer to the poles have smaller radii, and therefore smaller circumferences, than the zones near the equator. It's well-known that this decrease of circumference, as you approach the poles, exactly compensates for the increasing "tilt" of the zones, so that a zone's area is proportional to its height as measured in the $z$-direction.

Now let's "look" at $d$-dimensional unit sphere in $(d+1)$-dimensional Euclidean space. Cut it into zones similarly, at the same $z$-coordinates as before. What has changed? The radii of the zones and their tilt are the same as in the 2-dimensional case, but the "circumferences" have become $(d-1)$-dimensional volumes. Now the $(d-1)$-dimensional volume of a sphere depends on the radius $r$ much more violently than $1$-dimensional circumferences. $r^{d-1}$ is almost exactly zero while $r$ is substantially $<1$; it becomes respectable only when $r$ is almost up to $1$. So the zones near the poles are way smaller, compared to equatorial zones, in the high-dimensional case than in the $2$-dimensional case.

#### @Andreas Blass 2015-06-27 20:18:10

It's debatable whether my answer here is really different from the one posted by @foliations while I was writing mine, or whether it's more intuitive than a standard proof. But it uses fewer mathematical symbols and so has a chance of being verbally explainable.

#### @Joseph O'Rourke 2015-06-27 22:59:45

Very nice description in terms of zones! All that is needed to understand this is the behavior of $r^{d-1}$. Which, as @foliations says, is the key ($t^n$ in f's version).

#### @Joseph O'Rourke 2015-06-28 00:58:43

This view nicely explains why the phenomenon is definitely not present in $\mathbb{R}^3$.