2019-01-12 20:39:58 8 Comments

Equivalently, about variance?

I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have

- a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?
- any other intuitive interpretation that differentiates it from other possible measures of spread?

What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?

### Related Questions

#### Sponsored Content

#### 1 Answered Questions

### [SOLVED] Joint distribution of a set of normal variables.

**2019-05-03 22:58:50****Will Dorrell****28**View**0**Score**1**Answer- Tags: statistics probability-distributions normal-distribution

#### 1 Answered Questions

### [SOLVED] Meaning of measures of precision of estimates

**2018-03-11 16:15:14****user249018****21**View**0**Score**1**Answer- Tags: statistics

#### 1 Answered Questions

### [SOLVED] Confused about Standard Deviation

**2017-10-30 23:16:36****Leila Alcasid****334**View**0**Score**1**Answer- Tags: probability statistics standard-deviation

#### 2 Answered Questions

### [SOLVED] Question about standard error of sample standard deviation

**2017-06-16 16:09:22****cmd1991****161**View**0**Score**2**Answer- Tags: statistics standard-error

#### 1 Answered Questions

### Standard deviation

**2016-10-04 15:14:41****Rob****322**View**-1**Score**1**Answer- Tags: statistics

#### 0 Answered Questions

### about standard deviation and statistics of mean

**2017-01-12 20:28:23****user1285419****48**View**0**Score**0**Answer- Tags: statistics normal-distribution

#### 1 Answered Questions

### [SOLVED] Sample Standard Deviation vs. Population Standard Deviation

**2010-12-21 19:50:19****Rafid****119131**View**45**Score**1**Answer- Tags: statistics standard-deviation

#### 1 Answered Questions

### [SOLVED] sample standard deviation given population standard deviation

**2016-05-23 09:54:24****Robert B****3302**View**0**Score**1**Answer- Tags: statistics standard-deviation

#### 3 Answered Questions

### [SOLVED] Why does sample standard deviation underestimate population standard deviation?

**2012-02-08 06:23:40****JohnC****7963**View**4**Score**3**Answer- Tags: statistics

#### 2 Answered Questions

### [SOLVED] What is an intuitive meaning of $E(\overline { X } )$ and $Var(\overline { X } )$?

**2012-02-19 09:39:46****xenon****1204**View**3**Score**2**Answer- Tags: probability statistics

## 9 comments

## @J.G. 2019-01-12 20:48:11

There's a very nice geometric interpretation.

Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "is a linear transformation of", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean, variance-one variables; it gets you the same outcome in this context.)

Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,\,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.

## @blue_note 2019-01-12 21:03:09

Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...

## @J.G. 2019-01-12 21:06:30

@blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.

## @user1717828 2019-01-13 04:43:35

Can someone provide a concrete example or other similar dumbing down of this answer?

## @WorldSEnder 2019-01-13 08:59:57

A paragraph on wikipedia about it @blue_note

## @James Martin 2019-01-13 17:38:55

For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?

## @J.G. 2019-01-13 17:41:37

@JamesMartin That would be necessary if we want to avoid infinite-norm vectors, yes. (Of course, mentioning that restriction upfront can give the impression we care too much about variance from the start.)

## @RcnSc 2019-01-14 13:31:33

I need a 3Blue1Brown video on this.

## @J.G. 2019-01-14 13:50:43

@RcnSc It doesn't look like he's made one yet, more's the pity. You should suggest it.

## @aghostinthefigures 2019-01-14 16:29:29

For anyone looking for a technical reference in the style of this explanation, Larry Evans’ “Introduction to Stochastic Differential Equations” discusses the Hilbert space approach to random variables in some detail.

## @Vim 2019-01-15 10:33:25

Covariance isn't an inner product per se. For example, the covariance between a nonzero constant and itself is 0 but it's norm surely isn't. In fact the inner product should be $E(XY)$, just like in usual Hilbert spaces.

## @J.G. 2019-01-15 12:06:06

@Vim That's the point of my discussion of a quotient space.

## @Vim 2019-01-15 12:44:11

It's no problem then. I didn't notice that clarification.

## @Robert Wolfe 2019-02-01 19:27:05

I'm just testing where the bottom of this well is... Inner products aren't necessarily unique on a vector space, even symmetric ones. Is there something about covariance that makes it rise above the other possible inner products on this quotient space?

## @J.G. 2019-02-01 20:08:22

@RobertWolfe Yes: its orthogonal vectors are uncorrelated variables, including independent ones. It makes sense to regard independent components of a multivariate vector as the coefficients of basis elements, since motion along the direction of any such element changes exactly one component. Indeed, orthogonality has earned a metaphorical meaning or two that way. Further, when variables

arecorrelated, the covariance gives a natural regression interpretation viz. earlier comments.## @Yves Daoust 2019-01-17 18:22:32

Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.

This does not occur with other estimators of the spread.

## @user1483 2019-01-14 21:18:32

If you draw a random sample from a normal distribution with mean $\mu$ and variance $\sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.

For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.

These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.

## @Markus Scheuer 2019-01-14 17:49:57

The following is from

An Introduction to Probability Theory and Its Applications, Vol. 1by W. Feller.## @Daniel R. Collins 2019-01-14 16:26:00

Consider Casella/Berger,

Statistical Inference, Section 10.3.2:My interpretation of this is that using standard deviation leads one in the direction of an estimator for the

mean; whereas using average absolute deviation leads one in the direction of an estimator for themedian.## @Eric Towers 2019-01-14 05:36:08

The normal distribution has maximum entropy among real distributions supported on $(-\infty, \infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $\mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.

I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).

## @John Coleman 2019-01-13 12:53:54

I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $\mu$ and standard deviation $\sigma$, then for large $n$

$$\frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$$

is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.

## @Winther 2019-01-14 10:45:01

Related question to this: The role of variance in Central Limit Theorem

## @Misha Lavrov 2019-01-14 18:28:25

No other measure of dispersion can so relate $X$ with the

standardnormal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$\frac{\overline{X}-\mu}{IQR/\sqrt n}$$ is approximately standard normal.## @John Coleman 2019-01-14 19:13:10

@MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but

ifyou regard $\sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.## @Anton Golov 2019-01-13 10:17:18

An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.

(This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)

## @blue_note 2019-01-13 11:34:18

Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis

## @Qwerty 2019-01-13 02:22:39

When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)

If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.

## @mephistolotl 2019-01-13 02:36:20

If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?