By Joseph O'Rourke

2019-03-10 03:01:48 8 Comments

Take a random walk in the plane from the origin, each step of unit length in a uniformly random direction.

Q. How many steps on average until the path self-intersects?

My simulations suggest ~$8.95$ steps.

          Red: origin. Top: the $8$-th step self-intersects. Bottom: the $11$-th step self-intersects. (Not to same scale.)
I suspect this is known in the SAW literature (SAW=Self-Avoiding Walk), but I am not finding this explicit number.

Related: self-avoidance time of random walk.

Added. Here is a histogram of the number of steps to self-intersection.

          $10000$ random trials.


@Chip Hurst 2019-03-25 14:46:07

Here are some computational insights.

I ran $N = 10^k$ simulations for $3 \leq k \leq 12$. For $N = 10^9$ I recorded some explicit instances and recorded which pairs intersected, as opposed to just the lengths. For all other runs, I simply summed the results.

First and foremost here's a summary of the runs:

$$ \begin{array}{ |l|l|l|l| } \hline N & \text{mean} & \text{std/sqrt(}N\text{)} & \text{total number of segments} \\ \hline 10^3 & 8.732 & 0.16106 & 8732 \\ 10^4 & 8.781 & 0.0528827 & 87810 \\ 10^5 & 8.86218 & 0.0166654 & 886218 \\ 10^6 & 8.89447 & 0.00524551 & 8894468 \\ 10^7 & 8.88463 & 0.00165934 & 88846265 \\ 10^8 & 8.88595 & 0.000524796 & 888595095 \\ 10^9 & 8.88615 & 0.000165965 & 8886153545 \\ 10^{10} & 8.88631 & - & 88863113226 \\ 10^{11} & 8.88614 & - & 888613831649 \\ 10^{12} & 8.88614 & - & 8886139852418 \\ \hline \end{array} $$

We can apply the central limit theorem to this (sub)sequence of means. Even though $\text{std/sqrt(}N\text{)}$ was not calculated for $N = 10^{12}$, based on prior terms ~$0.00000524$ seems like a sufficient estimate.

From here I think we can be fairly confident that the first few digits of the mean are $\bf{8.8861}$. Here are the first few standard deviations ($68-95-99.7$ rule):

$$ m \pm 1\sigma \rightarrow [8.886135, 8.886145] \\ m \pm 2\sigma \rightarrow [8.886129, 8.886150] \\ m \pm 3\sigma \rightarrow [8.886124, 8.886156] $$

In fact it takes $8$ standard deviations to lose the digit $1$:

$$ m \pm 8\sigma \rightarrow [8.886098, 8.886181] $$

Finally, based on an inverse symbolic search, we can find that ${\bf\frac{1795}{202}} = 8.886138613...$ falls within $0.25\sigma$ of $m = 8.886139852418$ and so maybe this has potential to be the closed form.

For $N = 10^9$, here is the histogram of the number of steps to self-intersection, whose shape is quite similar to the one posted by OP:

enter image description here

If we plot look at the log plot, we can see the tail follows a decaying exponential quite closely (in red):

From here we might think we could use a gamma distribution or something similar to estimate the data, but I was only able to find a tight approximation with one distribution: the inverse Gaussian distribution $\operatorname{IG}(8.886, 26)$. Here's its PDF overlaid on the histogram:

I also recorded which segments intersected each other for $N = 10^9$. The data shows segment $n$ most often intersects with segment $n-2$, followed by $n-3$, etc which I think makes sense. Borrowing terminology from MattF's answer, the quick intersections are the most probable.

In the following diagram an $(x, y)$ pair indicates how often segment $x$ intersected with segment $y$. Each diagonal approximately decays exponentially, all with the same rate.

For each of the 4 cores I ran the simulations on, I recorded a single sequence of each length. The most interesting ones of course are the long ones.

Here's an instance of length 96. It's riddled with near misses and is ultimately squashed by a quick intersection.

enter image description here

Here the suspicious looking part is indeed intersection free -- it's about $0.01 \,\text{rad}$:

Lastly I looked at the histogram of angles in my saved simulations:

I think this bias (introduced by stopping once an intersection occurs) says that the best way to stay intersection free is to not curl back up towards yourself.

I have compiled all of my data into a Mathematica notebook which can be downloaded here.

@Joseph O'Rourke 2019-03-25 19:27:29

Great data! I especially like the somewhat surprising angles histogram (but I accept your explanation).

@Chip Hurst 2019-03-25 19:31:52

Thanks. Yes I was surprised by that histogram too. Originally I thought there might have been a bug in my code. I reran my code without the intersection stopping condition and the histogram came out uniform. An example of that is in the linked notebook.

@Chip Hurst 2019-03-25 19:32:42

And by the way, I made a couple edits since posting -- I don't know if you saw them -- a conjectured closed form and an approximate distribution fit.

@Matt F. 2019-03-25 20:16:21

$2+11\sqrt{29/74}$ is even closer to the empirical average.

@student 2019-03-26 14:34:13

The tail behaviour means that the intersection probability tends towards a constant $p$ which you could get from the slope. The behaviour is then modelled reasonably well by $$\Pr(\text{length} = k \mid k > k_0) = \Pr(\text{length} > k_0) \, (1-p)^{k-k_0-1} p$$ once the walk is past $k_0$ steps.

@Matt F. 2019-03-22 19:43:20

We can get an upper bound of $15.52$ by considering just the cases of interesction of next-to-consecutive steps, which we call quick intersections.

Let $\alpha$ be the angle between step $AB$ and step $BC$, and let $u=\alpha/\pi$. So $u$ is (initially) uniformly distributed between $0$ and $1$. Then, as Gerhard Paseman and Bill Bradley determined, the probability that steps $AB$ and $CD$ intersect is $$ P(\text{intersection|u})= \begin{cases} \frac14-\frac u4\ \ \text{ if } u\le \frac13\\ \frac12-u\ \ \text{ if } \frac13\le u \le \frac12\\ \phantom{1-2}0 \ \ \text{ if } \frac12 \le u\\ \end{cases} $$ So the (initial) overall probability of an intersection is $\int P(\text{intersection}|u)du=1/12$.

This means that for the next step, we have $$ P(u|\text{no intersection last time})=\frac{12}{11} \begin{cases} \frac34+\frac u4\ \ \text{ if } u\le \frac13\\ \frac12 + u\, \ \ \text{ if } \frac13\le u \le \frac12\\ \phantom{1-2}1 \ \ \text{ if } \frac12 \le u\\ \end{cases} $$ $$ \begin{align} P(\text{intersection|no intersection last time}) &=\frac{12}{11}\int_{u=0}^1 \begin{cases} (\frac34+\frac u4)(\frac14-\frac u4)\ \ \text{ if } u\le \frac13\\ (\frac12+u)(\frac12-u)\ \ \, \text{ if } \frac13\le u \le \frac12\\ \ \ \phantom{1-2}1(0)\ \ \ \ \ \ \ \ \ \ \ \text{ if } \frac12 \le u \end{cases} du\\ &=\frac{29}{396} \end{align} $$ So the probability of no quick intersection after $n$ steps is $$\frac{11}{12}\left(\frac{367}{396}\right)^{n-3}$$ and the expected number of steps until a quick intersection is $$ 3\left(\frac{1}{12}\right)+\sum_{n=4}^\infty n\left( \frac{11}{12}\left(\frac{367}{396}\right)^{n-4}\! - \frac{11}{12}\left(\frac{367}{396}\right)^{n-3} \right)=\frac{450}{29} \simeq 15.52$$

@Joseph O'Rourke 2019-03-23 00:11:15

Nice analysis, MattF!

@Bill Bradley 2019-03-20 15:37:53

(Thanks for noticing my earlier error, Gerhard!)

Here's a loose but very simple upper bound. Consider three consecutive unit steps, which might look like this:

Three consecutive steps

We take $\alpha\in[0, \pi]$ and $\beta\in[0,2\pi)$. (By symmetry we can reflect $\alpha\in (\pi, 2\pi)$ back to $\pi-\alpha$.) Then to overlap, we need $\alpha < \pi/2$, $\beta<\pi/2$ and $\alpha + \beta < \pi$ The first two events are independent, with probabilities 1/2 and 1/4.

For the third event, consider a case where the three segments just barely cross, e.g.:

3 segments barely cross

If we decrease the angles $\alpha, \beta$, the steps will still intersect, so it's sufficient to consider this case.

Because the bottom and left edges are both length one, the upper and rightmost angles are both $\beta$. Since we have a triangle, $\alpha+2\beta=\pi$, so the probability of this event is 2/3. (There's a symmetric case where the roles of $\alpha$ and $\beta$ are reversed.) It's possibly clearer to see this by examining a phase space diagram ("blue" = "line segments intersect"):

phase space

The blue region takes up 2/3rds of the (uniformly sampled) space. Conditioning on everything, the probability of overlap is 1/2 * 1/4 * 2/3 = 1/12

We can apply this argument to multiple non-overlapping pairs of angles. Therefore, the expected number of steps until self-intersection is upper-bounded by $$ \leq 1 + 2\times 12 = 25$$

@Gerhard Paseman 2019-03-20 15:49:25

Something does not look right. Do you want to find it, or shall I expand on my comments above? Gerhard "Pretty Sure About One Twelfth" Paseman, 2019.03.20.

@Gerhard Paseman 2019-03-21 23:55:43

Looks good now. I like the picture and the estimate. Gerhard "Blue Is A Nice Color" Paseman, 2019.03.21.

@Gerhard Paseman 2019-03-20 17:03:41

Inspired by Bill Bradley's post, I've decided to lower bound the probability of an intersection which (using the method suggested in Bill's post) gives a slightly less weak upper bound for the expected length of a random walk.

Using Bill's picture and rescaling so that angles are measured as nonnegative fractions of 2pi, we see one domain of self intersection is when $\alpha \lt \beta$ and $\alpha + 2\beta \lt 1/2$. This region is a triangle of base 1/4, and height 1/6, with an area of 1/48. Missing the measure 0 set where $\alpha=\beta$, we reflect across that line, and then once more across the line $\alpha+\beta=1$ to get the other three regions having an intersection, giving the probability of self intersection of three-step paths to be 4/48=1/12.

Now let's extend this to paths with four steps. The number of cases to determine the probability exactly is more than I care to tackle, so I will cheat and underestimate it. I do this by considering just two cases: that an intersection occurs in the first three steps, or that it does not but occurs in the last three steps.

If we considered these cases as disjoint, we would get a probability of 1/12 + 1/12 = 1/6. They aren't, so we will consider the case that $\beta \leq 1/4$ and $\alpha \gt 1/4$ or more visibly use 3/4 of the chance that the last three steps self-intersect. This gives a lower bound of (1/12) times (1+3/4), or 7/48, which gives 1/7 as a weak lower bound.

Using this and Bill's estimating method gives 1 + 7*(4-1)=22 as a weak upper bound on the number of steps needed to self-intersect. If we could improve the lower bound from 1/7 to 1/6 for self intersection of a four step path, this would lead to an upper bound of 19 instead of 22.

But if we can cheat once, we can cheat again. The first cheat went from two neighboring angles to three, so we will go from three to five and lower bound the crossing probability of six segments as (7/4) times the (lower bound of the) probability of four segments. We share the middle angle so that we can nicely separate the domains of the middle angle turning sharp left or sharp right. We do a similar cut and trim and end up with a lower bound of 49/192 =1/4+1/192 for the probability of 6 segments crossing. This gives a slightly better upper bound of 1 +4*5 or 21 for the step length of a walk that does not cross itself.

To justify the cheat, one has to do a lot more work to make sure that the probability grows by 7/4 at this step. However, I won't do that. I use the second cheat instead to motivate the idea that 22 is an upper bound that can be improved with a simple argument, and hope to inspire someone to come up with a simple and more honest and precise estimate.

Gerhard "Long As It's Not Graded" Paseman, 2019.03.20.

@Gerhard Paseman 2019-03-20 17:21:51

I would like to point out that many of my signatures encode some double meaning. In this particular case, I want to make it clear that this analysis fails on other manifolds. In particular, Joseph's problem makes sense on a torus or pseudo sphere, even on non-oriented manifolds or graphs with a rigid geometric structure. I encourage people to consider the version for a torus. Gerhard "Promises To Not Grade Work" Paseman, 2019.03.20.

@Gerhard Paseman 2019-03-20 18:20:11

The version of Joseph's question on a compact space is likely in the graph theory literature. Consider a cover of the space by uniform mostly disjoint regions, and make a graph with a region a vertex and a line joining two regions if a step takes one from one region to another. One can now approximate a random walk by a transition matrix, or use the theory of random walks in graphs to compute a desired expectation. Thus the actual expectation can be considered as a limit of expectations using various partitions of the space. Gerhard "Still Interested In Torus Version" Paseman, 2019.03.20.

Related Questions

Sponsored Content

2 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] Perimeters of random-walk polygons

0 Answered Questions

Self-avoiding random walks that always turn

3 Answered Questions

1 Answered Questions

2 Answered Questions

[SOLVED] Randomly walking a leashed dog

1 Answered Questions

Sponsored Content