2015-09-12 18:19:43 8 Comments

A long time ago I found this question and asked about an algorithm here.

My code is also available on GitHub.

Given a self-balancing tree (AVL), code a method that returns the median.

(Median: the numerical value separating the higher half of a data sample from the lower half. Example: if the series is

\$2, 7, 4, 9, 1, 5, 8, 3, 6\$

then the median is \$5\$.)

Now I have a solution I'd like to submit. The solution guarantees \$O(\log n)\$ time (since AVL trees are balanced), \$O( \log n)\$ extra space for the stack, and `O(n)`

extra space to the number of children of every node. See the method called `traverseTreeToFind`

The median of a sorted sequence is the element whose index is \$\frac{N}{2}\$ (the number of elements is odd) or \$\frac{a[\frac{N}{2}-1] + a[\frac{N}{2}]}{2.0}\$ (the number of elements is even). With an AVL tree we need to perform an in-order tree walk to find the median.

Let the left subtree has \$L\$ nodes, and the right subtree has \$R\$ nodes. The number of nodes in the is \$N = L + R + 1\$. There are a few possible cases:

- \$L == R\$. There is no reason to traverse the tree. The median is the key of the root element.
- \$L == N / 2\$ and \$N\$ is even. The median is the average of the root and its predecessor.
- \$L == N / 2 - 1\$ and \$N\$ is even. The median is the average of the root and its successor.
- The median is something that we need to find traversing the left of right subtree.

I traverse the left subtree staring with its last nodes and go in the opposite direction of what the regular in-order tree walk would be. Why? Because it's likely the median will be found at the end of the left subtree. Please let me know if I am wrong. Here is how I proved that I was right:

\$\frac{N}{2} > \frac{L}{2}\$ means that the median index \$\frac{N}{2}\$ is at the right half of the left subtree (\$> \frac{L}{2}\$)

```
package com.avl;
import java.util.Deque;
import java.util.LinkedList;
public class AvlTree {
private Node root;
public AvlTree(int... keys) {
if (keys == null || keys.length == 0) {
throw new IllegalArgumentException("Null or empty array");
}
insert(keys);
}
private Node insert(Node parent, int key) {
if (parent == null) {
return new Node(key);
}
if (key < parent.key) {
parent.left = insert(parent.left, key);
} else {
parent.right = insert(parent.right, key);
}
return balance(parent);
}
private Node balance(Node p) {
fixHeightAndChildCount(p);
if (bfactor(p) == 2) {
if (bfactor(p.right) < 0) {
p.right = rotateRight(p.right);
}
return rotateLeft(p);
}
if (bfactor(p) == -2) {
if (bfactor(p.left) > 0) {
p.left = rotateLeft(p.left);
}
return rotateRight(p);
}
return p;
}
private Node rotateRight(Node p) {
Node q = p.left;
p.left = q.right;
q.right = p;
fixHeightAndChildCount(p);
fixHeightAndChildCount(q);
return q;
}
private Node rotateLeft(Node q) {
Node p = q.right;
q.right = p.left;
p.left = q;
fixHeightAndChildCount(q);
fixHeightAndChildCount(p);
return p;
}
private int height(Node p) {
return p == null ? 0 : p.height;
}
private int bfactor(Node p) {
return height(p.right) - height(p.left);
}
public double getMedian() {
final int leftChildCount = root.left == null ? 0 : root.left.childCount + 1;
final int rightChildCount = root.right == null ? 0 : root.right.childCount + 1;
// Let's handle the simplest case
if (leftChildCount == rightChildCount) {
return root.key;
}
final int nodeCount = leftChildCount + rightChildCount + 1;
final boolean evenNodes = nodeCount % 2 == 0;
if (evenNodes) {
if (leftChildCount == nodeCount / 2) {
// the root predecessor and the root
return (root.key + getPredecessor(root)) / 2.0;
}
if (rightChildCount == nodeCount / 2) {
// the root and its successor
return (root.key + getSuccessor(root)) / 2.0;
}
}
final boolean traverseLeft = leftChildCount > rightChildCount;
return traverseTreeToFind(leftChildCount, traverseLeft, nodeCount, evenNodes);
}
private int getPredecessor(Node node) {
Node parent = node.left;
Node current = parent;
while (current != null) {
parent = current;
current = current.right;
}
return parent.key;
}
private int getSuccessor(Node node) {
Node parent = node.right;
Node current = parent;
while (current != null) {
parent = current;
current = current.left;
}
return parent.key;
}
private double traverseTreeToFind(int leftChildCount, boolean traverseLeft, int nodeCount,
boolean evenNodes) {
Node current = traverseLeft ? root.left : root.right;
int i = traverseLeft ? leftChildCount - 1 : leftChildCount + 1;
int medianFirstIndex;
int medianSecondIndex;
if (!evenNodes) {
medianFirstIndex = medianSecondIndex = nodeCount / 2;
} else {
if (traverseLeft) {
medianFirstIndex = nodeCount / 2;
medianSecondIndex = medianFirstIndex - 1;
} else {
medianFirstIndex = nodeCount / 2 - 1;
medianSecondIndex = medianFirstIndex + 1;
}
}
/*
* I chose LinkedList rather than ArrayDeque because LinkedList offers
* constant time for delete() and insert(). pop() calls removeFirst(),
* and push(e) calls addFirst(e).
*
* However, if I understand the answer on
* http://stackoverflow.com/a/249695/1065835 correctly, the difference
* between constant time and amortized constant time is little if we
* perform the operation many times.
*/
Deque<Node> stack = new LinkedList<>();
double smallest = 0.0;
while (true) {
if (current != null) {
stack.push(current);
current = traverseLeft ? current.right : current.left;
} else {
Node last = stack.pop();
if (i == medianFirstIndex) {
smallest = last.key;
if (!evenNodes) {
break;
}
} else if (i == medianSecondIndex) {
smallest += last.key;
smallest /= 2.0;
break;
}
if (traverseLeft) {
i--;
current = last.left;
} else {
i++;
current = last.right;
}
}
}
return smallest;
}
private void fixHeightAndChildCount(Node p) {
int hl = height(p.left);
int hr = height(p.right);
p.height = (hl > hr ? hl : hr) + 1;
p.childCount = 0;
if (p.left != null) {
p.childCount = p.left.childCount + 1;
}
if (p.right != null) {
p.childCount += p.right.childCount + 1;
}
}
public void insert(int... keys) {
for (int key : keys) {
root = insert(root, key);
}
}
private static class Node {
private Node left;
private Node right;
private final int key;
private int height;
private int childCount;
private Node(int value) {
key = value;
height = 1;
}
@Override
public String toString() {
return Integer.toString(key);
}
}
}
```

Tests. I covered all the execution paths of `getMedian()`

:

```
package com.avl;
import org.junit.Assert;
import org.junit.Test;
public class AvlTreeTest {
private static final double DELTA = 1e-15;
@Test(expected = IllegalArgumentException.class)
public void testNull() {
AvlTree avlTree = new AvlTree(null);
}
@Test(expected = IllegalArgumentException.class)
public void testEmpty() {
AvlTree avlTree = new AvlTree();
}
@Test
public void testSingle() {
AvlTree avlTree = new AvlTree(1);
Assert.assertEquals(1.0, avlTree.getMedian(), DELTA);
}
@Test
public void testTwo_OnlyLeft() {
AvlTree avlTree = new AvlTree(3, 2);
Assert.assertEquals(2.5, avlTree.getMedian(), DELTA);
}
@Test
public void testTwo_OnlyRight() {
AvlTree avlTree = new AvlTree(5, 7);
Assert.assertEquals(6.0, avlTree.getMedian(), DELTA);
}
@Test
public void testRootAndSuccessor() {
AvlTree avlTree = new AvlTree(5, 2, 7, 1, 6, 3, 8, 9);
Assert.assertEquals(5.5, avlTree.getMedian(), DELTA);
}
@Test
public void testPredecessorAndRoot() {
AvlTree avlTree = new AvlTree(5, 2, 7, 1, 6, 3, 8, 4);
Assert.assertEquals(4.5, avlTree.getMedian(), DELTA);
}
@Test
public void testGoRightEven() {
AvlTree avlTree = new AvlTree(5, 2, 9, 1, 3, 7, 12, 8, 11, 13);
Assert.assertEquals(7.5, avlTree.getMedian(), DELTA);
}
@Test
public void testGoLeftEven() {
AvlTree avlTree = new AvlTree(6, 3, 9, 1, 4, 7, 12, 0, 2, 5);
Assert.assertEquals(4.5, avlTree.getMedian(), DELTA);
}
@Test
public void testGoRightOdd() {
AvlTree avlTree = new AvlTree(10, 4, 14, 2, 5, 12, 17, 1, 3, 7, 11, 13, 16, 19, 15, 18, 20);
Assert.assertEquals(12.0, avlTree.getMedian(), DELTA);
}
@Test
public void testGoLeftOdd() {
}
}
```

### Related Questions

#### Sponsored Content

#### 2 Answered Questions

#### 1 Answered Questions

### [SOLVED] AVL height-balanced binary search Tree and Dictionary in C# v2

**2019-02-12 23:48:33****George Barwood****211**View**3**Score**1**Answer- Tags: c# tree generics binary-search

#### 2 Answered Questions

#### 2 Answered Questions

#### 1 Answered Questions

### [SOLVED] 2D Bin Packing Algorithm Implementation

**2017-08-21 02:26:41****Solomon Bothwell****3729**View**11**Score**1**Answer- Tags: python algorithm python-3.x tree computational-geometry

#### 2 Answered Questions

#### 2 Answered Questions

### [SOLVED] Java method to check if a binary tree is a valid BST

**2016-02-25 06:44:46****Haider****417**View**2**Score**2**Answer- Tags: java algorithm object-oriented

#### 1 Answered Questions

### [SOLVED] Binary Search Tree in C

**2015-03-12 04:24:58****envy_intelligence****855**View**4**Score**1**Answer- Tags: c tree binary-search

## 3 comments

## @David McManamon 2017-11-22 22:29:52

The Facebook question is really asking if you have heard of order statistic trees as many developers have not. If you have heard of order statistic trees then you know to augment the normal AVL tree node data with size of the subtree at that node and from there the way to proceed is well known.

## @Peter Taylor 2015-09-17 16:39:14

I think you're missing an opportunity to make it

`AvlTree<T>`

and use a`Comparator<T>`

, as per`SortedTree<T>`

. (In fact, it would be worth considering implementing`SortedTree<T>`

).What's wrong with an empty array? Typically a collection has a no-args constructor which initialises it to empty.

There's a lot of code which has special cases for nulls. I wonder whether you could get a simpler implementation by using sentinels. It would be good at least to see a comment mentioning that not using them is an intentional implementation decision rather than an oversight.

These can never be called with null

`p`

; why are they not methods of`Node`

?This should definitely be

`static`

, and I would be tempted to make it a method of`Node`

as well.You pulled out a method to get a node's height: why not also pull out one to get its size? Again, I would make this a

`static`

method of`Node`

. The smaller refactor would beBut I would prefer to replace

`Node.childCount`

with`Node.size`

, to avoid the continual increments.The implementation of

`median`

seems overly complicated, as observed in the earlier answer, although if you have a working`getSuccessor`

then for odd sizes you could make one call to a "get me the nth node" to get the first of the two values that need averaging, and call`getSuccessor`

to get the second.I find

`parent`

a confusing name to use here. I also note that this gives a`NullPointerException`

if`node.left == null`

: if you want to require that callers check this before calling, then you should document it.`getSuccessor`

similarly.This is quite confusing use of names, especially if you later want to refactor from a balanced collection to a balanced map.

## @coderodde 2015-09-17 17:06:11

If

`AvlTree`

is made generic and contains an even number of elements, how would you compute the median?## @Peter Taylor 2015-09-17 19:06:05

@coderodde, my own generic median implementation (which uses the classic quicksort-variant approach) returns a pair. Then it's up to the calling code to add and average or to do something else more appropriate for the type.

## @Maksim Dmitriev 2015-10-12 18:55:38

I didn't understand your idea with generics. A median of a tree is a

number. So what can be a median of a generic tree? However, I see a benefit of a generic tree. One can easily create trees of`float`

s,`double`

s, and`long`

s## @Maksim Dmitriev 2015-10-12 18:59:24

And neither did I get your idea about sentinels. I copied @coderodde's code. There are no

`getPredecessor`

or`getSuccessor`

.`null`

can be passed to`height()`

For example, insert 3, 2, 1, and then the will be rotated right around 3 whose`right`

is null## @Peter Taylor 2015-10-13 07:24:07

@MaksimDmitriev 1. The median of an odd-sized collection of strings is a string. The median of an even-sized collection requires some concept of averaging two elements (or a slight generalisation to "upper median" and "lower median"), but as mentioned in an earlier comment I handle this by returning a pair. 2. I've added some horizontal rules to group comments with code, because your observation about

`null`

being passed to`height()`

shows me that the grouping wasn't clear enough before.## @coderodde 2015-09-16 06:41:55

To me it seems that you are overkilling it. Compare to the following:

Summa summarum, you can optimize a bit if you cache in each node only the size of the

leftsubtree. Also, when appending (`addLast`

), say 2000000 elements, to`ArrayDeque`

vs.`LinkedList`

, the first one takes around 50 times less time than the`LinkedList`

. Same applies to`addFirst`

. (Try it.)## @Maksim Dmitriev 2015-09-16 19:53:51

I was surprised to learn that

amortized constant timeof`add(E)`

in`ArrayDeque`

took less time than`add(E)`

of`LinkedList`

which is constant. I want to know why. However, the answers on stackoverflow.com/q/6163166/1065835 are not good enough. I offered a bounty there. So I'd appreciate your answer## @coderodde 2015-09-17 05:12:32

@MaksimDmitriev Well, I believe that the main performance bottleneck in

`LinkedList`

is the fact that whenever you push to any end of the deque, behind the scene the implementation allocates a new linked list node, which essentially involves JVM/OS, and that's expensive. Also, whenever you pop from any end, the linked list nodes become eligible for garbage collection and that's more work behind the scenes.## @Maksim Dmitriev 2015-09-17 07:41:13

Can you please write your answer about the performance of LinkedList on stackoverflow.com/q/6163166/1065835? I offered 50 points there.

## @coderodde 2015-09-17 07:46:26

@MaksimDmitriev Done!

## @Ram Bavireddi 2019-05-07 03:19:49

@coderodde, Can you please explain, how to handle duplicates in your solution?

## @coderodde 2019-05-07 05:01:32

@RamBavireddi Good question. Looking from code, it allows duplicates.

## @Ram Bavireddi 2019-05-08 15:11:10

@coderrode, I see that you've inserted duplicate keys into right subtree. In general AVL tree implementation, how can you delete one instance of duplicate key?