By Maksim Dmitriev


2015-09-12 18:19:43 8 Comments

A long time ago I found this question and asked about an algorithm here.

My code is also available on GitHub.

Given a self-balancing tree (AVL), code a method that returns the median.

(Median: the numerical value separating the higher half of a data sample from the lower half. Example: if the series is

\$2, 7, 4, 9, 1, 5, 8, 3, 6\$

then the median is \$5\$.)

Now I have a solution I'd like to submit. The solution guarantees \$O(\log n)\$ time (since AVL trees are balanced), \$O( \log n)\$ extra space for the stack, and O(n) extra space to the number of children of every node. See the method called traverseTreeToFind

The median of a sorted sequence is the element whose index is \$\frac{N}{2}\$ (the number of elements is odd) or \$\frac{a[\frac{N}{2}-1] + a[\frac{N}{2}]}{2.0}\$ (the number of elements is even). With an AVL tree we need to perform an in-order tree walk to find the median.

Let the left subtree has \$L\$ nodes, and the right subtree has \$R\$ nodes. The number of nodes in the is \$N = L + R + 1\$. There are a few possible cases:

  1. \$L == R\$. There is no reason to traverse the tree. The median is the key of the root element.
  2. \$L == N / 2\$ and \$N\$ is even. The median is the average of the root and its predecessor.
  3. \$L == N / 2 - 1\$ and \$N\$ is even. The median is the average of the root and its successor.
  4. The median is something that we need to find traversing the left of right subtree.

I traverse the left subtree staring with its last nodes and go in the opposite direction of what the regular in-order tree walk would be. Why? Because it's likely the median will be found at the end of the left subtree. Please let me know if I am wrong. Here is how I proved that I was right:

enter image description here

\$\frac{N}{2} > \frac{L}{2}\$ means that the median index \$\frac{N}{2}\$ is at the right half of the left subtree (\$> \frac{L}{2}\$)

package com.avl;

import java.util.Deque;
import java.util.LinkedList;

public class AvlTree {

    private Node root;

    public AvlTree(int... keys) {
        if (keys == null || keys.length == 0) {
            throw new IllegalArgumentException("Null or empty array");
        }
        insert(keys);
    }

    private Node insert(Node parent, int key) {
        if (parent == null) {
            return new Node(key);
        }
        if (key < parent.key) {
            parent.left = insert(parent.left, key);
        } else {
            parent.right = insert(parent.right, key);
        }
        return balance(parent);
    }

    private Node balance(Node p) {
        fixHeightAndChildCount(p);
        if (bfactor(p) == 2) {
            if (bfactor(p.right) < 0) {
                p.right = rotateRight(p.right);
            }
            return rotateLeft(p);
        }
        if (bfactor(p) == -2) {
            if (bfactor(p.left) > 0) {
                p.left = rotateLeft(p.left);
            }
            return rotateRight(p);
        }
        return p;
    }

    private Node rotateRight(Node p) {
        Node q = p.left;
        p.left = q.right;
        q.right = p;
        fixHeightAndChildCount(p);
        fixHeightAndChildCount(q);
        return q;
    }

    private Node rotateLeft(Node q) {
        Node p = q.right;
        q.right = p.left;
        p.left = q;
        fixHeightAndChildCount(q);
        fixHeightAndChildCount(p);
        return p;
    }

    private int height(Node p) {
        return p == null ? 0 : p.height;
    }

    private int bfactor(Node p) {
        return height(p.right) - height(p.left);
    }

    public double getMedian() {
        final int leftChildCount = root.left == null ? 0 : root.left.childCount + 1;
        final int rightChildCount = root.right == null ? 0 : root.right.childCount + 1;
        // Let's handle the simplest case
        if (leftChildCount == rightChildCount) {
            return root.key;
        }
        final int nodeCount = leftChildCount + rightChildCount + 1;
        final boolean evenNodes = nodeCount % 2 == 0;
        if (evenNodes) {
            if (leftChildCount == nodeCount / 2) {
                // the root predecessor and the root
                return (root.key + getPredecessor(root)) / 2.0;
            }
            if (rightChildCount == nodeCount / 2) {
                // the root and its successor
                return (root.key + getSuccessor(root)) / 2.0;
            }
        }
        final boolean traverseLeft = leftChildCount > rightChildCount;
        return traverseTreeToFind(leftChildCount, traverseLeft, nodeCount, evenNodes);
    }

    private int getPredecessor(Node node) {
        Node parent = node.left;
        Node current = parent;
        while (current != null) {
            parent = current;
            current = current.right;
        }
        return parent.key;
    }

    private int getSuccessor(Node node) {
        Node parent = node.right;
        Node current = parent;
        while (current != null) {
            parent = current;
            current = current.left;
        }
        return parent.key;
    }

    private double traverseTreeToFind(int leftChildCount, boolean traverseLeft, int nodeCount,
            boolean evenNodes) {

        Node current = traverseLeft ? root.left : root.right;
        int i = traverseLeft ? leftChildCount - 1 : leftChildCount + 1;
        int medianFirstIndex;
        int medianSecondIndex;
        if (!evenNodes) {
            medianFirstIndex = medianSecondIndex = nodeCount / 2;
        } else {
            if (traverseLeft) {
                medianFirstIndex = nodeCount / 2;
                medianSecondIndex = medianFirstIndex - 1;
            } else {
                medianFirstIndex = nodeCount / 2 - 1;
                medianSecondIndex = medianFirstIndex + 1;
            }
        }
        /*
         * I chose LinkedList rather than ArrayDeque because LinkedList offers
         * constant time for delete() and insert(). pop() calls removeFirst(),
         * and push(e) calls addFirst(e).
         * 
         * However, if I understand the answer on
         * http://stackoverflow.com/a/249695/1065835 correctly, the difference
         * between constant time and amortized constant time is little if we
         * perform the operation many times.
         */
        Deque<Node> stack = new LinkedList<>();
        double smallest = 0.0;
        while (true) {
            if (current != null) {
                stack.push(current);
                current = traverseLeft ? current.right : current.left;
            } else {
                Node last = stack.pop();
                if (i == medianFirstIndex) {
                    smallest = last.key;
                    if (!evenNodes) {
                        break;
                    }
                } else if (i == medianSecondIndex) {
                    smallest += last.key;
                    smallest /= 2.0;
                    break;
                }

                if (traverseLeft) {
                    i--;
                    current = last.left;
                } else {
                    i++;
                    current = last.right;
                }
            }
        }
        return smallest;
    }

    private void fixHeightAndChildCount(Node p) {
        int hl = height(p.left);
        int hr = height(p.right);
        p.height = (hl > hr ? hl : hr) + 1;
        p.childCount = 0;
        if (p.left != null) {
            p.childCount = p.left.childCount + 1;
        }
        if (p.right != null) {
            p.childCount += p.right.childCount + 1;
        }
    }

    public void insert(int... keys) {
        for (int key : keys) {
            root = insert(root, key);
        }
    }

    private static class Node {

        private Node left;
        private Node right;
        private final int key;
        private int height;
        private int childCount;

        private Node(int value) {
            key = value;
            height = 1;
        }

        @Override
        public String toString() {
            return Integer.toString(key);
        }
    }
}

Tests. I covered all the execution paths of getMedian():

package com.avl;

import org.junit.Assert;
import org.junit.Test;

public class AvlTreeTest {

    private static final double DELTA = 1e-15;

    @Test(expected = IllegalArgumentException.class)
    public void testNull() {
        AvlTree avlTree = new AvlTree(null);
    }

    @Test(expected = IllegalArgumentException.class)
    public void testEmpty() {
        AvlTree avlTree = new AvlTree();
    }


    @Test
    public void testSingle() {
        AvlTree avlTree = new AvlTree(1);
        Assert.assertEquals(1.0, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testTwo_OnlyLeft() {
        AvlTree avlTree = new AvlTree(3, 2);
        Assert.assertEquals(2.5, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testTwo_OnlyRight() {
        AvlTree avlTree = new AvlTree(5, 7);
        Assert.assertEquals(6.0, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testRootAndSuccessor() {
        AvlTree avlTree = new AvlTree(5, 2, 7, 1, 6, 3, 8, 9);
        Assert.assertEquals(5.5, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testPredecessorAndRoot() {
        AvlTree avlTree = new AvlTree(5, 2, 7, 1, 6, 3, 8, 4);
        Assert.assertEquals(4.5, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testGoRightEven() {
        AvlTree avlTree = new AvlTree(5, 2, 9, 1, 3, 7, 12, 8, 11, 13);
        Assert.assertEquals(7.5, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testGoLeftEven() {
        AvlTree avlTree = new AvlTree(6, 3, 9, 1, 4, 7, 12, 0, 2, 5);
        Assert.assertEquals(4.5, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testGoRightOdd() {
        AvlTree avlTree = new AvlTree(10, 4, 14, 2, 5, 12, 17, 1, 3, 7, 11, 13, 16, 19, 15, 18, 20);
        Assert.assertEquals(12.0, avlTree.getMedian(), DELTA);
    }

    @Test
    public void testGoLeftOdd() {
    }

}

3 comments

@David McManamon 2017-11-22 22:29:52

The Facebook question is really asking if you have heard of order statistic trees as many developers have not. If you have heard of order statistic trees then you know to augment the normal AVL tree node data with size of the subtree at that node and from there the way to proceed is well known.

@Peter Taylor 2015-09-17 16:39:14

public class AvlTree {

I think you're missing an opportunity to make it AvlTree<T> and use a Comparator<T>, as per SortedTree<T>. (In fact, it would be worth considering implementing SortedTree<T>).


    public AvlTree(int... keys) {
        if (keys == null || keys.length == 0) {
            throw new IllegalArgumentException("Null or empty array");
        }

What's wrong with an empty array? Typically a collection has a no-args constructor which initialises it to empty.


There's a lot of code which has special cases for nulls. I wonder whether you could get a simpler implementation by using sentinels. It would be good at least to see a comment mentioning that not using them is an intentional implementation decision rather than an oversight.

    private Node balance(Node p) {
    private Node rotateRight(Node p) {
    private Node rotateLeft(Node q) {
    private void fixHeightAndChildCount(Node p) {
    private int getPredecessor(Node node) {
    private int getSuccessor(Node node) {

These can never be called with null p; why are they not methods of Node?


    private int height(Node p) {
        return p == null ? 0 : p.height;
    }

This should definitely be static, and I would be tempted to make it a method of Node as well.


        final int leftChildCount = root.left == null ? 0 : root.left.childCount + 1;
        final int rightChildCount = root.right == null ? 0 : root.right.childCount + 1;
        if (p.left != null) {
            p.childCount = p.left.childCount + 1;
        }
        if (p.right != null) {
            p.childCount += p.right.childCount + 1;
        }

You pulled out a method to get a node's height: why not also pull out one to get its size? Again, I would make this a static method of Node. The smaller refactor would be

int size(Node n) {
    return n == null ? 0 : (n.childCount + 1);
}

But I would prefer to replace Node.childCount with Node.size, to avoid the continual increments.


The implementation of median seems overly complicated, as observed in the earlier answer, although if you have a working getSuccessor then for odd sizes you could make one call to a "get me the nth node" to get the first of the two values that need averaging, and call getSuccessor to get the second.


    private int getPredecessor(Node node) {
        Node parent = node.left;
        Node current = parent;
        while (current != null) {
            parent = current;
            current = current.right;
        }
        return parent.key;
    }

I find parent a confusing name to use here. I also note that this gives a NullPointerException if node.left == null: if you want to require that callers check this before calling, then you should document it. getSuccessor similarly.


        private Node(int value) {
            key = value;

This is quite confusing use of names, especially if you later want to refactor from a balanced collection to a balanced map.

@coderodde 2015-09-17 17:06:11

If AvlTree is made generic and contains an even number of elements, how would you compute the median?

@Peter Taylor 2015-09-17 19:06:05

@coderodde, my own generic median implementation (which uses the classic quicksort-variant approach) returns a pair. Then it's up to the calling code to add and average or to do something else more appropriate for the type.

@Maksim Dmitriev 2015-10-12 18:55:38

I didn't understand your idea with generics. A median of a tree is a number. So what can be a median of a generic tree? However, I see a benefit of a generic tree. One can easily create trees of floats, doubles, and longs

@Maksim Dmitriev 2015-10-12 18:59:24

And neither did I get your idea about sentinels. I copied @coderodde's code. There are no getPredecessor or getSuccessor. null can be passed to height() For example, insert 3, 2, 1, and then the will be rotated right around 3 whose right is null

@Peter Taylor 2015-10-13 07:24:07

@MaksimDmitriev 1. The median of an odd-sized collection of strings is a string. The median of an even-sized collection requires some concept of averaging two elements (or a slight generalisation to "upper median" and "lower median"), but as mentioned in an earlier comment I handle this by returning a pair. 2. I've added some horizontal rules to group comments with code, because your observation about null being passed to height() shows me that the grouping wasn't clear enough before.

@coderodde 2015-09-16 06:41:55

To me it seems that you are overkilling it. Compare to the following:

package com.avl;

import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedList;

public class AvlTree {

    private Node root;
    private int size;

    public AvlTree(int... keys) {
        if (keys == null || keys.length == 0) {
            throw new IllegalArgumentException("Null or empty array");
        }

        insert(keys);
    }

    /**
     * Computes the amount of child nodes in the left subtree of {@code node}.
     * Runs in constant time.
     * 
     * @param  node the node whose left subtree size to compute.
     * @return the amount of child nodes in the left subtree.
     */
    private int getLeftSubtreeSize(Node node) {
        int tmp = node.childCount;

        if (node.right != null) {
            tmp -= (node.right.childCount + 1);
        }

        return tmp;
    }

    /**
     * Returns the value of {@code index}th (in order) entry. Runs in 
     * logarithmic time.
     * 
     * @param root  the root of the tree.
     * @param index the index of the entry whose value to get.
     * @return the value of {@code index}th value.
     */
    private Node getNode(Node root, int index) {
        Node current = root;

        for (;;) {
            int leftSubtreeSize = getLeftSubtreeSize(current);

            if (index == leftSubtreeSize) {
                return current;
            }

            if (index > leftSubtreeSize) {
                index -= (leftSubtreeSize + 1);
                current = current.right;
            } else {
                current = current.left;
            }
        }
    }

    /**
     * Computes the median of this tree. Makes at most two calls to logarithmic 
     * time methods.
     * 
     * @return the median of this tree.
     */
    public double getMedian() {
        if (size == 0) {
            throw new IllegalStateException(
                    "Asking for median from an empty tree.");
        }

        if (size % 2 == 0) {
            int b = getNode(root, size / 2 - 1).key;
            int a = getNode(root, size / 2).key;
            return 0.5 * (a + b);
        } else {
            return getNode(root, size / 2).key;
        }
    }

    private Node insert(Node parent, int key) {
        if (parent == null) {
            ++size;
            return new Node(key);
        }

        if (key < parent.key) {
            parent.left = insert(parent.left, key);
        } else {
            parent.right = insert(parent.right, key);
        }

        return balance(parent);
    }

    private Node balance(Node p) {
        fixHeightAndChildCount(p);
        if (bfactor(p) == 2) {
            if (bfactor(p.right) < 0) {
                p.right = rotateRight(p.right);
            }
            return rotateLeft(p);
        }
        if (bfactor(p) == -2) {
            if (bfactor(p.left) > 0) {
                p.left = rotateLeft(p.left);
            }
            return rotateRight(p);
        }
        return p;
    }

    private Node rotateRight(Node p) {
        Node q = p.left;
        p.left = q.right;
        q.right = p;
        fixHeightAndChildCount(p);
        fixHeightAndChildCount(q);
        return q;
    }

    private Node rotateLeft(Node q) {
        Node p = q.right;
        q.right = p.left;
        p.left = q;
        fixHeightAndChildCount(q);
        fixHeightAndChildCount(p);
        return p;
    }

    private int height(Node p) {
        return p == null ? 0 : p.height;
    }

    private int bfactor(Node p) {
        return height(p.right) - height(p.left);
    }

    private void fixHeightAndChildCount(Node p) {
        int hl = height(p.left);
        int hr = height(p.right);
        p.height = (hl > hr ? hl : hr) + 1;
        p.childCount = 0;
        if (p.left != null) {
            p.childCount = p.left.childCount + 1;
        }
        if (p.right != null) {
            p.childCount += p.right.childCount + 1;
        }
    }

    public void insert(int... keys) {
        for (int key : keys) {
            root = insert(root, key);
        }
    }

    private static final class Node {

        private Node left;
        private Node right;
        private final int key;
        private int height;
        private int childCount;

        private Node(int value) {
            key = value;
            height = 1;
        }

        @Override
        public String toString() {
            return Integer.toString(key);
        }
    }

    public static void main(String[] args) {
        Deque<Integer> list = new LinkedList<>();
        Deque<Integer> arra = new ArrayDeque<>();
        int n = 2000000;

        long ta = System.currentTimeMillis();
        for (int i = 0; i < n; ++i) {
            list.addFirst(i);
        }
        long tb = System.currentTimeMillis();

        System.out.println("LinkedList: " + (tb - ta));

        ta = System.currentTimeMillis();
        for (int i = 0; i < n; ++i) {
            arra.addFirst(i);
        }
        tb = System.currentTimeMillis();

        System.out.println("ArrayDeque: " + (tb - ta));
    }
}

Summa summarum, you can optimize a bit if you cache in each node only the size of the left subtree. Also, when appending (addLast), say 2000000 elements, to ArrayDeque vs. LinkedList, the first one takes around 50 times less time than the LinkedList. Same applies to addFirst. (Try it.)

@Maksim Dmitriev 2015-09-16 19:53:51

I was surprised to learn that amortized constant time of add(E) in ArrayDeque took less time than add(E) of LinkedList which is constant. I want to know why. However, the answers on stackoverflow.com/q/6163166/1065835 are not good enough. I offered a bounty there. So I'd appreciate your answer

@coderodde 2015-09-17 05:12:32

@MaksimDmitriev Well, I believe that the main performance bottleneck in LinkedList is the fact that whenever you push to any end of the deque, behind the scene the implementation allocates a new linked list node, which essentially involves JVM/OS, and that's expensive. Also, whenever you pop from any end, the linked list nodes become eligible for garbage collection and that's more work behind the scenes.

@Maksim Dmitriev 2015-09-17 07:41:13

Can you please write your answer about the performance of LinkedList on stackoverflow.com/q/6163166/1065835? I offered 50 points there.

@coderodde 2015-09-17 07:46:26

@MaksimDmitriev Done!

@Ram Bavireddi 2019-05-07 03:19:49

@coderodde, Can you please explain, how to handle duplicates in your solution?

@coderodde 2019-05-07 05:01:32

@RamBavireddi Good question. Looking from code, it allows duplicates.

@Ram Bavireddi 2019-05-08 15:11:10

@coderrode, I see that you've inserted duplicate keys into right subtree. In general AVL tree implementation, how can you delete one instance of duplicate key?

Related Questions

Sponsored Content

1 Answered Questions

AVL height-balanced binary search Tree and Dictionary

2 Answered Questions

[SOLVED] In-memory B-Tree in C++11

  • 2015-10-01 03:09:59
  • Koby Becker
  • 1652 View
  • 12 Score
  • 2 Answer
  • Tags:   c++ c++11 tree

2 Answered Questions

[SOLVED] Max heap in Java

  • 2013-07-28 17:43:58
  • user2368778
  • 2750 View
  • 4 Score
  • 2 Answer
  • Tags:   java heap

2 Answered Questions

[SOLVED] Generic binary search tree in C++

  • 2018-06-03 21:30:28
  • Incomputable
  • 570 View
  • 8 Score
  • 2 Answer
  • Tags:   c++ tree c++17

1 Answered Questions

[SOLVED] 2D Bin Packing Algorithm Implementation

1 Answered Questions

[SOLVED] Height of a general tree in C++

  • 2017-04-29 01:02:41
  • A user
  • 1472 View
  • 3 Score
  • 1 Answer
  • Tags:   c++ algorithm tree

2 Answered Questions

[SOLVED] Check if a tree is a proper BST

2 Answered Questions

[SOLVED] Constructing and maintaining a complete binary tree

  • 2015-03-05 05:42:57
  • envy_intelligence
  • 4466 View
  • 3 Score
  • 2 Answer
  • Tags:   c tree

2 Answered Questions

[SOLVED] Java method to check if a binary tree is a valid BST

1 Answered Questions

[SOLVED] Binary Search Tree in C

  • 2015-03-12 04:24:58
  • envy_intelligence
  • 757 View
  • 4 Score
  • 1 Answer
  • Tags:   c tree binary-search

Sponsored Content