By Matt Ball


2009-11-12 15:42:39 8 Comments

Let A and B be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B or A \B, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.

Notes:

  • Gecko-specific tricks are okay
  • I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
  • I've seen, but not tested, JS.Set (see previous point)

Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.

10 comments

@SmujMaiku 2017-01-11 04:21:09

As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:

var t, a, b, c, objA;

    // Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
    return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
    return (i*2).toFixed();
});

    // Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);

    // Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
objA = {};
a.forEach(function(v) { objA[v] = true; });
c = b.filter(function(v) { return !objA[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);

Results:

completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length

However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseFloat.

@fabianmoronzirfas 2019-05-29 03:32:30

Shouldn't it be c = b.filter(function(v) { return !A[v]; }); in the second function?

@SmujMaiku 2019-05-30 12:26:56

You are correct. Somehow it appears to be even faster for me

@milan 2016-04-08 16:27:19

Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as pythons A - B), and reportedly faster than indexOf for large arrays:

console.clear();
let a = new Set([1, 2, 3, 4]);
let b = new Set([5, 4, 3, 2]);


let a_minus_b = new Set([...a].filter(x => !b.has(x)));
let b_minus_a = new Set([...b].filter(x => !a.has(x)));
let a_intersect_b = new Set([...a].filter(x => b.has(x))); 

console.log([...a_minus_b]) // {1}
console.log([...b_minus_a]) // {5}
console.log([...a_intersect_b]) // {2,3,4}

@Estus Flask 2016-09-14 06:34:27

Also considerably faster than indexOf for large arrays.

@SwiftsNamesake 2016-11-27 04:50:46

Why JavaScript sets don't have union/intersect/difference built in is beyond me...

@Rafael 2017-01-06 19:37:12

I completely agree; these should be lower level primitives implemented in the js engine. It's beyond me also...

@John 2018-01-23 06:07:05

@SwiftsNamesake There's a proposal for set built-in methods that will hopefully be talked about in Janurary 2018 github.com/tc39/agendas/blob/master/2018/01.md.

@user187291 2009-11-12 15:50:30

if don't know if this is most effective, but perhaps the shortest

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(function(x) { return B.indexOf(x) < 0 })

console.log(diff);

Updated to ES6:

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(x => !B.includes(x) );

console.log(diff);

@Christoph 2009-11-12 17:37:29

+1: not the most efficient solution, but definitely short and readable

@Eric Bréchemier 2009-11-13 16:44:25

Note: array.filter is not supported cross-browser (e.g. not in IE). It seems not to matter to @Matt since he stated that "Gecko-specific tricks are okay" but I think it's worth mentioning.

@glebm 2013-04-08 00:37:06

This is very slow. O(|A| * |B|)

@Quentin Roy 2016-06-21 08:14:47

@EricBréchemier This is now supported (since IE 9). Array.prototype.filter is a standard ECMAScript feature.

@Paolo Biavati 2016-11-14 14:26:03

what is "ES6" ? I found this syntax don't work on safari 9.1 (on 10.0 its ok)

@Cole Cameron 2016-11-16 22:02:10

@PaoloBiavati ES6 is ECMAScript 6th edition. It's the standardization of JavaScript. The 6th edition was published in June 2015. I'm not familiar with Safari's versions, but it's likely version 9.1 didn't support the syntactic sugar form of lambda functions (the "=>" operator) whereas version 10 does.

@gnganpath 2017-03-17 09:25:27

it's working for more than one itme also . A = [200000340755,200000340767, 200000340760]; B = [200000340755]; diff = A.filter(function(x) { return B.indexOf(x) < 0 }) console.log(diff);

@Caius 2017-06-13 13:16:53

neat as I was looking for. Thanks!

@c24w 2017-06-26 11:27:23

In ES6, you could use !B.includes(x) instead of B.indexOf(x) < 0 :)

@eithed 2017-11-21 12:55:02

This doesn't work when computing difference of arrays containing objects - ie: [{a: 5}, {b: 6}].filter(x => [{a: 5}].indexOf(x) < 0 ) gives [{a: 5}, {b: 6}]

@AndreKR 2018-05-21 10:49:31

@glebm That would only be true if indexOf would run a linear search, but benchmarks suggest that it's somehow indexed. It's pretty much as fast as a key lookup in an object.

@Wong Jia Hau 2019-06-27 09:25:29

@glebm I don't think this kind of problem can be solved in linear time at all, it will always be polynomial, unless you solved the P vs NP problem.

@Brian Burns 2017-09-05 09:05:34

Some simple functions, borrowing from @milan's answer:

const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);

Usage:

const a = new Set([1, 2]);
const b = new Set([2, 3]);

setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }

@chribsen 2017-01-02 12:32:17

Using Underscore.js (Library for functional JS)

>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]

@perhelion 2014-10-16 07:13:53

The shortest, using jQuery, is:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = $(A).not(B);

console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

@Drew Baker 2015-09-10 18:31:06

This returns an object of the difference.

@Marc-André Lafortune 2016-06-09 15:38:34

jQuery not no longer works with generic objects as of 3.0.0-rc1. See github.com/jquery/jquery/issues/3147

@Chris Barr 2016-12-05 14:24:55

It's not a great idea to add a dependency on a ~70k 3rd party library just to do this, since the same thing can be accomplished in just a few lines of code as shown in the other answers here. However, if you are already using jQuery on your project this will work just fine.

@Downhillski 2017-01-18 03:24:50

Though this approach has less code, but it does not provide any explanation of the space and time complexity of the the differ algorithms and the data structure it uses to perform the method. It is black boxed for developers to engineer the software with no evaluation when data scale up or with limited memory is allowed. if you use such approach with large data set, the performance might remains unknown until further research onto the source code.

@Alex 2018-03-02 12:29:24

This is just returning the amount (2 in this case) of elements of A which are not in B. Converting 2 into array is pointless...

@Christoph 2009-11-12 16:37:18

You can use an object as a map to avoid linearly scanning B for each element of A as in user187291's answer:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

@j-g-faustus 2009-11-12 16:44:47

Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

  • myUtils.keys(hash): returns an array with the keys of the hash

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

Performance: Testing with

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

I think that's decent payoff for 20 lines of code.

@Christoph 2009-11-12 17:31:40

you should still check hasOwnProperty() even if the elements are numeric: otherwise, something like Object.prototype[42] = true; means 42 can never occur in the result set

@j-g-faustus 2009-11-12 18:54:07

Granted that it would be possible to set 42 in that way, but is there a semi-realistic use case where anyone would actually do so? But for general strings I take the point - it could easily conflict with some Object.prototype variable or function.

@Xavi Ivars 2009-11-12 16:07:59

This works, but I think another one is much more shorter, and elegant too

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;

@Eric Bréchemier 2009-11-12 17:04:43

I would hash the array B, then keep values from the array A not present in B:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}

@Christoph 2009-11-12 17:15:25

that's exactly the same algorithm I posted half an hour ago

@Eric Bréchemier 2009-11-13 16:41:35

@Christoph: you are right... I failed to notice that. I find my implementation more simple to understand though :)

@Christophe Roussy 2017-04-12 11:54:57

I think it is better to compute the diff outside of getDifference so it may be reused multiple times. Maybe optional like so: getDifference(a, b, hashOfB), if not passed it will be computed otherwise it is reused as-is.

Related Questions

Sponsored Content

8 Answered Questions

[SOLVED] What is the correct way to check for string equality in JavaScript?

  • 2010-08-27 17:37:55
  • JSS
  • 961041 View
  • 753 Score
  • 8 Answer
  • Tags:   javascript string

69 Answered Questions

[SOLVED] What is the most efficient way to deep clone an object in JavaScript?

28 Answered Questions

[SOLVED] What is the difference between null and undefined in JavaScript?

62 Answered Questions

[SOLVED] How to get the difference between two arrays in Javascript?

17 Answered Questions

[SOLVED] Fastest way to duplicate an array in JavaScript - slice vs. 'for' loop

16 Answered Questions

37 Answered Questions

[SOLVED] Most efficient way to create a zero filled JavaScript array?

  • 2009-08-18 18:11:29
  • dil
  • 291491 View
  • 512 Score
  • 37 Answer
  • Tags:   javascript arrays

19 Answered Questions

[SOLVED] What is the most efficient way to concatenate N arrays?

  • 2011-02-22 15:20:36
  • Leonid
  • 155379 View
  • 219 Score
  • 19 Answer
  • Tags:   javascript arrays

8 Answered Questions

[SOLVED] Best way to find if an item is in a JavaScript array?

  • 2008-09-27 15:41:12
  • zimbatm
  • 968499 View
  • 748 Score
  • 8 Answer
  • Tags:   javascript arrays

12 Answered Questions

[SOLVED] Fastest way to convert JavaScript NodeList to Array?

Sponsored Content