By RichK


2014-05-16 15:40:12 8 Comments

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?

For example I have a list of Person object and I want to remove people with the same name,

persons.stream().distinct();

Will use the default equality check for a Person object, so I need something like,

persons.stream().distinct(p -> p.getName());

Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?

23 comments

@Flavio Oliva 2019-08-23 17:38:29

In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.

public List<Log> fetchLogById(Long id) {
    return this.findLogById(id).stream()
        .filter(new LogPredicate())
        .collect(Collectors.toList());
}

public class LogPredicate implements Predicate<Log> {

    private Log previous;

    public boolean test(Log atual) {
        boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);

        if (isDifferent) {
            previous = current;
        }
        return isDifferent;
    }

    private boolean verifyIfDifferentLog(Log current, Log previous) {
        return !current.getId().equals(previous.getId());
    }

}

@Abdur Rahman 2018-07-04 06:30:02

If you want to List of Persons following would be the simple way

Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());

Additionally, if you want to find distinct or unique list of names, not Person , you can do using following two method as well.

Method 1: using distinct

persons.stream().map(x->x.getName()).distinct.collect(Collectors.toList());

Method 2: using HashSet

Set<E> set = new HashSet<>();
set.addAll(person.stream().map(x->x.getName()).collect(Collectors.toList()));

@Hulk 2018-12-06 11:21:46

This produces a list of names, not Persons.

@Raj 2018-12-18 05:29:21

This is exactly what I was looking for. I needed a single line method to eliminate duplicates while transforming a collection to one other. Thanks.

@Saeed Zarinfam 2017-05-30 05:39:22

You can use groupingBy collector:

persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));

If you want to have another stream you can use this:

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));

@Alex 2017-06-05 12:02:47

Similar approach which Saeed Zarinfam used but more Java 8 style:)

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
 .map(plans -> plans.stream().findFirst().get())
 .collect(toList());

@Naveen Dhalaria 2019-04-04 09:23:47

Distinct objects list can be found using:

 List distinctPersons = persons.stream()
                    .collect(Collectors.collectingAndThen(
                            Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
                            ArrayList::new));

@uneq95 2019-04-05 06:12:43

My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.

  List<YourPersonClass> listWithDistinctPersons =   persons.stream()
            //operators to remove duplicates based on person name
            .collect(Collectors.groupingBy(p -> p.getName()))
            .values()
            .stream()
            //cut short the groups to size of 1
            .flatMap(group -> group.stream().limit(1))
            //collect distinct users as list
            .collect(Collectors.toList());

@Andrew Novitskyi 2018-09-07 11:05:30

Set<YourPropertyType> set = new HashSet<>();
list
        .stream()
        .filter(it -> set.add(it.getYourProperty()))
        .forEach(it -> ...);

@Narendra Jadhav 2018-09-07 13:25:40

A good answer has a better explanation How do I write a good answer?

@Lonely Neuron 2019-02-22 16:50:48

HashSet is not thread safe

@Tomasz Linkowski 2018-07-27 11:11:51

Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:

Seq.seq(persons).distinct(Person::getName).toList();

Under the hood, it does practically the same thing as the accepted answer, though.

@Aliaksei Yatsau 2018-05-29 10:02:09

Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.

 public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
    return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
            (map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
            (left, right) -> {
                left.putAll(right);
                return left;
            }, map -> new ArrayList<>(map.values()),
            Collector.Characteristics.UNORDERED)); }

@Santhosh 2017-08-23 10:42:48

Another solution, using Set. May not be the ideal solution, but it works

Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());

Or if you can modify the original list, you can use removeIf method

persons.removeIf(p -> !set.add(p.getName()));

@Manoj Shrestha 2018-12-22 01:08:42

This is the best answer if you are not using any third party libraries!

@Luvie 2019-07-31 10:27:22

using genious idea that Set.add returns true if this set did not already contain the specified element. +1

@Stuart Marks 2015-01-10 04:28:32

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)

@Tagir Valeev 2015-09-04 09:49:09

I guess for better compatibility the argument should be Function<? super T, ?>, not Function<? super T, Object>. Also it should be noted that for ordered parallel stream this solution does not guarantee which object will be extracted (unlike normal distinct()). Also for sequential streams there's additional overhead on using CHM (which is absent in @nosid solution). Finally this solution violates the contract of filter method which predicate must be stateless as stated in JavaDoc. Nevertheless upvoted.

@java_newbie 2016-08-12 08:33:49

@Stuart, whi are you using concurrentHashMap here?

@Stuart Marks 2016-08-12 17:49:14

@java_newbie The Predicate instance returned by distinctByKey has no idea of whether it's being used within a parallel stream. It uses CHM in case it is being used in parallel, though this adds overhead in the sequential case as Tagir Valeev noted above.

@IcedDante 2016-09-15 16:11:59

@TagirValeev what are the ramifications of violating the contract of filter here?

@Tagir Valeev 2016-09-17 01:24:54

@IcedDante, it works with current implementation of Stream API, but violating contract in general is a bad thing. Another Stream API implementation might use your predicate in another way and may break if you violate the contract.

@holandaGo 2017-03-13 21:13:02

Wouldn't this fail on the second run ?

@Stuart Marks 2017-03-14 04:49:41

@holandaGo It will fail if you save and reuse the Predicate instance returned by distinctByKey. But it works if you call distinctByKey each time, so that it creates a fresh Predicate instance each time.

@Kirill Gamazkov 2017-06-22 11:13:17

You don't need Map for this, Set is sufficient. Collection#add method returns boolean indicating whether collection was modified by this call. I. e. someSet.add("abc") == true means that "abc" was not seen before.

@Chinmay 2017-07-11 03:40:35

Shouldn't the map inside distinctByKey be static?

@g00glen00b 2017-07-11 11:00:08

@Chinmay no, it shouldn't. If you use .filter(distinctByKey(...)). It will execute the method once and return the predicate. So basically the map is already being re-used if you use it properly within a stream. If you would make the map static, the map would be shared for all usages. So if you have two streams using this distinctByKey(), both would use the same map, which isn't what you want.

@Eugene 2017-07-31 08:29:40

This is sooo smart and completely non-obvious. Generally this is a stateful lambda and the underlying CallSite will be linked to the get$Lambda method - that will return a new instance of the Predicate all the time, but those instances will share the same map and function as far as I understand. Very nice!

@Vitaliy 2017-08-07 20:20:44

@StuartMarks, I agree with the last comment, but from a different perspective. I definitely appreciate the elegance of the solution but since it is very non obviouse - I dont think it should be used. In other words - it is simple but not streightforward. And in other-other words: if I see this in code I'm gonna go all "WTF just happened here..." :-) C#'s IEnumerable does have a Distinct that takes a function and it is very intuitive. I think you should consider just adding this to the API.

@Stuart Marks 2017-08-08 18:19:03

@Vitaliy Well this answer is mainly demonstrating a technique that can be implemented using only Java 8 and Java 9 APIs. Certainly a distinct-by-key operation would be useful to add to the Streams API, but if it is, it won't be available until a future release.

@Vitaliy 2017-08-09 06:47:47

@StuartMarks, thanks Stu. Please make haste :-)

@János 2017-09-11 11:32:43

This answer is great. Thanks a lot. Readibility could be improved a little using ConcurrentHashMap.newKeySet() instead of the Map itself: Set<Object> seen = ConcurrentHashMap.newKeySet();return t -> seen.add(t);

@Holger 2017-11-16 23:16:12

Worth noting that this will may destroy the encounter order for parallel streams.

@Stuart Marks 2017-11-17 05:24:53

@János Thanks for the suggestion. Updated.

@Stuart Marks 2017-11-17 05:25:01

@Holger Noted, thanks.

@Stuart Marks 2017-11-17 05:27:09

@TagirValeev Finally added a note regarding the choice of duplicate element selected.

@Olivier Boissé 2018-01-29 10:44:26

Can I use a new HashSet() instead of ConcurrentHashMap.newKeySet() if I don't use the parallel method of the stream ?

@Stuart Marks 2018-01-29 22:30:44

@OlivierBoissé It might work, but if it ever accidentally gets put into a parallel stream, it'll almost certainly fail. Sticking with ConcurrentHashMap is probably the safest thing to do.

@Ninja 2018-11-01 12:09:56

It seems use Set will break the sequence of the input list.

@Anyul Rivas 2018-11-08 11:29:12

Could anyone explain this piece of code?

@Hulk 2018-12-06 10:43:21

@Ninja yes, but this is already mentioned in the answer: "Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does."

@Ninja 2018-12-22 07:11:58

@AnyulRivas in the first section code, the keyExtractor.apply(t) can be replaced with person.getName().Then the code should be easy to read.

@user2296988 2019-03-24 14:56:22

Can somone please explain how a new set is created each time and yet it holds previous values?

@Pr0pagate 2019-06-12 22:30:56

@StuartMarks Awesome answer! I implemented this is a private non-static method in my class and it SEEMS to work fine. Is there anything I might be missing as to why it is static?

@Stuart Marks 2019-06-13 00:36:13

@Pr0pagate It's static because it doesn't depend on anything from the class in which it's declared. As such it can be in a utility class or something. I don't think it's a problem for it to be a non-static method, but it does seem a bit odd in that calls to foo.distinctByKey() and bar.distinctByKey() don't have anything to do with the foo or bar instances.

@2Big2BeSmall 2017-10-08 06:42:37

The Most simple code you can write:

    persons.stream().map(x-> x.getName()).distinct().collect(Collectors.toList());

@RichK 2017-10-08 08:13:50

That'll get a distinct list of names though, not Persons by name

@Mateusz Rasiński 2017-03-14 09:31:21

I recommend using Vavr, if you can. With this library you can do the following:

io.vavr.collection.List.ofAll(persons)
                       .distinctBy(Person::getName)
                       .toJavaSet() // or any another Java 8 Collection

@Sllouyssgort 2017-07-17 15:25:41

You can use StreamEx library:

StreamEx.of(persons)
        .distinct(Person::getName)
        .toList()

@Torque 2019-06-24 10:40:28

Unfortunately, that method of the otherwise awesome StreamEx library is poorly designed - it compares object equality instead of using equals. This may work for Strings thanks to string interning, but it also may not.

@Guillaume Cornet 2017-07-12 15:12:55

I made a generic version:

private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
    return Collectors.collectingAndThen(
            toMap(
                    keyExtractor,
                    t -> t,
                    (t1, t2) -> t1
            ),
            (Map<R, T> map) -> map.values().stream()
    );
}

An exemple:

Stream.of(new Person("Jean"), 
          new Person("Jean"),
          new Person("Paul")
)
    .filter(...)
    .collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
    .map(...)
    .collect(toList())

@Wojciech Górski 2016-10-19 12:27:05

Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    final Set<Object> seen = new HashSet<>();
    return t -> seen.add(keyExtractor.apply(t));
}

Then call:

persons.stream().filter(distinctByKey(p -> p.getName());

@brunnsbe 2016-12-27 12:08:11

This one doesn't take into consideration that the stream might be parallel.

@Wojciech Górski 2016-12-27 22:39:07

Thanks for the comment, I've updated my answer. If you don't need a parallel stream, not using concurrent maps gives you much better performance.

@Lii 2017-07-28 02:13:08

Your code would probably work for parallel collections if you created a Collections.synchronizedSet(new HashSet<>()) instead. But it would probably be slower than with a ConcurrentHashMap.

@Craig P. Motlin 2016-01-11 21:51:27

You can use the distinct(HashingStrategy) method in Eclipse Collections.

List<Person> persons = ...;
MutableList<Person> distinct =
    ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));

If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.

MutableList<Person> persons = ...;
MutableList<Person> distinct =
    persons.distinct(HashingStrategies.fromFunction(Person::getName));

HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

Note: I am a committer for Eclipse Collections.

@Donald Raab 2018-04-28 19:07:07

The method distinctBy was added in Eclipse Collections 9.0 which can further simplify this solution. medium.com/@donraab/…

@frhack 2015-06-24 23:39:52

We can also use RxJava (very powerful reactive extension library)

Observable.from(persons).distinct(Person::getName)

or

Observable.from(persons).distinct(p -> p.getName())

@sdgfsdh 2017-06-01 16:25:12

Rx is awesome, but this is a poor answer. Observable is push-based whereas Stream is pull-based. stackoverflow.com/questions/30216979/…

@frhack 2017-06-02 00:41:39

the question ask for a java8 solution not necessarily using stream. My answer show that java8 stream api is less powefull than rx api

@Ritesh 2017-08-16 14:53:34

Using reactor, it will be Flux.fromIterable(persons).distinct(p -> p.getName())

@M. Justin 2018-05-02 16:43:13

The question literally says "using the Stream API", not "not necessarily using stream". That said, this is a great solution to the XY problem of filtering the stream to distinct values.

@Garrett Smith 2015-06-15 11:11:53

Building on @josketres's answer, I created a generic utility method:

You could make this more Java 8-friendly by creating a Collector.

public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
    return input.stream()
            .collect(toCollection(() -> new TreeSet<>(comparer)));
}


@Test
public void removeDuplicatesWithDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(7), new C(42), new C(42));
    Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
    assertEquals(2, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 7));
    assertTrue(result.stream().anyMatch(c -> c.value == 42));
}

@Test
public void removeDuplicatesWithoutDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(1), new C(2), new C(3));
    Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
    assertEquals(3, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 1));
    assertTrue(result.stream().anyMatch(c -> c.value == 2));
    assertTrue(result.stream().anyMatch(c -> c.value == 3));
}

private class C {
    public final int value;

    private C(int value) {
        this.value = value;
    }
}

@josketres 2015-01-12 15:28:43

There's a simpler approach using a TreeSet with a custom comparator.

persons.stream()
    .collect(Collectors.toCollection(
      () -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName())) 
));

@janagn 2015-03-14 22:01:29

I think your answer helps towards the ordering and not towards uniqueness. However it helped me set my thoughts on how to do it. Check here: stackoverflow.com/questions/1019854/…

@pisaruk 2016-07-04 20:00:39

Keep in mind you will be paying the price for sorting the elements here and we do not need sorting in order to find duplicates or even remove duplicates.

@Jean-François Savard 2017-06-13 20:05:21

Comparator.comparing(Person::getName)

@nosid 2014-05-16 15:47:22

You can wrap the person objects into another class, that only compares the names of the persons. Afterwards, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:

persons.stream()
    .map(Wrapper::new)
    .distinct()
    .map(Wrapper::unwrap)
    ...;

The class Wrapper might look as follows:

class Wrapper {
    private final Person person;
    public Wrapper(Person person) {
        this.person = person;
    }
    public Person unwrap() {
        return person;
    }
    public boolean equals(Object other) {
        if (other instanceof Wrapper) {
            return ((Wrapper) other).person.getName().equals(person.getName());
        } else {
            return false;
        }
    }
    public int hashCode() {
        return person.getName().hashCode();
    }
}

@Stuart Caie 2014-05-16 18:07:42

This is called the Schwartzian transform

@Marko Topolnik 2014-05-16 20:21:26

@StuartCaie Not really... there's no memoization, and the point is not performance, but adaptation to the existing API.

@bjmi 2017-01-30 12:52:29

com.google.common.base.Equivalence.wrap(S) and com.google.common.base.Equivalence.Wrapper.get() could help too.

@Lii 2017-07-28 02:00:51

You could make the wrapper class generic and parametrized by a key extraction function.

@Holger 2017-11-17 08:03:39

The equals method can be simplified to return other instanceof Wrapper && ((Wrapper) other).person.getName().equals(person.getName());

@wilmol 2019-07-30 06:31:48

I really prefer this solution, because it lets you test more than just one attribute. +1 to Guava Equivalence class, looks like it was built for exactly this.

@Holger 2014-05-19 08:58:47

The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:

Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
    Person previous;
    public boolean test(Person p) {
      if(previous!=null && c.compare(previous, p)==0)
        return false;
      previous=p;
      return true;
    }
})./* more stream operations here */;

Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.

@wha'eve' 2014-05-16 17:47:59

An alternative would be to place the persons in a map using the name as a key:

persons.collect(toMap(Person::getName, p -> p, (p, q) -> p)).values();

Note that the Person that is kept, in case of a duplicate name, will be the first encontered.

@skiwi 2014-05-17 09:58:52

This does create memory overhead though, which may not be what you want.

@Holger 2014-05-19 08:38:47

@skiwi: do you think there is a way to implement distinct() without that overhead? How would any implementation know if it has seen an object before without actually remembering all distinct values it has seen? So the overhead of toMap and distinct is very likely the same.

@skiwi 2014-05-19 08:50:45

@Holger I may have been wrong there as I hadn't thought abou the overhead distinct() itself creates.

@Mohammad Adnan 2015-10-02 16:18:58

with less number of object, probably this is best concise and readable answer..

@Philipp 2016-11-07 18:56:00

And obviously it messes up the original order of the list

@Kirill Gamazkov 2017-06-22 11:20:24

'Stream#collect' is terminal operation. There may be need in further processing after deduplicating - sorting, concatenation, mapping/flatMapping, etc. Though there are Collectors for some of those ops.

@Holger 2017-11-17 08:02:24

@Philipp: could be fixed by changing to persons.collect(toMap(Person::getName, p -> p, (p, q) -> p, LinkedHashMap::new)).values();

@Daniel Earwicker 2018-09-29 17:04:38

@Holger if the input stream is sorted, then all the identical items are next to each other, hence it is only necessary to remember the previous item to see if the next one is the same as it and can be skipped. Streams internally have a way of indicating that they are sorted.

@Holger 2018-09-30 10:36:08

@DanielEarwicker this question is about "distinct by property". It would require the stream to be sorted by the same property, to be able to take advantage of it. First, the OP did never state that the stream is sorted at all. Second, streams are not able to detect whether they are sorted by a certain property. Third, there is no genuine "distinct by property" stream operation to do what you suggest. Forth, in practice, there are only two ways to get such a sorted stream. A sorted source (TreeSet) which is already distinct anyway or sorted on the stream which also buffers all elements.

@Daniel Earwicker 2018-09-30 10:37:40

@Holger was answering your comment, not the question.

@Holger 2018-09-30 10:38:58

@DanielEarwicker and my comment was made in the context of this question.

Related Questions

Sponsored Content

27 Answered Questions

55 Answered Questions

[SOLVED] Creating a memory leak with Java

42 Answered Questions

[SOLVED] How do I convert a String to an int in Java?

58 Answered Questions

[SOLVED] How do I read / convert an InputStream into a String in Java?

83 Answered Questions

[SOLVED] Is Java "pass-by-reference" or "pass-by-value"?

20 Answered Questions

[SOLVED] Java 8 List<V> into Map<K, V>

40 Answered Questions

[SOLVED] How do I efficiently iterate over each entry in a Java Map?

65 Answered Questions

[SOLVED] How do I generate random integers within a specific range in Java?

  • 2008-12-12 18:20:57
  • user42155
  • 3854343 View
  • 3307 Score
  • 65 Answer
  • Tags:   java random integer

31 Answered Questions

[SOLVED] When to use LinkedList over ArrayList in Java?

11 Answered Questions

[SOLVED] Ways to iterate over a list in Java

Sponsored Content