Code Pumpkin

Performance Issue with HashSet RemoveAll() method

October 12, 2017
Posted by Pumpkin

In our previous article on HashSet internals, we have seen internal implementation of its add(), remove(), contains(), clear(), iterator() and isEmpty() methods. One of our readers has raised question about How efficient is HashSet removeAll() method? As it requires detailed explanation, we have decided to write separate article on it.  

What does removeAll() method do?

Before going into internals of removeAll() method, let's understand the use of removeAll() method.  Here is the method signature :

public boolean removeAll(Collection<?> c)
  1. It removes all elements from the set that are contained in the specified collection.
  2. removeAll() method is defined in AbstractSet, So all Set implementation which extends AbstractSet has this method.
  3. It accepts not only HashSet, but all the Collections types i.e. ArrayList, LinkedList, HashSet.

  4. If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.

Java Program

import java.util.HashSet;
import java.util.LinkedList;

public class HashSetRemoveAllDemo {

    public static void main(String s[])
        HashSet<Integer> original = new HashSet<>();
        for(int i=1; i<=10; i++)

        HashSet<Integer> removalSet = new HashSet<>();
        for(int i=7; i<=12; i++)

        LinkedList<Integer> removalList = new LinkedList<>();
        for(int i=-3; i<=2; i++)

        System.out.println("Original : " + original);
        System.out.println("RemovalSet : " + removalSet);
        System.out.println("RemovalList : " + removalList);

        // removes 7,8,9,10 from original set
        System.out.println("After removing elements of RemovalSet : " + original);

        // removes 1,2 from original set
        System.out.println("After removing elements of RemovalList : " + original);

Output :

Original : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
RemovalSet : [7, 8, 9, 10, 11, 12]
RemovalList : [-3, -2, -1, 0, 1, 2]
After removing elements of RemovalSet : [1, 2, 3, 4, 5, 6]
After removing elements of RemovalList : [3, 4, 5, 6]

Note : In above program, we have used for loop to populate HashSet and LinkedList. It is quite verbose because it cannot be expressed in a single expression. However, Java9 has introduced Convenience Factory Methods for Collections to initialize Set, List or Map as below

List<String> list = List.of("a", "b", "c");
Set<String> set = Set.of("a", "b", "c");
Map<String, Integer> cities = Map.of("a", 1, "b", 2,"c",3);

Internal Implementation of RemoveAll()

public boolean removeAll(Collection c) {
	boolean modified = false;

	if (size() > c.size()) {
		for (Iterator i = c.iterator(); i.hasNext(); )
			modified |= remove(;
	} else {
		for (Iterator i = iterator(); i.hasNext(); ) {
			if (c.contains( {
				modified = true;
	return modified;

So, How it is implemented internally?

1) It first compares the size of both the collections.
2) If size of our Set is greater than passed collection, then it iterates through passed collection, 
– calls remove() method of our Set directly without checking for availability of current element in iterator.
3) If size of our Set is smaller than passed collection, it iteratess through our Set.
– calls contains() method of passed collection (by passing current element of our Set iterator)
– calls remove() method of our Set if contains() method returns true.

When Does it cause performance issue?

As it accepts Collection of all the types, it is required to call contains() method of that collection. For Example, 

    – Assume that you are passing LinkedList object in removeAll() method, and its size is greater than our Set.
    – So for each elements it calls contains() method of LinkedList which is very time consuming i.e. O(N)
    – But if our passed collection is just another HashSet object, then time complexity of contains() method is just O(1)

In Summary, performance of removeAll() method depends on the time complexity of conatins() method  and size of passed collection. If passed collection is HashSet, there won't be any performance issue. But If you pass LinkedlList with size greater than our set,  then removeAll() method will give you very poor performance . 

Here is already logged bug for the same :

You can also read Jon Skeet's article There’s a hole in my abstraction to analyse removeAll() performance further.

Tags: , , , , , ,

Comments and Queries

If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code class="java"> 
String foo = "bar";
For more information on supported HTML tags in disqus comment, click here.

Total Post : 88
Contribute Your Articles

Interview Experiences

Like Us On Facebook