Code Pumpkin

Performance Issue with HashSet RemoveAll() method

October 12, 2017
Posted by Pumpkin
Subscribe

In our previous article on HashSet internals, we have seen internal implementation of its add(), remove(), contains(), clear(), iterator() and isEmpty() methods. One of our readers has raised question about How efficient is HashSet removeAll() method? As it requires detailed explanation, we have decided to write separate article on it.  

What does removeAll() method do?

Before going into internals of removeAll() method, let's understand the use of removeAll() method.  Here is the method signature :


public boolean removeAll(Collection<?> c)
  1. It removes all elements from the set that are contained in the specified collection.
  2. removeAll() method is defined in AbstractSet, So all Set implementation which extends AbstractSet has this method.
  3. It accepts not only HashSet, but all the Collections types i.e. ArrayList, LinkedList, HashSet.

  4. If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.


Java Program


import java.util.HashSet;
import java.util.LinkedList;

public class HashSetRemoveAllDemo {

    public static void main(String s[])
    {
        HashSet<Integer> original = new HashSet<>();
        for(int i=1; i<=10; i++)
        {
            original.add(i);
        }

        HashSet<Integer> removalSet = new HashSet<>();
        for(int i=7; i<=12; i++)
        {
            removalSet.add(i);
        }

        LinkedList<Integer> removalList = new LinkedList<>();
        for(int i=-3; i<=2; i++)
        {
            removalList.add(i);
        }

        System.out.println("Original : " + original);
        System.out.println("RemovalSet : " + removalSet);
        System.out.println("RemovalList : " + removalList);

        // removes 7,8,9,10 from original set
        original.removeAll(removalSet);
        System.out.println("After removing elements of RemovalSet : " + original);

        // removes 1,2 from original set
        original.removeAll(removalList);
        System.out.println("After removing elements of RemovalList : " + original);
    }
}

Output :


Original : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
RemovalSet : [7, 8, 9, 10, 11, 12]
RemovalList : [-3, -2, -1, 0, 1, 2]
After removing elements of RemovalSet : [1, 2, 3, 4, 5, 6]
After removing elements of RemovalList : [3, 4, 5, 6]

Note : In above program, we have used for loop to populate HashSet and LinkedList. It is quite verbose because it cannot be expressed in a single expression. However, Java9 has introduced Convenience Factory Methods for Collections to initialize Set, List or Map as below


List<String> list = List.of("a", "b", "c");
Set<String> set = Set.of("a", "b", "c");
Map<String, Integer> cities = Map.of("a", 1, "b", 2,"c",3);

Internal Implementation of RemoveAll()


public boolean removeAll(Collection c) {
	Objects.requireNonNull(c);
	boolean modified = false;

	if (size() > c.size()) {
		for (Iterator i = c.iterator(); i.hasNext(); )
			modified |= remove(i.next());
	} else {
		for (Iterator i = iterator(); i.hasNext(); ) {
			if (c.contains(i.next())) {
				i.remove();
				modified = true;
			}
		}
	}
	return modified;
}

So, How it is implemented internally?

1) It first compares the size of both the collections.
2) If size of our Set is greater than passed collection, then it iterates through passed collection, 
– calls remove() method of our Set directly without checking for availability of current element in iterator.
3) If size of our Set is smaller than passed collection, it iteratess through our Set.
– calls contains() method of passed collection (by passing current element of our Set iterator)
– calls remove() method of our Set if contains() method returns true.


When Does it cause performance issue?

As it accepts Collection of all the types, it is required to call contains() method of that collection. For Example, 

    – Assume that you are passing LinkedList object in removeAll() method, and its size is greater than our Set.
    – So for each elements it calls contains() method of LinkedList which is very time consuming i.e. O(N)
    – But if our passed collection is just another HashSet object, then time complexity of contains() method is just O(1)

In Summary, performance of removeAll() method depends on the time complexity of conatins() method  and size of passed collection. If passed collection is HashSet, there won't be any performance issue. But If you pass LinkedlList with size greater than our set,  then removeAll() method will give you very poor performance . 

Here is already logged bug for the same : https://bugs.openjdk.java.net/browse/JDK-6982173

You can also read Jon Skeet's article There’s a hole in my abstraction to analyse removeAll() performance further.

That's all for this topic. If you guys have any suggestions or queries, feel free drop a comment. We would be happy to add that in our post. You can also contribute your articles by creating contributor account here

Happy Learning 🙂

Further Reading : 

  1. How is HashSet implemented internally in Java?
  2. How does HashMap work internally in Java?
  3. HashMap vs Hashtable Vs SynchronizedMap Vs ConcurrentHashMap

Tags: , , , , , ,


Comments and Queries

If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code class="java"> 
String foo = "bar";
</code></pre>
For more information on supported HTML tags in disqus comment, click here.

Total Post : 80
Subscribe
Contribute Your Articles

Interview Experiences

Related Books

Like Us On Facebook

Alexa Page Rank