How is HashSet implemented internally in Java?

September 27, 2017

Posted by Abhi Andhariya

In our previous article, we have seen internal implementation of SynchronizedMap and the difference between HashMap, Hashtable, SynchronizedMap and ConcurrentMap. In this article, we will understand how does HashSet work internally? How JDK implementers have reused HashMap effectively for HashSet implementation.

The best way to understand internal implementation of any built-in Java class is by looking into its source code. You can either attach and import source code in your IDE or you can refer GrepCode. I personally prefer GrepCode as it also maintains code for all the releases of jdk and we can easily peep through the evolution of any java class.

Before going into internal implementation HashSet, we should know below two basic properties of HashSet.

HashSet doesn't allow duplicate elements.
It allows at most one null element.

Both of these properties can be achieved using HashMap as HashMap doesn't allow duplicate keys and it also allows only one null key.

HashMap as a Backing DataStructure

HashSet internally uses HashMap as a backing data structure with key as generic type E and value as Object class type. Have a look at below code snippet of HashSet class from jdk 1.6


public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<E,Object>();
    }

   // Other constructors and methods
}

In HashMap, we can have same object/instance as values of multiple keys. In other words, duplicate values are allowed in HashMap.

This property of HashMap has been utilized very efficiently in HashSet implementation. If you notice in above code, there is one static field PRESENT of type Object. Whenever we add any element to HashSet, it puts element into backing HashMap with key as our element and value as static field PRESENT.

For Example,


import java.util.HashSet;

public class HashSetInternalDemo{
    public static void main(String[] a){
        HashSet<String> friendList = new HashSet<>();
        friendList.add("Malay");
        friendList.add("Mahesh");
        friendList.add("Jay");
        friendList.add("Ankit");
        friendList.add("Nirav");
        friendList.add("Dhaval");
    }
}

In above java program, I have created HashSet < String > and added six different Strings in it. Internally it creates HashMap < String, Object> and stores elements as shown below.

add() method

It adds the specified element to HashSet if it is not already present and returns true. If HashSet already contains the element, the call leaves the set unchanged and returns false.

add() method internally uses put() method of HashMap. Here is the code snippet of add() method of HashSet :


public boolean add(E e) {
        return map.put(e, PRESENT)==null;
}

put() method returns the previous value associated with key, or null if there was no mapping for key.

So, if element is not already present in the set, put() method will return null . Hence add() method returns true, because condition in add() method evaluates to true (i.e. null==null)
If element is already present in the set, put() method will return old value i.e PRESENT which is not null. Hence condition in add() method evaluates to false (i.e. PRESENT == null)

Remove() method

remove(Object o) method returns true, if the set contained the specified element.

It internally uses remove() method of HashMap which returns the value of removed key i.e PRESENT in HashSet. If no key-value pair is found in the map, then it returns null.


public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

Here is the internal implementation of other commonly used methods of HashSet. For detailed analysis, please refer HashSet implementation code at Grep Code


public void clear() {
	map.clear();
}

public boolean isEmpty() {
	return map.isEmpty();
}

public boolean contains(Object o) {
	return map.containsKey(o);
}

public Iterator<E> iterator() {
	return map.keySet().iterator();
}

Further Reading : You can read our article on possible Performance issue with HashSet removeAll() method

Summary and Other Important Points

HashSet doesn't allow duplicate elements.
It allows at most one null element.
It internally uses HashMap as backing data structure and stores private static final Object class instance PRESENT as a value of all the keys. As we are storing same static field as value to all the keys, we can neglect additional space being used by this value field.
HashSet's add() method uses HashMap's put() method internally.
As it is backed by HashMap, we must override hashCode() and equals() method while creating HashSet of custom class objects. Refer our article How does HashMap work internally in Java? to know more about HashMap internal implementation.
HashSet is not thread-safe. Here are some of the thread-safe implementation of Set:
–CopyOnWriteArraySet
–Collections.synchronizedSet(Set set)
–ConcurrentSkipListSet
–Collections.newSetFromMap(new ConcurrentHashMap())
HashSet doesn't maintain insertion order. To maintain insertion or access order, we need to use LinkedHashSet.
HashSet also doesn't store elements in sorted order. Instead we should use TreeSet to store elements in sorted order.
Similar to HashSet, LinkedHashSet has also been implemented using LinkedHashMap and TreeSet is implemented using TreeMap. And all these data structure uses private static final Object class instance PRESENT as a value of all their keys.

That's all for this topic. If you guys have any suggestions or queries, feel free to drop a comment. We would be happy to add that in our post. You can also contribute your articles by creating contributor account here.

Happy Learning 🙂

If you like the content on CodePumpkin and if you wish to do something for the community and the planet Earth, you can donate to our campaign for planting more trees at CodePumpkin Cauvery Calling Campaign.

We may not get time to plant a tree, but we can definitely donate ₹42 per Tree.

About the Author

abhi andhariya
Visit Full Profile

Surviving Java Developer, Passionate Blogger, Table Tennis Lover, Bookworm, Occasional illustrator and a big fan of Joey Tribbiani, The Walking Dead and Game of Thrones...!!

Tags: Collection Framework, Collections, Core Java, HashMap, Java

Comments and Queries

If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:

<pre><code class="java"> 
String foo = "bar";
</code></pre>

For more information on supported HTML tags in disqus comment, click here.

Code Pumpkin

How is HashSet implemented internally in Java?

HashMap as a Backing DataStructure

add() method

Remove() method

Summary and Other Important Points

About the Author

abhi andhariya
Visit Full Profile

Comments and Queries

Categories

Interview Questions

Interview Experiences

Recent Posts

Like Us On Facebook

How is HashSet implemented internally in Java?

HashMap as a Backing DataStructure

add() method

Remove() method

Summary and Other Important Points

About the Author

abhi andhariya Visit Full Profile

Comments and Queries

Categories

Tag Cloud

Interview Questions

Popular Posts

Interview Experiences

Recent Posts

Like Us On Facebook

abhi andhariya
Visit Full Profile