Code Pumpkin

How is HashSet implemented internally in Java?

September 27, 2017
Posted by Abhi Andhariya

In our previous article, we have seen internal implementation of SynchronizedMap and the difference between HashMap, Hashtable, SynchronizedMap and ConcurrentMap.  In this article, we will understand how does HashSet work internally? How JDK implementers have reused HashMap effectively for HashSet implementation.  

The best way to understand internal implementation of any built-in Java class is by looking into its source code. You can either attach and import source code in your IDE or you can refer GrepCode. I personally prefer GrepCode as it also maintains code for all the releases of jdk and we can easily peep through the evolution of any java class.

Before going into internal implementation HashSet, we should know below two basic properties of HashSet.

  1. HashSet doesn't allow duplicate elements.
  2. It allows at most one null element.

Both of these properties can be achieved using HashMap as HashMap doesn't allow duplicate keys and it also allows only one null key.


HashMap as a Backing DataStructure

HashSet internally uses HashMap as a backing data structure with key as generic type E and value as Object class type. Have a look at below code snippet of HashSet class from jdk 1.6


public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<E,Object>();
    }

   // Other constructors and methods
}

In HashMap, we can have same object/instance as values of multiple keys. In other words, duplicate values are allowed in HashMap.

This property of HashMap has been utilized very efficiently in HashSet implementation. If you notice in above code, there is one static field PRESENT of type Object. Whenever we add any element to HashSet, it puts element into backing HashMap with key as our element and value as static field PRESENT.

For Example,


import java.util.HashSet;

public class HashSetInternalDemo{
    public static void main(String[] a){
        HashSet<String> friendList = new HashSet<>();
        friendList.add("Malay");
        friendList.add("Mahesh");
        friendList.add("Jay");
        friendList.add("Ankit");
        friendList.add("Nirav");
        friendList.add("Dhaval");
    }
}

In above java program, I have created HashSet < String > and added six different Strings in it. Internally it creates HashMap < String, Object> and stores elements as shown below. 

HashSet Internal Implementation


add() method

It adds the specified element to HashSet if it is not already present and returns true. If HashSet already contains the element, the call leaves the set unchanged and returns false.

add() method internally uses put() method of HashMap. Here is the code snippet of add() method of HashSet :


public boolean add(E e) {
        return map.put(e, PRESENT)==null;
}

put() method returns the previous value associated with key, or null if there was no mapping for key.

  • So, if element is not already present in the set, put() method will return null . Hence add() method returns true, because condition in add() method evaluates to true (i.e. null==null)
  • If element is already present in the set, put() method will return old value i.e PRESENT which is not null. Hence condition in add() method evaluates to false (i.e. PRESENT == null)

Remove() method

remove(Object o) method returns true, if the set contained the specified element.

It internally uses remove() method of HashMap which returns the value of removed key i.e PRESENT in HashSet. If no key-value pair is found in the map, then it returns null.


public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }

Here is the internal implementation of other commonly used methods of HashSet. For detailed analysis, please refer HashSet implementation code at Grep Code


public void clear() {
	map.clear();
}

public boolean isEmpty() {
	return map.isEmpty();
}

public boolean contains(Object o) {
	return map.containsKey(o);
}

public Iterator<E> iterator() {
	return map.keySet().iterator();
}

Further Reading : You can read our article on possible Performance issue with HashSet removeAll() method


Summary and Other Important Points

  1. HashSet doesn't allow duplicate elements.
  2. It allows at most one null element.
  3. It internally uses HashMap as backing data structure and stores private static final Object class instance PRESENT as a value of all the keys. As we are storing same static field as value to all the keys, we can neglect additional space being used by this value field.
  4. HashSet's add() method uses HashMap's put() method internally.
  5. As it is backed by HashMap, we must override hashCode()  and  equals() method while creating HashSet of custom class objects. Refer our article How does HashMap work internally in Java? to know more about HashMap internal implementation.
  6. HashSet is not thread-safe. Here are some of the thread-safe implementation of Set:

    CopyOnWriteArraySet
    Collections.synchronizedSet(Set set)
    ConcurrentSkipListSet
    ​Collections.newSetFromMap(new ConcurrentHashMap())

  7. HashSet doesn't maintain insertion order. To maintain insertion or access order, we need to use LinkedHashSet.
  8. HashSet also doesn't store elements in sorted order. Instead we should use TreeSet to store elements in sorted order.
  9. Similar to HashSet, LinkedHashSet has also been implemented using LinkedHashMap and TreeSet is implemented using TreeMap. And all these data structure uses private static final Object class instance PRESENT as a value of all their keys.

That's all for this topic. If you guys have any suggestions or queries, feel free to drop a comment. We would be happy to add that in our post. You can also contribute your articles by creating contributor account here.

Happy Learning 🙂

If you like the content on CodePumpkin and if you wish to do something for the community and the planet Earth, you can donate to our campaign for planting more trees at CodePumpkin Cauvery Calling Campaign.

We may not get time to plant a tree, but we can definitely donate ₹42 per Tree.



About the Author


Surviving Java Developer, Passionate Blogger, Table Tennis Lover, Bookworm, Occasional illustrator and a big fan of Joey Tribbiani, The Walking Dead and Game of Thrones...!!



Tags: , , , ,


Comments and Queries

If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code class="java"> 
String foo = "bar";
</code></pre>
For more information on supported HTML tags in disqus comment, click here.
Total Posts : 124
follow us in feedly

Like Us On Facebook