How is HashSet implemented internally in Java?
In our previous article, we have seen internal implementation of SynchronizedMap and the difference between HashMap, Hashtable, SynchronizedMap and ConcurrentMap. In this article, we will understand how does HashSet work internally? How JDK implementers have reused HashMap effectively for HashSet implementation.
The best way to understand internal implementation of any built-in Java class is by looking into its source code. You can either attach and import source code in your IDE or you can refer GrepCode. I personally prefer GrepCode as it also maintains code for all the releases of jdk and we can easily peep through the evolution of any java class.
Before going into internal implementation HashSet, we should know below two basic properties of HashSet.
-
HashSetdoesn't allow duplicate elements. - It allows at most one null element.
Both of these properties can be achieved using HashMap as HashMap doesn't allow duplicate keys and it also allows only one null key.
HashMap as a Backing DataStructure
HashSet internally uses HashMap as a backing data structure with key as generic type E and value as Object class type. Have a look at below code snippet of HashSet class from jdk 1.6
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
static final long serialVersionUID = -5024744406713321676L;
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<E,Object>();
}
// Other constructors and methods
}
In HashMap, we can have same object/instance as values of multiple keys. In other words, duplicate values are allowed in HashMap.
This property of HashMap has been utilized very efficiently in HashSet implementation. If you notice in above code, there is one static field PRESENT of type Object. Whenever we add any element to HashSet, it puts element into backing HashMap with key as our element and value as static field PRESENT.
For Example,
import java.util.HashSet;
public class HashSetInternalDemo{
public static void main(String[] a){
HashSet<String> friendList = new HashSet<>();
friendList.add("Malay");
friendList.add("Mahesh");
friendList.add("Jay");
friendList.add("Ankit");
friendList.add("Nirav");
friendList.add("Dhaval");
}
}
In above java program, I have created HashSet < String > and added six different Strings in it. Internally it creates HashMap < String, Object> and stores elements as shown below.

add() method
It adds the specified element to HashSet if it is not already present and returns true. If HashSet already contains the element, the call leaves the set unchanged and returns false.
add() method internally uses put() method of HashMap. Here is the code snippet of add() method of HashSet :
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
put() method returns the previous value associated with key, or null if there was no mapping for key.
-
So, if element is not already present in the set,
put()method will returnnull. Henceadd()method returnstrue, because condition inadd()method evaluates to true (i.e.null==null) -
If element is already present in the set,
put()method will return old value i.e PRESENT which is notnull. Hence condition inadd()method evaluates tofalse(i.e.PRESENT == null)
Remove() method
remove(Object o) method returns true, if the set contained the specified element.
It internally uses remove() method of HashMap which returns the value of removed key i.e PRESENT in HashSet. If no key-value pair is found in the map, then it returns null.
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
Here is the internal implementation of other commonly used methods of HashSet. For detailed analysis, please refer HashSet implementation code at Grep Code
public void clear() {
map.clear();
}
public boolean isEmpty() {
return map.isEmpty();
}
public boolean contains(Object o) {
return map.containsKey(o);
}
public Iterator<E> iterator() {
return map.keySet().iterator();
}Further Reading : You can read our article on possible Performance issue with HashSet removeAll() method
Summary and Other Important Points
-
HashSetdoesn't allow duplicate elements. - It allows at most one null element.
-
It internally uses
HashMapas backing data structure and storesprivate static final Objectclass instancePRESENTas a value of all the keys. As we are storing same static field as value to all the keys, we can neglect additional space being used by this value field. -
HashSet'sadd()method usesHashMap's put() method internally. -
As it is backed by
HashMap, we must override hashCode() and equals() method while creatingHashSetof custom class objects. Refer our article How does HashMap work internally in Java? to know more aboutHashMapinternal implementation. -
HashSetis not thread-safe. Here are some of the thread-safe implementation of Set:–
CopyOnWriteArraySet
–Collections.synchronizedSet(Set set)
–ConcurrentSkipListSet
–Collections.newSetFromMap(new ConcurrentHashMap()) -
HashSetdoesn't maintain insertion order. To maintain insertion or access order, we need to useLinkedHashSet. -
HashSetalso doesn't store elements in sorted order. Instead we should useTreeSetto store elements in sorted order. -
Similar to
HashSet,LinkedHashSethas also been implemented usingLinkedHashMapandTreeSetis implemented usingTreeMap. And all these data structure usesprivate static final Objectclass instancePRESENTas a value of all their keys.
That's all for this topic. If you guys have any suggestions or queries, feel free to drop a comment. We would be happy to add that in our post. You can also contribute your articles by creating contributor account here.
Happy Learning 🙂
If you like the content on CodePumpkin and if you wish to do something for the community and the planet Earth, you can donate to our campaign for planting more trees at CodePumpkin Cauvery Calling Campaign.
We may not get time to plant a tree, but we can definitely donate ₹42 per Tree.
About the Author
Tags: Collection Framework, Collections, Core Java, HashMap, Java
Comments and Queries
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:<pre><code class="java"> String foo = "bar"; </code></pre>For more information on supported HTML tags in disqus comment, click here.