How is HashSet implemented internally in Java?
In our previous article, we have seen internal implementation of SynchronizedMap
and the difference between HashMap, Hashtable, SynchronizedMap and ConcurrentMap. In this article, we will understand how does HashSet
work internally? How JDK implementers have reused HashMap
effectively for HashSet
implementation.
The best way to understand internal implementation of any built-in Java class is by looking into its source code. You can either attach and import source code in your IDE or you can refer GrepCode. I personally prefer GrepCode as it also maintains code for all the releases of jdk and we can easily peep through the evolution of any java class.
Before going into internal implementation HashSet
, we should know below two basic properties of HashSet
.
-
HashSet
doesn't allow duplicate elements. - It allows at most one null element.
Both of these properties can be achieved using HashMap
as HashMap
doesn't allow duplicate keys and it also allows only one null
key.
HashMap as a Backing DataStructure
HashSet
internally uses HashMap
as a backing data structure with key as generic type E and value as Object
class type. Have a look at below code snippet of HashSet
class from jdk 1.6
public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<E,Object>(); } // Other constructors and methods }
In HashMap
, we can have same object/instance as values of multiple keys. In other words, duplicate values are allowed in HashMap
.
This property of HashMap
has been utilized very efficiently in HashSet
implementation. If you notice in above code, there is one static field PRESENT of type Object
. Whenever we add any element to HashSet
, it puts element into backing HashMap
with key as our element and value as static field PRESENT.
For Example,
import java.util.HashSet; public class HashSetInternalDemo{ public static void main(String[] a){ HashSet<String> friendList = new HashSet<>(); friendList.add("Malay"); friendList.add("Mahesh"); friendList.add("Jay"); friendList.add("Ankit"); friendList.add("Nirav"); friendList.add("Dhaval"); } }
In above java program, I have created HashSet < String >
and added six different Strings in it. Internally it creates HashMap < String, Object>
and stores elements as shown below.
add() method
It adds the specified element to HashSet
if it is not already present and returns true
. If HashSet
already contains the element, the call leaves the set unchanged and returns false
.
add()
method internally uses put() method of HashMap
. Here is the code snippet of add()
method of HashSet
:
public boolean add(E e) { return map.put(e, PRESENT)==null; }
put()
method returns the previous value associated with key, or null
if there was no mapping for key.
-
So, if element is not already present in the set,
put()
method will returnnull
. Henceadd()
method returnstrue
, because condition inadd()
method evaluates to true (i.e.null==null
) -
If element is already present in the set,
put()
method will return old value i.e PRESENT which is notnull
. Hence condition inadd()
method evaluates tofalse
(i.e.PRESENT == null
)
Remove() method
remove(Object o)
method returns true
, if the set contained the specified element.
It internally uses remove()
method of HashMap which returns the value of removed key i.e PRESENT
in HashSet
. If no key-value pair is found in the map, then it returns null
.
public boolean remove(Object o) { return map.remove(o)==PRESENT; }
Here is the internal implementation of other commonly used methods of HashSet
. For detailed analysis, please refer HashSet implementation code at Grep Code
public void clear() { map.clear(); } public boolean isEmpty() { return map.isEmpty(); } public boolean contains(Object o) { return map.containsKey(o); } public Iterator<E> iterator() { return map.keySet().iterator(); }
Further Reading : You can read our article on possible Performance issue with HashSet removeAll() method
Summary and Other Important Points
-
HashSet
doesn't allow duplicate elements. - It allows at most one null element.
-
It internally uses
HashMap
as backing data structure and storesprivate static final Object
class instancePRESENT
as a value of all the keys. As we are storing same static field as value to all the keys, we can neglect additional space being used by this value field. -
HashSet
'sadd()
method usesHashMap
's put() method internally. -
As it is backed by
HashMap
, we must override hashCode() and equals() method while creatingHashSet
of custom class objects. Refer our article How does HashMap work internally in Java? to know more aboutHashMap
internal implementation. -
HashSet
is not thread-safe. Here are some of the thread-safe implementation of Set:–
CopyOnWriteArraySet
–Collections.synchronizedSet(Set set)
–ConcurrentSkipListSet
–Collections.newSetFromMap(new ConcurrentHashMap())
-
HashSet
doesn't maintain insertion order. To maintain insertion or access order, we need to useLinkedHashSet
. -
HashSet
also doesn't store elements in sorted order. Instead we should useTreeSet
to store elements in sorted order. -
Similar to
HashSet
,LinkedHashSet
has also been implemented usingLinkedHashMap
andTreeSet
is implemented usingTreeMap
. And all these data structure usesprivate static final Object
class instancePRESENT
as a value of all their keys.
That's all for this topic. If you guys have any suggestions or queries, feel free to drop a comment. We would be happy to add that in our post. You can also contribute your articles by creating contributor account here.
Happy Learning 🙂
If you like the content on CodePumpkin and if you wish to do something for the community and the planet Earth, you can donate to our campaign for planting more trees at CodePumpkin Cauvery Calling Campaign.
We may not get time to plant a tree, but we can definitely donate ₹42 per Tree.
About the Author
Tags: Collection Framework, Collections, Core Java, HashMap, Java
Comments and Queries
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:<pre><code class="java"> String foo = "bar"; </code></pre>For more information on supported HTML tags in disqus comment, click here.