A set is a data structure where duplicated entries are not allowed. Set is like an array with unique values.
Note
|
JavaScript has already a built-in Set data structure. |
Take a look at the following example:
const set = new Set();
set.add(1); //↪️ Set [ 1 ]
set.add(1); //↪️ Set [ 1 ]
set.add(2); //↪️ Set [ 1, 2 ]
set.add(3); //↪️ Set [ 1, 2, 3 ]
set.has(1); //↪️ true
set.delete(1); //↪️ removes 1 from the set
set.has(1); //↪️ false, 1 has been removed
set.size; //↪️ 2, we just removed one value
console.log(set); //↪️ Set(2) {2, 3}
As you can see, even if we insert the same value multiple times, it only gets added once.
Can you think in a way how to implement it?
Tip
|
A hint… it should perform all operations in O(1)* or at most O(log n) |
If we use a map
, we can accomplish this. However, maps use a key/value pair. If we only use the keys, we can avoid duplicates. Since in a map
you can only have one key at a time.
As you might remember from the part03-graph-data-structures.asc chapter, there are two ways of implementing a map
and both can be used to create a set
. Let’s explore the difference between the two implementations are.
We can implement a map
using a balanced BST and using a hash function. If we use them to implement a Set
, then we would have a HashSet
and TreeSet
respectively.
-
TreeSet
, would return the values sorted in ascending order. -
HashSet
, would return the values in insertion order. -
Operations on a
HashSet
would take on average O(1) and in the worst case (rehash is due), it would take O(n). -
Operation on a
TreeSet
is always O(log n).
Let’s implement both!
We are to use a self-balanced BST (Red-Black Tree) to implement TreeSet.
link:../../../src/data-structures/sets/tree-set.js[role=include]
}
-
Converts an array or any iterable data structure to a set.
A common use case for Sets is to remove duplicated values from an array. We can do that by passing them in the constructor as follows:
set = new TreeSet([1, 2, 3, 2, 1]);
expect(set.size).toBe(3);
expect(Array.from(set.keys())).toEqual([1, 2, 3]);
Ok, now let’s implement the add method.
For adding values to the set, we Tree.add
method.
link:../../../src/data-structures/sets/tree-set.js[role=include]
Our BST implementation can hold duplicated values. It has a multiplicity tally to keep track of duplicates. However, we don’t dupe in a set. For that, we check if the value is already in the tree.
Don’t worry about adding extra lookups. The
Tree.has
is also very performant O(log n).
Again, we rely on the Tree implementation to do the heavy lifting:
has
methodlink:../../../src/data-structures/sets/tree-set.js[role=include]
We delete the elements from the TreeSet using the remove method of the BST.
delete
methodlink:../../../src/data-structures/sets/tree-set.js[role=include]
Voilà! That’s it!
A common use case for a Set is to convert it to an array or use in an iterator (for loops, forEach, …). Let’s provide the method for that:
link:../../../src/data-structures/sets/tree-set.js[role=include]
We are using the inOrderTraversal
method of the BST to go each key in an
ascending order.
Symbol
iteratorThe Symbol.iterator
built-in symbol specifies the default iterator for
an object. Used by for…of
, Array.from
and others.
Now we can convert from set to array and vice versa easily. For instance:
const array = [1, 1, 2, 3, 5];
// array to set
const set = new TreeSet(array);
// set to array
Array.from(set); //↪️ (4) [1, 2, 3, 5]
No more duplicates in our array!
Check out our GitHub repo for the full TreeSet implementation.
Let’s now, implement a HashSet
.
The HashSet is the set implementation using a HashMap as its underlying data structure.
The HashSet interface will be the same as the built-in Set
or our previously implemented TreeSet
.
link:../../../src/data-structures/sets/hash-set.js[role=include]
}
This constructor is useful for converting an array to set and initializing the HashMap
.
To insert items in a HashSet we use the set
method of the HashMap
:
add
methodlink:../../../src/data-structures/sets/hash-set.js[role=include]
}
HashMap
stores key/value pairs, but for this, we only need the key, and we ignore the value.
We use the method has
to check if a value is on the Set
or not.
has
methodlink:../../../src/data-structures/sets/hash-set.js[role=include]
Internally, the HashMap
will convert the key into an array index using a hash function. If there’s something in the array index bucket, it will return
true, and if it’s empty, it will be false.
We can say that HashMap
in on average more performant O(1) vs. O(log n). However, if a
rehash happens, it will take O(n) instead of O(1). A TreeSet
is always O(log n).
Data Structure |
Searching By |
Insert |
Delete |
Space Complexity |
|
Index/Key |
Value |
||||
HashSet |
- |
O(n) |
O(1)* |
O(1)* |
O(1)* |
TreeSet |
- |
O(n) |
O(log n) |
O(log n) |
O(log n) |
* = Amortized run time. E.g. rehashing might affect run time to O(n).
To recap, HashSet and TreeSet will keep data without duplicates. The difference besides runtime is that:
-
HashSet keeps data in insertion order
-
TreeSet keeps data sorted in ascending order.