Skip to content

Latest commit

 

History

History
237 lines (173 loc) · 7.22 KB

File metadata and controls

237 lines (173 loc) · 7.22 KB

Set

A set is a data structure where duplicated entries are not allowed. Set is like an array with unique values.

Note
JavaScript has already a built-in Set data structure.

Take a look at the following example:

Set usage example (using JavaScript built-in Set)
const set = new Set();

set.add(1); //↪️ Set [ 1 ]
set.add(1); //↪️ Set [ 1 ]
set.add(2); //↪️ Set [ 1, 2 ]
set.add(3); //↪️ Set [ 1, 2, 3 ]
set.has(1); //↪️ true
set.delete(1); //↪️ removes 1 from the set
set.has(1);    //↪️ false, 1 has been removed
set.size; //↪️ 2, we just removed one value
console.log(set); //↪️ Set(2) {2, 3}

As you can see, even if we insert the same value multiple times, it only gets added once.

Can you think in a way how to implement it?

Tip
A hint…​ it should perform all operations in O(1)* or at most O(log n)

If we use a map, we can accomplish this. However, maps use a key/value pair. If we only use the keys, we can avoid duplicates. Since in a map you can only have one key at a time.

As you might remember from the part03-graph-data-structures.asc chapter, there are two ways of implementing a map and both can be used to create a set. Let’s explore the difference between the two implementations are.

HashSet vs TreeSet

We can implement a map using a balanced BST and using a hash function. If we use them to implement a Set, then we would have a HashSet and TreeSet respectively.

  • TreeSet, would return the values sorted in ascending order.

  • HashSet, would return the values in insertion order.

  • Operations on a HashSet would take on average O(1) and in the worst case (rehash is due), it would take O(n).

  • Operation on a TreeSet is always O(log n).

Let’s implement both!

TreeSet

We are to use a self-balanced BST (Red-Black Tree) to implement TreeSet.

TreeSet’s constructor method and size attribute
link:../../../src/data-structures/sets/tree-set.js[role=include]
}
  1. Converts an array or any iterable data structure to a set.

A common use case for Sets is to remove duplicated values from an array. We can do that by passing them in the constructor as follows:

Removing duplicates from an Array using a Set
set = new TreeSet([1, 2, 3, 2, 1]);
expect(set.size).toBe(3);
expect(Array.from(set.keys())).toEqual([1, 2, 3]);

Ok, now let’s implement the add method.

Adding elements to a TreeSet

For adding values to the set, we Tree.add method.

TreeSet’s constructor method and size attribute
link:../../../src/data-structures/sets/tree-set.js[role=include]

Our BST implementation can hold duplicated values. It has a multiplicity tally to keep track of duplicates. However, we don’t dupe in a set. For that, we check if the value is already in the tree. Don’t worry about adding extra lookups. The Tree.has is also very performant O(log n).

Searching for values in a TreeSet

Again, we rely on the Tree implementation to do the heavy lifting:

TreeSet’s has method
link:../../../src/data-structures/sets/tree-set.js[role=include]
Deleting elements from a TreeSet

We delete the elements from the TreeSet using the remove method of the BST.

TreeSet’s delete method
link:../../../src/data-structures/sets/tree-set.js[role=include]

Voilà! That’s it!

Converting TreeSet to Array

A common use case for a Set is to convert it to an array or use in an iterator (for loops, forEach, …). Let’s provide the method for that:

TreeSet’s iterator
link:../../../src/data-structures/sets/tree-set.js[role=include]

We are using the inOrderTraversal method of the BST to go each key in an ascending order.

JavaScript Built-in Symbol iterator

The Symbol.iterator built-in symbol specifies the default iterator for an object. Used by for…​of, Array.from and others.

Now we can convert from set to array and vice versa easily. For instance:

TreeSet’s iterator
const array = [1, 1, 2, 3, 5];

// array to set
const set = new TreeSet(array);

// set to array
Array.from(set); //↪️ (4) [1, 2, 3, 5]

No more duplicates in our array!

Let’s now, implement a HashSet.

HashSet

The HashSet is the set implementation using a HashMap as its underlying data structure.

The HashSet interface will be the same as the built-in Set or our previously implemented TreeSet.

HashSet’s constructor method and size attribute
link:../../../src/data-structures/sets/hash-set.js[role=include]
}

This constructor is useful for converting an array to set and initializing the HashMap.

Inserting values to a HashSet

To insert items in a HashSet we use the set method of the HashMap:

HashSet’s add method
link:../../../src/data-structures/sets/hash-set.js[role=include]
}

HashMap stores key/value pairs, but for this, we only need the key, and we ignore the value.

Finding values in a HashSet

We use the method has to check if a value is on the Set or not.

HashSet’s has method
link:../../../src/data-structures/sets/hash-set.js[role=include]

Internally, the HashMap will convert the key into an array index using a hash function. If there’s something in the array index bucket, it will return true, and if it’s empty, it will be false.

Deleting values from a HashSet

For deleting a value from a hashSet we use the HashMap’s delete method:

HashSet’s delete method
link:../../../src/data-structures/sets/hash-set.js[role=include]

This method has an average runtime of O(1).

HashSet vs HashMap Time Complexity

We can say that HashMap in on average more performant O(1) vs. O(log n). However, if a rehash happens, it will take O(n) instead of O(1). A TreeSet is always O(log n).

Table 1. Time complexity HashSet vs TreeSet

Data Structure

Searching By

Insert

Delete

Space Complexity

Index/Key

Value

HashSet

-

O(n)

O(1)*

O(1)*

O(1)*

TreeSet

-

O(n)

O(log n)

O(log n)

O(log n)

* = Amortized run time. E.g. rehashing might affect run time to O(n).

To recap, HashSet and TreeSet will keep data without duplicates. The difference besides runtime is that:

TreeSet vs HashSet
  • HashSet keeps data in insertion order

  • TreeSet keeps data sorted in ascending order.