Create Performance2 homework

AdvancedScientificComputingInJuliaWashU · Mar 22, 2023 · 266347c · 266347c
1 parent 4e603e6
commit 266347c
Show file tree

Hide file tree

Showing 7 changed files with 114 additions and 3 deletions.
diff --git a/Project.toml b/Project.toml
@@ -3,7 +3,11 @@ uuid = "40bff39a-32dd-4119-8e13-99e0b6e9a6b9"
 authors = ["Tim Holy <[email protected]> and contributors"]
 version = "1.0.0-DEV"
 
+[deps]
+BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
+
 [compat]
+BenchmarkTools = "1"
 julia = "1"
 
 [extras]

diff --git a/README.md b/README.md
@@ -1,3 +1,25 @@
-# Performance2
+# Performance (part 2)
 
 [![Build Status](https://github.com/AdvancedScientificComputingInJuliaWashU/Performance2.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/AdvancedScientificComputingInJuliaWashU/Performance2.jl/actions/workflows/CI.yml?query=branch%3Amain)
+
+This assignment helps you explore concepts from the analysis of algorithms, specifically the "big-O" notation and its use in evaluating implementations.
+
+Steps:
+
+1. Fill out the (minimal) starter code in `src/`. Make sure the (minimal) tests pass on your implementations.
+
+2. Answer the following question: why do the tests for `unique_set` use `sort`, but the test for `unique_2loops` do not? If you get stuck, see `?Set`.
+
+3. Create the benchmarks in `benchmarks/run_benchmarks.jl`. The end result should be a few global variables that store information about the sizes of the lists and the runtime needed to execute the algorithm(s).
+
+4. Pick a plotting package (your choice): popular options include [Makie](https://github.com/MakieOrg/Makie.jl), [Plots](https://github.com/JuliaPlots/Plots.jl), and [PyPlot](https://github.com/JuliaPy/PyPlot.jl). Makie is not recommended for this assignment unless you're running at least Julia 1.9. (Makie is the most sophisticated and a good investment for anyone considering Julia for the long term, but it currently has long load and precompilation times; Plots or PyPlot are leaner options. Any of these should be more than sufficient for this assignment.)
+
+5. Plot your results, filling out a plotting script in `plot_benchmarks.jl`.
+
+6. In big-O notation, what order is `unique_2loops` when there are only 10 unique items? What is its order when the number of unique items is half the length of the list? For the same two cases, what order is `unique_set`?
+
+7. To get a glimpse of how `Set`s work, fill out the exercise in `examples/hashtable.jl`. Then fill out the number of "hash collisions" (slots in the table that got more than one item inserted in them) for tables of the following sizes:
+    - 16:
+    - 32:
+    - 64:
+    - 128:
diff --git a/benchmarks/plot_benchmarks.jl b/benchmarks/plot_benchmarks.jl
@@ -0,0 +1,3 @@
+# Write a plotting script to display the results from `run_benchmarks.jl`.
+# The length of the input list should be on the x-axis, and the y-axis should show the runtime.
+# You can use linear or log-log scales.
diff --git a/benchmarks/run_benchmarks.jl b/benchmarks/run_benchmarks.jl
@@ -0,0 +1,20 @@
+using Performance2
+using BenchmarkTools
+
+# Create lists of different sizes. For each list size, you'll evaluate two cases:
+#  - one where there are at most 10 distinct items (e.g., `list = rand(1:10, n)`)
+#  - one where the number of distinct items is half the list length (`list = rand(1:n÷2, n)`)
+# Then run both `unique_2loops` and `unique_set` on both lists, measuring the elapsed time with `@belapsed`
+# (from BenchmarkTools.jl). Put the list creation in a `setup` block; you can see a demo in the README for BenchmarkTools.
+# Hints:
+# - inside a comprehension I would typically use the syntax `@belapsed(<code goes here>, setup=(<setup code>))`
+#   to avoid any ambiguity about where the expressions passed to the macro `@belapsed` start and stop.
+# - you may have to use `$n` to bring an outside variable into the expression passed to `@belapsed`
+
+list_sizes = [10, 10^2, 10^3, 10^4, 10^5]
+
+time_set10 = ...
+time_sethalf = ...
+
+time_2loops10 = ...
+time_2loopshalf = ...
diff --git a/examples/hashtable.jl b/examples/hashtable.jl
@@ -0,0 +1,10 @@
+using Performance2
+
+# Read the source code for `create_hashtable` and `push_hashtable!` in `src/Performance2.jl`.
+# (You can read `hash2index` too, although you won't call it directly.)
+
+# For the letters of the alphabet, `'a':'z'`, push each once onto a hashtable that you create
+# with `create_hashtable`. Do this four times, for hash tables of size 16, 32, 64, and 128.
+# Be sure you look at the result. Then fill in the table in the README with the number of
+# collisions you observe for each table size.
+
diff --git a/src/Performance2.jl b/src/Performance2.jl
@@ -1,5 +1,53 @@
 module Performance2
 
-# Write your package code here.
+export unique_2loops, unique_set
+export create_hashtable, push_hashtable!
+
+function unique_2loops(list::AbstractVector)
+    # Implement your own version of `unique`, a function that will return a new vector containing
+    # each unique item in `list` (i.e., it drops duplicates).
+    newlist = eltype(list)[]
+    # Here, use two nested `for` loops to `push!` items onto `newlist` only if they do not appear
+    # earlier in the list
+
+    return newlist
+end
+
+function unique_set(list::AbstractVector)
+    # Implement a second version of `unique`.
+    # This time, insert all the items from `list` into a `Set` and then collect(set)
+end
+
+###
+### Below here, you don't need to modify any code. But you should read & understand it!
+### This is a very simple (and incorrect) "hash table" intended a taste for how `Set`s work;
+### it is in no way intended as a serious implementation.
+###
+
+"""
+    create_hashtable(::Type{T}, tablesize) → ht
+
+Create an empty "hash table," here just a list of length `tablesize`, each element itself a list that can store all items
+with the same index as computed by [`hash2index`](@ref).
+"""
+create_hashtable(::Type{T}, tablesize::Int) where T = [T[] for _ = 1:tablesize]
+
+"""
+    hash2index(x, tablesize) → idx
+
+Compute the proper index `idx` for `x`, given a hash table of length `tablesize`.
+"""
+hash2index(x, tablesize::Int) = (hash(x) % tablesize) + 1
+
+"""
+    push_hashtable!(table, x)
+
+Append `x` to the sub-list of items with index `idx` as computed by [`hash2index`](@ref).
+
+!!! warning
+    There is no protection against duplicates. The intended usage is to add each unique item
+    and see how many "collisions" you have.
+"""
+push_hashtable!(table, x) = push!(table[hash2index(x, length(table))], x)
 
 end
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -2,5 +2,9 @@ using Performance2
 using Test
 
 @testset "Performance2.jl" begin
-    # Write your tests here.
+    @test unique_2loops([1, 2, 2, 2, 3, 2]) == [1, 2, 3]
+    @test unique_2loops(collect("abracadabra")) == ['a', 'b', 'r', 'c', 'd']
+
+    @test sort(unique_set([1, 2, 2, 2, 3, 2])) == [1, 2, 3]
+    @test sort(unique_set(collect("abracadabra"))) == ['a', 'b', 'c', 'd', 'r']
 end