Skip to content

Commit

Permalink
Create Performance2 homework
Browse files Browse the repository at this point in the history
  • Loading branch information
timholy committed Mar 22, 2023
1 parent 4e603e6 commit 266347c
Show file tree
Hide file tree
Showing 7 changed files with 114 additions and 3 deletions.
4 changes: 4 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@ uuid = "40bff39a-32dd-4119-8e13-99e0b6e9a6b9"
authors = ["Tim Holy <[email protected]> and contributors"]
version = "1.0.0-DEV"

[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"

[compat]
BenchmarkTools = "1"
julia = "1"

[extras]
Expand Down
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
# Performance2
# Performance (part 2)

[![Build Status](https://github.com/AdvancedScientificComputingInJuliaWashU/Performance2.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/AdvancedScientificComputingInJuliaWashU/Performance2.jl/actions/workflows/CI.yml?query=branch%3Amain)

This assignment helps you explore concepts from the analysis of algorithms, specifically the "big-O" notation and its use in evaluating implementations.

Steps:

1. Fill out the (minimal) starter code in `src/`. Make sure the (minimal) tests pass on your implementations.

2. Answer the following question: why do the tests for `unique_set` use `sort`, but the test for `unique_2loops` do not? If you get stuck, see `?Set`.

3. Create the benchmarks in `benchmarks/run_benchmarks.jl`. The end result should be a few global variables that store information about the sizes of the lists and the runtime needed to execute the algorithm(s).

4. Pick a plotting package (your choice): popular options include [Makie](https://github.com/MakieOrg/Makie.jl), [Plots](https://github.com/JuliaPlots/Plots.jl), and [PyPlot](https://github.com/JuliaPy/PyPlot.jl). Makie is not recommended for this assignment unless you're running at least Julia 1.9. (Makie is the most sophisticated and a good investment for anyone considering Julia for the long term, but it currently has long load and precompilation times; Plots or PyPlot are leaner options. Any of these should be more than sufficient for this assignment.)

5. Plot your results, filling out a plotting script in `plot_benchmarks.jl`.

6. In big-O notation, what order is `unique_2loops` when there are only 10 unique items? What is its order when the number of unique items is half the length of the list? For the same two cases, what order is `unique_set`?

7. To get a glimpse of how `Set`s work, fill out the exercise in `examples/hashtable.jl`. Then fill out the number of "hash collisions" (slots in the table that got more than one item inserted in them) for tables of the following sizes:
- 16:
- 32:
- 64:
- 128:
3 changes: 3 additions & 0 deletions benchmarks/plot_benchmarks.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Write a plotting script to display the results from `run_benchmarks.jl`.
# The length of the input list should be on the x-axis, and the y-axis should show the runtime.
# You can use linear or log-log scales.
20 changes: 20 additions & 0 deletions benchmarks/run_benchmarks.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
using Performance2
using BenchmarkTools

# Create lists of different sizes. For each list size, you'll evaluate two cases:
# - one where there are at most 10 distinct items (e.g., `list = rand(1:10, n)`)
# - one where the number of distinct items is half the list length (`list = rand(1:n÷2, n)`)
# Then run both `unique_2loops` and `unique_set` on both lists, measuring the elapsed time with `@belapsed`
# (from BenchmarkTools.jl). Put the list creation in a `setup` block; you can see a demo in the README for BenchmarkTools.
# Hints:
# - inside a comprehension I would typically use the syntax `@belapsed(<code goes here>, setup=(<setup code>))`
# to avoid any ambiguity about where the expressions passed to the macro `@belapsed` start and stop.
# - you may have to use `$n` to bring an outside variable into the expression passed to `@belapsed`

list_sizes = [10, 10^2, 10^3, 10^4, 10^5]

time_set10 = ...
time_sethalf = ...

time_2loops10 = ...
time_2loopshalf = ...
10 changes: 10 additions & 0 deletions examples/hashtable.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
using Performance2

# Read the source code for `create_hashtable` and `push_hashtable!` in `src/Performance2.jl`.
# (You can read `hash2index` too, although you won't call it directly.)

# For the letters of the alphabet, `'a':'z'`, push each once onto a hashtable that you create
# with `create_hashtable`. Do this four times, for hash tables of size 16, 32, 64, and 128.
# Be sure you look at the result. Then fill in the table in the README with the number of
# collisions you observe for each table size.

50 changes: 49 additions & 1 deletion src/Performance2.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,53 @@
module Performance2

# Write your package code here.
export unique_2loops, unique_set
export create_hashtable, push_hashtable!

function unique_2loops(list::AbstractVector)
# Implement your own version of `unique`, a function that will return a new vector containing
# each unique item in `list` (i.e., it drops duplicates).
newlist = eltype(list)[]
# Here, use two nested `for` loops to `push!` items onto `newlist` only if they do not appear
# earlier in the list

return newlist
end

function unique_set(list::AbstractVector)
# Implement a second version of `unique`.
# This time, insert all the items from `list` into a `Set` and then collect(set)
end

###
### Below here, you don't need to modify any code. But you should read & understand it!
### This is a very simple (and incorrect) "hash table" intended a taste for how `Set`s work;
### it is in no way intended as a serious implementation.
###

"""
create_hashtable(::Type{T}, tablesize) → ht
Create an empty "hash table," here just a list of length `tablesize`, each element itself a list that can store all items
with the same index as computed by [`hash2index`](@ref).
"""
create_hashtable(::Type{T}, tablesize::Int) where T = [T[] for _ = 1:tablesize]

"""
hash2index(x, tablesize) → idx
Compute the proper index `idx` for `x`, given a hash table of length `tablesize`.
"""
hash2index(x, tablesize::Int) = (hash(x) % tablesize) + 1

"""
push_hashtable!(table, x)
Append `x` to the sub-list of items with index `idx` as computed by [`hash2index`](@ref).
!!! warning
There is no protection against duplicates. The intended usage is to add each unique item
and see how many "collisions" you have.
"""
push_hashtable!(table, x) = push!(table[hash2index(x, length(table))], x)

end
6 changes: 5 additions & 1 deletion test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,9 @@ using Performance2
using Test

@testset "Performance2.jl" begin
# Write your tests here.
@test unique_2loops([1, 2, 2, 2, 3, 2]) == [1, 2, 3]
@test unique_2loops(collect("abracadabra")) == ['a', 'b', 'r', 'c', 'd']

@test sort(unique_set([1, 2, 2, 2, 3, 2])) == [1, 2, 3]
@test sort(unique_set(collect("abracadabra"))) == ['a', 'b', 'c', 'd', 'r']
end

0 comments on commit 266347c

Please sign in to comment.