Skip to content

Commit

Permalink
Merge pull request #673 from jlumpe/heaps-custom-ordering
Browse files Browse the repository at this point in the history
Use custom orderings with heaps and nlargest/nsmallest
  • Loading branch information
oxinabox authored Sep 13, 2020
2 parents 2122aa2 + bfd5303 commit db6c845
Show file tree
Hide file tree
Showing 7 changed files with 218 additions and 50 deletions.
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name = "DataStructures"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.4"
version = "0.18.5"


[deps]
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
Expand Down
94 changes: 69 additions & 25 deletions docs/src/heaps.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@ provides the following interface:
```julia
# Let `h` be a heap, `i` be a handle, and `v` be a value.

i = push!(h, v) # adds a value to the heap and and returns a handle to v
i = push!(h, v) # adds a value to the heap and and returns a handle to v

update!(h, i, v) # updates the value of an element (referred to by the handle i)
update!(h, i, v) # updates the value of an element (referred to by the handle i)

delete!(h, i) # deletes the node with handle i from the heap
delete!(h, i) # deletes the node with handle i from the heap

v, i = top_with_handle(h) # returns the top value of a heap and its handle
v, i = top_with_handle(h) # returns the top value of a heap and its handle
```

Currently, both min/max versions of binary heap (type `BinaryHeap`) and
Expand All @@ -49,38 +49,52 @@ Examples of constructing a heap:

```julia
h = BinaryMinHeap{Int}()
h = BinaryMaxHeap{Int}() # create an empty min/max binary heap of integers
h = BinaryMaxHeap{Int}() # create an empty min/max binary heap of integers

h = BinaryMinHeap([1,4,3,2])
h = BinaryMaxHeap([1,4,3,2]) # create a min/max heap from a vector
h = BinaryMaxHeap([1,4,3,2]) # create a min/max heap from a vector

h = MutableBinaryMinHeap{Int}()
h = MutableBinaryMaxHeap{Int}() # create an empty mutable min/max heap
h = MutableBinaryMaxHeap{Int}() # create an empty mutable min/max heap

h = MutableBinaryMinHeap([1,4,3,2])
h = MutableBinaryMaxHeap([1,4,3,2]) # create a mutable min/max heap from a vector
h = MutableBinaryMaxHeap([1,4,3,2]) # create a mutable min/max heap from a vector
```

Heaps may be constructed with a custom ordering. One use case for custom orderings
is to achieve faster performance with `Float` elements with the risk of random ordering
if any elements are `NaN`. The provided `DataStructures.FasterForward` and
`DataStructures.FasterReverse` orderings are optimized for this purpose.
Custom orderings may also be used for defining the order of structs as heap elements.
## Using alternate orderings

Heaps can also use alternate orderings apart from the default one defined by
`Base.isless`. This is accomplished by passing an instance of `Base.Ordering`
as the first argument to the constructor. The top of the heap will then be the
element that comes first according to this ordering.

The following example uses 2-tuples to track the index of each element in the
original array, but sorts only by the data value:

```julia
h = BinaryHeap{Float64, DataStructures.FasterForward}() # faster min heap
h = BinaryHeap{Float64, DataStructures.FasterReverse}() # faster max heap
data = collect(enumerate(["foo", "bar", "baz"]))

h = MutableBinaryHeap{Float64, DataStructures.FasterForward}() # faster mutable min heap
h = MutableBinaryHeap{Float64, DataStructures.FasterReverse}() # faster mutable max heap
h1 = BinaryHeap(data) # Standard lexicographic ordering for tuples
first(h1) # => (1, "foo")

h = BinaryHeap{MyStruct, MyStructOrdering}() # heap containing custom struct
h2 = BinaryHeap(Base.By(last), data) # Order by 2nd element only
first(h2) # => (2, "bar")
```

If the ordering type is a singleton it can be passed as a type parameter to the
constructor instead:

```julia
BinaryHeap{T, O}() # => BinaryHeap{T}(O())
MutableBinaryHeap{T, O}() # => MutableBinaryHeap{T}(O())
```

## Min-max heaps
Min-max heaps maintain the minimum _and_ the maximum of a set,
allowing both to be retrieved in constant (`O(1)`) time.
The min-max heaps in this package are subtypes of `AbstractMinMaxHeap <: AbstractHeap`
and have the same interface as other heaps with the following additions:

```julia
# Let h be a min-max heap, k an integer
minimum(h) # return the smallest element
Expand All @@ -95,6 +109,7 @@ popmax!(h, k) # remove and return the largest k elements
popall!(h) # remove and return all the elements, sorted smallest to largest
popall!(h, o) # remove and return all the elements according to ordering o
```

The usual `first(h)` and `pop!(h)` are defined to be `minimum(h)` and `popmin!(h)`,
respectively.

Expand All @@ -104,7 +119,7 @@ This package includes an implementation of a binary min-max heap (`BinaryMinMaxH
Examples:
```julia
h = BinaryMinMaxHeap{Int}() # create an empty min-max heap with integer values
h = BinaryMinMaxHeap{Int}() # create an empty min-max heap with integer values

h = BinaryMinMaxHeap([1, 2, 3, 4]) # create a min-max heap from a vector
```
Expand All @@ -115,13 +130,42 @@ Heaps can be used to extract the largest or smallest elements of an
array without sorting the entire array first:

```julia
nlargest(3, [0,21,-12,68,-25,14]) # => [68,21,14]
nsmallest(3, [0,21,-12,68,-25,14]) # => [-25,-12,0]
data = [0,21,-12,68,-25,14]
nlargest(3, data) # => [68,21,14]
nsmallest(3, data) # => [-25,-12,0]
```

Both methods also support the `by` and `lt` keywords to customize the sort order,
as in `Base.sort`:

```julia
nlargest(3, data, by=x -> x^2) # => [68,-25,21]
nsmallest(3, data, by=x -> x^2) # => [0,-12,14]
```

Note that if the array contains floats and is free of NaN values,
then the following alternatives may be used to achieve a 2x performance boost.
The lower-level `DataStructures.nextreme` function takes a `Base.Ordering`
instance as the first argument and returns the first `n` elements according to
this ordering:

```julia
DataStructures.nextreme(Base.Forward, n, a) # Equivalent to nsmallest(n, a)
```
DataStructures.nextreme(DataStructures.FasterReverse(), n, a) # faster nlargest(n, a)
DataStructures.nextreme(DataStructures.FasterForward(), n, a) # faster nsmallest(n, a)


# Improving performance with Float data

One use case for custom orderings is to achieve faster performance with `Float`
elements with the risk of random ordering if any elements are `NaN`.
The provided `DataStructures.FasterForward` and `DataStructures.FasterReverse`
orderings are optimized for this purpose and may achive a 2x performance boost:

```julia
h = BinaryHeap{Float64, DataStructures.FasterForward}() # faster min heap
h = BinaryHeap{Float64, DataStructures.FasterReverse}() # faster max heap

h = MutableBinaryHeap{Float64, DataStructures.FasterForward}() # faster mutable min heap
h = MutableBinaryHeap{Float64, DataStructures.FasterReverse}() # faster mutable max heap

DataStructures.nextreme(DataStructures.FasterReverse(), n, a) # faster nlargest(n, a)
DataStructures.nextreme(DataStructures.FasterForward(), n, a) # faster nsmallest(n, a)
```
28 changes: 18 additions & 10 deletions src/heaps.jl
Original file line number Diff line number Diff line change
Expand Up @@ -129,37 +129,45 @@ function nextreme(ord::Base.Ordering, n::Int, arr::AbstractVector{T}) where T
end

"""
nlargest(n, arr)
nlargest(n, arr; kw...)
Return the `n` largest elements of the array `arr`.
Equivalent to:
sort(arr, order = Base.Reverse)[1:min(n, end)]
sort(arr, kw..., rev=true)[1:min(n, end)]
Note that if `arr` contains floats and is free of NaN values,
then the following alternative may be used to achieve 2x performance.
then the following alternative may be used to achieve 2x performance:
DataStructures.nextreme(DataStructures.FasterReverse(), n, arr)
This faster version is equivalent to:
sort(arr, lt = >)[1:min(n, end)]
"""
function nlargest(n::Int, arr::AbstractVector)
return nextreme(Base.Reverse, n, arr)
function nlargest(n::Int, arr::AbstractVector; lt=isless, by=identity)
order = Base.ReverseOrdering(Base.ord(lt, by, nothing))
return nextreme(order, n, arr)
end

"""
nsmallest(n, arr)
nsmallest(n, arr; kw...)
Return the `n` smallest elements of the array `arr`.
Equivalent to:
sort(arr, order = Base.Forward)[1:min(n, end)]
sort(arr; kw...)[1:min(n, end)]
Note that if `arr` contains floats and is free of NaN values,
then the following alternative may be used to achieve 2x performance.
then the following alternative may be used to achieve 2x performance:
DataStructures.nextreme(DataStructures.FasterForward(), n, arr)
This faster version is equivalent to:
sort(arr, lt = <)[1:min(n, end)]
"""
function nsmallest(n::Int, arr::AbstractVector)
return nextreme(Base.Forward, n, arr)
function nsmallest(n::Int, arr::AbstractVector; lt=isless, by=identity)
order = Base.ord(lt, by, nothing)
return nextreme(order, n, arr)
end
18 changes: 13 additions & 5 deletions src/heaps/binary_heap.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,30 @@ mutable struct BinaryHeap{T, O <: Base.Ordering} <: AbstractHeap{T}
ordering::O
valtree::Vector{T}

function BinaryHeap{T, O}() where {T,O}
new{T,O}(O(), Vector{T}())
function BinaryHeap{T}(ordering::Base.Ordering) where T
new{T, typeof(ordering)}(ordering, Vector{T}())
end

function BinaryHeap{T, O}(xs) where {T,O}
ordering = O()
function BinaryHeap{T}(ordering::Base.Ordering, xs::AbstractVector) where T
valtree = heapify(xs, ordering)
new{T,O}(ordering, valtree)
new{T, typeof(ordering)}(ordering, valtree)
end
end

BinaryHeap(ordering::Base.Ordering, xs::AbstractVector{T}) where T = BinaryHeap{T}(ordering, xs)

# Constructors using singleton order types as type parameters rather than arguments
BinaryHeap{T, O}() where {T, O<:Base.Ordering} = BinaryHeap{T}(O())
BinaryHeap{T, O}(xs::AbstractVector) where {T, O<:Base.Ordering} = BinaryHeap{T}(O(), xs)

# Forward/reverse ordering type aliases
const BinaryMinHeap{T} = BinaryHeap{T, Base.ForwardOrdering}
const BinaryMaxHeap{T} = BinaryHeap{T, Base.ReverseOrdering}

BinaryMinHeap(xs::AbstractVector{T}) where T = BinaryMinHeap{T}(xs)
BinaryMaxHeap(xs::AbstractVector{T}) where T = BinaryMaxHeap{T}(xs)


#################################################
#
# interfaces
Expand Down
18 changes: 12 additions & 6 deletions src/heaps/mutable_binary_heap.jl
Original file line number Diff line number Diff line change
Expand Up @@ -160,26 +160,32 @@ mutable struct MutableBinaryHeap{T, O <: Base.Ordering} <: AbstractMutableHeap{T
nodes::Vector{MutableBinaryHeapNode{T}}
node_map::Vector{Int}

function MutableBinaryHeap{T, O}() where {T, O}
ordering = O()
function MutableBinaryHeap{T}(ordering::Base.Ordering) where T
nodes = Vector{MutableBinaryHeapNode{T}}()
node_map = Vector{Int}()
new{T, O}(ordering, nodes, node_map)
new{T, typeof(ordering)}(ordering, nodes, node_map)
end

function MutableBinaryHeap{T, O}(xs::AbstractVector{T}) where {T, O}
ordering = O()
function MutableBinaryHeap{T}(ordering::Base.Ordering, xs::AbstractVector) where T
nodes, node_map = _make_mutable_binary_heap(ordering, T, xs)
new{T, O}(ordering, nodes, node_map)
new{T, typeof(ordering)}(ordering, nodes, node_map)
end
end

MutableBinaryHeap(ordering::Base.Ordering, xs::AbstractVector{T}) where T = MutableBinaryHeap{T}(ordering, xs)

# Constructors using singleton order types as type parameters rather than arguments
MutableBinaryHeap{T, O}() where {T, O<:Base.Ordering} = MutableBinaryHeap{T}(O())
MutableBinaryHeap{T, O}(xs::AbstractVector) where {T, O<:Base.Ordering} = MutableBinaryHeap{T}(O(), xs)

# Forward/reverse ordering type aliases
const MutableBinaryMinHeap{T} = MutableBinaryHeap{T, Base.ForwardOrdering}
const MutableBinaryMaxHeap{T} = MutableBinaryHeap{T, Base.ReverseOrdering}

MutableBinaryMinHeap(xs::AbstractVector{T}) where T = MutableBinaryMinHeap{T}(xs)
MutableBinaryMaxHeap(xs::AbstractVector{T}) where T = MutableBinaryMaxHeap{T}(xs)


function Base.show(io::IO, h::MutableBinaryHeap)
print(io, "MutableBinaryHeap(")
nodes = h.nodes
Expand Down
Loading

2 comments on commit db6c845

@oxinabox
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/21321

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.18.5 -m "<description of version>" db6c845a00ff71cf6df9e39462fc0181c8bc2892
git push origin v0.18.5

Please sign in to comment.